8.1 Shell Scripting Fundamentals

Content

Overview and Objectives

Learning Objectives

After completing this section, you will be able to:

  1. Create properly structured shell scripts with appropriate shebang lines, comments, and execution permissions for production use
  2. Implement conditional logic using if/then/else and case statements to make scripts respond to different situations
  3. Apply boolean operators and file/environment tests to create robust decision-making logic in scripts
  4. Design and implement loops using for, while, and until constructs to automate repetitive tasks
  5. Create and utilize functions to organize script code and create reusable components

Real-world Context

Shell scripting is how you get from “I typed that command again” to “the machine does it for me.” In my three decades of system administration, I’ve seen it repeatedly: the admins who script well spend far less time fire-fighting and far more time on work that actually matters.

At some point, manual command execution stops being viable. You need to check disk space on dozens of servers, roll out config changes across multiple environments, or comb through hundreds of log files before a postmortem. Scripts make that work reliable and repeatable — the same commands, in the same order, every time, without the typos.

The fundamentals covered here transfer directly to more advanced tooling. Configuration management systems, CI/CD pipelines, and cloud automation all use the same constructs: conditionals, loops, functions, and text processing. The syntax changes; the concepts do not.

Script Structure and Foundation Elements

A well-structured script communicates its purpose clearly, executes predictably, and can be maintained by other administrators. The conventions below are what make that happen in practice.

The shebang line tells the kernel which interpreter to use when someone runs the script. #!/bin/bash works on most systems, but #!/usr/bin/env bash is more portable because it searches the PATH for bash rather than assuming a fixed location. This matters when scripts move between Linux distributions or when bash is installed somewhere non-standard.

#!/usr/bin/env bash is the conventional choice for scripts you intend to share or deploy across different systems. If you’re writing a quick personal script on a known machine, #!/bin/bash is fine — but the env form costs nothing and buys you flexibility.

Comments do more than document. They explain complex logic, record assumptions about the environment, and help the person who inherits the script six months from now — which is often you. In team environments where multiple people modify the same automation, a comment explaining why a command is written a certain way is worth more than the command itself.

Here’s how proper script structure looks in practice:

#!/usr/bin/env bash
# Script: system-cleanup.sh
# Purpose: Automated system maintenance and cleanup
# Author: Jochen (Monospace Mentor)
# Version: 1.2
# Last Modified: 2025-01-15

# Set strict error handling
set -euo pipefail

# Global variables
SCRIPT_NAME=$(basename "$0")
LOG_FILE="/var/log/system-cleanup.log"
TEMP_DIR="/tmp"
MAX_LOG_SIZE=100000000  # 100MB in bytes

# Function definitions
log_message() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE"
}

# Main script execution starts here
log_message "Starting system cleanup process"

The set -euo pipefail line enables strict error handling so the script stops on unexpected failures rather than silently continuing. Global variables at the top give you one place to change configuration values, and defining log_message early means every subsequent part of the script can use it.

Execution permissions are easy to overlook. chmod +x scriptname.sh makes your script executable, but consider who can modify it — scripts that run with elevated privileges should not be writable by unprivileged users.

Conditional Statements for Decision Making

Conditional statements let scripts adapt their behaviour based on what they find at runtime: file existence, user input, system conditions, or environment variables. Without them, every script does the same thing every time regardless of what is actually going on.

Understanding the if Statement

The if statement is the basic building block of conditional logic in bash. It evaluates a test condition and executes a block of commands only when that condition is true:

if [ condition ]; then
    commands
fi

The square brackets [ ] represent the test command, which evaluates the condition and returns an exit status. If the condition is true (exit status 0), bash executes the commands between then and fi. If the condition is false (non-zero exit status), bash skips the command block entirely.

The semicolon after the closing bracket is required when placing then on the same line as if. Alternatively, you can place then on the next line without the semicolon:

if [ condition ]
then
    commands
fi

Extending Logic with else

The else clause provides an alternative path when the if condition fails. This allows your script to handle both positive and negative test results with grace:

if [ condition ]; then
    commands_for_true
else
    commands_for_false
fi

When the condition is true, bash executes the commands after then and skips everything after else. When the condition is false, bash skips the commands after then and executes the commands after else. This ensures exactly one path executes, never both.

Multiple Conditions with elif

The elif (else if) clause allows you to test multiple conditions in sequence. This is more efficient and readable than nested if statements when you need to choose between several mutually exclusive options:

if [ condition1 ]; then
    commands_for_condition1
elif [ condition2 ]; then
    commands_for_condition2
elif [ condition3 ]; then
    commands_for_condition3
else
    commands_for_none_true
fi

Bash evaluates conditions in order and executes the commands for the first true condition it encounters. Once a condition matches, bash skips all remaining elif and else clauses. The final else clause is optional and provides a default action when none of the conditions are true.

Test Conditions and Operators

The condition inside the brackets uses test operators to compare values, check file properties, or evaluate strings. Common test operators include:

  • String comparisons: = (equal), != (not equal), -z (empty), -n (non-empty)
  • Numeric comparisons: -eq (equal), -ne (not equal), -lt (less than), -gt (greater than)
  • File tests: -f (regular file), -d (directory), -e (exists), -r (readable), -w (writable), -x (executable)

Always quote variables in test conditions to prevent errors when variables contain spaces or are empty: [ "$variable" = "value" ] instead of [ $variable = value ].

Here’s a practical example of conditional logic in a backup script:

#!/usr/bin/env bash
# Backup script with intelligent decision making

BACKUP_SOURCE="/home/users"
BACKUP_DEST="/backup/daily"
FREE_SPACE=$(df "$BACKUP_DEST" | tail -1 | awk '{print $4}')
REQUIRED_SPACE=5000000  # 5GB in KB

# Check if source directory exists
if [ ! -d "$BACKUP_SOURCE" ]; then
    echo "Error: Source directory $BACKUP_SOURCE does not exist"
    exit 1
elif [ ! -w "$BACKUP_DEST" ]; then
    echo "Error: Cannot write to backup destination $BACKUP_DEST"
    exit 2
elif [ "$FREE_SPACE" -lt "$REQUIRED_SPACE" ]; then
    echo "Warning: Insufficient disk space for backup"
    echo "Required: ${REQUIRED_SPACE}KB, Available: ${FREE_SPACE}KB"
    exit 3
else
    echo "All conditions met, proceeding with backup"
    tar -czf "$BACKUP_DEST/backup-$(date +%Y%m%d).tar.gz" "$BACKUP_SOURCE"
fi

The script checks three prerequisites before doing any actual work: source directory existence, write permission on the destination, and available disk space. Each gets its own descriptive error message and a distinct exit code, so a calling script or monitoring system can tell exactly which check failed.

Understanding Case Statements

A case statement compares one value against multiple patterns and runs the matching branch. It is cleaner than a chain of elif comparisons when you are matching against specific values rather than ranges or compound conditions.

The basic syntax of a case statement follows this pattern:

case $variable in
    pattern1)
        commands
        ;;
    pattern2)
        commands
        ;;
    *)
        default_commands
        ;;
esac

Case Statement Components

The $variable after case can be any string expression — a variable, command substitution, or literal string. Bash evaluates this expression once and compares it against each pattern in order.

Each pattern can be a literal string, a glob pattern with wildcards (*, ?, [...]), or multiple alternatives separated by pipe symbols (|). Patterns are matched literally, not as regular expressions.

After each pattern and its closing parenthesis come the commands to execute when that pattern matches, terminated by a double semicolon (;;). The ;; tells bash to skip the remaining patterns, preventing the fall-through behaviour found in some other languages.

The * pattern matches anything not caught by earlier patterns and is the default case. It is optional but worth including so unexpected input does not silently fall through. The esac keyword (case spelled backward) closes the statement, just as fi closes an if.

Pattern Matching Features

Case patterns support several matching options:

  • Literal strings: "DEBUG" matches exactly “DEBUG”
  • Wildcards: "*.log" matches any string ending in “.log”
  • Character classes: "[0-9]*" matches strings starting with a digit
  • Multiple patterns: "start"|"restart" matches either “start” or “restart”
  • Negation: !(pattern) matches anything except the pattern (requires shopt -s extglob)

Here’s how case statements work in practice:

# Handle different log levels in a monitoring script
case "$LOG_LEVEL" in
    "DEBUG")
        echo "Debug mode enabled - verbose output"
        set -x
        ;;
    "INFO")
        echo "Normal operation mode"
        ;;
    "WARNING")
        echo "Warning level - reduced output"
        ;;
    "ERROR")
        echo "Error level - critical messages only"
        exec 2>/dev/null
        ;;
    *)
        echo "Unknown log level: $LOG_LEVEL"
        echo "Valid options: DEBUG, INFO, WARNING, ERROR"
        exit 1
        ;;
esac

The wildcard pattern (*) at the end handles any value not matched by the specific cases, printing a clear usage message and exiting with an error rather than silently doing nothing.

Boolean Operators and Complex Logic

Boolean operators let you combine conditions so a single if tests multiple criteria at once, rather than writing a chain of nested statements.

The AND Operator (&&)

&& requires all conditions to be true. If the first is false, bash skips the rest — the result is already settled. Use it when multiple things all have to hold before you proceed: a file exists AND you have write permission before you try to modify it.

The OR Operator (||)

|| succeeds if any condition is true. If the first is true, bash stops there. It fits fallback checks — is a process running, or does its PID file exist, before you try to start it.

The NOT Operator (!)

! inverts a condition. ! [ -f "$lockfile" ] is often cleaner than hunting for the right negative flag.

Combining Operators with Parentheses

Parentheses group conditions and control evaluation order. ( condition1 && condition2 ) || condition3 evaluates the parenthesised group first, then applies OR — so the precedence does what you actually mean rather than what the parser defaults to.

Short-Circuit Evaluation

Because && and || stop as soon as the result is determined, the second condition may never run. That’s useful: [ -f "$file" ] && grep -q "pattern" "$file" won’t try to grep a file that doesn’t exist. It can also hide bugs if you rely on a side effect in the second condition — write your tests so that matters.

Here’s how boolean operators work in practice:

#!/usr/bin/env bash
# System health check with complex conditions

LOAD_THRESHOLD=2.0
DISK_THRESHOLD=90
MEMORY_THRESHOLD=85

# Get current system metrics
CURRENT_LOAD=$(uptime | awk '{print $(NF-2)}' | sed 's/,//')
DISK_USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')
MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100}')

# Complex condition checking multiple thresholds
if [ "$(echo "$CURRENT_LOAD > $LOAD_THRESHOLD" | bc)" -eq 1 ] && \
   [ "$DISK_USAGE" -gt "$DISK_THRESHOLD" ] || \
   [ "$MEMORY_USAGE" -gt "$MEMORY_THRESHOLD" ]; then
    
    echo "ALERT: System resources critical"
    echo "Load: $CURRENT_LOAD, Disk: ${DISK_USAGE}%, Memory: ${MEMORY_USAGE}%"
    
    # Take corrective action
    if [ "$DISK_USAGE" -gt "$DISK_THRESHOLD" ] && [ -d "/tmp" ]; then
        echo "Cleaning temporary files..."
        find /tmp -type f -atime +7 -delete
    fi
    
elif [ "$CURRENT_LOAD" > "1.0" ] || [ "$DISK_USAGE" -gt "75" ]; then
    echo "WARNING: System resources elevated"
    echo "Load: $CURRENT_LOAD, Disk: ${DISK_USAGE}%, Memory: ${MEMORY_USAGE}%"
else
    echo "System resources normal"
fi

The first condition combines high load AND high disk usage OR high memory usage — the kind of compound check that would take several nested if statements to express otherwise. The bc command handles the floating-point load average comparison since [ ] only does integer arithmetic.

File and Environment Variable Tests

Scripts need to check file properties and environment conditions before taking action. These tests prevent errors, catch missing prerequisites early, and let scripts adapt to the environment they’re running in.

File Existence and Type Tests

File test operators go well beyond simple existence checks — you can test permissions, file types, size, and modification times.

Basic Existence Tests:

  • -e file: Returns true if the file exists (any type - regular file, directory, device, etc.)
  • -f file: Returns true if the file exists AND is a regular file (not a directory or special file)
  • -d file: Returns true if the file exists AND is a directory
  • -L file: Returns true if the file exists AND is a symbolic link
  • -S file: Returns true if the file exists AND is a socket
  • -p file: Returns true if the file exists AND is a named pipe (FIFO)

File Permission Tests:

  • -r file: Returns true if the file exists and is readable by the current user
  • -w file: Returns true if the file exists and is writable by the current user
  • -x file: Returns true if the file exists and is executable by the current user
  • -O file: Returns true if the file exists and is owned by the current user
  • -G file: Returns true if the file exists and is owned by the current user’s group

File Property Tests:

  • -s file: Returns true if the file exists and has a size greater than zero
  • -u file: Returns true if the file exists and has the setuid bit set
  • -g file: Returns true if the file exists and has the setgid bit set
  • -k file: Returns true if the file exists and has the sticky bit set

File Comparison Tests

You can compare files directly using special operators:

  • file1 -nt file2: Returns true if file1 is newer than file2 (based on modification time)
  • file1 -ot file2: Returns true if file1 is older than file2
  • file1 -ef file2: Returns true if file1 and file2 refer to the same file (same device and inode)

Environment Variable Tests

Environment variable tests let scripts adapt to the environment they run in: checking that required variables are set, comparing their values, and branching accordingly.

Variable State Tests:

  • -z "$VAR": Returns true if the variable is empty or unset (zero length)
  • -n "$VAR": Returns true if the variable is non-empty (non-zero length)
  • [ "$VAR" ]: Shorthand for -n "$VAR", returns true if variable is non-empty

String Comparison Tests:

  • "$VAR" = "value": Returns true if the variable exactly equals the specified value
  • "$VAR" != "value": Returns true if the variable does not equal the specified value
  • "$VAR" < "value": Returns true if the variable is lexicographically less than the value
  • "$VAR" > "value": Returns true if the variable is lexicographically greater than the value

Numeric Comparison Tests (for variables containing numbers):

  • "$VAR" -eq number: Returns true if the variable equals the number
  • "$VAR" -ne number: Returns true if the variable does not equal the number
  • "$VAR" -lt number: Returns true if the variable is less than the number
  • "$VAR" -le number: Returns true if the variable is less than or equal to the number
  • "$VAR" -gt number: Returns true if the variable is greater than the number
  • "$VAR" -ge number: Returns true if the variable is greater than or equal to the number

Parameter Expansion for Variable Testing

Bash provides special parameter expansion syntax for handling undefined or empty variables:

  • ${VAR:-default}: Returns the value of VAR if set and non-empty, otherwise returns “default”
  • ${VAR:=default}: Like above, but also assigns “default” to VAR if it was unset
  • ${VAR:+alternate}: Returns “alternate” if VAR is set and non-empty, otherwise returns nothing
  • ${VAR:?message}: Returns the value of VAR if set and non-empty, otherwise prints “message” and exits

Important Quoting Considerations

Always quote variables in test conditions to prevent errors when variables contain spaces or are empty. Use [ "$variable" = "value" ] instead of [ $variable = value ]. Unquoted variables can cause syntax errors or unexpected behavior when they expand to multiple words or contain special characters.

Always quote your variables. An unquoted variable containing a space becomes two separate words — turning [ $filename = "backup" ] into [ my file = "backup" ], which is a syntax error. This is one of the most common sources of subtle script bugs.

Here’s a practical example of file testing in a log rotation script:

#!/usr/bin/env bash
# Log rotation script with comprehensive file checks

LOG_FILE="/var/log/application.log"
ARCHIVE_DIR="/var/log/archives"
MAX_SIZE=10485760  # 10MB in bytes

# Comprehensive file and directory checks
if [ ! -e "$LOG_FILE" ]; then
    echo "Log file $LOG_FILE does not exist - nothing to rotate"
    exit 0
elif [ ! -f "$LOG_FILE" ]; then
    echo "Error: $LOG_FILE exists but is not a regular file"
    exit 1
elif [ ! -r "$LOG_FILE" ]; then
    echo "Error: Cannot read $LOG_FILE - check permissions"
    exit 2
fi

# Check archive directory
if [ ! -d "$ARCHIVE_DIR" ]; then
    echo "Creating archive directory: $ARCHIVE_DIR"
    mkdir -p "$ARCHIVE_DIR" || {
        echo "Error: Cannot create archive directory"
        exit 3
    }
elif [ ! -w "$ARCHIVE_DIR" ]; then
    echo "Error: Cannot write to archive directory $ARCHIVE_DIR"
    exit 4
fi

# Check file size
if [ $(stat -f%z "$LOG_FILE" 2>/dev/null || stat -c%s "$LOG_FILE") -gt "$MAX_SIZE" ]; then
    echo "Rotating log file (size exceeds ${MAX_SIZE} bytes)"
    mv "$LOG_FILE" "$ARCHIVE_DIR/application-$(date +%Y%m%d-%H%M%S).log"
    touch "$LOG_FILE"
    chmod 644 "$LOG_FILE"
else
    echo "Log file size within limits - no rotation needed"
fi

The stat command uses -f%z on macOS and -c%s on Linux; the 2>/dev/null || stat -c%s pattern handles both. After rotating, the script creates a new empty log file and resets its permissions so the application can resume writing.

Here’s an example of environment variable testing in a deployment script:

#!/usr/bin/env bash
# Deployment script with environment validation

# Required environment variables
REQUIRED_VARS=("DEPLOY_ENV" "APP_VERSION" "DATABASE_URL")

# Check for required variables
for var in "${REQUIRED_VARS[@]}"; do
    if [ -z "${!var}" ]; then
        echo "Error: Required variable $var is not set"
        exit 1
    fi
done

# Validate deployment environment
case "$DEPLOY_ENV" in
    "development"|"staging"|"production")
        echo "Deploying to $DEPLOY_ENV environment"
        ;;
    *)
        echo "Error: Invalid deployment environment: $DEPLOY_ENV"
        echo "Valid environments: development, staging, production"
        exit 2
        ;;
esac

# Check if we're running as the correct user
if [ "$USER" != "deploy" ] && [ "$DEPLOY_ENV" = "production" ]; then
    echo "Error: Production deployments must run as 'deploy' user"
    exit 3
fi

# Proceed with deployment
echo "Environment validated - proceeding with deployment"

The ${!var} syntax is indirect expansion: it reads the value of the variable whose name is stored in var. This lets the loop check each required variable by name without repeating the same if [ -z ... ] block multiple times.

Loop Constructs for Repetitive Operations

Loops are where scripting pays off most obviously. Instead of running the same command by hand on dozens of servers or hundreds of files, you write it once and let the loop handle the rest.

Understanding For Loops

The for loop excels at processing lists of items — files, directories, server names, or user accounts. It assigns each item in turn to a variable, then executes the loop body for that item, making it ideal for batch operations where you know the set of targets in advance:

for variable in list; do
    commands
done

variable holds the current item, list is a space-separated sequence of values, and the do/done keywords bracket the commands to run on each iteration. The list can come from many sources: literal values (for day in Monday Tuesday Wednesday), glob patterns (for file in /var/log/*.log), command substitution (for user in $(cut -d: -f1 /etc/passwd)), brace expansion (for num in {1..10}), or arrays (for item in "${array[@]}").

Understanding While Loops

While loops continue executing as long as a condition remains true. They’re ideal for monitoring situations, processing data streams, or implementing retry logic — particularly when you don’t know in advance how many iterations you’ll need:

while [ condition ]; do
    commands
done

The condition is evaluated before each iteration. The loop continues as long as it returns true (exit status 0) and terminates when it returns false (non-zero exit status). Common patterns include infinite loops (while true or while :) for continuous monitoring daemons, counter loops that increment a variable until it reaches a limit, line-by-line file processing, and service monitoring that polls until a desired state is reached.

Understanding Until Loops

Until loops provide the opposite logic of while loops — they execute until a condition becomes true, which is useful for waiting operations where you want to keep retrying until success occurs:

until [ condition ]; do
    commands
done

The condition is evaluated before each iteration: the loop continues as long as it returns false (non-zero exit status) and terminates when it returns true (exit status 0). This inverted logic reads naturally for polling scenarios such as waiting for a service to start (until systemctl is-active service), retrying a failed operation, polling for an expected file to appear, or checking network connectivity.

Reading Files Line by Line

A common use of while loops in system administration involves processing files line by line, which lets you handle configuration files, user lists, or any structured text data:

while IFS= read -r line; do
    commands
done < filename

IFS= prevents read from trimming leading and trailing whitespace; -r reads lines literally without interpreting backslash escapes; < filename feeds the file as input to the loop without creating a subshell. For structured files like /etc/passwd, you can set IFS=: to split each line on colons and assign fields to separate variables:

while IFS=: read -r field1 field2 field3; do
    commands
done < filename

Loop Control Statements

Two keywords let you alter a loop’s flow mid-execution. break immediately exits the current loop regardless of the loop condition — useful when you’ve found what you need and further iterations are unnecessary. continue skips the remaining commands in the current iteration and jumps to the next one, letting you filter out items you want to ignore without terminating the loop entirely:

for file in *.txt; do
    [ ! -r "$file" ] && continue    # Skip unreadable files
    [ "$(wc -l < "$file")" -eq 0 ] && continue    # Skip empty files
    
    if grep -q "ERROR" "$file"; then
        echo "Found errors in $file"
        break    # Stop processing after first error file
    fi
done

Loop Performance Considerations

Pipes create subshells, which can silently swallow variable changes:

# ❌ INCORRECT: The pipe creates a subshell — counter changes don't persist
counter=0
cat file | while read line; do
    counter=$((counter + 1))
done
echo $counter  # Still 0!

# ✅ CORRECT: Redirect the file directly — the while loop stays in the current shell
counter=0
while read line; do
    counter=$((counter + 1))
done < file
echo $counter  # Shows actual count

For large files, avoid calling external commands inside loops for every line. Where possible, use tools like awk or sed that process the whole file in one pass rather than invoking a new process per iteration.

Here’s a practical for loop example that processes multiple log files:

#!/usr/bin/env bash
# Process multiple log files for error analysis

LOG_DIRECTORY="/var/log/applications"
ERROR_REPORT="/tmp/error-summary.txt"
ERROR_COUNT=0

# Initialize report
echo "Error Analysis Report - $(date)" > "$ERROR_REPORT"
echo "=================================" >> "$ERROR_REPORT"

# Process all log files in the directory
for logfile in "$LOG_DIRECTORY"/*.log; do
    # Skip if no log files match the pattern
    [ ! -f "$logfile" ] && continue
    
    echo "Processing: $(basename "$logfile")"
    
    # Count errors in this file
    file_errors=$(grep -c "ERROR" "$logfile" 2>/dev/null || echo "0")
    ERROR_COUNT=$((ERROR_COUNT + file_errors))
    
    # Add details to report
    echo "" >> "$ERROR_REPORT"
    echo "File: $(basename "$logfile")" >> "$ERROR_REPORT"
    echo "Error count: $file_errors" >> "$ERROR_REPORT"
    
    # Include recent errors if any exist
    if [ "$file_errors" -gt 0 ]; then
        echo "Recent errors:" >> "$ERROR_REPORT"
        tail -50 "$logfile" | grep "ERROR" | tail -5 >> "$ERROR_REPORT"
    fi
done

echo "Analysis complete. Total errors: $ERROR_COUNT"
echo "Report saved to: $ERROR_REPORT"

The [ ! -f "$logfile" ] && continue guard handles the case where the glob pattern matches nothing — without it, the loop would execute once with the literal string *.log as the value. Each iteration counts errors and appends them to the running report; the accumulation happens because the loop body runs in the current shell, not a subshell.

#!/usr/bin/env bash
# Monitor disk space and wait for cleanup

DISK_THRESHOLD=85
CHECK_INTERVAL=300  # 5 minutes

while true; do
    DISK_USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')
    
    if [ "$DISK_USAGE" -gt "$DISK_THRESHOLD" ]; then
        echo "$(date): Disk usage critical: ${DISK_USAGE}%"
        
        # Attempt automatic cleanup
        echo "Attempting automatic cleanup..."
        find /tmp -type f -atime +1 -delete
        find /var/log -name "*.log.*" -mtime +30 -delete
        
        # Wait and check again
        sleep "$CHECK_INTERVAL"
    else
        echo "$(date): Disk usage normal: ${DISK_USAGE}%"
        break
    fi
done

The sleep "$CHECK_INTERVAL" inside the loop prevents the script from hammering the disk with constant df calls. The break fires once disk usage drops back below the threshold, ending the monitoring run cleanly.

Here is the same file-processing pattern applied to user account creation:

#!/usr/bin/env bash
# Process user list for account creation

USER_LIST="/tmp/new-users.txt"
FAILED_USERS="/tmp/failed-creations.txt"

# Clear previous failure log
> "$FAILED_USERS"

# Read user list line by line
while IFS=: read -r username fullname department; do
    # Skip empty lines and comments
    [[ "$username" =~ ^#.*$ ]] && continue
    [ -z "$username" ] && continue
    
    echo "Creating user: $username ($fullname) - $department"
    
    # Create user account
    if useradd -c "$fullname" -m "$username"; then
        echo "Successfully created: $username"
        
        # Set initial password (user must change on first login)
        echo "$username:temppass123" | chpasswd
        passwd -e "$username"
    else
        echo "Failed to create: $username" | tee -a "$FAILED_USERS"
    fi
    
done < "$USER_LIST"

# Report results
if [ -s "$FAILED_USERS" ]; then
    echo "Some user creations failed. See: $FAILED_USERS"
else
    echo "All users created successfully"
fi

IFS=: splits each line on colons, so username, fullname, and department are populated in one read call. The passwd -e call forces a password change on first login — without it, users would log in with the temporary password and have no reason to change it.

Functions for Code Organization and Reusability

Functions turn a script from a long list of commands into something with structure. You give a chunk of logic a name, and then you can call it from anywhere in the script — or test it in isolation. They eliminate repeated code blocks and make the intent of the main script body much easier to read.

Understanding Function Definition

A function definition creates a named command that you can call anywhere in your script. Functions can accept parameters, perform complex operations, and return exit codes to indicate success or failure. Bash supports two equivalent syntaxes:

function_name() {
    commands
}
function function_name {
    commands
}

The function name follows standard variable naming rules (letters, numbers, and underscores). In the first form, the parentheses signal that this is a function definition; in the second, the function keyword serves the same purpose. Either way, the curly braces contain the body — the commands that run each time the function is called.

Function Parameters and Arguments

Functions can accept parameters just like shell scripts, using positional parameters $1, $2, $3, and so on. $@ expands all parameters as separate words (preserving quoting), while $* expands them as a single word. $# gives the number of parameters passed, and $0 holds the script name rather than the function name. A well-written function assigns parameters to named local variables and validates that required arguments are present:

my_function() {
    local param1="$1"
    local param2="$2"
    local param3="${3:-default_value}"
    
    # Check if required parameters are provided
    if [ $# -lt 2 ]; then
        echo "Usage: my_function param1 param2 [param3]" >&2
        return 1
    fi
    
    # Function logic here
}

Practical application: alias functions

The difference between functions and aliases is that functions can take arguments and contain logic. An alias is text substitution; a function runs code. The psg example below takes a search term, pipes it through ps aux, and strips the grep process from the results — something an alias cannot do cleanly.

function psg() {
  ps aux | grep "$1" | grep -v 'grep'
}

Local Variables in Functions

Local variables prevent naming conflicts with the rest of the script and make functions predictable regardless of what the caller has set:

my_function() {
    local var1="value1"
    local var2="$1"
    local result
    
    # Function operations
    result="computed value"
    
    echo "$result"
}

Variables declared without local are global and affect the entire script. Variables declared with local exist only within the function and hide any global variable with the same name. Function parameters ($1, $2, etc.) are automatically local — they do not leak out to the caller.

Function Return Values

Functions can return data two ways: exit codes and output. Use return n to set the exit status (0–255), where 0 means success and any non-zero value means failure. If there is no return statement, the function exits with the status of its last command. To return actual data, write it to stdout with echo or printf and capture it with command substitution (result=$(my_function)). Send error messages to stderr so they do not pollute the captured output.

Here is both methods in one function:

validate_and_process() {
    local input="$1"
    
    # Validation
    if [ -z "$input" ]; then
        echo "Error: No input provided" >&2
        return 1
    fi
    
    # Processing
    local processed=$(echo "$input" | tr '[:lower:]' '[:upper:]')
    
    # Return processed data
    echo "$processed"
    return 0
}

# Usage
if result=$(validate_and_process "hello"); then
    echo "Success: $result"
else
    echo "Function failed"
fi

Function Organization and Best Practices

Use descriptive names that clearly indicate the function’s purpose — check_disk_space over check, backup_database over backup, validate_ip_address over validate.

Each function should have one clear purpose. Functions that do one thing well are easier to test and debug. Avoid functions that perform multiple unrelated operations; break complex operations into smaller, focused functions instead.

Functions should handle errors gracefully: validate input parameters, check for required files or permissions, return appropriate exit codes, and send error messages to stderr.

Document complex functions with comments:

# Function: backup_file
# Purpose: Creates a timestamped backup copy of a file
# Parameters: $1 - source file path
# Returns: 0 on success, 1-4 on various error conditions
# Output: Path to backup file on success
backup_file() {
    # Function implementation
}

Here’s a practical example showing function organization in a system maintenance script:

#!/usr/bin/env bash
# System maintenance script with modular functions

# Global configuration
LOG_FILE="/var/log/maintenance.log"
TEMP_CLEANUP_DAYS=7
LOG_RETENTION_DAYS=30

# Logging function
log_message() {
    local level="$1"
    local message="$2"
    echo "$(date '+%Y-%m-%d %H:%M:%S') [$level] $message" | tee -a "$LOG_FILE"
}

# Check disk space function
check_disk_space() {
    local mount_point="${1:-/}"
    local threshold="${2:-90}"
    
    local usage=$(df "$mount_point" | tail -1 | awk '{print $5}' | sed 's/%//')
    
    if [ "$usage" -gt "$threshold" ]; then
        log_message "WARNING" "Disk usage on $mount_point: ${usage}%"
        return 1
    else
        log_message "INFO" "Disk usage on $mount_point: ${usage}% (OK)"
        return 0
    fi
}

# Clean temporary files function
cleanup_temp_files() {
    local temp_dir="${1:-/tmp}"
    local days="${2:-$TEMP_CLEANUP_DAYS}"
    
    log_message "INFO" "Cleaning temporary files older than $days days in $temp_dir"
    
    local count=$(find "$temp_dir" -type f -atime +$days | wc -l)
    
    if [ "$count" -gt 0 ]; then
        find "$temp_dir" -type f -atime +$days -delete
        log_message "INFO" "Cleaned $count temporary files"
    else
        log_message "INFO" "No temporary files to clean"
    fi
}

# Rotate log files function
rotate_logs() {
    local log_dir="${1:-/var/log}"
    local retention_days="${2:-$LOG_RETENTION_DAYS}"
    
    log_message "INFO" "Rotating logs older than $retention_days days in $log_dir"
    
    # Compress logs older than 1 day
    find "$log_dir" -name "*.log" -mtime +1 -exec gzip {} \;
    
    # Remove compressed logs older than retention period
    local deleted_count=$(find "$log_dir" -name "*.log.gz" -mtime +$retention_days | wc -l)
    find "$log_dir" -name "*.log.gz" -mtime +$retention_days -delete
    
    log_message "INFO" "Removed $deleted_count old log files"
}

# Main execution
main() {
    log_message "INFO" "Starting system maintenance"
    
    # Check disk space on critical mount points
    for mount in "/" "/var" "/home"; do
        if [ -d "$mount" ]; then
            check_disk_space "$mount" 85
        fi
    done
    
    # Perform cleanup tasks
    cleanup_temp_files "/tmp" 7
    cleanup_temp_files "/var/tmp" 14
    rotate_logs "/var/log" 30
    
    log_message "INFO" "System maintenance completed"
}

# Run main function
main "$@"

Each function has one job: log a message, check disk space, clean files, or rotate logs. The main function at the bottom just calls them in order. This makes the high-level logic readable without needing to scroll through the implementation details, and makes it possible to test check_disk_space independently by calling it directly.

Here’s an example of a function that both returns data and exit codes:

# Function that validates and normalizes IP addresses
validate_ip() {
    local ip="$1"
    local normalized_ip
    
    # Check basic format
    if [[ ! "$ip" =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
        return 1
    fi
    
    # Check each octet is valid (0-255)
    IFS='.' read -ra octets <<< "$ip"
    for octet in "${octets[@]}"; do
        if [ "$octet" -gt 255 ] || [ "$octet" -lt 0 ]; then
            return 2
        fi
    done
    
    # Normalize by removing leading zeros
    normalized_ip=$(printf "%d.%d.%d.%d" "${octets[@]}")
    echo "$normalized_ip"
    return 0
}

# Usage example
if result=$(validate_ip "192.168.001.100"); then
    echo "Valid IP address: $result"
else
    echo "Invalid IP address format"
fi

The if result=$(validate_ip "192.168.001.100"); then line does two things at once: it captures the function’s output into result and tests its return code. If validation fails, result is empty and the else branch runs. This pattern — output for data, return code for status — is the standard approach for functions that need to communicate both.

Common Pitfalls

Quoting variables. This is the most common source of subtle bugs. Always quote variable references in test conditions: [ "$variable" = "value" ] not [ $variable = value ]. An unquoted variable containing a space becomes two words and causes a syntax error; an unquoted empty variable disappears entirely.

Exit status is inverted from most languages. In bash, 0 means success and non-zero means failure — the opposite of what most programmers expect. In a test condition, 0 (success) evaluates as true. Keep this straight or your if logic will be backwards.

Pipes create subshells. Variable assignments inside a piped loop do not persist after the loop finishes. Use input redirection (done < file) rather than a pipe to keep the loop in the current shell.

Function variables are global by default. Any variable set inside a function without local modifies the global scope. This causes hard-to-trace bugs when a function silently overwrites a variable the caller depends on. Always use local unless global scope is intentional.

Unhandled failures. Commands fail for many reasons: missing files, wrong permissions, exhausted resources. Use set -e to exit on unexpected errors, check return codes for critical operations, and write error messages that say what went wrong and where.

Untrusted input. Validate and quote variables that come from user input or external sources. Never pass untrusted input to eval. When creating temporary files in world-writable directories like /tmp, use mktemp to avoid predictable filenames that attackers could exploit.

  1. “Classic Shell Scripting” by Arnold Robbins and Nelson Beebe - Covers bash scripting fundamentals with a strong focus on portable scripting and real-world system administration examples. Good for building habits that hold up across different Unix environments.

  2. “Learning the bash Shell” by Cameron Newham - A thorough walkthrough of bash features that progresses from basics to advanced scripting. Useful if you want to understand why bash works the way it does, not just how to use it.

  3. “The Linux Command Line” by William Shotts - Chapters 24–37 cover shell scripting from fundamentals through more advanced techniques. A free version is available at linuxcommand.org.

  4. “Bash Pocket Reference” by Arnold Robbins - A slim reference for syntax, built-in commands, and scripting patterns. Handy to keep open while writing scripts.

  5. Advanced Bash-Scripting Guide — A sprawling online reference that covers bash in great depth. Not the best starting point, but useful when you need to understand a specific edge case.

Lab Exercises

Lab Overview

You will build a system health monitoring script called healthcheck.sh across three exercises. Each exercise adds one layer to the script, so every piece of code you write stays in the final version. By the end you will have a working tool that checks disk space across multiple mount points and waits for a remote host to become reachable — something you could actually drop into a server’s cron schedule.

Create the script file before you start:

touch ~/healthcheck.sh
chmod +x ~/healthcheck.sh

Lab Exercise 1: Decision Logic

Open ~/healthcheck.sh in your editor. Add the shebang line and a function that checks disk usage on a given mount point and reports its status:

#!/usr/bin/env bash
# healthcheck.sh — system health monitoring script

WARN_THRESHOLD=${WARN_THRESHOLD:-80}
CRIT_THRESHOLD=${CRIT_THRESHOLD:-90}

check_disk() {
    local mount="$1"
    local usage

    if [ ! -d "$mount" ]; then
        echo "UNKNOWN: $mount is not a valid mount point"
        return 3
    fi

    usage=$(df "$mount" | tail -1 | awk '{print $5}' | sed 's/%//')

    if [ "$usage" -ge "$CRIT_THRESHOLD" ]; then
        echo "CRITICAL: $mount is ${usage}% full"
        return 2
    elif [ "$usage" -ge "$WARN_THRESHOLD" ]; then
        echo "WARNING: $mount is ${usage}% full"
        return 1
    else
        echo "OK: $mount is ${usage}% full"
        return 0
    fi
}

check_disk "/"
check_disk "/tmp"
check_disk "/nonexistent"

Run the script and observe the output:

~/healthcheck.sh

Test the threshold logic by temporarily lowering the warning threshold and re-running:

WARN_THRESHOLD=5 ~/healthcheck.sh

Verification steps:

  • The /nonexistent check prints UNKNOWN without crashing the script
  • Lowering the threshold causes at least one WARNING line to appear
  • Each check_disk call exits with a different return code (0, 1, 2, or 3) depending on usage

Lab Exercise 2: Loops

Replace the three standalone check_disk calls at the bottom of your script with a for loop that checks a list of mount points, and add an until loop that waits for a host to respond to ping:

# Check all mount points
MOUNT_POINTS=("/" "/tmp" "/var")

for mount in "${MOUNT_POINTS[@]}"; do
    check_disk "$mount"
done

# Wait for a host to become reachable
HOST="${HOST:-8.8.8.8}"
MAX_ATTEMPTS=${MAX_ATTEMPTS:-5}
attempt=0

until ping -c1 -W1 "$HOST" &>/dev/null; do
    attempt=$((attempt + 1))
    if [ "$attempt" -ge "$MAX_ATTEMPTS" ]; then
        echo "CRITICAL: $HOST unreachable after $MAX_ATTEMPTS attempts"
        break
    fi
    echo "Waiting for $HOST... (attempt $attempt)"
    sleep 2
done

if ping -c1 -W1 "$HOST" &>/dev/null; then
    echo "OK: $HOST is reachable"
fi

Run the script and watch both sections execute:

~/healthcheck.sh

To test the retry logic without waiting for a real timeout, try an address you know is unreachable on your network and reduce MAX_ATTEMPTS to 2:

HOST="192.0.2.1" MAX_ATTEMPTS=2 ~/healthcheck.sh

Verification steps:

  • All mount points in MOUNT_POINTS are checked, including /var if it exists
  • With an unreachable host, the loop exits cleanly after MAX_ATTEMPTS attempts and prints the CRITICAL message
  • No infinite loop occurs — the break fires correctly

Lab Exercise 3: Functions

Extract the host-reachability logic into a reusable function called check_host. Place it alongside check_disk near the top of the script, before the main body:

check_host() {
    local host="$1"
    local max_attempts="${2:-5}"
    local attempt=0

    until ping -c1 -W1 "$host" &>/dev/null; do
        attempt=$((attempt + 1))
        if [ "$attempt" -ge "$max_attempts" ]; then
            echo "CRITICAL: $host unreachable after $max_attempts attempts"
            return 2
        fi
        echo "Waiting for $host... (attempt $attempt)"
        sleep 2
    done

    echo "OK: $host is reachable"
    return 0
}

Replace the inline retry block at the bottom of the script with a call to the new function. Use local variables inside it and a default argument for max_attempts:

# Check all mount points
for mount in "${MOUNT_POINTS[@]}"; do
    check_disk "$mount"
done

# Check host reachability (3 attempts max)
check_host "8.8.8.8" 3

Run the complete script one final time:

~/healthcheck.sh

Verify that the function’s return code works correctly by temporarily replacing the check_host "8.8.8.8" 3 call at the bottom of healthcheck.sh with this conditional block, then running the script:

if check_host "8.8.8.8" 3; then
    echo "Network check passed"
else
    echo "Network check failed"
fi

Verification steps:

  • The check_host function accepts both one and two arguments — the second has a working default
  • Calling check_host with an unreachable address returns exit code 2
  • The main script body is now a clean list of function calls with no inline logic

Troubleshooting Guide

Issue: df output format differs from expected

  • Cause: Some systems add extra lines for filesystem headers or use different column ordering
  • Solution: Confirm the output of df / manually and adjust the awk '{print $5}' field number if needed

Issue: ping flags not recognised

  • Cause: macOS and some Linux distributions use different ping option syntax
  • Solution: Try ping -c1 -t1 (macOS) or ping -c1 -W1 (Linux); check man ping for your system

Issue: The until loop runs forever

  • Cause: The break statement is missing or placed outside the loop body
  • Solution: Check indentation carefully — the if [ "$attempt" -ge "$MAX_ATTEMPTS" ] block and break must be inside the until … done block

Issue: WARN_THRESHOLD override in environment doesn’t take effect

  • Cause: Variables defined inside the script shadow the environment at assignment time
  • Solution: Move threshold variable definitions to the top of the script without quotes around the value, so environment overrides work: WARN_THRESHOLD=${WARN_THRESHOLD:-80}

Assessment

Multiple Choice Questions

Question 1: What is the purpose of the shebang line in a shell script?

  • a) To specify which interpreter should execute the script
  • b) To add comments to the script
  • c) To make the script executable
  • d) To set environment variables

Question 2: Which conditional structure is most appropriate for comparing a single variable against multiple specific values?

  • a) if/then/else
  • b) while loop
  • c) case statement
  • d) for loop

Question 3: What does the test condition [ -f "$filename" ] check for?

  • a) If the file exists and is executable
  • b) If the file exists and is a directory
  • c) If the file exists and is readable
  • d) If the file exists and is a regular file

Question 4: In the condition [ "$disk_usage" -gt 90 ] && [ -w "$log_file" ], when will the overall condition be true?

  • a) When either condition is true
  • b) When both conditions are true
  • c) When the first condition is false
  • d) When exactly one condition is true

Question 5: Which loop type would be most appropriate for processing each file in a directory?

  • a) for loop
  • b) until loop
  • c) while loop
  • d) case statement

Question 6: What is the difference between local variable="value" and variable="value" inside a function?

  • a) There is no difference
  • b) Local variables are read-only
  • c) Local variables are only accessible within the function
  • d) Local variables cannot be passed to other functions

Question 7: Which command makes a shell script executable?

  • a) sudo script.sh
  • b) exec script.sh
  • c) run script.sh
  • d) chmod +x script.sh

Question 8: What does the condition [ -z "$variable" ] test for?

  • a) If the variable is empty or unset
  • b) If the variable contains only zeros
  • c) If the variable is a number
  • d) If the variable contains spaces

Question 9: In a while loop, when does the loop stop executing?

  • a) When the condition becomes true
  • b) After a fixed number of iterations
  • c) When the condition becomes false
  • d) When the script encounters an error

Question 10: What is the recommended shebang line for maximum portability across different systems?

  • a) #!/bin/bash
  • b) #!/usr/bin/env bash
  • c) #!/bin/sh
  • d) #!/usr/local/bin/bash

Short Answer Questions

Question 11: Explain the difference between a while loop and an until loop. Provide a scenario where each would be most appropriate.

Question 12: Write a function called backup_file that takes a filename as a parameter, checks if the file exists and is readable, and creates a backup copy with a timestamp. Include proper error handling.

Question 13: Describe three different ways to handle multiple conditions in shell scripts. Give an example of when you would use each approach.

Updated 2026-03-10