Syntax Rules - Bash Notes 1 - 黯羽轻扬

1. Basics

Basic rules of bash scripts:

Expand variables during execution, get command and argument strings, and execute.

General workflow:

# 1. Create/edit `.sh` file (moral constraint)
vim test.sh
# 2. Add executable permission
chmod +x test.sh
# 3. Execute
./test.sh

Simple example:

#!/bin/bash
# 声明变量
str='hoho'
# 输出变量值
echo $str

Where the first line #!/bin/bash specifies the path to the interpreter, which can be checked via which bash.

P.S. #! is called a shebang (translated as 'shì bàn', pronounced that way too). For more information, please check Shebang: The #! symbol on Linux.

2. Variables

1. Environment Variables

HOME    # 当前用户目录的绝对路径
USER    # 当前用户名
PWD     # 当前工作目录
# ...
# 更多变量用`env`命令查看

No declaration needed, use directly, similar to __dirname and __filename in node.

3 ways to create environment variables:

Add permanent environment variables in bashrc files (system-level /etc/bashrc and user-level ~/.bashrc) (available to every newly created shell).
Set temporary environment variables when executing a script (only valid within the subshell executing the script).
export environment variables (only valid for subsequently created subshells).

For example:

# 方式1
# 如果是zsh，对应文件名为`~/.zshrc`
echo _ENV=product >> ~/.bashrc
source ~/.bashrc
echo $_ENV

# 方式2
_ENV=product ./test.sh
# 在./test.sh中可以读到_ENV
echo $_ENV

# 方式3
_ENV=product; export _ENV
# 新开一个shell
bash
echo $_ENV

2. Global and Local Variables

#!/bin/bash
# 默认声明全局变量
VAR="global variable"
function fn() {
    VAR="updated global variable"
    # 只能在function里通过local关键字声明局部变量
    local VAR="local variable"
    echo $VAR
}
fn
echo $VAR

# 输出
local variable
updated global variable

Declare as you go; by default, all are global variables. Local variables can only be declared inside a function using the local keyword. Also, note:

There cannot be spaces on either side of the equals sign because each line is treated as "command space argument".
Quotes are not mandatory; like CSS, quotes are only necessary when the content contains spaces.
There is no concept of hoisting; the scope of a local variable is from the declaration to the end of the function body, and the scope of a global variable is from the declaration to the end of the file.

3. Accessing Variable Values

Use $variable_name to get the value, e.g., $VAR.

Variable interpolation rule: variables referenced in double quotes are expanded, while those in single quotes are not, just like in PHP.

{} can isolate variable names, protecting them:

${VAR}abc   # VAR的值后面紧跟着字符串abc

This is mandatory when accessing array elements, for example:

arr=(aa b ccc)
# 输出aa[1]，不符合预期
# 因为用$对arr取值，得到aa（$arr返回首元），再给后面接上字符串[1]
echo $arr[1]
# 输出b
echo ${arr[1]}

3. Branches and Loops

1. Conditional Statements

if 条件      # test命令和[]操作符
then
    语句...
else        # else if写作elif
    语句...
fi

The condition part is generally the test command or [] operator, for example:

if [ $X -lt $Y ];   # X小于Y
if [ -n $X ];       # 变量非空（字符串长度不为0）
if [ -e $path ];    # 文件存在

# 数值比较
if test 2 -gt 1; then echo "number 2 > 1"; fi
# 等价于
if [ 2 -gt 1 ]; then echo "number 2 > 1"; fi

# 字符串比较
if test 2 > 11; then echo "string 2 > 11"; fi
# 等价于
if [ 2 > 11 ]; then echo "string 2 > 11"; fi

There are 3 details here: operand types, semicolons, and spaces:

-gt represents numerical greater-than comparison, while > represents string greater-than comparison. Operators convert automatically during calculation; otherwise, an error occurs.
; is used in single-line statements to distinguish block structures. The first semicolon marks the end of the condition, and the second marks the end of the then part; both are indispensable.
Spaces are used to separate commands and arguments (besides spaces, default delimiters include tabs and newlines; see IFS below).

P.S. Common ways to convert strings to numbers are ((str)), `expr str`, and $(expr str). The former is a bash operator, and the latter two are external commands.

Space examples:

# 空格很关键
if [ 1=2 ];         # 把1=2整体当操作数（字符串）了，没看见操作符
# []里两端的空格也很关键
if [-e $path]; then # 报错[-e命令找不着，因为被看成了'[-e' '$path]'

Other test operators can be checked via the man test command.

bash also provides something similar to switch, but the syntax is very strange:

case $variable in
    pattern1)
        command...
        ;;  # break
    pattern2|pattern3)
        command...
        ;;
    patternN)
        command...
        ;;
    *)  # default case
        command...
esac

2. Loop Statements

There are 3 types of loops: for, while, and until. The syntax rules are as follows:

# for循环
for f in $( ls /var/ ); do
    echo $f
done
# 或者单行的（分号区分结构块）
for f in $( ls /var/ ); do echo $f; done

# while循环
times=6
while [ $times -gt 0 ]; do
    echo Value of times is: $times
    let times=times-1
done
# 单行形式
times=6; while [ $times -gt 0 ]; do echo Value of times is: $times; let times=times-1; done

# until循环
times=0
until [ $times -gt 5 ]; do
    echo Value of times is: $times
    let times=times+1
done

Besides for...in, there is also C-style:

arr=(1 '2 3' 4)
len=3
for (( i=0; i<$len; i++)); do
    echo ${arr[${i}]}
done

The basic rule of loops:

The loop iterates through items separated by IFS (' ', '\t', '\n').

IFS (Internal Field Separator). The default is space, tab, and newline, so note this situation:

for f in $( ls -l /var/ ); do echo $f; done

The output result is not as expected:

total
0
drwx------
2
root
wheel
68
8
23
2015
agentx

It should have been like this:

total 0
drwx------   2 root       wheel        68  8 23  2015 agentx

To read the entire line in a loop, you need to modify IFS:

# 限制分隔符只认换行
IFS=$'\n'; for f in $( ls -l /var/ ); do echo $f; done

Additionally, loops are often used with the wildcard *, for example:

# 通配符
echo *      # 当前目录下所有文件/文件夹名，空格分隔
echo *.html # 当前目录下所有html格式文件

# 找test目录下所有html文件
for htmlFile in `echo ~/Documents/projs/test/*.html`; do echo $htmlFile; done
# 找test目录下所有html文件，包括子孙目录
for htmlFile in `echo ~/Documents/projs/test/**/*.html`; do echo $htmlFile; done

Loop + Wildcard is very convenient for operating on directory files.

4. Functions

1. Function Declaration

function function_name {
    command...
}

# 或者
function_name () {
    command...
}

If the function keyword is omitted, () must follow the function name; otherwise, it will be treated as a command, for example:

# 报错，parse error near `}'
fn {echo fn}; fn

If the function keyword is not omitted, it doesn't matter whether () follows the function name.

Function declaration order is not very strict, but ensure it is declared before being called, for example:

fn1() {echo fn1:`fn2 $1`}
fn2() {echo fn2:$1}

fn1 hoho
# 输出
# fn1:fn2:hoho

If fn1 is called before declaring fn2, an error will occur stating fn2 cannot be found, as follows:

# 报错command not found: fn2
fn1() {echo fn1:`fn2 $1`}; fn1 hoho; fn2() {echo fn2:$1}

2. Calling and Argument Passing

Arguments are obtained via positional variables; parameters are not explicitly declared:

# 声明函数
fn() {echo $0$1}
# 无参调用
fn
# 传入一个字符串参数hoho
fn hoho

Within the function scope, several positional variables are provided (all are read-only):

$0  # 函数名（不算参数，因为$*和$@不包含$0，$#也不计$0）
$n  # 第n个参数，参数从1开始
$*  # 由所有参数拼成的字符串，用空格分隔
$@  # 同上，区别是每个参数会被双引号保护起来
$#  # 参数个数

P.S. Note that $10 is not ${10}; the former is $1 followed by the string 0, while the latter is the value of the 10th argument.

Additionally, these positional variables also apply to passing arguments to scripts via the command line, for example:

# sub.sh
echo $1-$2=`expr $1 - $2`

# 命令行执行
./sum.sh 1 2
# 输出
# 1-2=-1

The difference between $* and $@ is important; simply understood:

$*=$1 $2 $3...
$@="$1" "$2" "$3"...

Example:

# 没有区别
fn1() {for arg in $*; do echo line:$arg; done}; fn1 "a" "b c" "d"
fn2() {for arg in $@; do echo line:$arg; done}; fn2 "a" "b c" "d"
# 输出
# line:a
# line:b c
# line:d

# 用双引号包起来时能发现区别
fn1() {for arg in "$*"; do echo line:$arg; done}; fn1 "a" "b c" "d"
# 输出
# line:a
# b c
# d
fn2() {for arg in "$@"; do echo line:$arg; done}; fn2 "a" "b c" "d"
# 输出
# line:a
# line:b c
# line:d

When wrapped in double quotes, the number of iterations differs: $* iterates once, while $@ iterates three times. Thus, it's generally recommended to use $@.

3. Return Values

None of the 3 methods are very good; avoid returning if possible (modify external variables directly). If you must return a value, it's recommended to use subshell execution and echo to return it, though the considerations (see code comments) are also cumbersome. Examples are as follows:

# 1.return
fn() {
    return -2
}
fn
# 取出上一条命令的返回值，0表示正常，非0不正常
# 实际输出是254，超出范围的会被框进来，256会变成0，-1变成255
echo $?

# 缺点：return表示函数执行状态，只能返回[0, 255]的整数，无法return字符串
#      其次`$?`必须紧跟在函数调用后面


# 2.子shell执行，echo传回
fn() {
    echo -2
    # 避免错误信息进入标准输出，直接丢掉
    cat xxx 2> /dev/null
}

echo $(fn)

# 缺点：给标准输出的结果可能不干净
#      （函数体不止一条`echo`语句，或者有`print`、`printf`之类的也输出到标准输出的语句）
#      另外，如果函数执行过程中出错了，错误信息也会混进去（虽然可以避免，但比较麻烦）

# 3.所谓的传引用
fn() {
    # 约定第一个参数传入的字符串是返回变量名
    local res=$1
    # 计算1+2，再按照返回变量名创建全局变量，带回结果
    eval $res=$(($2 + $3))
}
   
fn result 1 2
echo $result

# 缺点：其实就是全局变量传值，只是全局变量名动态传入，没有在函数里写死而已

5. Arrays

1. Declaration and Assignment

# 空数组
arr=()
# 字符串数组，空格分隔元素，不用逗号
arr=(1 2 3 'we together')

# 直接赋值，没有就新增一个
arr[0]=4
arr[6]='sixth'

Array indices start from 0; continuity is not required during assignment; a new one will be added if it doesn't exist.

2. Traversal

The for loop doesn't need to know the array length:

arr=(1 2 3 '4 5')
for i in "${arr[@]}"; do echo $i; done

特別注意 "${arr[@]}"; similar to positional variables in functions, the number of iterations for $* and $@ differs:

for i in "${arr[@]}"; do echo $i; done
# 输出
# 1
# 2
# 3
# 4 5

for i in "${arr[*]}"; do echo $i; done
# 输出
# 1 2 3 4 5

while and until need to know the array length:

# 取数组长度
len=${#arr[@]}
i=1
while [ $i -lt $len ]; do echo $arr[$i]; i=$((i+1)); done
# 或者until
until [ $len -lt 1 ]; do len=$((len-1)); echo ${arr[$len]}; done

Note: ${#str} gets the string length, which is very similar to ${#arr[@]} for array length and easily confused.

6. Command Substitution

Command substitution refers to executing a shell command within a bash script and obtaining its output (roughly speaking; no strict definition found).

If executed directly, the result is output to standard output (screen). To retrieve the result, command substitution is needed:

# 直接执行 屏幕输出了ls命令的结果
ls
# 命令替换 屏幕不输出结果，由自定义变量记下
lsResult=`ls`

# 看起来像是丢弃结果（不输出也不记）
# 实际上可能会报错，`ls`命令返回结果字符串，会被当作命令继续执行
`ls`

There are 2 common ways for command substitution: backtick expansion and parenthesis expansion, for example:

# 反撇号扩展，不允许嵌套
files=`ls`
# 圆括号扩展，允许嵌套
files="$(ls)"
# 嵌套示例
# 当前目录下文件及文件夹数量+1
$(expr ${#$(ls)[@]} + 1)

An interesting point: both command substitution methods execute commands in a new shell (i.e., a subshell) and have no impact on the current shell, making it easy to isolate the operating environment, for example:

# 去新环境执行cd
lsParent="$(cd ../; ls)"
# 执行后pwd不变，不用再cd回来

By this point, it's basically enough to complete slightly complex (and useful) bash scripts.

Syntax Rules - Bash Notes 1