Common Simple Commands - Bash Notes 2 - 黯羽轻扬

cat(concatenate)

Read, display, and concatenate file contents.

Concatenate content from standard input with file content:

echo 'from stdin' | cat - test.sh

The cat command uses - to represent standard input.

If file is a single dash (`-') or absent, cat reads from the standard input.

Other commonly used options:

# Add line numbers to file content
cat -n test.sh
# Squeeze multiple consecutive empty lines into one
cat -s test.sh

find

Basic Rules

Traverse downwards from a directory, matching and processing files that meet conditions.

# List all files/folders and sub-files/folders in the current directory
find .
# Separate with \0 (useful when file paths contain newlines)
find . -print0

# Wildcards
find -name "*.js"
# Ignore case
find -iname "*.js"
# Multiple conditions
find . \( -name "e*" -o -name "s*" \)
# Path matching
find ../tnode -path "*node*"
# Same as -path, but the argument is a regular expression
find . -regex ".*/e.*h$"
# Ignore case
find . -iregex ".*/e.*h$"

# Negation argument (an independent argument, can be used with -name/path/regex etc.)
find . ! -iregex ".*/e.*h$"
# For example, exclude paths containing node_modules
find ../tnode ! -regex ".*node_modules.*"

# Specify directory depth; -maxdepth 1 means looking one level down (i.e., children of .., not grandchildren)
find .. -name "*.js" -maxdepth 1
# You can also specify a starting depth; -mindepth 2 -maxdepth 2 means searching only within the grandchildren of .., not children or great-grandchildren
find .. -name "*.js" -mindepth 2 -maxdepth 2
# Use -mindepth alone to find files exceeding a specified depth (to find deep path libs)
find .. -regex ".*node_modules*.*.js$" -mindepth 20

Search by File Type

# Specify file/folder; -type f means outputting files only
find ../tnode ! -regex ".*node_modules.*" -type f

P.S. The order of arguments affects search efficiency; for instance, checking depth before filtering by type is faster.

Mapping between file types and type argument values:

Regular file: f
Symbolic link: l
Directory: d
Character device: c
Block device: b
Socket: s
FIFO: p

Search by Time

Each file has 3 types of timestamps:

Access time: -atime
Modification time: -mtime
Change time: -ctime

The argument value is an integer representing days, which can be prefixed with + or - to signify greater than or less than. For example:

# Find files in the parent directory accessed from yesterday until now
find .. -type f -atime -1

There are also units in minutes:

# -amin, -mmin, -cmin
find .. -type f -amin $((-1 * 60 * 24))

You can also specify a file as a reference to find newer files (those with more recent modification times):

# Find files in the parent directory newer than ~/.bash_profile
find .. -type f -newer ~/.bash_profile

Search by File Size

# Files in the current directory larger than 1K
find . -type f -size +1k

Supports b (blocks), c (bytes), w (words), k, M, and G units. Note the lowercase for the first few and uppercase for the last two; this is generally the case in other commands like split.

Other Usages

# Find and delete
find . -type f -name "*.tmp" -delete
# Match file permissions
find . -type f -perm 777 -print
find . -type f -user ayqy

Combine with -exec to execute other commands:

# Find and format output
find . -type f -exec printf "file: %s\n" {} \;
# Find and back up
find . -type f -mtime +7 -exec cp {} bak/ \;

P.S. The escaped semicolon at the end is used to indicate the end of the -exec argument value and is mandatory.

-exec can only execute a single command. To execute multiple commands, write them into a file and execute that instead. For example, write the backup command into bak.sh:

#!/bin/bash
BAK_DIR=bak

if ! test -e "$BAK_DIR";
then
    mkdir "$BAK_DIR"
fi

for file in "$@";
do
    cp $file "$BAK_DIR"
done

Then find and execute:

find . -type f -mtime +7 -exec ./bak.sh {} \;

-prune to exclude things from the search:

# Skip .git and node_modules directories
find . \( -name ".git" -prune \) -o \( -name "node_modules" -prune \) -o \( -type f -print \)

xargs

The xargs command reformats data received from stdin and provides it as arguments to other commands. It immediately follows the pipe operator. Basic form:

cmd | xargs

Convert multiline input into single-line output:

# Replace newlines with spaces
cat test.sh | xargs

Convert single-line input into multiline output:

# Break by the number of arguments per line
echo "1 22 3 4 5 6 7" | xargs -n 3

-d specifies the delimiter to perform string split:

# split
echo "1,2,3,4" | xargs -d ,
# The `-d` argument is a GNU extension and is not available on FreeBSD or macOS; complete this using other methods
echo "1,2,3,4" | tr , ' '

-I specifies the replacement string:

# replace
echo "1 2 3 4" | xargs -n 1 -I {} find {}.txt

find combined with xargs:

# Find and delete
find . -type f -name "*.tmp" -print0 | xargs -0 rm -f

Here, -print0 and xargs -0 use \0 as the delimiter to prevent filenames like temp file.tmp, which contain the default delimiter, from being split into two arguments.

Count lines of code:

find . -type f -name "*.sh" -print0 | xargs -0 wc -l

Execute multiple commands for a single argument:

# Same effect as the replace above
echo '1\n2\n3\n4' | (while read arg; do find $arg.txt; done)

xargs can only execute one command per argument. By using a loop inside a subshell to read instead, you can execute multiple commands within the loop body.

P.S. The parentheses here are the subshell operators, which open a subshell to execute the commands inside; they are not the conditional grouping mentioned earlier, so do not escape them.

tr(translate)

Replace, delete, and squeeze characters from standard input for string processing.

#  Case conversion
echo 'Ho Hoho hoho' | tr 'a-z' 'A-Z'

If the two character sets are of different sizes, the latter set is padded with its last character. For example:

# The result is ABC XXX
echo 'abc xyz' | tr 'a-z' 'A-X'

P.S. The character set is defined in the form start character - end character. If the result is not a continuous sequence, it is treated as three ordinary characters.

Note: tr only maps each input character; it does not perform string matching or replacement. It is a character-level operation, not a string-level one.

Other common options:

# -d to delete characters
# Result is a, a , 579
echo 'hohoa, hoa 123, 4579' | tr -d 'ho0-4'
# -c to get the complement set, usually combined with -d to delete characters in the complement set, keeping only those in the specified character set
# Result is hohoho1234
echo 'hohoa, hoa 123, 4579' | tr -d -c 'ho0-4'
# -s to squeeze characters (replace consecutive duplicate characters with one)
# Result is ha, ha
echo 'hhhhhha, ha' | tr -s 'a-z'

Use character classes as sets:

# Case conversion
echo '124abcX1' | tr '[:lower:]' '[:upper:]'

Other character classes can be viewed via man tr.

md5sum, sha1sum

These two commands are used to calculate checksums. For example:

# Calculate the MD5 of a file
# The result is `32-character hexadecimal string filename`
md5sum test.sh

P.S. macOS does not have md5sum or sha1sum by default; they need to be installed separately.

Verify using an MD5 file:

# Use an MD5 file to check if files are correct
md5sum -c file.md5

Use md5deep to generate the MD5 of a folder; it needs to be installed separately (same for sha1deep):

# Install via yum
yum install md5deep
# Calculate the MD5 of a folder
# -r for recursive, -l to generate relative paths (default is absolute paths)
md5deep -rl dir > dir.md5
# Verify using all MD5 files
md5sum *.md5

sort & uinq

The sort command sorts lines, and uniq removes duplicates. They are generally used together. For example:

# Sort each line of file.txt lexicographically and remove duplicates
sort file.txt | uniq
# Or
sort -u file.txt

Sorts in ascending lexicographical order by default; -n sorts numerically, and -r sorts in reverse:

# If there are both letters and numbers, letters come first
sort -n file.txt
sort -r file.txt

Other common options:

# Check if file content is ordered; use -nC for numerical order
# A return value of 0 indicates it is ordered
sort -C file.txt; echo $?
# Sort by the second column
sort -k 2 file.txt
# Sort from the 2nd to the 5th character
sort -k 2,5 file.txt
# Use \0 as the delimiter (useful when combining with other commands via pipes)
sort -z file.txt
# Ignore leading whitespace characters
sort -b file.txt

The uniq command can only be used on ordered input, so it is generally used in combination with sort:

# Show only unique lines (lines appearing more than once are filtered out)
uniq -u sorted.txt
# Count occurrences of each line
uniq -c sorted.txt
# Find duplicate lines
uniq -d sorted.txt

De-duplication can also specify a key:

# -s to skip the first few characters, -w to specify the length of the key
uniq -s 3 -w 2 sorted.txt

P.S. macOS does not have the -w option.

split

The split command is used to split large files. For example:

# Split data.txt into multiple 1k files
split -b 1k data.txt

By default, it generates filenames like xaa, xab, xac, etc., and strictly splits by size. Lines may be truncated, or even a Chinese character could be split.

Generated filenames can be manually specified; the last argument is the prefix (default is x). -a specifies the suffix length. Check man split for other options.

You can also split files by line count:

# 10 lines per file, with filenames `small.aa, small.ab, small.ac...`
split -l 10 test.sh 'small.'

P.S. I didn't know this command existed; I once searched for a text editor capable of editing large files just to split an sql backup manually...

P.S. Another more powerful file splitting command is csplit, often used to split log files, which can split based on the presence of specific text content.

Other Tips

Temporary File Naming

Ubuntu and Debian have the tempfile command to generate temporary filenames (a random string). In other environments, you can use the RANDOM environment variable or the current process ID:

# Get the value of the RANDOM environment variable
$RANDOM
# Get the current process ID
$$

String Extraction

The %, %%, #, ## operators provide powerful string extraction features:

file=logo.png
# Extract filename
filename=${file%.*}
echo filename:$filename
# Extract extension
ext=${file##*.}
echo ext:$ext

Usage is as follows:

# Delete the string matching the wildcard to the right of % from the value of var, matching from right to left
${var%.*}
# %% is greedy matching, finding the longest string, while % matches the shortest
${var%.*}

# Delete the string matching the wildcard to the right of # from the value of var, matching from left to right
${var#*.}
# Corresponding greedy matching
${var##*.}

Extracting the extension should use ## greedy matching, as filenames like file.txt.md5 contain multiple dots.

Common Simple Commands - Bash Notes 2