KUAS Engineering

Working with multiple files and directories

The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.

Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.

Review

Review of important concepts:

The file system manages the storage of data on the disk.
Files contain data.
Directories contain files or other directories, forming a directory tree.
cd path changes the current working directory.
ls path lists infomration about a file or directory. With no argument, ls lists the files in the current working directory.
pwd prints the current working directory.
/ at the start of a path means the root directory at the 'top' of the filesystem.
An absolute path specifies a location starting from the root directory (and therefore always begins with /).
A relative path specifies a location starting from the current working directory.
Directory names in a path are separated by / characters.
“..” is the name of the parent directory; “.” is the name the current directory.

Copying directories

The command cp files… directory copies one or more files into directory. If any of the files happen to be directories then the cp command will fail.

To copy an entire directory (recursively) use cp with the -r option.

The cp -r files… directory command copies one or more files into directory. If any of the files are directories then first the directory is copied along with all of its contents.

Let's practice on a simple directory hierarchy.

Use the mkdir and echo commands to recreate the dir1 directory and its three files as shown in the diagram. The content of the three files is not important.

$ cd /tmp $ mkdir dir1 $ echo 1 > dir1/file1 $ echo 2 > dir1/file2 $ echo 3 > dir1/file3 $ ls -lR dir1 dir1: total 48 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file3.txt

Use cp -rv (recursive and verbose) to copy the entire directory dir1 to a new directory tree called dir2.

$ cp -rv dir1 dir2 'dir1' → 'dir2' 'dir1/file3.txt' → 'dir2/file3.txt' 'dir1/file2.txt' → 'dir2/file2.txt' 'dir1/file1.txt' → 'dir2/file1.txt'

Because dir2 does not yet exist, it is first created in the current directory and then the contents of dir1 are copied to dir2. The -v option shows you the directory being created and the files being copied.

What will happen if you run the same cp -rv dir1 dir2 command again?

$ cp -rv dir1 dir2 'dir1' → 'dir2/dir1' 'dir1/file3.txt' → 'dir2/dir1/file3.txt' 'dir1/file2.txt' → 'dir2/dir1/file2.txt' 'dir1/file1.txt' → 'dir2/dir1/file1.txt' $ ls -lR dir2 dir2: total 64 drwxr-xr-x 2 piumarta dialout 170 Oct 26 05:57 dir1 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file3.txt dir2/dir1: total 48 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file3.txt

Because dir2 already exists, dir1 is copied into dir2; the new copy of dir1 does not replace dir2.

Removing directories

The rmdir dir command removes the directory dir.

Try removing dir1.

$ rmdir dir1 rmdir: failed to remove 'dir1': Directory not empty

A directory must be empty before it can be removed.

You could remove the files dir1/file1.txt, dir1/file2.txt, and dir1/file3.txt one at a time but that would be tedious. Instead, remove all three at the same time using a wildcard. The path dir1/* expands to all three of the files in dir1. If you use rm -v dir1/* (-v for verbose) then each name will be printed as it is removed. Once the three files are removed you will he able to remove their parent directory dir1.

Use rm -v dir1/* to remove all the files in dir1.

$ ls dir1 file1.txt file2.txt file3.txt $ rm -v dir1/* removed 'dir1/file1.txt' removed 'dir1/file2.txt' removed 'dir1/file3.txt' $ rmdir dir1 $ ls dir1 ls: cannot access 'dir1': No such file or directory

We still have dir2 which contains three files and a copy of the original dir1 (with three more files inside that directory). The * wildcard is less useful when removing this many files. Instead you can use rm -r (-r for recursive) which will remove the contents of a directory before removing the directory itself.

Use rm -r dir2 to remove dir2 and all of its contents.

$ ls -F dir2 dir1/ file1.txt file2.txt file3.txt $ rm -r dir2 $ ls dir2 ls: cannot access 'dir2': No such file or directory

WARNING! When you delete a file from the command line it is gone forever. There is no 'trash can' that collects deleted files. There is no way to restore a deleted file later if you change your mind.

Wildcards

In the exercises above the argument dir2/* matched all the filenames in dir2. The shell expanded the pattern dir2/* into three separate arguments: dir2/file1, dir2/file2, and dir2/file3.

The * character actually matches any sequence of characters (zero or more) except /. You can use it to match 'anything' in a part of a filename. You can also use it more than once to match 'anything' in several different parts of a filename.

List all files in /etc that begin with b, that end with .conf, or that have a . anywhere in their name.

$ ls /etc/b* /etc/baseprofile /etc/bash_completion $ ls /etc/*.conf /etc/nsswitch.conf $ ls -d /etc/*.* /etc/init.d /etc/nsswitch.conf /etc/rebase.db.i386 /etc/vimrc.less /etc/minirc.dfl /etc/persistprofile.sh /etc/sessionsaliases.sh /etc/xmodmap.esc

Another useful wildcard character is ? which matches exactly one of any character (except /).

List all files in /etc that have an o and an f in their name separated by exactly one other character (it does not matter which character).

$ ls /etc/*o?f* /etc/nsswitch.conf /etc/ssh_config

One more useful wildcard pattern is [chars] which matches exactly one of any of the chars listed between the square brackets.

List all files in /etc that have a two consecutive vowels ('a', 'e', 'i', 'o', or 'u') in their name.

$ ls -d /etc/*[aeiou][aeiou]* /etc/bash_completion /etc/defaults /etc/screenrc /etc/version /etc/bash_completion.d /etc/group /etc/sessionsaliases.sh

When the chars contains a range of consecutive characters, you can specify the entire range using “first-last”.

Use the “[first-last]” pattern to list all files in /etc whose name contains at least one digit.

$ ls -d /etc/*[0-9]* /etc/X11 /etc/at-spi2 /etc/dbus-1 /etc/gtk-3.0 /etc/pkcs11 /etc/rebase.db.i386

The wildcard patterns explained above are expanded by the shell according to the files that actually exist in the filesystem. What happens if you use a wildcard pattern that does not match any files?

Try to delete some non-existent 'log' files: dir1/*.log.

$ rm dir/*.log rm: can't remove 'dir/*.log': No such file or directory

If the wildcard pattern does not match any files, it is simply left unexpanded. When the command tries to access a file named by a wildcard expression, the file does not exist and an error message is generated.

Dry runs: using "echo" to preview commands

A 'dry run' is a rehearsal or practice that takes place before the real performance. In computing, a dry run shows you what a command would do but without actually doing it. One example of how useful they are is to see what files would be matched by wildcard patterns, for example before actually removing them.

For the next exercise, set up your dir1 directory as above, containing six files:

three text files file1.txt, file2.txt, and file3.txt, containing the words think, for, and yourself;
three data files file1.dat, file2.dat, and file3.dat, containing the number of characters in the corresponding .txt files.

$ mkdir dir1 $ echo think > dir1/file1.txt $ echo for > dir1/file2.txt $ echo yourself > dir1/file3.txt $ wc -c dir1/file1.txt > dir1/file1.dat $ wc -c dir1/file2.txt > dir1/file2.dat $ wc -c dir1/file3.txt > dir1/file3.dat $ ls -l dir1 total 3 -rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file1.dat

rw-r–r– 1 user UsersGrp 6 Oct 26 16:51 file1.txt

-rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file2.dat -rw-r–r– 1 user UsersGrp 4 Oct 26 16:51 file2.txt -rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file3.dat -rw-r–r– 1 user UsersGrp 9 Oct 26 16:51 file3.txt

Use the echo command to perform a dry-run of removing:

all the .txt files in dir1,
all the .dat files in dir1,
the .txt and .dat files for only file2 (two files in total),
the .txt and .dat files for file1 andfile3 (four files in total).

$ echo rm dir1/*.txt rm dir1/file1.txt dir1/file2.txt dir1/file3.txt $ echo rm dir1/*.dat rm dir1/file1.dat dir1/file2.dat dir1/file3.dat $ echo rm dir1/file2.* rm dir1/file2.dat dir1/file2.txt $ echo rm dir1/file[13].* rm dir1/file1.dat dir1/file1.txt dir1/file3.dat dir1/file3.txt

Why is it called a 'dry run'?

Creating files and updating timestamps

The touch command updates the last modification time of an existing file to be the current date and time. If the file does not exist, an empty file is created.

Create two empty files called file1 and file2.

$ cd dir1 $ ls -lt file[12] ls: file[12]: No such file or directory $ touch file1 file2 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2 $ touch file2 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 $ touch file1 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2

Note how touching a file moves it to the top of the 'most recent' list (ls -t).

Generating path names using brace expressions

Wildcards are used to match existing file names. They cannot be used to generate file names for non-existent files or directories, for example, to create a set of needed files or directories.

Try using a wildcard to create ten empty files called test0, test1, test2, …, test9.

$ touch test[0123456789] $ ls test* test[0123456789]

Creating a single file called test[0123456789] is not what you intended. That is what happened because the shell could not find any existing file to match the pattern test[0123456789] and so did not expand it in the command line.

A brace expression will generate multiple words based on a list or sequence of values. The list of values to generate is written between curly braces { and } with items in the list separated by commas. For example, the expression {a,b,c} generates three separate words a, b, and c. The brace expression can appear in a larger pattern, for example, the expression p{a,b,c}q generates three separate words paq, pbq, and pcq.

Use a brace expression to generate the command needed to create the five files test0.txt to test4.txt.

$ touch test{0,1,2,3,4}.txt $ ls test* test0.txt test1.txt test2.txt test3.txt test4.txt

When a sequence of numbers or letters are needed then the list can contain just the first and last values separated by ... This is called a sequence expression. For example, the sequence expression p{a..z}q generates a list of 26 words, starting with paq and pbq, and ending with pyq and pzq.

Use a brace expression to generate the command needed to create the five files test5.txt to test9.txt.

$ touch test{5..9}.txt $ ls test* test0.txt test1.txt test2.txt test3.txt test4.txt test5.txt test6.txt test7.txt test8.txt test9.txt

In a sequence expression that generates numbers, the first value in the sequence sets the minimum width of the generated numbers. This is useful if leading 0s are needed. For example, the following sequence expressions generate lists of 100 words:

test{0..99} generates test0, test1, … , test98, test99, and
tt{000..099} generates tt000, tt001, … , tt098, tt099, and
t{00000..99} generates t00000, t00001, … , t00098, t00099.

CSV files and the "cut" command

Text files are often used as simple 'databases' for storing captured sensor data, the results of data processing, etc. The shell provides several commands for manipulating data stored in this kind of text file.

A comma-separated value (CSV) file is one example of this kind of text file database. Each line is a record and each field in that record is separated from the next with a specified delimiter character. In a CSV file the delimiter is a comma, “,”.

The cut command selects and prints fields from exactly this kind of text file. By default it uses a 'tab' character to separated fields (just as a copy-paste operation between Excel and a text editor does) but this can be changed using a command line option. cut has the following command line options:

-d character specifies the delimiter character. To manipulate CSV files, use: “cut -d ,”
-f fields tells cut which of the fields you want to print. Fields are numbered, starting at 1, and fields can contain multiple fields separated by commas.

Create a CSV file called directory.txt that contains the following data. (The easiest way is to copy the text it from this web page and paste it into a text editor, or into “cat > directory.txt” followed by Control+D to simulate end-of-file.)

name,given,office,phone,lab,phone
Adams,Douglas,042,0042,092,0092
Kay,Alan,301,3001,351,3051
Knuth,Donald,201,2001,251,2051
Lee,Tim,404,4004,454,4054
McCarthy,John,202,2002,252,2052
Shannon,Claude,304,3004,351,3051
Vinge,Vernor,302,3003,352,3053

Use the cut command to extract just the “office” column from the data.

$ cut -d , -f 3 directory.txt office 042 301 201 404 202 304 302

The tail command has an option to print a file starting at a specific line number. The syntax is: “tail -n +number”. For example, “tail -n +5 file” will print the contents of file starting from the 5th line in the file.

Pipe (|) the output from the previous command into tail. Use the tail -n +number option to print the input starting at line number 2.

$ cut -d , -f 3 directory.txt | tail -n +2 042 301 201 404 202 304 302

The grep command understands the similar wildcard patterns to the shell. (The shell uses them to filter file names and grep uses them to filter or select lines of text.)

Each office number in our sample data is three digits long. The first digit says which floor the office is on. One way to extract just the office numbers on the second floor is to use grep to search for numbers matching the pattern “2[0-9][0-9]”. You can then count how many offices are on the second floor using “wc -l”.

Write a pipeline of commands that prints how many offices are located on the third floor. Try very hard to do this without looking at the sample answer. If you cannot find the solution, click on the link below to view the answer.

Sample answer

Summary

echo > file can be used to create a file containing a line of data.
touch file can be used to create an empty file or to update its modification time to 'now'.
mkdir directory creates a new directory.
cp oldfile newfile copies (duplicates) oldfile to newfile.
cp files… directory copies one or more files (or directories) into an existing directory.
rm files… removes (deletes) files.
rmdir directory removes (deletes) a directory which must be empty.
rm -r directory removes (deletes) a directory and all its contents, recursively.
“*” in a file name matches zero or more characters, so “*.txt” matches all files ending in “.txt”.
“? in a file name matches any single character, so ”?.txt“ matches ”a.txt“” but not “any.txt”.
“[characters'] in a file name matches any one of the characters, so ”[aeiou].txt“ matches ”a.txt“” but not “b.txt”.
“[first-last'] in a file name matches any character in the range first to last, so ”*[a-m].txt“ matches ”boa.txt“” but not “constrictor.txt”.
Wildcards (*, ?, []) are expanded by the shell to match files that already exist. They cannot generate new (non-existent) file names.
{a,b,c} expands to three words: a, b, and c.
p{a,b,c}q{x,y,z}r expands to nine words: paqxr paqyr paqzr pbqxr pbqyr pbqzr pcqxr pcqyr pcqzr
{000..5}.txt expands to six words: 000.txt 001.txt 002.txt 003.txt 004.txt 005.txt
tail -n +number displays input starting at line number (and continuing until the last line).
There is no 'trash': when a file or directory is deleted it is gone immediately and forever.
cut -d char -f fields prints the given fields from its input lines using char as the field delimiter. The fields are numbered from 1 and multiple field numbers are separated by commas.