Note: This week we will practice and learn more about working with multiple files and directories, and about how text files can be used to store simple databases. In the two weeks following this one we will study command sequencing (scripts), control, and shell variables.
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.
tail
.cut
to extract fields from a simple database stored as a text file.The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.
Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.
First make sure you understand the important topics from the previous two weeks.
The command cp files… directory
copies one or more files into directory.
If any of the files
happen to be directories then the cp
command will fail.
To copy an entire directory (recursively) use cp
with the -r
option.
The cp -r files… directory
command copies one or more files into directory.
If any of the files
are directories then first the directory is copied along with
all of its contents.
Let's practice on a simple directory hierarchy.
Use the mkdir
and echo
commands to recreate the dir1
directory
and its three files as shown in the diagram.
The content of the three files is not important.
$ cd /tmp $ mkdir dir1 $ echo 1 > dir1/file1 $ echo 2 > dir1/file2 $ echo 3 > dir1/file3 $ ls -lR dir1 dir1: total 48 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file3.txt
Use cp -rv
(recursive and verbose)
to copy the entire directory dir1
to a new directory tree called dir2
.
$ cp -rv dir1 dir2 'dir1' → 'dir2' 'dir1/file3.txt' → 'dir2/file3.txt' 'dir1/file2.txt' → 'dir2/file2.txt' 'dir1/file1.txt' → 'dir2/file1.txt'
Because dir2
does not yet exist, it is first created in the current directory and then the contents of dir1
are copied to dir2
.
The -v
option shows you the directory being created and the files being copied.
What will happen if you run the same cp -rv dir1 dir2
command again?
$ cp -rv dir1 dir2 'dir1' → 'dir2/dir1' 'dir1/file3.txt' → 'dir2/dir1/file3.txt' 'dir1/file2.txt' → 'dir2/dir1/file2.txt' 'dir1/file1.txt' → 'dir2/dir1/file1.txt' $ ls -lR dir2 dir2: total 64 drwxr-xr-x 2 piumarta dialout 170 Oct 26 05:57 dir1 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file3.txt dir2/dir1: total 48 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file3.txt
Because dir2
already exists, dir1
is copied into dir2
;
the new copy of dir1
does not replace dir2
.
The rmdir dir
command removes the directory dir.
Try removing dir1
.
$ rmdir dir1 rmdir: failed to remove 'dir1': Directory not empty
A directory must be empty before it can be removed.
You could remove the files dir1/file1.txt
, dir1/file2.txt
, and dir1/file3.txt
one at a time but that would be tedious.
Instead, remove all three at the same time using a wildcard.
The path dir1/*
expands to all three of the files in dir1
.
If you use rm -v dir1/*
(-v
for verbose)
then each name will be printed as it is removed.
Once the three files are removed you will he able to remove their parent directory dir1
.
Use rm -v dir1/*
to remove all the files in dir1
.
$ ls dir1 file1.txt file2.txt file3.txt $ rm -v dir1/* removed 'dir1/file1.txt' removed 'dir1/file2.txt' removed 'dir1/file3.txt' $ rmdir dir1 $ ls dir1 ls: cannot access 'dir1': No such file or directory
We still have dir2
which contains three files and a copy of the original
dir1
(with three more files inside that directory).
The *
wildcard is less useful when removing this many files.
Instead you can use rm -r
(-r
for recursive) which
will remove the contents of a directory before removing the directory itself.
Use rm -r dir2
to remove dir2
and all of its contents.
$ ls -F dir2 dir1/ file1.txt file2.txt file3.txt $ rm -r dir2 $ ls dir2 ls: cannot access 'dir2': No such file or directory
When you delete a file from the command line it is gone forever. There is no 'trash can' that collects deleted files. There is no way to restore a deleted file later if you change your mind.
In the exercises above the argument dir2/*
matched all the filenames in dir2
.
The shell expanded the pattern dir2/*
into three separate arguments: dir2/file1
, dir2/file2
, and dir2/file3
.
The *
character actually matches any sequence of characters (zero or more) except /
.
You can use it to match 'anything' in a part of a filename.
You can also use it more than once to match 'anything' in several different parts of a filename.
List all files in /etc
that begin with b
, that end with .conf
, or that have a .
anywhere in their name.
$ ls /etc/b* /etc/baseprofile /etc/bash_completion $ ls /etc/*.conf /etc/nsswitch.conf $ ls -d /etc/*.* /etc/init.d /etc/nsswitch.conf /etc/rebase.db.i386 /etc/vimrc.less /etc/minirc.dfl /etc/persistprofile.sh /etc/sessionsaliases.sh /etc/xmodmap.esc
Another useful wildcard character is ?
which matches exactly one of any character (except /
).
List all files in /etc
that have an o
and an f
in their name separated by exactly one other character (it does not matter which character).
$ ls /etc/*o?f* /etc/nsswitch.conf /etc/ssh_config
One more useful wildcard pattern is [chars]
which matches exactly one of any of the chars listed between the square brackets.
List all files in /etc
that have a two consecutive vowels ('a', 'e', 'i', 'o', or 'u') in their name.
$ ls -d /etc/*[aeiou][aeiou]* /etc/bash_completion /etc/defaults /etc/screenrc /etc/version /etc/bash_completion.d /etc/group /etc/sessionsaliases.sh
When the chars contains a range of consecutive characters, you can specify the entire range using “first-last
”.
Use the “[first-last]
” pattern to list all files in /etc
whose name contains at least one digit.
$ ls -d /etc/*[0-9]* /etc/X11 /etc/at-spi2 /etc/dbus-1 /etc/gtk-3.0 /etc/pkcs11 /etc/rebase.db.i386
The wildcard patterns explained above are expanded by the shell according to the files that actually exist in the filesystem. What happens if you use a wildcard pattern that does not match any files?
Try to delete some non-existent 'log' files: dir1/*.log
.
$ rm dir/*.log rm: can't remove 'dir/*.log': No such file or directory
If the wildcard pattern does not match any files, it is simply left unexpanded. When the command tries to access a file named by a wildcard expression, the file does not exist and an error message is generated.
A 'dry run' is a rehearsal or practice that takes place before the real performance. In computing, a dry run shows you what a command would do but without actually doing it. One example of how useful they are is to see what files would be matched by wildcard patterns, for example before actually removing them.
For the next exercise, set up your dir1
directory as above, containing six files:
file1.txt
, file2.txt
, and file3.txt
, containing the words think
, for
, and yourself
;file1.dat
, file2.dat
, and file3.dat
, containing the number of characters in the corresponding .txt
files.$ mkdir dir1 $ echo think > dir1/file1.txt $ echo for > dir1/file2.txt $ echo yourself > dir1/file3.txt $ wc -c dir1/file1.txt > dir1/file1.dat $ wc -c dir1/file2.txt > dir1/file2.dat $ wc -c dir1/file3.txt > dir1/file3.dat $ ls -l dir1 total 3 -rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file1.dat
-rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file2.dat -rw-r–r– 1 user UsersGrp 4 Oct 26 16:51 file2.txt -rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file3.dat -rw-r–r– 1 user UsersGrp 9 Oct 26 16:51 file3.txt
Use the echo
command to perform a dry-run of removing:
.txt
files in dir1
,.dat
files in dir1
,.txt
and .dat
files for only file2
(two files in total),.txt
and .dat
files for file1
andfile3
(four files in total).$ echo rm dir1/*.txt rm dir1/file1.txt dir1/file2.txt dir1/file3.txt $ echo rm dir1/*.dat rm dir1/file1.dat dir1/file2.dat dir1/file3.dat $ echo rm dir1/file2.* rm dir1/file2.dat dir1/file2.txt $ echo rm dir1/file[13].* rm dir1/file1.dat dir1/file1.txt dir1/file3.dat dir1/file3.txt
The touch
command updates the last modification time of an existing file to be the current date and time.
If the file does not exist, an empty file is created.
Create two empty files called file1
and file2
.
$ cd dir1 $ ls -lt file[12] ls: file[12]: No such file or directory $ touch file1 file2 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2 $ touch file2 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 $ touch file1 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2
Note how touch
ing a file moves it to the top of the 'most recent' list (ls -t
).
Wildcards are used to match existing file names. They cannot be used to generate file names for non-existent files or directories, for example, to create a set of needed files or directories.
Try using a wildcard to create ten empty files called test0
, test1
, test2
, …, test9
.
$ touch test[0123456789] $ ls test* test[0123456789]
Creating a single file called test[0123456789]
is not what you intended.
That is what happened because the shell could not find any existing file to match
the pattern test[0123456789]
and so did not expand it in the command line.
A brace expression will generate multiple words based on a list or sequence of values.
The list of values to generate is written between curly braces {
and }
with items in the list separated by commas.
For example, the expression {a,b,c}
generates three separate words a
, b
, and c
.
The brace expression can appear in a larger pattern,
for example, the expression p{a,b,c}q
generates three separate words
paq
, pbq
, and pcq
.
Use a brace expression to generate the command needed to create the five files
test0.txt
to test4.txt
.
$ touch test{0,1,2,3,4}.txt $ ls test* test0.txt test1.txt test2.txt test3.txt test4.txt
When a sequence of numbers or letters are needed then the list can contain
just the first and last values separated by ..
.
This is called a sequence expression.
For example, the sequence expression p{a..z}q
generates a list of 26 words,
starting with paq
and pbq
, and ending with pyq
and pzq
.
Use a brace expression to generate the command needed to create the five files
test5.txt
to test9.txt
.
$ touch test{5..9}.txt $ ls test* test0.txt test1.txt test2.txt test3.txt test4.txt test5.txt test6.txt test7.txt test8.txt test9.txt
In a sequence expression that generates numbers, the first value in the sequence
sets the minimum width of the generated numbers.
This is useful if leading 0
s are needed.
For example, the following sequence expressions generate lists of 100 words:
test{0..99}
generates test0
, test1
, … , test98
, test99
, andtt{000..099}
generates tt000
, tt001
, … , tt098
, tt099
, andt{00000..99}
generates t00000
, t00001
, … , t00098
, t00099
.Text files are often used as simple 'databases' for storing captured sensor data, the results of data processing, etc. The shell provides several commands for manipulating data stored in this kind of text file.
A comma-separated value (CSV) file is one example of this kind of text file database.
Each line is a record and each field in that record is separated from the next with a specified delimiter character.
In a CSV file the delimiter is a comma, “,
”.
The cut
command selects and prints fields from exactly this kind of text file.
By default it uses a 'tab' character to separated fields (just as a copy-paste operation between Excel and a text editor does) but this can be changed using a command line option.
cut
has the following command line options:
-d character
specifies the delimiter character. To manipulate CSV files, use: “cut -d ,
”-f fields
tells cut
which of the fields you want to print. Fields are numbered, starting at 1, and fields can contain multiple fields separated by commas.
Create a CSV file called directory.txt
that contains the following data.
(The easiest way is to copy the text it from this web page and paste it into a text editor,
or into “cat > directory.txt
” followed by Control+D to simulate end-of-file.)
name,given,office,phone,lab,phone Adams,Douglas,042,0042,092,0092 Kay,Alan,301,3001,351,3051 Knuth,Donald,201,2001,251,2051 Lee,Tim,404,4004,454,4054 McCarthy,John,202,2002,252,2052 Shannon,Claude,304,3004,351,3051 Vinge,Vernor,302,3003,352,3053
Use the cut
command to extract just the “office” column from the data.
$ cut -d , -f 3 directory.txt office 042 301 201 404 202 304 302
The tail
command has an option to print a file starting at a specific line number.
The syntax is: “tail -n +number
”.
For example, “tail -n +5 file
” will print the contents of file starting from the 5th line in the file.
Pipe (|
) the output from the previous command into tail
.
Use the tail -n +number
option to print the input starting at line number 2.
$ cut -d , -f 3 directory.txt | tail -n +2 042 301 201 404 202 304 302
The grep
command understands the similar wildcard patterns to the shell.
(The shell uses them to filter file names and grep
uses them to filter or select lines of text.)
Each office number in our sample data is three digits long.
The first digit says which floor the office is on.
One way to extract just the office numbers on the second floor is to use grep
to search for numbers matching the pattern “2[0-9][0-9]
”.
You can then count how many offices are on the second floor using “wc -l
”.
Write a pipeline of commands that prints how many offices are located on the third floor. Try very hard to do this without looking at the sample answer. If you cannot find the solution, click on the link below to view the answer.
echo > file
can be used to create a file containing a line of data.touch file
can be used to create an empty file or to update its modification time to 'now'.mkdir directory
creates a new directory.cp oldfile newfile
copies (duplicates) oldfile to newfile.mv oldfile newfile
moves (renames) a file or directory.cp files… directory
copies one or more files (or directories) into an existing directory.mv files… directory
moves one or more files (or directories) into an existing directory.rm files…
removes (deletes) files.rmdir directory
removes (deletes) a directory which must be empty.rm -r directory
removes (deletes) a directory and all its contents, recursively.*
” in a file name matches zero or more characters, so “*.txt
” matches all files ending in “.txt
”.?
in a file name matches any single character, so ”?.txt
“ matches ”a.txt
“” but not “any.txt
”.[characters']
in a file name matches any one of the characters, so ”[aeiou].txt
“ matches ”a.txt
“” but not “b.txt
”.[first-last']
in a file name matches any character in the range first to last, so ”*[a-m].txt
“ matches ”boa.txt
“” but not “constrictor.txt
”.*
, ?
, []
) are expanded by the shell to match files that already exist. They cannot generate new (non-existent) file names.{a,b,c}
expands to three words: a
, b
, and c
.p{a,b,c}q{x,y,z}r
expands to nine words: paqxr paqyr paqzr pbqxr pbqyr pbqzr pcqxr pcqyr pcqzr
{000..5}.txt
expands to six words: 000.txt 001.txt 002.txt 003.txt 004.txt 005.txt
tail -n +number
displays input starting at line number (and continuing until the last line).cut -d char -f fields
prints the given fields from its input lines using char as the field delimiter.
The fields are numbered from 1 and multiple field numbers are separated by commas.