This week we will study manipulating multiple files using loops and creating new commands out of sequences of existing commands.
The large sensor data file for the in-class assignment can be downloaded like this: curl -O https://kuas.org/tmp/metar-2019.tgz
Once downloaded, unpack the contents using tar -xf metar-2019.tgz
which will create a directory called metar-2019
containing 8752 files of weather sensor data.
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.
The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.
Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.
First make sure you understand the important topics from the previous two weeks. Click on this link to review what you should already know:
In the notes below, follow along by typing all the commands shown in bold. Check the the output from your commands is similar to the output shown here.
Download the file planets.tar
from the course web site.
$ cd $ curl -O https://kuas.org/tmp/planets.tar % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 20480 100 20480 0 0 485k 0 --:--:-- --:--:-- --:--:-- 487k
The file is a 'tar' archive.
Unpack the archive using the tar
command with options
-x
to extract an archive,
-v
to be verbose about each file extracted, and
-f
to give the archive filename on the command line.
$ tar -xvf planets.tar planets/earth.dat planets/jupiter.dat planets/mars.dat planets/mercury.dat planets/moon.dat planets/neptune.dat planets/pluto.dat planets/saturn.dat planets/uranus.dat planets/venus.dat
You can see from the output that a directory called planets
was created and that all the new files are inside it.
Change to the planets
directory and then check the contents of one of the files using cat
or less
.
$ cd planets $ cat earth.dat Name Earth Mass (10^24kg) 5.97 Diameter (km) 12,756 Density (kg/m^3) 5514 Gravity (m/s^2) 9.8 Escape Velocity (km/s) 11.2 Rotation Period (hours) 23.9 Length of Day (hours) 24.0 Distance from Sun (10^6 km) 149.6 Perihelion (10^6 km) 147.1 Aphelion (10^6 km) 152.1 Orbital Period (days) 365.2 Orbital Velocity (km/s) 29.8 Orbital Inclination (degrees) 0.0 Orbital Eccentricity 0.017 Obliquity to Orbit (degrees) 23.4 Mean Temperature (C) 15 Surface Pressure (bars) 1 Number of Moons 1 Ring System? No Global Magnetic Field? Yes
The files contain tab-separated values with two columns. The first column describes the data on that line, and the second column contains the data value.
Check the first two lines of the files to see if they all look the same.
$ head -n 2 *.dat ==> earth.dat <== Name Earth Mass (10^24kg) 5.97 ==> jupiter.dat <== Name Jupiter Mass (10^24kg) 1898
...etc... ==> uranus.dat <== Name Uranus Mass (10^24kg) 86.8 ==> venus.dat <== Name Venus Mass (10^24kg) 4.87
Line 17 of every file should contain the mean temperature.
Check line 17 of earth.dat
using the combination of head -n 17
and tail -n 1
that was used earlier.
$ head -n 17 earth.dat | tail -n 1 Mean Temperature (C) 15
How would you check line 17 of all the files to make sure they contain the mean temperature?
The obvious way is to change earth.dat
to *.dat
in the command you just used.
Will that work?
Try showing the 17th line of each file by running the command withearth.dat
changed to *.dat
.
$ head -n 17 *.dat | tail -n 1 Mean Temperature (C) 464
That's not right.
We only saw the line for one planet.
Which one was it?
Use grep
to find out.
$ grep 464 *.dat venus.dat:Mean Temperature (C) 464
Why did you see only one line of output?
To print the 17th line of every file we need to use something more sophisticated: a loop.
To print the 17th line of each file, what we want to do is this (in natural language):
.dat
The shell can do this for us using a for
loop.
The syntax (or 'general form') of a for
loop always looks like this:
for thing in list of things do operation_on $thing done
The word for
is followed by a variable name (in this case thing
),
then the word in
, and then a list of (space-separated) words.
The list of words ends with a newline (or semicolon – see below) and the word do
.
One or more commands then follow, collectively called the body of the loop, ending with the word done
.
The commands in the body will be run as many times are there are words in the list.
Each time the body commands are run, the variable will be set to the next item in the list (starting with the first).
Note that the parts in italics are not meant to be typed literally.
They are descriptive 'placeholders' for some particular list of things that you want to operate on and
and some specific operation that you want to perform on those things.
Let's make the loop print the 17th line of all the .dat
files by
*.dat
for our
list of things andhead -n 17 $thing | tail -1
for our operation_on
Note also that the name of the variable thing
is not important;
what is important is that the name used after for
matches the name used inside the loop
to refer to each of the words in the list of things.
Let's change the name thing
to something more meaningful, such as filename
for filename in *.dat do head -n 17 $filename | tail -n 1 done
Try running the above command, exactly as it is shown.
(If you make a mistake, or the shell gets confused about what you are typing, press Control-C
to
get back to the normal prompt.)
Note that the prompt changes to “>
” as soon as you finish typing the first line.
This is to remind you that you have not yet finished typing the complete for
command.
(A for
loop is not complete until the shell sees the word done
at the end.)
$ for filename in *.dat > do > head -n 17 $filename | tail -n 1 > done
How did the shell know that the filename
inside the loop was a variable, and not the name of a file?
Because of the $
symbol at the beginning.
Whenever a $
is followed by a name, the shell replaces
the $name
combination with whatever value is currently assigned to the variable with the given name.
Without the $
in front of filename
the head
command would have tried to
print the first 17 lines of the (non-existent) file literally called filename
.
How did the shell know that the filename
after the for
is the name of a variable?
Because the syntax of the for
command says that the next thing in the command must always the name of a variable.
The $
is not needed (and is even wrong) because we do not want to replace filename
with its value,
we are just telling for
the name of the variable it should set to each item in our list of things.
You can use the echo
command to see exactly how the loop works and what it is doing to the variable.
Use echo
to see how many times the loop is run and to see the value of filename
each time the loop runs.
for filename in *.dat do echo filename is $filename done
Try moving the $
from the second filename
to the first to see what changes.
$
in front of both filename
s?$
at all in front of the filename
s?What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop.
for filename in *.dat do ls *.dat done
What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop.
for filename in *.dat do ls $filename done
Use a loop to make a backup copy of each of the planet .dat
files.
For each file x.dat
, make a copy of that file called backup-x.dat
.
For example, earth.dat
should be copied to a file called backup-earth.dat
.
A single copy command such as the following will not work (try it if you like):
$ cp *.dat backup-*.dat cp: target 'backup-*.dat' is not a directory
The correct solution follows the same pattern as printing the 17th line of every file.
Of course, the operation should instead copy each file from “$filename
” to “backup-$filename
”.
The previous example shows how a variable is used to form part of a longer name.
The filename
variable is used to create the name backup-$filename
.
When filename
is set to earth.dat
, the longer name will be backup-earth.dat
.
A problem arises when trying to append a letter or digit to a name stored in a variable.
For this reason $filename
can also be written ${filename}
.
Since the characters {
and }
cannot be part of a variable name,
there is no possibility of ambiguity when this form is used inside a longer name next to a letter or a digit.
Delete your backup-*
files.
The for each file x.dat
create a backup file called x2
$ for name in *.dat > do > cp $name $name2 > done cp: missing destination file operand after 'earth.dat' cp: missing destination file operand after 'jupiter.dat' ...etc... cp: missing destination file operand after 'uranus.dat' cp: missing destination file operand after 'venus.dat'
What is the problem?
Use an echo
command to print what the shell will do when it executes the cp
command, like this:
$ for name in *.dat > do > echo cp $name $name2 > done cp earth.dat cp jupiter.dat ...etc... cp uranus.dat cp venus.dat
What happened to earth.dat2
, etc.?
Variable names start with a letter which is followed by any number of letters and digits.
The shell thinks that the “2
” is part of the variable name;
in other words, that the name of the variable in “$name2
” is “name2
”.
To fix this, use {
and }
around “name
” to separate it from the “2
”.
$ for name in *.dat > do > cp $name ${name}2 > done $ ls earth.dat mars.dat moon.dat pluto.dat uranus.dat earth.dat2 mars.dat2 moon.dat2 pluto.dat2 uranus.dat2 jupiter.dat mercury.dat neptune.dat saturn.dat venus.dat jupiter.dat2 mercury.dat2 neptune.dat2 saturn.dat2 venus.dat2 $ rm *.dat2
Wildcards (*
, ?
, and [...]
) in a
for
loop's list of things are expanded as usual.
What would be the results of running each of the following commands?
for name in p*.dat do echo $name done
for name in *p*.dat do echo $name done
Predict the answers, then check them by actually running the commands.
The up-arrow (or Control
+p
)
and down-arrow (or Control
+n
) keys can be used to scroll through recent commands.
The left-arrow (or Control
+b
) and right-arrow (Control
+f
) keys let you move around inside a command.
You can edit a previous command by deleting or inserting new content.
Pressing Return
re-runs the (edited) command.
If you try this on a for
loop you will notice that the loop has been recorded on single line.
To do this the shell has inserted some semicolon “;
” characters to separate the different parts of the loop.
A semicolon has been inserted in approximately the places where a newline was in the original for
loop.
When viewed in the history our loop looks like this:
$ for name in *.dat
> do
> ls $name
> done
earth.dat
jupiter.dat
...etc...
uranus.dat
venus.dat
$ Control
+P
$ for name in *.dat; do ls $name; done
The general form of a single-line for
loop is:
for thing in list of things ; do operation on thing ; ...etc... ; done
The semicolons take the place of newlines in the single-line version. Either or both of the semicolons can be replaced by newlines; the shell does not care whether you use semicolons or newlines.
Write the backup for
loop again, all on one line.
for name in *.dat; do cp $name backup-$name; done
Delete the backup files.
Add a command to echo
the name of each file before copying it,
still putting the entire for
loop on a single line..
$ rm backup-* $ for name in *.dat; do echo $name; cp $name backup-$name; done earth.dat jupiter.dat mars.dat mercury.dat moon.dat neptune.dat pluto.dat saturn.dat uranus.dat venus.dat $ rm backup-*
Let's print the 17th line of each file and redirect the output to another file.
The following will not work:
$ for name in *.dat; do > head -n 17 $name | tail -1 > lines.txt > done $ cat lines.txt Mean Temperature (C) 464
The problem is that each time around the loop the >
redirection
truncates (empties) lines.txt
before it writes the output from tail
into it.
There are two solutions to this problem.
The first solution is to use another redirection operator, >>
.
This operator appends lines to the output file instead of replacing its contents.
$ for name in *.dat; do > head -n 17 $name | tail -1 >> lines.txt > done $ cat lines.txt Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464
The second solution is to move the redirection outside the loop, so that every command executed inside the loop will all be part of a single output redirection.
$ for name in *.dat; do > head -n 17 $name | tail -1 > done > lines.txt $ cat lines.txt Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464
The echo
command normally prints a newline character after its arguments.
If you use the option -n
then this newline is not printed.
This lets you use several echo -n
commands to print several things on the same line.
In the following example a semicolon ;
is used (instead of newline) to separate two echo
commands.
The first echo
command uses the option -n
to prevent it printing the newline.
Try running these commands with and without the -n
to see the difference.
$ echo -n hello; echo "," world hello, world $ echo hello; echo "," world hello , world
In a for
loop, the operation that is performed inside the loop can be another for
loop.
(This is called nesting loops.)
For example:
$ for digit in {1..3}; do for letter in {a,b}; do echo $digit $letter; done; done 1 a 1 b 2 a 2 b 3 a 3 b
Arithmetic expansion is performed on any text written inside double parentheses after a $
symbol, like this:
“$((text))
”.
The entire expression (from “$
” to the closing “)
”) is replaced by the result of
evaluating text as an arithmetic expression.
Within text you can refer to variables without needing to use the $
prefix.
Some examples:
$ foo=32 $ echo foo plus ten is $((foo + 10)) foo plus ten is 42 $ N=1; for L in {a,b,c}; do echo $L$N; N=$((N+1)); done a1 b2 c3
Write two nested for
loops that print the following multiplication table:
1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100
Don't worry about properly lining up the columns.
The echo
command understands an option -e
that replaces certain sequences of characters with other characters.
One replacement that this enables is to convert “\\t
” into a tab character.
A tab moves the cursor forward to a column that is a multiple of 8.
Modify your loops to line up the columns, like this:.
1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100
for
loop repeats a command for every item in a list.for
loop sets a variable to the next item in the list before running the loop body.$name
to expand a variable (i.e., get its value),
or ${name}
if there are letters or digits immediately after the variable.for
loops can be written on one line by replacing newlines with semicolons.for
loops can be nested by writing a loop as the body of another loop.