Week 08 — Loops, scripts

This week we will study manipulating multiple files using loops and creating new commands out of sequences of existing commands.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

1. Complete the self-preparation assignment at home before next class

This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.

2. Check your understanding of command concepts using the self-assessment questionnaire
  1. Answer each question in the self-assessment questionnaire as honestly as you can.
  2. Revise the topics having the lowest scores, update your scores.
  3. Repeat the previous step until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.

What you will learn from this class

Notes

The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.

Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.

Review

First make sure you understand the important topics from the previous two weeks. Click on this link to review what you should already know:

Review of previous weeks

In the notes below, follow along by typing all the commands shown in bold. Check the the output from your commands is similar to the output shown here.

Download some reference data

Download the file planets.tar from the course web site.

$ cd $ curl -O https://kuas.org/tmp/planets.tar % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 20480 100 20480 0 0 485k 0 --:--:-- --:--:-- --:--:-- 487k

The file is a 'tar' archive. Unpack the archive using the tar command with options -x to extract an archive, -v to be verbose about each file extracted, and -f to give the archive filename on the command line.

$ tar -xvf planets.tar planets/earth.dat planets/jupiter.dat planets/mars.dat planets/mercury.dat planets/moon.dat planets/neptune.dat planets/pluto.dat planets/saturn.dat planets/uranus.dat planets/venus.dat

You can see from the output that a directory called planets was created and that all the new files are inside it. Change to the planets directory and then check the contents of one of the files using cat or less.

$ cd planets $ cat earth.dat Name Earth Mass (10^24kg) 5.97 Diameter (km) 12,756 Density (kg/m^3) 5514 Gravity (m/s^2) 9.8 Escape Velocity (km/s) 11.2 Rotation Period (hours) 23.9 Length of Day (hours) 24.0 Distance from Sun (10^6 km) 149.6 Perihelion (10^6 km) 147.1 Aphelion (10^6 km) 152.1 Orbital Period (days) 365.2 Orbital Velocity (km/s) 29.8 Orbital Inclination (degrees) 0.0 Orbital Eccentricity 0.017 Obliquity to Orbit (degrees) 23.4 Mean Temperature (C) 15 Surface Pressure (bars) 1 Number of Moons 1 Ring System? No Global Magnetic Field? Yes

The files contain tab-separated values with two columns. The first column describes the data on that line, and the second column contains the data value.

Check the first two lines of the files to see if they all look the same.

$ head -n 2 *.dat ==> earth.dat <== Name Earth Mass (10^24kg) 5.97 ==> jupiter.dat <== Name Jupiter Mass (10^24kg) 1898

...etc... ==> uranus.dat <== Name Uranus Mass (10^24kg) 86.8 ==> venus.dat <== Name Venus Mass (10^24kg) 4.87

Line 17 of every file should contain the mean temperature. Check line 17 of earth.dat using the combination of head -n 17 and tail -n 1 that was used earlier.

$ head -n 17 earth.dat | tail -n 1 Mean Temperature (C) 15

How would you check line 17 of all the files to make sure they contain the mean temperature?

The obvious way is to change earth.dat to *.dat in the command you just used. Will that work?

Try showing the 17th line of each file by running the command withearth.dat changed to *.dat.

$ head -n 17 *.dat | tail -n 1 Mean Temperature (C) 464

That's not right. We only saw the line for one planet. Which one was it? Use grep to find out.

$ grep 464 *.dat venus.dat:Mean Temperature (C) 464

Why did you see only one line of output?

Answer

To print the 17th line of every file we need to use something more sophisticated: a loop.

Running a command on multiple files using a loop

To print the 17th line of each file, what we want to do is this (in natural language):

The shell can do this for us using a for loop. The syntax (or 'general form') of a for loop always looks like this:

for thing in list of things do     operation_on $thing done

The word for is followed by a variable name, the word in, and then a list of (space-separated) words. The loop will be run as many times are there are words in the list. Each time the list is run, the variable will be set to the next item in the list (starting with the first).

Note that the parts in italics are not meant to be typed literally. They are descriptive 'placeholders' for some particular list of things that you want to operate on and and some specific operation that you want to perform on those things. Let's make the loop print the 17th line of all the .dat files by

Note also that the name of the variable thing is not important; what is important is that the name used after for matches the name used inside the loop to refer to each of the words in the list of things. Let's change the name thing to something more meaningful, such as filename

for filename in *.dat do     head -n 17 $filename | tail -n 1 done

Try running the above command, exactly as it is shown. (If you make a mistake, or the shell gets confused about what you are typing, press Control-C to get back to the normal prompt.)

Note that the prompt changes to “>” as soon as you finish typing the first line. This is to remind you that you have not yet finished typing the complete for command. (A for loop is not complete until the shell sees the word done at the end.)

$ for filename in *.dat > do >    head -n 17 $filename | tail -n 1 > done

How did the shell know that the filename inside the loop was a variable, and not the name of a file? Because of the $ symbol at the beginning. Whenever a $ is followed by a name, the shell replaces the $name combination with whatever value is currently assigned to the variable with the given name. Without the $ in front of filename the head command would have tried to print the first 17 lines of the (non-existent) file literally called filename.

How did the shell know that the filename after the for is the name of a variable? Because the syntax of the for command says that the next thing in the command must always the name of a variable. The $ is not needed (and is even wrong) because we do not want to replace filename with its value, we are just telling for the name of the variable it should set to each item in our list of things.

You can use the echo command to see exactly how the loop works and what it is doing to the variable.

Use echo to see how many times the loop is run and to see the value of filename each time the loop runs.

for filename in *.dat do     echo filename is $filename done

Try moving the $ from the second filename to the first to see what changes.

What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop.

for filename in *.dat do     ls *.dat done

What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop.

for filename in *.dat do     ls $filename done

Use a loop to make a backup copy of each of the planet .dat files. For each file x.dat, make a copy of that file called backup-x.dat. For example, earth.dat should be copied to a file called backup-earth.dat.

A single copy command such as the following will not work (try it if you like):

$ cp *.dat backup-*.dat cp: target 'backup-*.dat' is not a directory

The correct solution follows the same pattern as printing the 17th line of every file. Of course, the operation should instead copy each file from “$filename” to “backup-$filename”.

Answer

Using variables as parts of filenames

The previous example shows how a variable is used to form part of a longer name. The filename variable is used to create the name backup-$filename. When filename is set to earth.dat, the longer name will be backup-earth.dat.

A problem arises when trying to append a letter or digit to a name stored in a variable. For this reason $filename can also be written ${filename}. Since the characters { and } cannot be part of a variable name, there is no possibility of ambiguity when this form is used inside a longer name next to a letter or a digit.

Delete your backup-* files. The for each file x.dat create a backup file called x2

$ for name in *.dat > do >    cp $name $name2 > done cp: missing destination file operand after 'earth.dat' cp: missing destination file operand after 'jupiter.dat' ...etc... cp: missing destination file operand after 'uranus.dat' cp: missing destination file operand after 'venus.dat'

What is the problem? The shell thinks that the name of the variable in $name2 is “name2”. Use { and } around name to separate it from the 2.

$ for name in *.dat > do >    cp $name ${name}2 > done $ ls earth.dat mars.dat moon.dat pluto.dat uranus.dat earth.dat2 mars.dat2 moon.dat2 pluto.dat2 uranus.dat2 jupiter.dat mercury.dat neptune.dat saturn.dat venus.dat jupiter.dat2 mercury.dat2 neptune.dat2 saturn.dat2 venus.dat2

Using wildcards

Wildcards (*, ?, and [...]) in a for loop's list of things are expanded as usual.

What would be the results of running each of the following commands?

for name in p*.dat do     echo $name done

for name in *p*.dat do     echo $name done

Avoiding typing: interactive history

The up-arrow (or Control+p) and down-arrow (or Control+n) keys can be used to scroll through recent commands. The left-arrow (or Control+b) and right-arrow (Control+f) keys let you move around inside a command. You can edit a previous command by deleting or inserting new content. Pressing Return re-runs the (edited) command.

If you try this on a for loop you will notice that the loop has been recorded on single line. To do this the shell has inserted some semicolon “;” characters to separate the different parts of the loop. A semicolon has been inserted in approximately the places where a newline was in the original for loop.

When viewed in the history our loop looks like this:

$ for name in *.dat > do >    ls $name > done earth.dat jupiter.dat ...etc... uranus.dat venus.dat $ Control+P $ for name in *.dat; do ls $name; done

Putting the entire loop on a single line

The general form of a single-line for loop is:

for thing in list of things ; do operation on thing ; ...etc... ; done

The semicolons take the place of newlines in the single-line version. Either or both of the semicolons can be replaced by newlines; the shell does not care whether you use semicolons or newlines.

Write the backup for loop again, all on one line.

for name in *.dat; do cp $name backup-$name; done

Delete the backup files. Add a command to echo the name of each file before copying it, still putting the entire for loop on a single line..

$ rm backup-* $ for name in *.dat; do echo $name; cp $name backup-$name; done earth.dat jupiter.dat mars.dat mercury.dat moon.dat neptune.dat pluto.dat saturn.dat uranus.dat venus.dat $ rm backup-*

Using redirection with loops

Let's print the 17th line of each file and redirect the output to another file.

The following will not work:

$ for name in *.dat; do >    head -n 17 $name | tail -1 > lines.txt > done $ cat lines.txt Mean Temperature (C) 464

The problem is that each time around the loop the > redirection truncates (empties) lines.txt before it writes the output from tail into it. There are two solutions to this problem.

The first solution is to use another redirection operator, >>. This operator appends lines to the output file instead of replacing its contents.

$ for name in *.dat; do >    head -n 17 $name | tail -1 >> lines.txt > done $ cat lines.txt Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464

The second solution is to move the redirection outside the loop, so that every command executed inside the loop will all be part of a single output redirection.

$ for name in *.dat; do >    head -n 17 $name | tail -1 > done > lines.txt $ cat lines.txt Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464

Challenge

The echo command normally prints a newline character after its arguments. If you use the option -n then this newline is not printed. This lets you use several echo -n commands to print several things on the same line. For example:

$ echo -n hello; echo “,” world hello, world

In a for loop, the operation that is performed inside the loop can be another for loop. (This is called nesting loops.) For example:

$ for digit in {1..3}; do for letter in {a,b}; do echo $digit $letter; done; done 1 a 1 b 2 a 2 b 3 a 3 b

Arithmetic expansion is performed on any text written inside double parentheses after a $ symbol, like this: “$((text))”. The entire expression (from “$” to the closing “)”) is replaced by the result of evaluating text as an arithmetic expression. Within text you can refer to variables without needing to use the $ prefix.

Some examples:

$ foo=32 $ echo foo plus ten is $((foo + 10)) foo plus ten is 42 $ N=1; for L in {a,b,c}; do echo $L$N; N=$((N+1)); done a1 b2 c3

Write two nested for loops that print the following multiplication table:

1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100

Don't worry about properly lining up the columns.

Hint

The echo command understands an option -e that replaces certain sequences of characters with other characters. One replacement that this enables is to convert “\\t” into a tab character. A tab moves the cursor forward to a column that is a multiple of 8.

Modify your loops to line up the columns, like this:.

1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100

Hint

Summary