~~NOTOC~~ /*{{page>livereload&nodate&noeditbtn&nofooter}}*/ {{page>css&nodate&noeditbtn&nofooter}} ===== Week 08 — Loops, scripts ===== This week we will study manipulating multiple files using loops and creating new commands out of sequences of existing commands. The large sensor data file for the in-class assignment can be downloaded like this: ''curl -O https://kuas.org/tmp/metar-2019.tgz'' Once downloaded, unpack the contents using ''tar -xf metar-2019.tgz'' which will create a directory called ''metar-2019'' containing 8752 files of weather sensor data. ==== Evaluation ==== Up to 10 points can be gained towards your final score by completing the **in-class assignment** on Friday. ==== Preparation ==== == 1. Complete the self-preparation assignment at home before next class == This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the [[#Notes|Notes]] section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday. == 2. Check your understanding of command concepts using the self-assessment questionnaire == - Answer each question in the self-assessment questionnaire as honestly as you can. - Revise the topics having the lowest scores, update your scores. - Repeat the previous step until you feel comfortable with most (or all) of the topics. On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone. To succeed at the in-class assignment for this class you should understand the topics outlined in the "Notes" section. ==== What you will learn from this class ==== /**************************************************************** TODO ==== Environment variables ==== ==== Path name expansion ==== ==== Quoting ==== ==== Interactive history ==== ==== Command and filename completion ==== ==== Command substitution ==== ==== Arithmetic substitution ==== ===== Looking for help: man and help ===== ===== Shell scripts ===== * how to scale performing one operation on one file to performing bulk operations on multiple files ****************************************************************/ * How to perform an action on many different files. * How to use a loop to apply a command to many files. * How to use a variable in part of a larger name. * How to use wildcards to control which files a loop processes. * How to use interactive history to modify and/or repeat recent commands. * How to write a loop on a single line. * How to use redirection with loops. /* ++++ Glossary of | ; entry : description ++++ */ /* ==== Further reading ==== */ /****************************************************************/ ==== Notes ==== The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment. Read the notes and try to complete each of the exercises //without// looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it //before// continuing. === Review === First make sure you understand the important topics from the previous two weeks. Click on this link to review what you should already know: ++++ Review of previous weeks | Review of important concepts: * Directories form a tree with ''/'' at the root and files at the leaves. * Absolute paths begin with ''/'' and start in the root; other paths are relative and start in the current working directory. * The current working directory is managed using ''cd'' and ''pwd''. * ''ls'' lists directories and file information. * Files are viewed with ''cat'' and ''less'', and are managed with ''mv'' (move), ''cp'' (copy), and ''rm'' (remove). * Directories are managed with ''mkdir'' (make directory), ''rmdir'' (remove directory), and ''rm -f'' (recursively remove a directory and its contents). * Input is redirected from the keyboard to come instead from a file using ''< //file//''; output redirected from the screen to go to a file using ''> //file//''. * Output from a commend is connected to the input of the next using a pipe ''|''. * Files can be created by redirecting output (from ''cat'', ''echo'', or another command), with ''touch'', or using an editor such as ''nano''. * In file/directory path names, ''*'' matches any sequence of characters, ''?'' matches any single character, and ''[//characters//]'' matches any one of the listed //characters//. * ''Control-C'' terminates a running command and ''Control-D'' simulates 'end of file' when input is being read from the keyboard. * Brace expressions ''{//a//,//b//,...,//c//}'' repeat the name they are part multiple times while replacing the text between the braces with each of //a//, //b//, etc., and finally //c//. * Sequence expressions ''{//a//..//b//}'' repeat the name they are part of multiple times while replacing the text between the braces with each item in the sequence //a// to //b//, inclusive. * Lines of text are chosen from the start or end using ''head'' or ''tail'', chosen according to a pattern with ''grep'', reordered by ''sort'', made unique with ''uniq'', counted with ''wc'' or ''uniq -c'', and dissected by ''cut''. * Setting the delimiter with ''cut -d ,'' and/or ''sort -t ,'' lets these programs work directly on comma-separated values (CSV). ++++ In the notes below, follow along by typing all the commands shown **in bold**. Check the the output from your commands is similar to the output shown here. === Download some reference data === Download the file ''planets.tar'' from the course web site. $ **cd** $ **curl -O https://kuas.org/tmp/planets.tar** % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 20480 100 20480 0 0 485k 0 --:--:-- --:--:-- --:--:-- 487k The file is a 'tar' archive. Unpack the archive using the ''tar'' command with options ''-x'' to e**x**tract an archive, ''-v'' to be **v**erbose about each file extracted, and ''-f'' to give the archive **f**ilename on the command line. $ **tar -xvf planets.tar** planets/earth.dat planets/jupiter.dat planets/mars.dat planets/mercury.dat planets/moon.dat planets/neptune.dat planets/pluto.dat planets/saturn.dat planets/uranus.dat planets/venus.dat You can see from the output that a directory called ''planets'' was created and that all the new files are inside it. Change to the ''planets'' directory and then check the contents of one of the files using ''cat'' or ''less''. $ **cd planets ** $ **cat earth.dat ** Name Earth Mass (10^24kg) 5.97 Diameter (km) 12,756 Density (kg/m^3) 5514 Gravity (m/s^2) 9.8 Escape Velocity (km/s) 11.2 Rotation Period (hours) 23.9 Length of Day (hours) 24.0 Distance from Sun (10^6 km) 149.6 Perihelion (10^6 km) 147.1 Aphelion (10^6 km) 152.1 Orbital Period (days) 365.2 Orbital Velocity (km/s) 29.8 Orbital Inclination (degrees) 0.0 Orbital Eccentricity 0.017 Obliquity to Orbit (degrees) 23.4 Mean Temperature (C) 15 Surface Pressure (bars) 1 Number of Moons 1 Ring System? No Global Magnetic Field? Yes The files contain //tab//-separated values with two columns. The first column describes the data on that line, and the second column contains the data value. Check the first two lines of the files to see if they all look the same. $ **head -n 2 *.dat ** ==> earth.dat <== Name Earth Mass (10^24kg) 5.97 ==> jupiter.dat <== Name Jupiter Mass (10^24kg) 1898 // ...etc... // ==> uranus.dat <== Name Uranus Mass (10^24kg) 86.8 ==> venus.dat <== Name Venus Mass (10^24kg) 4.87 Line 17 of every file should contain the mean temperature. Check line 17 of ''earth.dat'' using the combination of ''head -n 17'' and ''tail -n 1'' that was used earlier. $ **head -n 17 earth.dat | tail -n 1** Mean Temperature (C) 15 How would you check line 17 of all the files to make sure they contain the mean temperature? The obvious way is to change ''earth.dat'' to ''*.dat'' in the command you just used. Will that work? Try showing the 17th line of each file by running the command with''earth.dat'' changed to ''*.dat''. $ **head -n 17 *.dat | tail -n 1** Mean Temperature (C) 464 That's not right. We only saw the line for one planet. Which one was it? Use ''grep'' to find out. $ **grep 464 *.dat ** venus.dat:Mean Temperature (C) 464 Why did you see only one line of output? ++++ Answer | The ''head -n 17 *.dat '' command printed the first 17 lines of every file, which is about 189 lines of output. (Try it!) Because ''*.dat'' expands to a list of file in //alphabetical order//, the last file to be processed was ''venus.dat''. The last 17 lines of output were therefore the first 17 lines of the ''venus.dat'' file. Taking the ''tail -n 1'' of the entire output therefore gave us just line 17 of ''venus.dat''. ++++ To print the 17th line of every file we need to use something more sophisticated: a loop. === Running a command on multiple files using a loop === To print the 17th line of each file, what we want to do is this (in natural language): * for every file ending in ''.dat'' * print the last 1 line of the first 17 lines of the file The shell can do this for us using a ''for'' loop. The syntax (or 'general form') of a ''for'' loop always looks like this: for thing in //list of things// do \_\_\_\_//operation_on// $thing done The word ''for'' is followed by a variable name (in this case ''thing''), then the word ''in'', and then a list of (space-separated) words. The list of words ends with a newline (or semicolon -- see below) and the word ''do''. One or more commands then follow, collectively called the //body// of the loop, ending with the word ''done''. The commands in the body will be run as many times are there are words in the list. Each time the body commands are run, the variable will be set to the next item in the list (starting with the first). Note that the parts in //italics// are not meant to be typed literally. They are descriptive 'placeholders' for some particular //list of things// that you want to operate on and and some specific //operation// that you want to perform on those things. Let's make the loop print the 17th line of all the ''.dat'' files by * using ''*.dat'' for our //list of things// and * ''head -n 17 $thing | tail -1'' for our //operation_on// Note also that the name of the variable ''thing'' is not important; what //is// important is that the name used after ''for'' matches the name used inside the loop to refer to each of the words in the //list of things//. Let's change the name ''thing'' to something more meaningful, such as ''filename'' for filename in *.dat do \_\_\_\_head -n 17 $filename | tail -n 1 done Try running the above command, exactly as it is shown. (If you make a mistake, or the shell gets confused about what you are typing, press ''Control-C'' to get back to the normal prompt.) Note that the prompt changes to "''>''" as soon as you finish typing the first line. This is to remind you that you have not yet finished typing the complete ''for'' command. (A ''for'' loop is not complete until the shell sees the word ''done'' at the end.) $ **for filename in *.dat **> **do **> **\_\_\_head -n 17 $filename | tail -n 1 **> **done** **How did the shell know that the ''filename'' inside the loop was a variable, and not the name of a file?** Because of the ''$'' symbol at the beginning. Whenever a ''$'' is followed by a //name//, the shell replaces the ''$//name//'' combination with whatever value is currently assigned to the variable with the given //name//. Without the ''$'' in front of ''filename'' the ''head'' command would have tried to print the first 17 lines of the (non-existent) file literally called ''filename''. **How did the shell know that the ''filename'' after the ''for'' is the name of a variable?** Because the syntax of the ''for'' command says that the next thing in the command must always the name of a variable. The ''$'' is not needed (and is even //wrong//) because we do not want to replace ''filename'' with its value, we are just telling ''for'' the name of the variable it should set to each item in our //list of things//. You can use the ''echo'' command to see exactly how the loop works and what it is doing to the variable. Use ''echo'' to see how many times the loop is run and to see the value of ''filename'' each time the loop runs. for filename in *.dat do \_\_\_\_echo filename is $filename done Try moving the ''$'' from the second ''filename'' to the first to see what changes. * What will be the output if you put a ''$'' in front of both ''filename''s? * What will be the output if you do not use ''$'' at all in front of the ''filename''s? What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop. for filename in *.dat do \_\_\_\_ls *.dat done What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop. for filename in *.dat do \_\_\_\_ls $filename done Use a loop to make a backup copy of each of the planet ''.dat'' files. For each file ''//x//.dat'', make a copy of that file called ''backup-//x//.dat''. For example, ''earth.dat'' should be copied to a file called ''backup-earth.dat''. A single copy command such as the following will not work (try it if you like): $ **cp *.dat backup-*.dat ** cp: target 'backup-*.dat' is not a directory The correct solution follows the same pattern as printing the 17th line of every file. Of course, the operation should instead copy each file from "''$filename''" to "''backup-$filename''". ++++ Answer | $ **for name in *.dat **>** do **>** \_\_\_cp $name backup-$name **>** done** ++++ === Using variables as parts of filenames === The previous example shows how a variable is used to form part of a longer name. The ''filename'' variable is used to create the name ''backup-$filename''. When ''filename'' is set to ''earth.dat'', the longer name will be ''backup-earth.dat''. A problem arises when trying to append a letter or digit to a name stored in a variable. For this reason ''$filename'' can also be written ''${filename}''. Since the characters ''{'' and ''}'' cannot be part of a variable name, there is no possibility of ambiguity when this form is used inside a longer name next to a letter or a digit. Delete your ''backup-*'' files. The for each file ''//x//.dat'' create a backup file called ''//x//2'' $ **for name in *.dat **> **do **> **\_\_\_cp $name $name2 **> **done** cp: missing destination file operand after 'earth.dat' cp: missing destination file operand after 'jupiter.dat' // ...etc... // cp: missing destination file operand after 'uranus.dat' cp: missing destination file operand after 'venus.dat' What is the problem? Use an ''echo'' command to print what the shell will do when it executes the ''cp'' command, like this: $ **for name in *.dat **> **do **> **\_\_\_echo cp $name $name2 **> **done** cp earth.dat cp jupiter.dat // ...etc... // cp uranus.dat cp venus.dat What happened to ''earth.dat2'', etc.? Variable names start with a letter which is followed by any number of letters and digits. The shell thinks that the "''2''" is part of the variable name; in other words, that the name of the variable in "''$name2''" is "''name2''". To fix this, use ''{'' and ''}'' around "''name''" to separate it from the "''2''". $ **for name in *.dat **> **do **> **\_\_\_cp $name ${name}2 **> **done** $ **ls** earth.dat mars.dat moon.dat pluto.dat uranus.dat earth.dat2 mars.dat2 moon.dat2 pluto.dat2 uranus.dat2 jupiter.dat mercury.dat neptune.dat saturn.dat venus.dat jupiter.dat2 mercury.dat2 neptune.dat2 saturn.dat2 venus.dat2 $ **rm *.dat2** === Using wildcards === Wildcards (''*'', ''?'', and ''[...]'') in a ''for'' loop's //list of things// are expanded as usual. What would be the results of running each of the following commands? for name in p*.dat do \_\_\_\_echo $name done for name in *p*.dat do \_\_\_\_echo $name done Predict the answers, then check them by actually running the commands. === Avoiding typing: interactive history === The up-arrow (or ''Control''+''p'') and down-arrow (or ''Control''+''n'') keys can be used to scroll through recent commands. The left-arrow (or ''Control''+''b'') and right-arrow (''Control''+''f'') keys let you move around inside a command. You can edit a previous command by deleting or inserting new content. Pressing ''Return'' re-runs the (edited) command. If you try this on a ''for'' loop you will notice that the loop has been recorded on single line. To do this the shell has inserted some semicolon "'';''" characters to separate the different parts of the loop. A semicolon has been inserted in approximately the places where a newline was in the original ''for'' loop. When viewed in the history our loop looks like this: $ **for name in *.dat **> **do **> **\_\_\_ls $name **> **done ** earth.dat jupiter.dat // ...etc... // uranus.dat venus.dat $ **''Control''+''P''** $ for name in *.dat; do ls $name; done === Putting the entire loop on a single line === The general form of a single-line ''for'' loop is: for //thing// in //list of things// **;** do //operation on thing// **;** //...etc...// **;** done The semicolons take the place of newlines in the single-line version. Either or both of the semicolons can be replaced by newlines; the shell does not care whether you use semicolons or newlines. Write the backup ''for'' loop again, all on one line. for name in *.dat; do cp $name backup-$name; done Delete the backup files. Add a command to ''echo'' the name of each file before copying it, still putting the entire ''for'' loop on a single line.. $ **rm backup-* ** $ **for name in *.dat; do echo $name; cp $name backup-$name; done** earth.dat jupiter.dat mars.dat mercury.dat moon.dat neptune.dat pluto.dat saturn.dat uranus.dat venus.dat $ **rm backup-* ** === Using redirection with loops === Let's print the 17th line of each file and redirect the output to another file. The following will not work: $ **for name in *.dat; do **> **\_\_\_head -n 17 $name | tail -1 > lines.txt **> **done ** $ **cat lines.txt** Mean Temperature (C) 464 The problem is that each time around the loop the ''>'' redirection //truncates// (empties) ''lines.txt'' before it writes the output from ''tail'' into it. There are two solutions to this problem. The first solution is to use another redirection operator, ''>>''. This operator //appends// lines to the output file instead of replacing its contents. $ **for name in *.dat; do **> **\_\_\_head -n 17 $name | tail -1 >> lines.txt **> **done ** $ **cat lines.txt** Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464 The second solution is to move the redirection outside the loop, so that every command executed inside the loop will all be part of a single output redirection. $ **for name in *.dat; do **> **\_\_\_head -n 17 $name | tail -1 **> **done > lines.txt** $ **cat lines.txt** Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464 === Challenges === The ''echo'' command normally prints a newline character after its arguments. If you use the option ''-n'' then this newline is not printed. This lets you use several ''echo -n'' commands to print several things on the same line. In the following example a semicolon '';'' is used (instead of newline) to separate two ''echo'' commands. The first ''echo'' command uses the option ''-n'' to prevent it printing the newline. Try running these commands with and without the ''-n'' to see the difference. $ **echo -n hello; echo "," world** hello, world $ **echo hello; echo "," world** hello , world In a ''for'' loop, the operation that is performed inside the loop can be another ''for'' loop. (This is called //nesting// loops.) For example: $ **for digit in {1..3}; do for letter in {a,b}; do echo $digit $letter; done; done** 1 a 1 b 2 a 2 b 3 a 3 b Arithmetic expansion is performed on any text written inside double parentheses after a ''$'' symbol, like this: "''$((//text//))''". The entire expression (from "''$''" to the closing "'')''") is replaced by the result of evaluating //text// as an arithmetic expression. Within //text// you can refer to variables without needing to use the ''$'' prefix. Some examples: $ **foo=32** $ **echo foo plus ten is $((foo + 10))** foo plus ten is 42 $ **N=1; for L in {a,b,c}; do echo $L$N; N=$((N+1)); done** a1 b2 c3 Write two nested ''for'' loops that print the following multiplication table: 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100 Don't worry about properly lining up the columns. ++++ Hint | $ **for thing in {a,b,c}; do **> **\_\_for other in {1..3}; do **> **\_\_ echo -n $thing **> **\_\_ echo -n $other **> **\_\_ echo -n " " **> **\_\_done **> **\_\_echo **> **done** a1 a2 a3 b1 b2 b3 c1 c2 c3 ++++ The ''echo'' command understands an option ''-e'' that replaces certain sequences of characters with other characters. One replacement that this enables is to convert "''\\t''" into a //tab// character. A tab moves the cursor forward to a column that is a multiple of 8. Modify your loops to line up the columns, like this:. 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100 ++++ Hint | $ **for i in {1..10}; do echo -ne $i\\t; done; echo** 1 2 3 4 5 6 7 8 9 10 ++++ === Summary === * A ''for'' loop repeats a command for every item in a list. * A ''for'' loop sets a variable to the next item in the list before running the loop //body//. * Use ''$//name//'' to expand a variable (i.e., get its value), or ''${//name//}'' if there are letters or digits immediately after the variable. * Use the up-arrow key to scroll up through previous commands, then edit and/or repeat them. * ''for'' loops can be written on one line by replacing newlines with semicolons. * ''for'' loops can be nested by writing a loop as the body of another loop. /**** TODO: Filesystem layout: standard directories ****/ /* ---------------- IN CLASS ---------------- * How to save and reuse commands. * How to use variables to store a name or value. * How to use aliases. * How to use shell scripts. * How to use variables in a shell script to access the command-line arguments. * Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion. * Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping. * Use Ctrl+R to search through the previously entered commands. * Use history to display recent commands, and ![number] to repeat a command by number. LOOPS performing the same action on many different files for thing in list of things do operation on $thing done how the prompt changes when waiting for additional input quoting to allow spaces and other funny characters in filenames using ECHO to understand what a loop is going to do before running it for real loop visualised as flowchart ; each execution of loop body visualised as echo process using semicolons to separated the parts of a command instead of newlines organising files into folders what happens if you redirect output to a file being used as input? sort -n lengths.txt > lengths.txt using >> to append output to a file extensions don't mean anything -- .txt is just a convention checking quality of data, removing damaged files * Most files’ names are something.extension. The extension isn’t required, and doesn’t guarantee anything, but is normally used to indicate the type of data in the file. * command >> [file] appends a command’s output to a file. * [first] | [second] is a pipeline: the output of the first command is used as the input to the second. * The best way to use the shell is to use pipes to combine simple single-purpose programs (filters). ---------------------------------------------------------------- NEXT ---------------------------------------------------------------- === Translating characters with the "tr" command === editing longer lines: Control + A E B F P N or arrow keys repeating earlier commands history | less history | grep ! to rerun a command Control + R to reverse search !! runs the previous command !$ is last word of previous command ESC-. inserts the last word of the previous command using ECHO to do a "dry run" protecting > and >> using "" in an ECHO argument nested loops * A for loop repeats commands once for every thing in a list. * Every for loop needs a variable to refer to the thing it is currently operating on. * Use $name to expand a variable (i.e., get its value). ${name} can also be used. * Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion. * Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping. * Use the up-arrow key to scroll up through previous commands to edit and repeat them. * Use Ctrl+R to search through the previously entered commands. * Use history to display recent commands, and ![number] to repeat a command by number. ---------------------------------------------------------------- shell scripts get middle of file using head and tail save the commands in a file.sh run the 'bash file.sh' replace the built-in file name with "$1" (double quotes around arguments to protect spaces) $@ means all of the arguments, and "$@" means all of the arguments, each one inside implicit double quotes bash options for debugging: -x * Save commands in files (usually called shell scripts) for re-use. * bash [filename] runs the commands saved in a file. * $@ refers to all of a shell script’s command-line arguments. * $1, $2, etc., refer to the first command-line argument, the second command-line argument, etc. * Place variables in quotes if the values might have spaces in them. * Letting users decide what files to process is more flexible and more consistent with built-in Unix commands. ---------------------------------------------------------------- finding things: grep and find grep options: -i -E -w anchoring expressions: ^ and $ command substitution: $() wc -l $(find . -name "*.txt") grep PATTERN $(find .. -name "*.txt") inverting matches: grep -v * find finds files with specific properties that match patterns. * grep selects lines in files that match patterns. * --help is an option supported by many bash commands, and programs that can be run from within Bash, to display more information on how to use these commands or programs. * man [command] displays the manual page for a given command. * $([command]) inserts a command’s output in place. ---------------------------------------------------------------- === Working with multiple files === Let's practice manipulating large numbers of data files using the shell. From the course web site you can download a archive file called ''metars-2019.tgz''. ++++ What is an archive? | An archive is a file that contains other files. You have probably already used a ''.zip'' file, which is popular on Windows. Another popular format is ''.tar'' and the compressed version ''.tgz'' (for '**t**ar' compressed with '**gz**ip'). ++++ The ''metars-2019.tgz'' archive contains aviation weather data for Japan. ??? HOW TO DOWLOAD THE FILE TO COMMAND LINE DIRECTORY? Download the file and then extract the files inside it with the command ''tar xf metars-2019.tgz''. ++++ How the ''tar'' command works | The command ''tar'' is short for **t**ape **ar**chive. We don't use magnetic tape to store data any more, but the program is still a popular alternative to ''.zip'' files. The first argument to ''tar'' tells it what to do. In this case ''x'' means e**x**tract files from an archive, and ''f'' means the archive should be read from a **f**ile whose name appears in the next argument. If you add ''v'' for **v**erbose it will also print each filename as it is extracted. ++++ Readings are collected from automated weather stations installed at Japanese airports. Every hour the data from these weather stations is collected and stored in a file. There are 8753 files in the archive (365 days x 24 hours per day = 8760, with a few omissions because of downtime). Each file is named according to the date and time that the data were collected. For example the file ''2019-01-01T00:53:57-japan.txt'' contains the data recorded on 2019/01/01 at 00:53:57 JST. What is the structure of each file? The 'structure' of a file means how the data is arranged within it. Let's look in file ''2019-01-01T01:53:58-japan.txt'' to see what the structure looks like. You can use ''cat 2019-01-01T01:53:58-japan.txt'' to do this, but the file is long (85 lines). To see it one 'page' at a time use the command ''less 2019-01-01T00:53:57-japan.txt''. + is every file of the correct structure? + are there any unreliable data files due to network or system failures? + how many stations are there? + how can we reorganise the data to make it more useful? + how can we simplify the data for analysis? Challenge: finding files and directories by type ================================================ The 'find' command finds files and directories by name or by property. The default action of 'find' is to print the path of the files/directories it finds. The general form of the command is: 'find directory -property value' One option for -property is '-type' which understands a value of 'd' or 'f'. So, 'find . -type d' looks in the current directory (.) and finds all directories (-type d) and 'find . -type f' looks in the current directory (.) and finds all regular files (-type f). Assume your current working directory is your home directory. ** Q.11 What command pipeline will count the number of directories under '/usr/lib' that have the digit '2' somewhere in their name? ________________________________________________________________ */ /* === Resources === Software Carpentry, etc. Using the command line puts you in control at the level of the operating system and other fundamental processes that make it work. Many operations and options that are not accessible using a graphical interface (Windows Explorer, Mac Finder, etc.) become accessible to you on the command line. Developers, engineers, scientists, and researchers all use the command line to make themselves faster and more effective (and happier) than would be using only graphical interfaces. Developers love command lines because they can invoke their tools directly with no options hidden from them by a 'development environment'. Full access to debuggers, debugging facilities, and debugging information is possible. Many low-level tools (e.g., for simulation) don't even have a graphical interface, and so using the command line is //the// way to interact with them. Scientists and researchers use the command line to help manage and interact with data, even if that involves hundreds of thousands of files and directories. The facilities of the command line, such as bulk operations on files, make it easy to manage and work with that amount of data. Command line tools can very quickly and easily be customised and combined to make your own new tools that perform powerful operations. Converting data between different formats and representations is much easier on the command line than in a 'pointy-cicky' interface. Command line users are faster, more efficient, more effective, and as a result //happier// than GUI users. */ /* syllabus */ /* * Local Variables: * eval: (flyspell-mode) * eval: (ispell-change-dictionary "british") * eval: (flyspell-buffer) * End: */