~~NOCACHE~~

===== Working with multiple files and directories =====

{{page>css&nodate&noeditbtn&nofooter}}

The notes below include several exercises with answers that introduce new concepts.
Many of these concepts will be used in this week's in-class assignment.

Read the notes and try to complete each of the
exercises //without// looking at the sample answer.
If you cannot complete an exercise using a few short commands then
read the sample answer, practice it, and make sure you
understand it //before// continuing.

==== Review ====

Review of important concepts:

  * The file system manages the storage of data on the disk.
  * Files contain data.
  * Directories contain files or other directories, forming a directory tree.
  * ''cd //path//'' changes the current working directory.
  * ''ls //path//'' lists infomration about a file or directory.  With no argument, ''ls'' lists the files in the current working directory.
  * ''pwd'' prints the current working directory.
  * ''/'' at the start of a path means the //root// directory at the 'top' of the filesystem.
  * An //absolute path// specifies a location starting from the root directory (and therefore always begins with ''/'').
  * A //relative path// specifies a location starting from the current working directory.
  * Directory names in a path are separated by ''/'' characters.
  * "''..''" is the name of the parent directory;
    "''.''" is the name the current directory.

==== Copying directories ====

The command ''cp //files//... //directory//'' copies one or more //files// into //directory//.
If any of the ''files'' happen to be directories then the ''cp'' command will fail.

To copy an entire directory (recursively) use ''cp'' with the ''-r'' option.

The ''cp -r //files//... //directory//'' command copies one or more //files// into //directory//.
If any of the ''files'' are directories then first the directory is copied along with
all of its contents.

Let's practice on a simple directory hierarchy.

<wrap exercise>
Use the ''mkdir'' and ''echo'' commands to recreate the ''dir1'' directory
and its three files as shown in the diagram.
The content of the three files is not important.
</wrap>

{{  07-dir1-bb.png?473  }}

<WRAP shell>
$ **cd /tmp**
$ **mkdir dir1**
$ **echo 1 > dir1/file1**
$ **echo 2 > dir1/file2**
$ **echo 3 > dir1/file3**
$ **ls -lR dir1**
dir1:
total 48
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:15 file1.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:15 file2.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:15 file3.txt
</WRAP>

<wrap exercise>
Use ''cp -rv'' (**r**ecursive and **v**erbose)
to copy the entire directory ''dir1'' to a new directory tree called ''dir2''.
</wrap>

<WRAP shell>
$ **cp -rv dir1 dir2**
'dir1' -> 'dir2'
'dir1/file3.txt' -> 'dir2/file3.txt'
'dir1/file2.txt' -> 'dir2/file2.txt'
'dir1/file1.txt' -> 'dir2/file1.txt'
</WRAP>

Because ''dir2'' does not yet exist, it is first created in the current directory and then the contents of ''dir1'' are copied to ''dir2''.
The ''-v'' option shows you the directory being created and the files being copied.

<wrap exercise>
What will happen if you run the same ''cp -rv dir1 dir2'' command again?
</wrap>

<WRAP shell>
$ **cp -rv dir1 dir2**
'dir1' -> 'dir2/dir1'
'dir1/file3.txt' -> 'dir2/dir1/file3.txt'
'dir1/file2.txt' -> 'dir2/dir1/file2.txt'
'dir1/file1.txt' -> 'dir2/dir1/file1.txt'
$ **ls -lR dir2**
dir2:
total 64
drwxr-xr-x 2 piumarta dialout 170 Oct 26 05:57 dir1
-rw-r--r-- 1 piumarta dialout   2 Oct 26 05:54 file1.txt
-rw-r--r-- 1 piumarta dialout   2 Oct 26 05:54 file2.txt
-rw-r--r-- 1 piumarta dialout   2 Oct 26 05:54 file3.txt
<html> </html>
dir2/dir1:
total 48
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:57 file1.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:57 file2.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:57 file3.txt
</WRAP>

Because ''dir2'' already exists, ''dir1'' is copied into ''dir2'';
the new copy of ''dir1'' does not replace ''dir2''.

==== Removing directories ====

The ''rmdir //dir//'' command removes the directory //dir//.

<wrap exercise>Try removing ''dir1''.</wrap>

<WRAP shell>
$ **rmdir dir1**
rmdir: failed to remove 'dir1': Directory not empty
</WRAP>

A directory must be empty before it can be removed.

You could remove the files ''dir1/file1.txt'', ''dir1/file2.txt'', and ''dir1/file3.txt''
one at a time but that would be tedious.
Instead, remove all three at the same time using a //wildcard//.
The path ''dir1/*'' expands to all three of the files in ''dir1''.
If you use ''rm -v dir1/*'' (''-v'' for **v**erbose)
then each name will be printed as it is removed.
Once the three files are removed you will he able to remove their parent directory ''dir1''.

<wrap exercise>
Use ''rm -v dir1/*'' to remove all the files in ''dir1''.
</wrap>

<WRAP shell>
$ **ls dir1**
file1.txt  file2.txt  file3.txt
$ **rm -v dir1/* **
removed 'dir1/file1.txt'
removed 'dir1/file2.txt'
removed 'dir1/file3.txt'
$ **rmdir dir1**
$ **ls dir1**
ls: cannot access 'dir1': No such file or directory
</WRAP>

We still have ''dir2'' which contains three files and a copy of the original
''dir1'' (with three more files inside that directory).
The ''*'' wildcard is less useful when removing this many files.
Instead you can use ''rm -r'' (''-r'' for **r**ecursive) which
will remove the contents of a directory before removing the directory itself.

<wrap exercise>
Use ''rm -r dir2'' to remove ''dir2'' and all of its contents.
</wrap>

<WRAP shell>
$ **ls -F dir2**
dir1/  file1.txt  file2.txt  file3.txt
$ **rm -r dir2**
$ **ls dir2**
ls: cannot access 'dir2': No such file or directory
</WRAP>

<WRAP danger><html><label>WARNING!</label></html>
When you delete a file from the command line it is gone //forever//.
There is no 'trash can' that collects deleted files.
There is no way to restore a deleted file later if you change your mind.</WRAP>

==== Wildcards ====

In the exercises above the argument ''dir2/*'' matched all the filenames in ''dir2''.
The shell //expanded// the pattern ''dir2/*'' into three separate arguments: ''dir2/file1'',  ''dir2/file2'', and ''dir2/file3''.

The ''*'' character actually matches any sequence of characters (zero or more) except ''/''.
You can use it to match 'anything' in a part of a filename.
You can also use it more than once to match 'anything' in several different parts of a filename.

<wrap exercise>
List all files in ''/etc'' that begin with ''b'', that end with ''.conf'', or that have a ''.'' anywhere in their name.
</wrap>

<WRAP shell>
$ **ls /etc/b* **
/etc/baseprofile      /etc/bash_completion
$ **ls /etc/*.conf**
/etc/nsswitch.conf
$ **ls -d /etc/*.* **
/etc/init.d              /etc/nsswitch.conf       /etc/rebase.db.i386      /etc/vimrc.less
/etc/minirc.dfl          /etc/persistprofile.sh   /etc/sessionsaliases.sh  /etc/xmodmap.esc
</WRAP>

Another useful wildcard character is ''?'' which matches exactly one of any character (except ''/'').

<wrap exercise>
List all files in ''/etc'' that have an ''o'' and an ''f'' in their name separated by exactly one other character (it does not matter which character).
</wrap>

<WRAP shell>
$ **ls /etc/*o?f* **
/etc/nsswitch.conf  /etc/ssh_config
</WRAP>

One more useful wildcard pattern is ''[//chars//]'' which matches exactly one of any of the //chars// listed between the square brackets.

<wrap exercise>
List all files in ''/etc'' that have a two consecutive vowels ('a', 'e', 'i', 'o', or 'u') in their name.
</wrap>

<WRAP shell>
$ **ls -d /etc/*[aeiou][aeiou]* **
/etc/bash_completion     /etc/defaults            /etc/screenrc            /etc/version
/etc/bash_completion.d   /etc/group               /etc/sessionsaliases.sh
</WRAP>

When the //chars// contains a range of consecutive characters, you can specify the entire range using "''//first//-//last//''".

<wrap exercise>
Use the "''[//first//-//last//]''" pattern to list all files in ''/etc'' whose name contains at least one digit.
</wrap>

<WRAP shell>
$ **ls -d /etc/*[0-9]* **
/etc/X11             /etc/at-spi2         /etc/dbus-1
/etc/gtk-3.0         /etc/pkcs11          /etc/rebase.db.i386
</WRAP>

The wildcard patterns explained above are expanded by the shell according to the files that actually exist in the filesystem.
What happens if you use a wildcard pattern that does not match any files?

<wrap exercise>
Try to delete some non-existent 'log' files: ''dir1/*.log''.
</wrap>

<WRAP shell>
$ **rm dir/*.log**
rm: can't remove 'dir/<nowiki>*</nowiki>.log': No such file or directory
</WRAP>

If the wildcard pattern does not match any files, it is simply left //unexpanded//.
When the command tries to access a file named by a wildcard expression, the file does not exist and an error message is generated.


==== Dry runs: using "echo" to preview commands ====

A 'dry run' is a rehearsal or practice that takes place before the real performance.
In computing, a dry run shows you what a command //would// do but without actually doing it.
One example of how useful they are is to see what files would be matched by wildcard patterns, for example before actually removing them.

For the next exercise, set up your ''dir1'' directory as above, containing six files:
  * three text files ''file1.txt'',  ''file2.txt'', and ''file3.txt'', containing the words ''think'',  ''for'',  and ''yourself'';
  * three data files ''file1.dat'',  ''file2.dat'', and ''file3.dat'', containing the number of characters in the corresponding .''txt'' files.

<WRAP shell>
$ **mkdir dir1**
$ **echo think    > dir1/file1.txt**
$ **echo for      > dir1/file2.txt**
$ **echo yourself > dir1/file3.txt**
$ **wc -c dir1/file1.txt > dir1/file1.dat**
$ **wc -c dir1/file2.txt > dir1/file2.dat**
$ **wc -c dir1/file3.txt > dir1/file3.dat**
$ **ls -l dir1**
total 3
-rw-r--r--    1 user     UsersGrp        17 Oct 26 16:51 file1.dat
              -rw-r--r--    1 user     UsersGrp         6 Oct 26 16:51 file1.txt
-rw-r--r--    1 user     UsersGrp        17 Oct 26 16:51 file2.dat
-rw-r--r--    1 user     UsersGrp         4 Oct 26 16:51 file2.txt
-rw-r--r--    1 user     UsersGrp        17 Oct 26 16:51 file3.dat
-rw-r--r--    1 user     UsersGrp         9 Oct 26 16:51 file3.txt
</WRAP>

<wrap exercise>Use</wrap> the ''echo'' command to perform a dry-run of removing:
  * all the ''.txt'' files in ''dir1'',
  * all the ''.dat'' files in ''dir1'',
  * the ''.txt'' and ''.dat'' files for only ''file2'' (two files in total),
  * the ''.txt'' and ''.dat'' files for ''file1'' and''file3'' (four files in total).

<WRAP shell>
$ **echo rm dir1/*.txt **
rm dir1/file1.txt dir1/file2.txt dir1/file3.txt
$ **echo rm dir1/*.dat **
rm dir1/file1.dat dir1/file2.dat dir1/file3.dat
$ **echo rm dir1/file2.* **
rm dir1/file2.dat dir1/file2.txt
$ **echo rm dir1/file[13].* **
rm dir1/file1.dat dir1/file1.txt dir1/file3.dat dir1/file3.txt
</WRAP>

++++ Why is it called a 'dry run'? |
Fire departments run practice sessions in which fire engines are dispatched, fire hoses are deployed, but water is not actually pumped onto a fire.
Since the exercise performs all the actions of fire-fighting //except// pumping water onto a fire, it is literally a 'dry' run.
++++

==== Creating files and updating timestamps ====

The ''touch'' command updates the last modification time of an existing file to be the current date and time.
If the file does not exist, an empty file is created.

<wrap exercise>
Create two empty files called ''file1'' and ''file2''.
</wrap>

<WRAP shell>
$ **cd dir1**
$ **ls -lt file[12]**
ls: file[12]: No such file or directory
$ touch file1 file2
$ **ls -lt file[12]**
-rw-r--r--    1 user     UsersGrp         0 Oct 26 18:33 file1
-rw-r--r--    1 user     UsersGrp         0 Oct 26 18:33 file2
$ touch file2
$ **ls -lt file[12]**
-rw-r--r--    1 user     UsersGrp         0 Oct 26 18:33 file2
-rw-r--r--    1 user     UsersGrp         0 Oct 26 18:33 file1
$ touch file1
$ **ls -lt file[12]**
-rw-r--r--    1 user     UsersGrp         0 Oct 26 18:33 file1
-rw-r--r--    1 user     UsersGrp         0 Oct 26 18:33 file2
</WRAP>


Note how ''touch''ing a file moves it to the top of the 'most recent' list (''ls -t'').

==== Generating path names using brace expressions ====

Wildcards are used to match existing file names.
They cannot be used to generate file names for non-existent files or directories, for example, to create a set of needed files or directories.

<wrap exercise>
Try using a wildcard to create ten empty files called ''test0'', ''test1'', ''test2'', ..., ''test9''.
</wrap>

<WRAP shell>
$ **touch test[0123456789]**
$ **ls test* **
test[0123456789]
</WRAP>

Creating a single file called ''test[0123456789]'' is not what you intended.
That is what happened because the shell could not find any existing file to match
the pattern ''test[0123456789]'' and so did not expand it in the command line.

A //brace expression// will generate multiple //words// based on a list or sequence of values.
The list of values to generate is written between curly braces ''{'' and ''}''
with items in the list separated by commas.
For example, the expression ''{a,b,c}'' generates three separate words ''a'', ''b'', and ''c''.
The brace expression can appear in a larger pattern,
for example, the expression ''p{a,b,c}q'' generates three separate words
''paq'', ''pbq'', and ''pcq''.

<wrap exercise>
Use a brace expression to generate the command needed to create the five files
''test0.txt'' to ''test4.txt''.
</wrap>

<WRAP shell>
$ **touch test{0,1,2,3,4}.txt**
$ **ls test* **
test0.txt  test1.txt  test2.txt  test3.txt  test4.txt
</WRAP>

When a //sequence// of numbers or letters are needed then the list can contain
just the first and last values separated by ''..''.
This is called a //sequence expression//.
For example, the sequence expression ''p{a..z}q'' generates a list of 26 words,
starting with ''paq'' and ''pbq'', and ending with ''pyq'' and ''pzq''.

<wrap exercise>
Use a brace expression to generate the command needed to create the five files
''test5.txt'' to ''test9.txt''.
</wrap>

<WRAP shell>
$ **touch test{5..9}.txt**
$ **ls test* **
test0.txt  test1.txt  test2.txt  test3.txt  test4.txt
test5.txt  test6.txt  test7.txt  test8.txt  test9.txt
</WRAP>

In a sequence expression that generates numbers, the first value in the sequence
sets the minimum width of the generated numbers.
This is useful if leading ''0''s are needed.
For example, the following sequence expressions generate lists of 100 words:
  * ''test{0..99}'' generates ''test0'', ''test1'', ... , ''test98'', ''test99'', and
  * ''tt{000..099}'' generates ''tt000'', ''tt001'', ... , ''tt098'', ''tt099'', and
  * ''t{00000..99}'' generates ''t00000'', ''t00001'', ... , ''t00098'', ''t00099''.


==== CSV files and the "cut" command ====

Text files are often used as simple 'databases' for storing captured sensor data, the results of data processing, etc.
The shell provides several commands for manipulating data stored in this kind of text file.

A comma-separated value (CSV) file is one example of this kind of text file database.
Each line is a record and each field in that record is separated from the next with a specified delimiter character.
In a CSV file the delimiter is a comma, "'',''".

The ''cut'' command selects and prints fields from exactly this kind of text file.
By default it uses a 'tab' character to separated fields (just as a copy-paste operation between Excel and a text editor does) but this can be changed using a command line option.
''cut'' has the following command line options:
  * ''-d //character//'' specifies the delimiter //character//.  To manipulate CSV files, use: "''cut -d ,''"
  * ''-f //fields//'' tells ''cut'' which of the fields you want to print.  Fields are numbered, starting at 1, and //fields// can contain multiple fields separated by commas.

Create a CSV file called ''directory.txt'' that contains the following data.
(The easiest way is to copy the text it from this web page and paste it into a text editor,
or into "''cat > directory.txt''" followed by <wrap key>Control</wrap>+<wrap key>D</wrap> to simulate end-of-file.)

  name,given,office,phone,lab,phone
  Adams,Douglas,042,0042,092,0092
  Kay,Alan,301,3001,351,3051
  Knuth,Donald,201,2001,251,2051
  Lee,Tim,404,4004,454,4054
  McCarthy,John,202,2002,252,2052
  Shannon,Claude,304,3004,351,3051
  Vinge,Vernor,302,3003,352,3053

<wrap exercise>
Use the ''cut'' command to extract just the "office" column from the data.
</wrap>

<WRAP shell>
$ **cut -d , -f 3 directory.txt**
office
042
301
201
404
202
304
302
</WRAP>

The ''tail'' command has an option to print a file starting at a specific line number.
The syntax is: "''tail -n +//number//''".
For example, "''tail -n +5 //file//''" will print the contents of //file// starting from the 5th line in the file.

<wrap exercise>
Pipe (''|'') the output from the previous command into ''tail''.
Use the ''tail -n +//number//'' option to print the input starting at line number 2.
</wrap>

<WRAP shell>
$ **cut -d , -f 3 directory.txt | tail -n +2**
042
301
201
404
202
304
302
</WRAP>

The ''grep'' command understands the similar wildcard patterns to the shell.
(The shell uses them to filter file names and ''grep'' uses them to filter or select lines of text.)

Each office number in our sample data is three digits long.
The first digit says which floor the office is on.
One way to extract just the office numbers on the second floor is to use ''grep'' to search for numbers matching the pattern "''2[0-9][0-9]''".
You can then count how many offices are on the second floor using "''wc -l''".

<wrap exercise>
Write a pipeline of commands that prints how many offices are located on the third floor.
Try very hard to do this without looking at the sample answer.
If you cannot find the solution, click on the link below to view the answer.
</wrap>

++++ Sample answer |

<WRAP shell>
$ **cut -d , -f 3 directory.txt | tail -n +2 | grep '3[0-9][0-9]' | wc -l**
3
</WRAP>

If this does not make sense, look at the output from each stage of the pipeline.

<WRAP shell>
$ **cut -d , -f 3 directory.txt**
office
042
301
201
404
202
304
302
$ **cut -d , -f 3 directory.txt  | tail -n +2**
042
301
201
404
202
304
302
$ **cut -d , -f 3 directory.txt  | tail -n +2 | grep '3[0-9][0-9]'**
301
304
302
$ **cut -d , -f 3 directory.txt  | tail -n +2 | grep '3[0-9][0-9]' | wc -l**
3
</WRAP>

++++

==== Summary ====

  * ''echo > //file//''               can be used to create a //file// containing a line of data.
  * ''touch //file//''                can be used to create an empty //file// or to update its modification time to 'now'.
  * ''mkdir //directory//''           creates a new //directory//.
  * ''cp //oldfile// //newfile//''    copies (duplicates) //oldfile// to //newfile//.
  * ''cp //files...// //directory//'' copies one or more //files// (or directories) into an existing //directory//.
  * ''rm //files...//''               removes (deletes) //files//.
  * ''rmdir //directory//''           removes (deletes) a //directory// which **must** be empty.
  * ''rm -r //directory//''           removes (deletes) a //directory// and all its contents, recursively.
  * "''*''"                           in a file name matches zero or more characters, so "''*.txt''" matches all files ending in "''.txt''".
  * "''?''                            in a file name matches any single character, so "''?.txt''" matches "''a.txt''"" but //not// "''any.txt''".
  * "''[//characters//']''            in a file name matches any one of the //characters//, so "''[aeiou].txt''" matches "''a.txt''"" but //not// "''b.txt''".
  * "''[//first//-//last//']''        in a file name matches any character in the range //first// to //last//, so "''*[a-m].txt''" matches "''boa.txt''"" but //not// "''constrictor.txt''".
  * Wildcards (''*'', ''?'', ''[]'')  are expanded by the shell to match files that //already exist//.  They cannot generate new (non-existent) file names.
  * ''{a,b,c}''                       expands to three words: ''a'', ''b'', and ''c''.
  * ''p{a,b,c}q{x,y,z}r''             expands to nine words: ''paqxr paqyr paqzr pbqxr pbqyr pbqzr pcqxr pcqyr pcqzr''
  * ''{000..5}.txt''                  expands to six words: ''000.txt 001.txt 002.txt 003.txt 004.txt 005.txt''
  * ''tail -n +//number//''           displays input starting at line //number// (and continuing until the last line).
  * There is no 'trash': when a file or directory is deleted it is gone immediately and forever.
  * ''cut -d //char// -f //fields//'' prints the given //fields// from its input lines using //char// as the field delimiter.
    The //fields// are numbered from 1 and multiple field numbers are separated by commas.

/* ---------------- IN CLASS ----------------


mv //oldfile// //newfile//      moves (renames) a file or directory.
mv //files...// //directory//   moves one or more //files// (or directories) into an existing //directory//.

doing wc -l on data files

saving output in lengths.txt

see lengths.txt page by page

analyse lengths using sort -n

use head and tail to find longest and shortest files

organising files into folders

what happens if you redirect output to a file being used as input?
sort -n lengths.txt > lengths.txt


using >> to append output to a file


pipelines and understanding what text flows through each step of the pipeline


extensions don't mean anything -- .txt is just a convention


combining cut sort uniq wc


checking quality of data, removing damaged files


  * Most files’ names are something.extension. The extension isn’t required, and doesn’t guarantee anything, but is normally used to indicate the type of data in the file.
  * command >> [file] appends a command’s output to a file.
  * [first] | [second] is a pipeline: the output of the first command is used as the input to the second.
  * The best way to use the shell is to use pipes to combine simple single-purpose programs (filters).


*/

/*
----------------------------------------------------------------
			NEXT
----------------------------------------------------------------


==== Translating characters with the "tr" command ====


LOOPS

performing the same action on many different files

for thing in list of things
do
    operation on $thing
done

how the prompt changes when waiting for additional input

variables, word vs $word vs ${word}

using variables in loops -- operation (ls)

using > inside a loop vs >> vs > after the loop


quoting to allow spaces and other funny characters in filenames


cp file-*.txt backup-file-*.txt
  !=
for i in files-*; do cp $i backup-$i.txt; done


using ECHO to understand what a loop is going to do before running it for real


loop visualised as flowchart ; each execution of loop body visualised as echo process


using semicolons to separated the parts of a command instead of newlines


editing longer lines: Control + A E B F P N or arrow keys

repeating earlier commands
history | less
history | grep
!<number> to rerun a command

Control + R to reverse search

!! runs the previous command

!$ is last word of previous command

ESC-. inserts the last word of the previous command


using ECHO to do a "dry run"

protecting > and >> using "" in an ECHO argument


nested loops


  * A for loop repeats commands once for every thing in a list.
  * Every for loop needs a variable to refer to the thing it is currently operating on.
  * Use $name to expand a variable (i.e., get its value). ${name} can also be used.
  * Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion.
  * Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping.
  * Use the up-arrow key to scroll up through previous commands to edit and repeat them.
  * Use Ctrl+R to search through the previously entered commands.
  * Use history to display recent commands, and ![number] to repeat a command by number.

----------------------------------------------------------------

shell scripts

get middle of file using head and tail

save the commands in a file.sh

run the 'bash file.sh'

replace the built-in file name with "$1" (double quotes around arguments to protect spaces)

$@ means all of the arguments, and "$@" means all of the arguments, each one inside implicit double quotes

bash options for debugging: -x

  * Save commands in files (usually called shell scripts) for re-use.
  * bash [filename] runs the commands saved in a file.
  * $@ refers to all of a shell script’s command-line arguments.
  * $1, $2, etc., refer to the first command-line argument, the second command-line argument, etc.
  * Place variables in quotes if the values might have spaces in them.
  * Letting users decide what files to process is more flexible and more consistent with built-in Unix commands.

----------------------------------------------------------------

finding things: grep and find

grep options: -i -E -w

anchoring expressions: ^ and $

command substitution: $()

wc -l $(find . -name "*.txt")

grep PATTERN $(find .. -name "*.txt")

inverting matches: grep -v

  * find finds files with specific properties that match patterns.
  * grep selects lines in files that match patterns.
  * --help is an option supported by many bash commands, and programs that can be run from within Bash, to display more information on how to use these commands or programs.
  * man [command] displays the manual page for a given command.
  * $([command]) inserts a command’s output in place.

----------------------------------------------------------------


==== Working with multiple files ====

Let's practice manipulating large numbers of data files using the shell.

From the course web site you can download a archive file called ''metars-2019.tgz''.

++++ What is an archive? |
An archive is a file that contains other files.
You have probably already used a ''.zip'' file, which is popular on Windows.
Another popular format is ''.tar'' and the compressed version ''.tgz'' (for '**t**ar' compressed with '**gz**ip').
++++

The ''metars-2019.tgz'' archive contains aviation weather data for Japan.
??? HOW TO DOWLOAD THE FILE TO COMMAND LINE DIRECTORY?
Download the file and then extract the files inside it with the command ''tar xf metars-2019.tgz''.

++++ How the ''tar'' command works |
The command ''tar'' is short for **t**ape **ar**chive.
We don't use magnetic tape to store data any more, but the program is still a popular alternative to ''.zip'' files.
The first argument to ''tar'' tells it what to do.
In this case ''x'' means e**x**tract files from an archive, and
''f'' means the archive should be read from a **f**ile whose name appears in the next argument.
If you add ''v'' for **v**erbose it will also print each filename as it is extracted.
++++

Readings are collected from automated weather stations installed at Japanese airports.
Every hour the data from these weather stations is collected and stored in a file.
There are 8753 files in the archive (365 days x 24 hours per day = 8760, with a few omissions because of downtime).
Each file is named according to the date and time that the data were collected.
For example the file ''2019-01-01T00:53:57-japan.txt'' contains the data recorded on 2019/01/01 at 00:53:57 JST.

What is the structure of each file?

The 'structure' of a file means how the data is arranged within it.
Let's look in file ''2019-01-01T01:53:58-japan.txt'' to see what the structure looks like.
You can use ''cat 2019-01-01T01:53:58-japan.txt'' to do this, but the file is long (85 lines).
To see it one 'page' at a time use the command ''less 2019-01-01T00:53:57-japan.txt''.


+ is every file of the correct structure?

+ are there any unreliable data files due to network or system failures?

+ how many stations are there?

+ how can we reorganise the data to make it more useful?

+ how can we simplify the data for analysis?


Challenge: finding files and directories by type
================================================

The 'find' command finds files and directories by name or by property.
The default action of 'find' is to print the path of the files/directories it finds.

The general form of the command is: 'find directory -property value'

One option for -property is '-type' which understands a value of 'd' or 'f'.
So, 'find . -type d' looks in the current directory (.) and finds all directories (-type d)
and 'find . -type f' looks in the current directory (.) and finds all regular files (-type f).

Assume your current working directory is your home directory.

** Q.11 What command pipeline will count the number of directories under
   	'/usr/lib' that have the digit '2' somewhere in their name?

        ________________________________________________________________


*/