~~NOCACHE~~
===== Working with multiple files and directories =====
{{page>css&nodate&noeditbtn&nofooter}}
The notes below include several exercises with answers that introduce new concepts.
Many of these concepts will be used in this week's in-class assignment.
Read the notes and try to complete each of the
exercises //without// looking at the sample answer.
If you cannot complete an exercise using a few short commands then
read the sample answer, practice it, and make sure you
understand it //before// continuing.
==== Review ====
Review of important concepts:
* The file system manages the storage of data on the disk.
* Files contain data.
* Directories contain files or other directories, forming a directory tree.
* ''cd //path//'' changes the current working directory.
* ''ls //path//'' lists infomration about a file or directory. With no argument, ''ls'' lists the files in the current working directory.
* ''pwd'' prints the current working directory.
* ''/'' at the start of a path means the //root// directory at the 'top' of the filesystem.
* An //absolute path// specifies a location starting from the root directory (and therefore always begins with ''/'').
* A //relative path// specifies a location starting from the current working directory.
* Directory names in a path are separated by ''/'' characters.
* "''..''" is the name of the parent directory;
"''.''" is the name the current directory.
==== Copying directories ====
The command ''cp //files//... //directory//'' copies one or more //files// into //directory//.
If any of the ''files'' happen to be directories then the ''cp'' command will fail.
To copy an entire directory (recursively) use ''cp'' with the ''-r'' option.
The ''cp -r //files//... //directory//'' command copies one or more //files// into //directory//.
If any of the ''files'' are directories then first the directory is copied along with
all of its contents.
Let's practice on a simple directory hierarchy.
Use the ''mkdir'' and ''echo'' commands to recreate the ''dir1'' directory
and its three files as shown in the diagram.
The content of the three files is not important.
{{ 07-dir1-bb.png?473 }}
$ **cd /tmp**
$ **mkdir dir1**
$ **echo 1 > dir1/file1**
$ **echo 2 > dir1/file2**
$ **echo 3 > dir1/file3**
$ **ls -lR dir1**
dir1:
total 48
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:15 file1.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:15 file2.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:15 file3.txt
Use ''cp -rv'' (**r**ecursive and **v**erbose)
to copy the entire directory ''dir1'' to a new directory tree called ''dir2''.
$ **cp -rv dir1 dir2**
'dir1' -> 'dir2'
'dir1/file3.txt' -> 'dir2/file3.txt'
'dir1/file2.txt' -> 'dir2/file2.txt'
'dir1/file1.txt' -> 'dir2/file1.txt'
Because ''dir2'' does not yet exist, it is first created in the current directory and then the contents of ''dir1'' are copied to ''dir2''.
The ''-v'' option shows you the directory being created and the files being copied.
What will happen if you run the same ''cp -rv dir1 dir2'' command again?
$ **cp -rv dir1 dir2**
'dir1' -> 'dir2/dir1'
'dir1/file3.txt' -> 'dir2/dir1/file3.txt'
'dir1/file2.txt' -> 'dir2/dir1/file2.txt'
'dir1/file1.txt' -> 'dir2/dir1/file1.txt'
$ **ls -lR dir2**
dir2:
total 64
drwxr-xr-x 2 piumarta dialout 170 Oct 26 05:57 dir1
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:54 file1.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:54 file2.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:54 file3.txt
dir2/dir1:
total 48
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:57 file1.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:57 file2.txt
-rw-r--r-- 1 piumarta dialout 2 Oct 26 05:57 file3.txt
Because ''dir2'' already exists, ''dir1'' is copied into ''dir2'';
the new copy of ''dir1'' does not replace ''dir2''.
==== Removing directories ====
The ''rmdir //dir//'' command removes the directory //dir//.
Try removing ''dir1''.
$ **rmdir dir1**
rmdir: failed to remove 'dir1': Directory not empty
A directory must be empty before it can be removed.
You could remove the files ''dir1/file1.txt'', ''dir1/file2.txt'', and ''dir1/file3.txt''
one at a time but that would be tedious.
Instead, remove all three at the same time using a //wildcard//.
The path ''dir1/*'' expands to all three of the files in ''dir1''.
If you use ''rm -v dir1/*'' (''-v'' for **v**erbose)
then each name will be printed as it is removed.
Once the three files are removed you will he able to remove their parent directory ''dir1''.
Use ''rm -v dir1/*'' to remove all the files in ''dir1''.
$ **ls dir1**
file1.txt file2.txt file3.txt
$ **rm -v dir1/* **
removed 'dir1/file1.txt'
removed 'dir1/file2.txt'
removed 'dir1/file3.txt'
$ **rmdir dir1**
$ **ls dir1**
ls: cannot access 'dir1': No such file or directory
We still have ''dir2'' which contains three files and a copy of the original
''dir1'' (with three more files inside that directory).
The ''*'' wildcard is less useful when removing this many files.
Instead you can use ''rm -r'' (''-r'' for **r**ecursive) which
will remove the contents of a directory before removing the directory itself.
Use ''rm -r dir2'' to remove ''dir2'' and all of its contents.
$ **ls -F dir2**
dir1/ file1.txt file2.txt file3.txt
$ **rm -r dir2**
$ **ls dir2**
ls: cannot access 'dir2': No such file or directory
When you delete a file from the command line it is gone //forever//.
There is no 'trash can' that collects deleted files.
There is no way to restore a deleted file later if you change your mind.
==== Wildcards ====
In the exercises above the argument ''dir2/*'' matched all the filenames in ''dir2''.
The shell //expanded// the pattern ''dir2/*'' into three separate arguments: ''dir2/file1'', ''dir2/file2'', and ''dir2/file3''.
The ''*'' character actually matches any sequence of characters (zero or more) except ''/''.
You can use it to match 'anything' in a part of a filename.
You can also use it more than once to match 'anything' in several different parts of a filename.
List all files in ''/etc'' that begin with ''b'', that end with ''.conf'', or that have a ''.'' anywhere in their name.
$ **ls /etc/b* **
/etc/baseprofile /etc/bash_completion
$ **ls /etc/*.conf**
/etc/nsswitch.conf
$ **ls -d /etc/*.* **
/etc/init.d /etc/nsswitch.conf /etc/rebase.db.i386 /etc/vimrc.less
/etc/minirc.dfl /etc/persistprofile.sh /etc/sessionsaliases.sh /etc/xmodmap.esc
Another useful wildcard character is ''?'' which matches exactly one of any character (except ''/'').
List all files in ''/etc'' that have an ''o'' and an ''f'' in their name separated by exactly one other character (it does not matter which character).
$ **ls /etc/*o?f* **
/etc/nsswitch.conf /etc/ssh_config
One more useful wildcard pattern is ''[//chars//]'' which matches exactly one of any of the //chars// listed between the square brackets.
List all files in ''/etc'' that have a two consecutive vowels ('a', 'e', 'i', 'o', or 'u') in their name.
$ **ls -d /etc/*[aeiou][aeiou]* **
/etc/bash_completion /etc/defaults /etc/screenrc /etc/version
/etc/bash_completion.d /etc/group /etc/sessionsaliases.sh
When the //chars// contains a range of consecutive characters, you can specify the entire range using "''//first//-//last//''".
Use the "''[//first//-//last//]''" pattern to list all files in ''/etc'' whose name contains at least one digit.
$ **ls -d /etc/*[0-9]* **
/etc/X11 /etc/at-spi2 /etc/dbus-1
/etc/gtk-3.0 /etc/pkcs11 /etc/rebase.db.i386
The wildcard patterns explained above are expanded by the shell according to the files that actually exist in the filesystem.
What happens if you use a wildcard pattern that does not match any files?
Try to delete some non-existent 'log' files: ''dir1/*.log''.
$ **rm dir/*.log**
rm: can't remove 'dir/*.log': No such file or directory
If the wildcard pattern does not match any files, it is simply left //unexpanded//.
When the command tries to access a file named by a wildcard expression, the file does not exist and an error message is generated.
==== Dry runs: using "echo" to preview commands ====
A 'dry run' is a rehearsal or practice that takes place before the real performance.
In computing, a dry run shows you what a command //would// do but without actually doing it.
One example of how useful they are is to see what files would be matched by wildcard patterns, for example before actually removing them.
For the next exercise, set up your ''dir1'' directory as above, containing six files:
* three text files ''file1.txt'', ''file2.txt'', and ''file3.txt'', containing the words ''think'', ''for'', and ''yourself'';
* three data files ''file1.dat'', ''file2.dat'', and ''file3.dat'', containing the number of characters in the corresponding .''txt'' files.
$ **mkdir dir1**
$ **echo think > dir1/file1.txt**
$ **echo for > dir1/file2.txt**
$ **echo yourself > dir1/file3.txt**
$ **wc -c dir1/file1.txt > dir1/file1.dat**
$ **wc -c dir1/file2.txt > dir1/file2.dat**
$ **wc -c dir1/file3.txt > dir1/file3.dat**
$ **ls -l dir1**
total 3
-rw-r--r-- 1 user UsersGrp 17 Oct 26 16:51 file1.dat
-rw-r--r-- 1 user UsersGrp 6 Oct 26 16:51 file1.txt
-rw-r--r-- 1 user UsersGrp 17 Oct 26 16:51 file2.dat
-rw-r--r-- 1 user UsersGrp 4 Oct 26 16:51 file2.txt
-rw-r--r-- 1 user UsersGrp 17 Oct 26 16:51 file3.dat
-rw-r--r-- 1 user UsersGrp 9 Oct 26 16:51 file3.txt
Use the ''echo'' command to perform a dry-run of removing:
* all the ''.txt'' files in ''dir1'',
* all the ''.dat'' files in ''dir1'',
* the ''.txt'' and ''.dat'' files for only ''file2'' (two files in total),
* the ''.txt'' and ''.dat'' files for ''file1'' and''file3'' (four files in total).
$ **echo rm dir1/*.txt **
rm dir1/file1.txt dir1/file2.txt dir1/file3.txt
$ **echo rm dir1/*.dat **
rm dir1/file1.dat dir1/file2.dat dir1/file3.dat
$ **echo rm dir1/file2.* **
rm dir1/file2.dat dir1/file2.txt
$ **echo rm dir1/file[13].* **
rm dir1/file1.dat dir1/file1.txt dir1/file3.dat dir1/file3.txt
++++ Why is it called a 'dry run'? |
Fire departments run practice sessions in which fire engines are dispatched, fire hoses are deployed, but water is not actually pumped onto a fire.
Since the exercise performs all the actions of fire-fighting //except// pumping water onto a fire, it is literally a 'dry' run.
++++
==== Creating files and updating timestamps ====
The ''touch'' command updates the last modification time of an existing file to be the current date and time.
If the file does not exist, an empty file is created.
Create two empty files called ''file1'' and ''file2''.
$ **cd dir1**
$ **ls -lt file[12]**
ls: file[12]: No such file or directory
$ touch file1 file2
$ **ls -lt file[12]**
-rw-r--r-- 1 user UsersGrp 0 Oct 26 18:33 file1
-rw-r--r-- 1 user UsersGrp 0 Oct 26 18:33 file2
$ touch file2
$ **ls -lt file[12]**
-rw-r--r-- 1 user UsersGrp 0 Oct 26 18:33 file2
-rw-r--r-- 1 user UsersGrp 0 Oct 26 18:33 file1
$ touch file1
$ **ls -lt file[12]**
-rw-r--r-- 1 user UsersGrp 0 Oct 26 18:33 file1
-rw-r--r-- 1 user UsersGrp 0 Oct 26 18:33 file2
Note how ''touch''ing a file moves it to the top of the 'most recent' list (''ls -t'').
==== Generating path names using brace expressions ====
Wildcards are used to match existing file names.
They cannot be used to generate file names for non-existent files or directories, for example, to create a set of needed files or directories.
Try using a wildcard to create ten empty files called ''test0'', ''test1'', ''test2'', ..., ''test9''.
$ **touch test[0123456789]**
$ **ls test* **
test[0123456789]
Creating a single file called ''test[0123456789]'' is not what you intended.
That is what happened because the shell could not find any existing file to match
the pattern ''test[0123456789]'' and so did not expand it in the command line.
A //brace expression// will generate multiple //words// based on a list or sequence of values.
The list of values to generate is written between curly braces ''{'' and ''}''
with items in the list separated by commas.
For example, the expression ''{a,b,c}'' generates three separate words ''a'', ''b'', and ''c''.
The brace expression can appear in a larger pattern,
for example, the expression ''p{a,b,c}q'' generates three separate words
''paq'', ''pbq'', and ''pcq''.
Use a brace expression to generate the command needed to create the five files
''test0.txt'' to ''test4.txt''.
$ **touch test{0,1,2,3,4}.txt**
$ **ls test* **
test0.txt test1.txt test2.txt test3.txt test4.txt
When a //sequence// of numbers or letters are needed then the list can contain
just the first and last values separated by ''..''.
This is called a //sequence expression//.
For example, the sequence expression ''p{a..z}q'' generates a list of 26 words,
starting with ''paq'' and ''pbq'', and ending with ''pyq'' and ''pzq''.
Use a brace expression to generate the command needed to create the five files
''test5.txt'' to ''test9.txt''.
$ **touch test{5..9}.txt**
$ **ls test* **
test0.txt test1.txt test2.txt test3.txt test4.txt
test5.txt test6.txt test7.txt test8.txt test9.txt
In a sequence expression that generates numbers, the first value in the sequence
sets the minimum width of the generated numbers.
This is useful if leading ''0''s are needed.
For example, the following sequence expressions generate lists of 100 words:
* ''test{0..99}'' generates ''test0'', ''test1'', ... , ''test98'', ''test99'', and
* ''tt{000..099}'' generates ''tt000'', ''tt001'', ... , ''tt098'', ''tt099'', and
* ''t{00000..99}'' generates ''t00000'', ''t00001'', ... , ''t00098'', ''t00099''.
==== CSV files and the "cut" command ====
Text files are often used as simple 'databases' for storing captured sensor data, the results of data processing, etc.
The shell provides several commands for manipulating data stored in this kind of text file.
A comma-separated value (CSV) file is one example of this kind of text file database.
Each line is a record and each field in that record is separated from the next with a specified delimiter character.
In a CSV file the delimiter is a comma, "'',''".
The ''cut'' command selects and prints fields from exactly this kind of text file.
By default it uses a 'tab' character to separated fields (just as a copy-paste operation between Excel and a text editor does) but this can be changed using a command line option.
''cut'' has the following command line options:
* ''-d //character//'' specifies the delimiter //character//. To manipulate CSV files, use: "''cut -d ,''"
* ''-f //fields//'' tells ''cut'' which of the fields you want to print. Fields are numbered, starting at 1, and //fields// can contain multiple fields separated by commas.
Create a CSV file called ''directory.txt'' that contains the following data.
(The easiest way is to copy the text it from this web page and paste it into a text editor,
or into "''cat > directory.txt''" followed by Control+D to simulate end-of-file.)
name,given,office,phone,lab,phone
Adams,Douglas,042,0042,092,0092
Kay,Alan,301,3001,351,3051
Knuth,Donald,201,2001,251,2051
Lee,Tim,404,4004,454,4054
McCarthy,John,202,2002,252,2052
Shannon,Claude,304,3004,351,3051
Vinge,Vernor,302,3003,352,3053
Use the ''cut'' command to extract just the "office" column from the data.
$ **cut -d , -f 3 directory.txt**
office
042
301
201
404
202
304
302
The ''tail'' command has an option to print a file starting at a specific line number.
The syntax is: "''tail -n +//number//''".
For example, "''tail -n +5 //file//''" will print the contents of //file// starting from the 5th line in the file.
Pipe (''|'') the output from the previous command into ''tail''.
Use the ''tail -n +//number//'' option to print the input starting at line number 2.
$ **cut -d , -f 3 directory.txt | tail -n +2**
042
301
201
404
202
304
302
The ''grep'' command understands the similar wildcard patterns to the shell.
(The shell uses them to filter file names and ''grep'' uses them to filter or select lines of text.)
Each office number in our sample data is three digits long.
The first digit says which floor the office is on.
One way to extract just the office numbers on the second floor is to use ''grep'' to search for numbers matching the pattern "''2[0-9][0-9]''".
You can then count how many offices are on the second floor using "''wc -l''".
Write a pipeline of commands that prints how many offices are located on the third floor.
Try very hard to do this without looking at the sample answer.
If you cannot find the solution, click on the link below to view the answer.
++++ Sample answer |
$ **cut -d , -f 3 directory.txt | tail -n +2 | grep '3[0-9][0-9]' | wc -l**
3
If this does not make sense, look at the output from each stage of the pipeline.
$ **cut -d , -f 3 directory.txt**
office
042
301
201
404
202
304
302
$ **cut -d , -f 3 directory.txt | tail -n +2**
042
301
201
404
202
304
302
$ **cut -d , -f 3 directory.txt | tail -n +2 | grep '3[0-9][0-9]'**
301
304
302
$ **cut -d , -f 3 directory.txt | tail -n +2 | grep '3[0-9][0-9]' | wc -l**
3
++++
==== Summary ====
* ''echo > //file//'' can be used to create a //file// containing a line of data.
* ''touch //file//'' can be used to create an empty //file// or to update its modification time to 'now'.
* ''mkdir //directory//'' creates a new //directory//.
* ''cp //oldfile// //newfile//'' copies (duplicates) //oldfile// to //newfile//.
* ''cp //files...// //directory//'' copies one or more //files// (or directories) into an existing //directory//.
* ''rm //files...//'' removes (deletes) //files//.
* ''rmdir //directory//'' removes (deletes) a //directory// which **must** be empty.
* ''rm -r //directory//'' removes (deletes) a //directory// and all its contents, recursively.
* "''*''" in a file name matches zero or more characters, so "''*.txt''" matches all files ending in "''.txt''".
* "''?'' in a file name matches any single character, so "''?.txt''" matches "''a.txt''"" but //not// "''any.txt''".
* "''[//characters//']'' in a file name matches any one of the //characters//, so "''[aeiou].txt''" matches "''a.txt''"" but //not// "''b.txt''".
* "''[//first//-//last//']'' in a file name matches any character in the range //first// to //last//, so "''*[a-m].txt''" matches "''boa.txt''"" but //not// "''constrictor.txt''".
* Wildcards (''*'', ''?'', ''[]'') are expanded by the shell to match files that //already exist//. They cannot generate new (non-existent) file names.
* ''{a,b,c}'' expands to three words: ''a'', ''b'', and ''c''.
* ''p{a,b,c}q{x,y,z}r'' expands to nine words: ''paqxr paqyr paqzr pbqxr pbqyr pbqzr pcqxr pcqyr pcqzr''
* ''{000..5}.txt'' expands to six words: ''000.txt 001.txt 002.txt 003.txt 004.txt 005.txt''
* ''tail -n +//number//'' displays input starting at line //number// (and continuing until the last line).
* There is no 'trash': when a file or directory is deleted it is gone immediately and forever.
* ''cut -d //char// -f //fields//'' prints the given //fields// from its input lines using //char// as the field delimiter.
The //fields// are numbered from 1 and multiple field numbers are separated by commas.
/* ---------------- IN CLASS ----------------
mv //oldfile// //newfile// moves (renames) a file or directory.
mv //files...// //directory// moves one or more //files// (or directories) into an existing //directory//.
doing wc -l on data files
saving output in lengths.txt
see lengths.txt page by page
analyse lengths using sort -n
use head and tail to find longest and shortest files
organising files into folders
what happens if you redirect output to a file being used as input?
sort -n lengths.txt > lengths.txt
using >> to append output to a file
pipelines and understanding what text flows through each step of the pipeline
extensions don't mean anything -- .txt is just a convention
combining cut sort uniq wc
checking quality of data, removing damaged files
* Most files’ names are something.extension. The extension isn’t required, and doesn’t guarantee anything, but is normally used to indicate the type of data in the file.
* command >> [file] appends a command’s output to a file.
* [first] | [second] is a pipeline: the output of the first command is used as the input to the second.
* The best way to use the shell is to use pipes to combine simple single-purpose programs (filters).
*/
/*
----------------------------------------------------------------
NEXT
----------------------------------------------------------------
==== Translating characters with the "tr" command ====
LOOPS
performing the same action on many different files
for thing in list of things
do
operation on $thing
done
how the prompt changes when waiting for additional input
variables, word vs $word vs ${word}
using variables in loops -- operation (ls)
using > inside a loop vs >> vs > after the loop
quoting to allow spaces and other funny characters in filenames
cp file-*.txt backup-file-*.txt
!=
for i in files-*; do cp $i backup-$i.txt; done
using ECHO to understand what a loop is going to do before running it for real
loop visualised as flowchart ; each execution of loop body visualised as echo process
using semicolons to separated the parts of a command instead of newlines
editing longer lines: Control + A E B F P N or arrow keys
repeating earlier commands
history | less
history | grep
! to rerun a command
Control + R to reverse search
!! runs the previous command
!$ is last word of previous command
ESC-. inserts the last word of the previous command
using ECHO to do a "dry run"
protecting > and >> using "" in an ECHO argument
nested loops
* A for loop repeats commands once for every thing in a list.
* Every for loop needs a variable to refer to the thing it is currently operating on.
* Use $name to expand a variable (i.e., get its value). ${name} can also be used.
* Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion.
* Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping.
* Use the up-arrow key to scroll up through previous commands to edit and repeat them.
* Use Ctrl+R to search through the previously entered commands.
* Use history to display recent commands, and ![number] to repeat a command by number.
----------------------------------------------------------------
shell scripts
get middle of file using head and tail
save the commands in a file.sh
run the 'bash file.sh'
replace the built-in file name with "$1" (double quotes around arguments to protect spaces)
$@ means all of the arguments, and "$@" means all of the arguments, each one inside implicit double quotes
bash options for debugging: -x
* Save commands in files (usually called shell scripts) for re-use.
* bash [filename] runs the commands saved in a file.
* $@ refers to all of a shell script’s command-line arguments.
* $1, $2, etc., refer to the first command-line argument, the second command-line argument, etc.
* Place variables in quotes if the values might have spaces in them.
* Letting users decide what files to process is more flexible and more consistent with built-in Unix commands.
----------------------------------------------------------------
finding things: grep and find
grep options: -i -E -w
anchoring expressions: ^ and $
command substitution: $()
wc -l $(find . -name "*.txt")
grep PATTERN $(find .. -name "*.txt")
inverting matches: grep -v
* find finds files with specific properties that match patterns.
* grep selects lines in files that match patterns.
* --help is an option supported by many bash commands, and programs that can be run from within Bash, to display more information on how to use these commands or programs.
* man [command] displays the manual page for a given command.
* $([command]) inserts a command’s output in place.
----------------------------------------------------------------
==== Working with multiple files ====
Let's practice manipulating large numbers of data files using the shell.
From the course web site you can download a archive file called ''metars-2019.tgz''.
++++ What is an archive? |
An archive is a file that contains other files.
You have probably already used a ''.zip'' file, which is popular on Windows.
Another popular format is ''.tar'' and the compressed version ''.tgz'' (for '**t**ar' compressed with '**gz**ip').
++++
The ''metars-2019.tgz'' archive contains aviation weather data for Japan.
??? HOW TO DOWLOAD THE FILE TO COMMAND LINE DIRECTORY?
Download the file and then extract the files inside it with the command ''tar xf metars-2019.tgz''.
++++ How the ''tar'' command works |
The command ''tar'' is short for **t**ape **ar**chive.
We don't use magnetic tape to store data any more, but the program is still a popular alternative to ''.zip'' files.
The first argument to ''tar'' tells it what to do.
In this case ''x'' means e**x**tract files from an archive, and
''f'' means the archive should be read from a **f**ile whose name appears in the next argument.
If you add ''v'' for **v**erbose it will also print each filename as it is extracted.
++++
Readings are collected from automated weather stations installed at Japanese airports.
Every hour the data from these weather stations is collected and stored in a file.
There are 8753 files in the archive (365 days x 24 hours per day = 8760, with a few omissions because of downtime).
Each file is named according to the date and time that the data were collected.
For example the file ''2019-01-01T00:53:57-japan.txt'' contains the data recorded on 2019/01/01 at 00:53:57 JST.
What is the structure of each file?
The 'structure' of a file means how the data is arranged within it.
Let's look in file ''2019-01-01T01:53:58-japan.txt'' to see what the structure looks like.
You can use ''cat 2019-01-01T01:53:58-japan.txt'' to do this, but the file is long (85 lines).
To see it one 'page' at a time use the command ''less 2019-01-01T00:53:57-japan.txt''.
+ is every file of the correct structure?
+ are there any unreliable data files due to network or system failures?
+ how many stations are there?
+ how can we reorganise the data to make it more useful?
+ how can we simplify the data for analysis?
Challenge: finding files and directories by type
================================================
The 'find' command finds files and directories by name or by property.
The default action of 'find' is to print the path of the files/directories it finds.
The general form of the command is: 'find directory -property value'
One option for -property is '-type' which understands a value of 'd' or 'f'.
So, 'find . -type d' looks in the current directory (.) and finds all directories (-type d)
and 'find . -type f' looks in the current directory (.) and finds all regular files (-type f).
Assume your current working directory is your home directory.
** Q.11 What command pipeline will count the number of directories under
'/usr/lib' that have the digit '2' somewhere in their name?
________________________________________________________________
*/