Up to 10 points can be gained towards your final score.
Log in to MS Teams using your university account and post a short message to the Self Introduction channel of the Information Literacy team. Introduce yourself, your main interest(s), and say which topics of Information Literacy you already have experience with. (Throughout the course, use the Q&A (English) and/or 質疑応答 (日本語) channels to ask any questions you might have about the course or the content of the classes. If you can answer a question posted there, please do not hesitate to do so.)
One of our teaching assistants (TAs) will send you an e-mail asking you to send a reply with an interesting or funny image attached. Our TAs are official members of the teaching staff for this class and so your reply should follow proper “netiquette”. It should be professional, formal, and polite. Use any reference materials you can to make sure your reply is professional, for example the notes posted on the course web site or articles/blogs about writing e-mail that you can find by searching online. When you think your reply has been properly prepared, send it to our TA.
Using proper netiquette in your e-mail includes
Continue to use the Q&A (English) and/or 質疑応答 (日本語) channels to post and answer questions about the course content. Students who contribute outstanding answers to the Q&A (English) channel might gain a bonus point towards their final score.
Team up with one other student from the class (e.g., the student who sits next to you). Send your partner a professional “business” e-mail asking if they would be willing to help you improve your communication skills. Be formal and polite, as you would when seeking a business partnership with someone you do not know personally.
Read the e-mail that you receive from your partner. Think of ways that it could be improved. Write a formal reply that explains how you think they could improve their e-mail to you. When you receive the reply suggesting how to improve your original e-mail, reply one more time formally thanking your partner for their time and kindness.
There are many different kinds of communication tools. In an educational or professional environment the most important are collaboration support systems and e-mail.
Students and younger employees likely use text messaging (texting) for their daily communication. Text messages are brief and usually answered within a few minutes or hours. Because of the limitations of the message format, they are most suited for informal conversation with friends and family. However, texting from personal devices can sometimes be appropriate to alert colleagues to emerging situations such as arriving late for a meeting because of a train delay. Faculty members and managers are likely to prefer e-mail for all professional communication.
E-mail is the standard communication tool in professional (academic, industry) life. Its advantages include permanence, searchability, non-invasive delivery, and the ability to compose messages of any length with as much care and consideration as are warranted by the situation. Just as you can send an informal birthday greeting to a friend or a formal request to the head of a company by postal mail, so you can send the same kind of content (with the same levels of formality) by e-mail.
The e-mail paradigm is very close to physical mail: a sender writes a message, a third-party mail delivery service (online rather than postal) delivers the message, and a recipient picks up the message and reads it. E-mail messages have several parts, some of which have names that correspond to the same parts of a postal message. Just as in postal mail, every message must specify the recipient's address (as is always written on the front of postal mail), the sender's address (as is often written in the corner or on the back of postal mail), some content containing the actual message (corresponding to the paper inside the postal envelope), and possibly one or more attachments (sometimes called enclosures in postal mail) which are separate documents sent along with the written message.
Every e-mail message contains a header which includes the date, the sender (From:
) and recipient (To:
) addresses, and the subject of the message.
Messages can be delivered to more than one recipient by putting more than one address in the To:
line.
Messages can also be copied to other people using the Cc:
and Bcc:
fields.
Replying to an e-mail message usually sends the reply to the sender (the From:
address in the original message) although
this can be changed by setting the Reply-to:
field in the header.
Date: | the time and date the message was sent |
From: | the sender's address (becomes the To: address if the message is replied to) |
To: | the address of the recipient(s) who are expected to contribute actively to the conversation |
Subject: | the purpose of the message or a one-line summary of the content |
Cc: | 'carbon copy' address(es), for observers of the conversation or non-active participants |
Bcc: | 'blind carbon copy' address(es), for observers of the conversation whose names will not be made visible to anyone else |
Reply-to: | The address that will become the To: address in a reply (instead of the From: address) |
E-mail addresses contain two parts separated by an @
symbol.
The second part (after the @
) is the domain name of the organisation that is responsible for receiving the message.
The first part (before the @
) is the local name of the person (or department) within that organisation who should receive the message.
For example, mail to katsuma.yoshiyuki@kuas.ac.jp
will be delivered to a particular organisation (KUAS)
and within that organisation a particular person (Mr. Katsuma) will be able to retrieve and read the message.
Similarly, e-mail sent to sales@honda.co.jp
will be delivered to the sales department within the Honda Motor Company, Ltd.
Writing a professional letter on paper means following social conventions and business etiquette.
Writing a professional e-mail means following “netiquette” (from Internet +
etiquette).
Many of the conventions of netiquette are related to making your e-mail easier for the recipient to read.
To develop an intuition for netiquette, just ask two simple questions about every part of your e-mail message:
For example…
Subject: | Does it accurately reflect the content of the conversation? (Some people look quickly at just subject lines and immediately delete e-mail that appears irrelevant to them.) |
Could the recipient find this conversation again in the future based only on the subject line? | |
message content | Does the recipient really need to know/read this content at this moment in time? |
Is the amount of detail just right for the recipient? |
An e-mail message can be composed as if it were a postal letter to the same recipient. The same stock phrases, order of items, and levels of formality and politeness, can be carried over from paper letters to e-mails.
Greeting | Indicates the name of the person you are talking to. Titles (e.g., “Dr.”) can be used if appropriate. Women are “Ms.” (instead of “Mrs.” or “Miss”) unless you know their preference. Using “M.” leaves the gender unspecified, for situations in which it is unknown. |
Dear Sir, Dear Madam, Dear Professor, Dear Mr. Secretary, Dear Dr. Spock, Dear Ms. Jones, Dear Mr. Kite, Dear all, Dear colleagues, etc. | |
Opening line | Brief introduction to the message including references to any prior communication if appropriate. In a reply, a thank-you to the sender for their previous message. |
Body | Main content written simply, clearly, and concisely. |
Closing | Thanks the reader for their time, expresses eagerness to receive a reply, etc. |
Signature | A sign-off including “Regards”, or “Sincerely” if you opened with the person's actual name, or “Faithfully” if you did not use their name, followed by the name of the sender. |
Signature block | Professional contact details about the sender: job title, postal address, telephone number, etc. Can provide similar information to a “business card”. |
When replying to an e-mail, most mail software automatically inserts a copy of the original message in the reply.
This is called quoting the original content.
Typically each line of the quoted original is preceded with a >
symbol to distinguish it from the reply.
Replies can be divided into three styles, according to how this quoted material is used.
top-posting | The reply is at the start of the message and the entire original message is left quoted underneath the reply. |
bottom‑posting | The entire original message is left quoted at the top and the actual reply is added underneath it. |
inline reply | The quoted original message is carefully edited to extract just the relevant parts, and associated replies are inserts underneath each quoted part. |
Efficiency and effectiveness are also important for replies. Top-posting accumulates lots of previous content in reverse order (compared to the chronological order of the conversation) and therefore creates more work for people joining the conversation later on. Bottom-posting accumulates lots of previous content at the top in correct chronological order, but this can force readers to scroll past a lot of history to reach the important part of the message. (The quoted content can be trimmed to minimise the amount of work the reader has to do.)
Inline replies avoid the disadvantages of both top- and bottom-posting. Inline replies are more efficient for the reader, help keep the total size of the message small, and produce better results when searching through e-mail messages for specific information or conversations.
While conducting an e-mail conversation, anything that is no longer needed can be removed, anything that is lacking can be added, and anything that is inefficient or ineffective can be modified.
Subject: | Does it still accurately reflect the content of the conversation? |
To: and Cc: | Is the message still going to exactly the right group of people? |
reply content | Is the structure of the reply efficient and effective? |
Is replying above the original message more or less effective than replying inline or below it? | |
Is the amount of quoted material just right to give the best context for the reply? |
Facial expressions and other body language are not available when communicating by e-mail. In less formal professional e-mails, the use of emoticons can help to indicate emotions that would accompany parts of a message delivered face-to-face. Some mail software will convert emoticons into graphical emoji for the reader.
emoticon | emoji | typical meaning |
---|---|---|
:-) | humor or happiness | |
:-( | sadness or unhappy | |
:-D | very large grin | |
:-) :-) | laughing | |
:-p | sticking out tongue (“so there!”, “I told you so!”) | |
;-) | winking | |
:-| | disgust | |
:-/ | puzzled | |
:-o | surprised |
Text effects popular with adolescent users, such as aLtErNaTiNg CaPiTaLs iN nOrMaL tExT are very difficult to read and therefore contradict the goal of maximising the effectiveness of communication. Similarly, writing in ALL CAPITAL LETTERS can be interpreted as shouting or yelling; alternatives such as using punctuation for virtual /italics/ or _underlining_ or *boldface* are much gentler on the reader.
Documents that are separate from the main e-mail message but sent along with it are called attachments. Attachments are best kept small and limited in number.
Some mail delivery services will delete e-mails having large attachments, without warning or indication. Original photographs can be very large and are often down-sized before sending by e-mail. PDF files that use unnecessary text effects such as drop-shadow can also be very large. (A good solution to that problem is to avoid using unnecessary text effects in documents.)
Some mail services might also reject messages with too many attachments. Programs such as zip let you gather many files into to a single archive for attaching to an e-mail message.
Some mail software reformats message content. If layout such as
+--------------+-------------------+ | tabular | data | +--------------+-------------------+ | written as | mono-spaced text | +--------------+-------------------+
needs to be preserved then a plain text file containing the content can be sent as an attachment, to protect it against reformatting.
E-mail is inherently insecure.
Messages are transmitted and stored without encryption, making them relatively easy to intercept and read.
Secret information sent 'in private' by e-mail might easily become public knowledge.
If sensitive information must be sent by e-mail, one way to protect it is to send it in a password-protected .zip
archive.
Accidentally sending e-mail to the wrong recipient, or replying to everyone in a conversation instead of just the original sender, is a common mistake. One way to mitigate that problem is to leave the recipient fields blank and fill them in just before sending the message.
The sender of an e-mail has no control over who might read it. Negative comments made about a person in an e-mail could eventually be seen by that person, causing embarrassment (or even loss of job) for the sender. A recipient address might be mistyped, for example, or one of the intended recipients might decide to forward the message to the person mentioned in the negative comment.
Cyber-criminals can use e-mail to compromise your computer or your personal information. This includes stealing your financial information to commit fraud. Messages received from unrecognised sender addresses might include attachments that introduce viruses to a computer when opened. Phishing messages are written so that they appear to be sent by a trusted person, such as a bank manager (asking for account details or password “confirmation”), or by an unknown sender seeking a collaboration with apparently huge benefits for the recipient. It is extremely unlikely that such messages are genuine.
A good way to increase the safety of e-mail is to install a spam (junk mail) filter to delete suspicious messages before they are presented for reading, and an anti-virus program that scans attachments for potential threats. Many online web mail services provide these functions for all their users' incoming e-mail. Some (such as gmail) go further by banning outgoing attachments that might contain harmful content.
Up to 10 points can be gained towards your final score.
You should already have a copy of MS Word installed on your computer.
Start it up (or activate the File
menu if it is already running) and click on Options
at the bottom of the page.
In the Word Options
pop-up window, select the Language
tab on the left and then under
Choose Display Language
move English
to the top of the list using the up and down arrows.
Using Word in English will make it easier to follow the material in this class,
and will help you to improve your English faster.
Download the example Word file.
Follow the instructions in the file to modify the document in the following ways:
The end result might look something like this.1
When you are happy with your formatted document, upload it to MS Teams and submit it. (In MS Teams either click on the “Assignments” tab and then the Week 02 assignment, or click on the assignment inside the announcement in the “General” tab. Then “attach your work” to the assignment and click on “turn in”.
Please try to finish the assignment before class. The hard deadline for assignments is 23:59 on the day of class.
1 If you used a document formatting system designed for publication, the end result might look something like this.
Glossary of word processing terms
Many kinds of text and document editors exist (as well as almost as many opinions about which ones are the best). Two kinds that you will encounter often as an engineer are text editors, and word processors.
Text editors manipulate any kind of plain text file using an interface that presents the contents of the file simply and literally. A simple plain text file can contain almost any kind of information, from recipes and shopping or 'to-do' lists to meeting minutes or random thoughts and notes. In more technical settings, plain text files might contain configuration settings or a program source code.
simple text editors | |
---|---|
Linux | LeafPad (packages available in most distributions via apt , yum , etc.) |
MacOS | TextEdit (bundled with the OS) |
Windows | Notepad or Notepad++ (recommended alternative) |
People who spend most of their time editing plain text files (programmers, technical authors, web designers, etc.) might use a much more capable (and complicated) text editor. There are several choices (as well as religious wars fought over which one is the best), for example Emacs, vi, and VS Code, all of which run on the three major operating systems.
Word processors are programs for desktop publishing: the creation and production of structured, formatted documents such as printed letters, reports, and newsletters. Word processors use a graphical 'what you see is what you get' (WYSIWYG, pronounced “wizzi-wig”) interface where content is edited in a form that resembles its final, printed appearance. They almost always use their own proprietary file formats which make no sense when viewed as plain text files, and editing plain text files is almost always impossible using a word processor.
The de-facto standard word processor is Microsoft Word, which means that there is a huge amount of on-line help available for both beginners and experts.
Almost any question about MS Word can be answered by searching in Google (or similar) for MS word
followed by the topic of the question.
MS Word is also a very complicated program and the best way to learn it is to actually use it to create documents of increasing complexity.
Learning how to use search engines to answer questions about MS Word is therefore a vital skill for novice (and advanced) users. The results will also include a variety of different media, including video, tutorials, blog posts, and so on, that cater to different learning styles.
One of the first Google results for ‘ms word help
’ is a section on Microsoft's own web site called
Word help & learning that includes short tutorials on
getting started,
inserting text,
working with pages and layouts,
inserting pictures, and
saving and printing documents.
Learning the basics of word processing from sites such as this one is excellent
preparation for a breadth-first tour of some of the features of Word that engineers and scientist
might find the most useful. The following sections present such a tour with reference to the
ribbon – the part of the user interface that most people interact with most often.
In the following sections, keyboard shortcuts are shown in side
As the name implies, this is where the simplest and most common editing operations are located.
Clipboard contains
cut Control-X
,
copy Control-C
, and several varieties of
paste Control-V
(depending on whether you want the pasted text to retain its original formatting, adopt the destination formatting, and so on).
Clicking once on the
format painter and then again in the document copies the format of the text under the insertion point to the text that was clicked on.
Double-clicking on the format painter makes it `sticky': multiple targets can be clicked to copy formatting; press Escape
to stop format painting.
Format can also be copied by typing Control-Shift-C
and pasted using Control-Shift-V
.
Clicking on the little diagonal arrow (in the bottom right-hand corner) opens the clipboard dialogue, which handily lets you paste
from a recent history of cut and copied text.
Font contains the tools to change
font family (the typeface) and
size (measured in points, of which there are approximately 72 per 25.4mm of length on the printed page),
followed by buttons that
increase font size Control-Shift→
,
decrease font size Control-Shift-<
,
change case of text (for all-caps, etc.), and
clear all formatting from it.
On the second line are toggles for
boldface Control-B
,
italics Control-I
,
underline [Control-U],
strikeout,
subscript Control-=
, and
superscript Control-Shift-+
.
The “text effects” button comes next (and is best ignored – trust me followed by two buttons for
text highlight colour (the background colour for text) and
font colour (the foreground colour).
Paragraph contains the tools for bulleted lists, numbered lists, and multi-level numbering.
The next two buttons decrease indent and increase indent of the selected text.
The last two buttons on the top row will sort the selected text lines into alphabetical order or toggle
the display of paragraph marks and other typesetting annotations in the text.
On the lower line are buttons that tell the selected text to
align left Control-L
,
centre Control-E
,
align right Control-R
, to
justify Control-J
.
Omitting the next button (which you should also ignore) we have a tool controlling
line spacing and then two buttons that affect the
shading (background) and
border (edges) of the selected table cell or text.
Styles contains collections of format that can be clicked to apply them to text. The formatting of the text at the insertion point can also be copied into a style by right clicking on the associated button and selecting “Update to Match Selection”. Clicking on the little diagonal arrow (in the bottom right-hand corner) [Alt-Control-Shift-S] opens a very handy “styles chooser” dialogue that can remain open during other editing operations.
Editing contains the tools to find Control-F
and replace Control-H
text.
No prizes for guessing what is in this tab.
Pages contains tools to insert a front cover page, a new blank page, or a forced page break Control-Return
.
Tables contains almost everything you need to create and edit a table.
Illustrations has tools to insert images and graphical objects of several kinds, including external pictures.
Links creates, modifies, or removes hyperlinks from text.
Header & Footer contains drop-down menus to control the running header, footer, and page numbering applied to all pages in the document.
Symbols has tools to insert mathematical equations or single mathematical symbols into the text.
Page Setup contains tools that control the entire page, including its overall size and the number of text columns.
Everything to do with referring to a part of the document from some other, faraway part.
Table of Contents has a button to create the table of contents and another to update table which is useful whenever heading numbering changes.
Footnotes has the insert footnote tool which places the footnote marker at the current insertion point and then prompts for the content of the footnote text.
Citations & Bibliography has the tool to insert citation at the current insertion point which then prompts for the information about a new reference source or the identity of an old reference source that was already entered. The manage sources tool allows editing of reference source details. The style menu controls how the citations and references will be presented, and bibliography inserts the list of references at the insertion point.
Captions adds caption text to a figure via the insert caption tool which prompts for the text of the caption.
Index has the mark entry tool which will include the currently selected text as an index term. Again, a pop-up dialog (which can, very usefully, persist) allows control over the presentation of the index entry. The insert index tool does exactly what it says, at the insertion point.
Useful tools for collaboration and finding out who to blame.
Tracking has several tools to track changes made to the document content, and to control how the tracked changes are presented.
Is there if you need it.
The most important tool here is actually present in every tab.
search Alt-Q
(also known as tell me) searches all the tools for some specific text
and presents the results in a list where they can be directly clicked on.
It's the Word equivalent of the Windows 10 Window-S
key that that opens the “Type here to search” feature.
Bordering the page on the left and top are the rulers.
Hovering over a transition from white to grey within either ruler will convert the cursor into a “slide” icon. Clicking and dragging the transition will then change the page margins.
The white blobs in the horizontal ruler control where the elements of lists (bullet or number, text of the item) are placed. If items with several lines of text are not lining up properly after the first line, move these blobs around to fix that. (Typing spaces into the text to try to align things will never look right and is an immediate indication that the author was clueless.)
Double-clicking inside the ruler opens a handy page setup menu which allows much finer control over page and margin dimensions.
The small grey icons visible in the screen-shot at 45, 55, 65, and 75 mm are tab stops.
The tab stops become active whenever text contains Tab
characters.
Each paragraph has its own set of tab stops.
From left-to-right the stops in the image are:
Tab
character;Tab
character;Tab
characters; andTab
character.Clicking on the small icon in the top-left corner cycles it through all the available tab stop types. The kind of tab shown by the icon is inserted into the ruler by double-clicking the ruler's lower edge. This also opens a handy editor dialog to change the positions and types of each tab stop in the ruler.
At the bottom-right of the page is a handy control for zooming in and out.
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
You should already have a copy of MS PowerPoint (PPT) installed on your computer.
Start it up (or activate the File
menu if it is already running) and click on Options
at the bottom of the page.
In the PowerPoint Options
pop-up window, select the Language
tab on the left and then under
Choose Display Language
move English
to the top of the list using the up and down arrows.
Using PPT in English will make it easier to follow the material in this class,
and will help you to improve your English faster.
Starting with a blank presentation, reproduce the document shown in the following videos. (These videos are also embedded at the end of this page, in case you prefer to watch them without leaving your browser.) The versions on the right labeled “eng” have burned-in English captions, while “jpn” have burned-in Japanese captions that were auto-translated (probably very badly).
Substitute your own media – images, videos, etc. – for those shown in the sample document. I recommend you make ample use of the 'pause' button and follow along one step at a time.
You can download the actual sample document featured in the videos, if you think that would be helpful: 03-powerpoint_examples.pptx
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should know how to
font
, unusual size)You can also use Internet search engines to find online tutorials and other educational materials relating to PPT, or even check our library to see if they have a book on the subject. In other words, use any resources you can to achieve your learning goals. At the same time, exercise judgement about good and bad advice you find online especially when it comes to choosing and organising the content of your slides. (You, and only you, know what content is needed and how best to present it. If you are unsure about choosing and presenting content, the best way to learn how to do that is by making lots of presentations and finding out for yourself what works and what does not work.)
Edward Tufte is a famous proponent of simplicity, clarity, and good design in printed media. His essay The Cognitive Style of PowerPoint is well worth reading and thinking about.1 While not everyone agrees with all of his points, his extreme position is a welcome counterbalance to the mass of bad advice that you will find elsewhere about how to construct an 'amazing' presentation in PPT. His essay offers few solutions to PowerPoint's problems, but a critical response by Jean-Luc Doumont entitled Slides Are Not All Evil does offer some practical advice (starting on page 68).
Remember also that PowerPoint slides are not the only way to create an effective presentation. For making traditional slides, several LaTeX packages are available. There are also web frameworks such as reveal.js with which you can make presentations that would be impossible in PPT. Breaking out of the 'slide' format entirely, I have seen an effective presentation made using a single (long) web page that the presenter simply scrolled through during his talk. Pick the right tool for the job.
1 This document was scanned to PDF with optical character recognition that failed in a handful of places. For example, in the fourth sentence, “2ist century” should be “21st century”.
PowerPoint is a program for preparing presentation support 'slides' that mimic the transparencies that were once used with overhead projectors. It is the de facto standard program for preparing slides used in business meetings.
An estimated 500,000,000 people use it regularly which means there is a huge amount of online help available for it.
One of the first Google results for ‘powerpoint help
’ is a section on Microsoft's own web site called
PowerPoint help & learning that includes short tutorials on
getting started,
collaborating,
design,
animations
pictures and charts,
giving presentations, and
slides and text.
The PPT user interface is quite similar to Word. The following sections present a breadth-first tour of PPT's ribbon – the part of the user interface that most people interact with most often. Since many of the tool groups are identical to MS Word, we will concentrate here on those aspects that are different.
The Slides group adds common operations performed on slides. The Drawing group gives quick access to some of the most commonly functions available in the tabs relating to shapes and images.
New for PPT are groups for Illustrations and Media.
Several groups relating to hand-drawn content. Maybe the most useful is the ability to highlight text in a way that looks hand-drawn.
Many distracting Themes are available, taking up most of the ribbon space. Much more useful is the Customise group which contains the Slide Size tool, essential for making a poster or other non-presentation media.
If you need them for a rare special effect, here they are along with sounds that can play when changing slides too. Of most use is the Timing group which provides the ability to automatically advance through a range of slides based on time delay instead of clicking a mouse or pressing a key.
Start Slide Show contains several tools for presenting in different ways. Some of the online presentation possibilities are powerful when used in conjunction with MS Teams. For example, the ability to show the presenter view on your screen while showing the actual slide as your shared 'screen' to other participants in the meeting.
Set up has tools to help perfect the timing of a presentation, as well as to record a video of yourself presenting the slides.
Monitors is where you will find the options relating to multiple monitors, which includes the projector which the computer considers as an external monitor. Use Presenter View does exactly what it says on the label, showing the audience only the current slide while you see the current slides, presenter's notes, a countdown timer, and a preview of the upcoming slide.
Presentation Views includes the Normal view which is the default and the one most people spend most time looking at, along with three others that are useful. Slide Sorter is a thumbnail view that uses the entire width of the window. Notes View is where you can edit the presenter's notes that will be shown only to you when using presenter mode. Reading View lets you see and interact with the slide as you will when it is actually presented. Essential for testing animations, etc.
Master Views is where you switch from editing the normal slide content to editing the Slide Master layouts that dictate how each 'empty' slide is initially set up. If you need to permanently move things around (e.g., to make the title space smaller) or introduce entirely new layouts, do it here on the master slides. (Note that switching to the master slide opens a hidden tab in the ribbon, but the tools it contains should now be familiar to you.) Modifying the layout of printed handouts or 'slides plus notes' is done from here too.
Incompatibilities happen. Some projectors don't like some computers. Some PPT files prepared on one OS do not play in PPT on another OS. In other words, there is no guarantee that your presentation will display at the venue. One way to insure against this is to take a PDF version of the slides, which should display properly from almost anyone's computer. (PDFs do not, in general, display animations. Using PDF as an insurance policy therefore has the additional benefit of discouraging the use of animations in the original slides.)
In the worst case nothing will display at the venue (or the projector will explode, or their will be a power cut, etc.). How well do you know your presentation and material? Could you present the entire talk without using any slides at all?
Printing some thumbnails of the slides can be a handy reference during a talk, both to know what is coming up and as a map to get to a specific slide quickly if someone asks a question.
Simple fonts are more legible than fancier fonts. (Highly decorated fonts, or those that simulate handwriting, have no place in a presentation.)
Not all fonts are available on all computers. Sticking to common fonts (Arial, Times New Roman) almost guarantees a presentation will look the same to everyone.
Contrast aids legibility and therefore the efficiency of communication and information transfer. Black and white have the best contrast of any pair of colours. Other pairs of colours can have good contrast, if chosen with great care.
Colours on a computer monitor are different to the colours produced by a projector. Checking the legibility of coloured information well before a presentation can avoid embarrassment during it. (I once met a projector that refused to admit the existence of 100% saturated green. I spent quite a few minutes of that talk helping the audience to imagine the missing parts of my diagrams. I have avoided pure green in my presentations ever since.)
These 14 videos cover all the essentials of PowerPoint (assuming you already know how to use MS Word) while not diving deeply into any one topic. Use them to understand what features are there, and then explore the full capabilities of the interesting or useful features in more depth on your own.
The sample document created in these videos is available here: 03-powerpoint_examples.pptx
I made 14 × 1.5-minute videos (one per topic) instead of 2 × 10-minute videos (divided in half arbitrarily) or 1 × 20-minute video. I thought that would make the content easier to navigate. However, if you prefer fewer (but longer) videos then please tell me.
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
Watch at least videos 1 to 11 (inclusive). If any of the topics are not familiar to you, open a blank Excel workbook and try to reproduce the examples shown in the videos for yourself. (These videos are also embedded at the end of this page, in case you prefer to watch them without leaving your browser.)
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should know how to
SUM()
and conditional functions such as SUMIF()
You can also use Internet search engines to find online tutorials and other educational materials relating to PPT, or even check our library to see if they have a book on the subject. In other words, use any resources you can to achieve your learning goals.
Excel is a program for creating tables of data, and performing computation and analysis on that data. It is based on the very old idea of a spreadsheet, which was a large piece of paper used by accountants. Known data was entered into spaces on the spreadsheet and then calculations were performed to calculate new values. The process continued for as many iterations as was required to calculate the final, useful information.
Excel's principles are exactly the same. You enter the data you know into cells, and use formulas to compute data you don't know in other cells. Excel takes care of figuring out the order in which the computations should be performed and the unknown data generated.
A | B | C | … | |
1 | A1 | B1 | C1 | |
2 | A2 | B2 | C2 | |
3 | A3 | B3 | C3 | |
… |
Excel calls its spreadsheets workbooks.
A workbook consists of a number of rows and columns.
At the intersection of every row and column there is a cell that can contain data.
The cells inside a spreadsheet are therefore laid out in a square pattern.
The columns are given letters and the rows are given numbers.
Every cell therefore has a 'name' or 'coordinate' defining where it is located; the official term is reference.
The first column is called A
and the first row is called 1
, so the top-left cell in the spreadsheet has the reference A1
.
The cell to its right is B1
and the cell below it is A2
.
How many cells are in a spreadsheet? 17,179,869,184 arranged in 1,048,576 rows of 16,384 columns. Excel is very good at hiding the empty ones from you and so you'll never even see them unless you go looking.
What happens after the column names run out of letters?
Like in a cinema, after column Z
come columns AA
, AB
, and AC
.
After column AZ
come columns BA
, BB
, and BC
.
After ZZ
come columns AAA
, AAB
, and AAC
.
(There are not enough columns to ever reach ZZZ
.)
The UI should be very recognisable to anyone having experience with MS Word and PowerPoint. Excel has the same search feature as PowerPoint and Word, so it is easy to look up tools by name or description. I shall dare even to not reproduce its ribbons here.
The least familiar part of Excel might be the way references work. I shall therefore use the space to explain them instead of describing pretty pictures of the user interface.
References work like map coordinates, with a letter for horizontal position and a number for vertical position. They come in two types: relative and absolute.
Relative references are what most people (and almost all beginners) use almost all of the time. One or more letters (naming a column) and one or more digits (naming a row) make up a reference. Even though they are called “relative”, they still identify a cell by its absolute position in the array of cells. So what makes them relative?
The relativity comes from their behaviour when they are used in a formula inside a cell. Formulae can move, either because they are copied and pasted or because rows and cells are inserted or deleted. When a formula moves, Excel looks at the relative positions of (distance between) the original position and the new position. The difference is added to the column letter and row number in the reference. The effect is that the name of the referenced cell changes, so that the formula continues to reference a value stored at the same position relative to wherever the formula happens to be.
Copying the formula B3+D3
and pasting it two rows below the original position causes the references within it to change to B5+D5
.
Moving that new formula one column to the right causes the references within it to change to C5+E5
.
This is bad when many formulae need to refer to a single cell, such as an interest rate, no matter where they may be moved.
By placing a $
in front of any letter or digit in a reference, it becomes absolute.
This does not change how it refers to a cell, only how it behaves when the formula that it is part of is moved.
In the case of absolute parts of references, they do not have the “distance” between the original and new position of the formula added to them.
No matter where the formula is moved to, the absolute parts of the reference will always remain the same.
Take our formula containing B3+D3
and change it to $B$3+D3
, then perform the same two moves on it.
Moving it down two rows changes it to $B$3+D5
, and moving that new formula right one column changes it to $B$3+E5
.
The second relative cell referenced has moved to remain in the same relative position as the formula, whereas the first absolute reference has not.
Change our formula to $B3+D$3
and perform the same two moves on it.
Moving it down two rows changes it to $B5+D$3
, and moving that new formula right one column changes it to $B5+E$3
.
These 12 videos cover most of the essentials of Excel (assuming you already know how to use MS Word and PowerPoint). Use them to understand what features are there, and then explore the full capabilities of the interesting or useful features in more depth on your own.
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
This week's self-preparation assignment is part practical preparation and part study.
First, install a command-line environment on your computer that you can use to complete the next few weeks of this course. (Linux and Mac users already have a suitable command-line environment; there is nothing to do. Windows users have several options; please follow that link and install one of the options on your laptop computer.)
Second, review the notes on this page before coming to class. Use Internet search engines to find online tutorials, Wikipedia articles, etc., for any additional information (or explanation) that you might need.
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should understand the topics outlined in the next section and further explained in the “Notes” section.
A filesystem (or file system) manages the storage of data on a device (such as a solid-state disk, hard disk drive, or USB flash memory drive). Many hundreds of different filesystems exist, each one providing different tradeoffs between speed, reliability, safety, security, etc. Almost all of them manage data by splitting it into files (sequences of bytes) placed within a hierarchy of directories (which map symbolic names to individual files). In most situations a filesystem refers to a single logical repository of files residing on a single physical medium (e.g., SSD or HDD).
Sizes of files (and filesystems, and the physical media on which they exist) are measured in bytes. One byte contains eight bits, where a bit is a single binary digit (0 or 1). A bit is the smallest unit of information possible. A byte contains eight bits because that is sufficient to store a single character in English. For English and many European languages, a single character in a document will be represented as a single byte. Asian characters are typically larger. In Japanese, common characters are represented using two bytes but uncommon characters can require up to four bytes of storage.
storage sizes (traditional units) | |||
---|---|---|---|
unit | name | size (decimal SI1 units) | size (binary JEDEC2 units) |
1 kB | kilobyte | 103 = 1,000 bytes | 210 = 1,024 bytes |
1 MB | megabyte | 106 = 1,000,000 bytes | 220 = 1,048,576 bytes |
1 GB | gigabyte | 109 = 1,000,000,000 bytes | 230 = 1,073,741,824 bytes |
1 TB | terabyte | 1012 = 1,000,000,000,000 bytes | 240 = 1,099,511,627,776 bytes |
storage sizes (IEC binary prefixes) | ||
---|---|---|
unit | name | size |
1 KiB | kibibyte | 210 = 1,024 bytes |
1 MiB | mebibyte | 220 = 1,048,576 bytes |
1 GiB | gibibyte | 230 = 1,073,741,824 bytes |
1 TiB | tebibyte | 240 = 1,099,511,627,776 bytes |
A byte is too small a quantity to be useful when discussing an entire filesystem or disk. SI prefixes are often applied to storage sizes, as shown in the table on the right. Permanent storage (such as disk drives) usually use the SI (metric) prefixes, which increase by factors of 1000. Volatile storage (such as computer memory) usually use the JEDEC (binary) prefixes, which increase by factors of 1024.
Available disk space reported by the OS might use either system leading to discrepancies between advertised (by the manufacturer) and actual (available to the computer user) storage space. For example, a drive advertised in SI units as containing 1 TB of space would be reported as providing only 931 MB of space by an OS using JEDEC units. This can be a cause for confusion, or even legal action.
A new set of binary prefixes was standardised by the IEC, corresponding to the JEDEC binary sizes but using a different spelling and (somewhat silly-sounding) names. The prefixes are formed from the SI/JEDEC ones by inserting an 'i' between the multiplier (k, M, G, etc.) and the unit (B). Their names are formed from the first two letters of the corresponding SI/JEDEC prefix (ki, me, gi, te, etc.) and the first two letters of the word 'binary' (bi); hence kibibyte, mebibyte, and so on.
Linux and recent versions of MacOS have adopted these prefixes for displaying information that uses 1024-based scales (such as memory statistics). Windows and the mass media have so far ignored them.
Filesystems are used to store:
C:\Program Files
' to find lots of applications.
On MacOS, look in '/Applications
'.
On Linux, look in '/usr/bin
'.C:\pagefile.sys
'.
On MacOS look in '/private/var/vm
'.
On Linux you probably have part of the physical disk dedicated to this task, using its own filesystem
that is independent of OS and user files.
No matter what filesystem is being used, the application sees a very simple model of storage:
A directory maps names onto files.
In the example on the right, 'Documents
' is a directory containing two entries.
One entry points to a regular file called 'local
' (containing a document represented as a sequence of bytes).
The other entry points to another directory called 'MobaXterm
'.
Using a 'family tree' metaphor, 'Documents
' is called the parent directory of 'MobaXterm
'.
The root directory, which is at the top of the tree structure, has no name. (File and directory names are stored in their parent directory. Since the root directory has no parent directory, there is simply nowhere to store a name for it.)
To specify a particular file or directory, start at the root and describe the path that must be followed to find that file or directory.
For example, the 'Documents
' directory has the following path:
Users
' directory piumarta
' directory Documents
' directory
Each element in the path is separated by a “/” character (or “" on Windows).
The root directory has no name so we start with an empty name, then the separator, then 'Users
', another separator, and so on.
The final path is therefore: '/Users/piumarta/Documents
' (or '\Users\piumarta\Documents
' on Windows).
On most computers there is only one root directory.
For historical reasons Windows is an exception and has one root directory per filesystem, or 'volume' in Windows terminology.
Each volume is named by a single letter followed by a colon, in this case 'C:
'.
Usually the volume name is prepended to paths (i.e., added at the beginning of the path).
The correct path to the 'Documents
' directory in Windows would therefore be: 'C:\Users\piumarta\Documents
'.
Windows explorer shows you the path to the current directory above the list of files. If you click in it you will see it written in the notation shown above. You can also type into the location bar, or copy/paste from/to it, using the same notation.
Every directory contains two special entries whose names are '.
' and '..
'.
The name '.
' points to the inode of the directory itself
(so the path '/Users/././././.
' refers still to the '/Users
' directory).
The name '..
' points to the inode of the parent directory
(so the path '/Users/Administrator/../piumarta/.
' refers to my account's directory).
The only exception is the root directory, which has no parent, and so for it the name
'..
' points back to the root directory again.
When you open a folder in Mac Finder or Windows Explorer, or type 'ls
' in a command line window, you are looking at a list of directory entries.
Every entry in the directory has a unique (file or directory) name
and associates that name with some storage on the disk where the contents of the file or directory are stored.
The structure describing where the contents are stored is called an index node, or inode for short.
Inodes are not stored in the directories, but in a separate table on the disk.
There is exactly one inode specifying where the contents for any given file or directory are stored on the disk.
However, more than one name can be associated with a given inode (and therefore file) by having more than one directory entry specify the same inode
as the location of the file's contents.
An inode contains all the information relating to a file's or directory's contents, including:
The inode also contains a list of blocks on the disk where the contents of the file/directory are actually stored. Each block has a fixed sized, typically 4096 bytes, and is identified by a number (unique within the filesystem) that can easily be converted into the physical location on the disk where the block is stored.
(Windows, just to be difficult, calls its inodes 'MFT records' where MFT stands for 'master file table'.)
Hard disk drives (HDD) store data magnetically on the surface of spinning metal disks. The data is read and written by tiny heads that move between the edge and centre of the disks. It should now be obvious why they are called disk drives.
Each surface on the metal disks is divided into concentric tracks. Each track is divided into a number of sectors where the data is actually stored. The size of each sector is fixed by the disk drive manufacturer and cannot be changed.
A disk drive, whether solid-state or rotating, stores information in a fixed number of fixed-sized sectors. A HDD sector is typically 512 bytes long, and so a 1TB hard disk would contain 2,147,483,648 sectors of data. A sector is the smallest unit of data that can be transferred to or from the disk.
While sectors can be addressed directly, most filesystems do not do that for reasons of efficiency. Instead they combine several sectors into a block and treat that as the smallest unit of data when managing space on the disk. A typical block size is 8 sectors, or 4096 bytes. (Of course, Windows has to be different to everyone else and uses the terms 'allocation unit' or 'cluster'.)
Each block has a unique number, and data is always read or written to the disk in multiples of the block size.
Why am I bothering to tell you about block (allocation unit size) size? Efficiency.
Before you can store information on a disk you have to format it. Formatting writes all the data structures that are needed to describe an empty filesystem onto the disk. It puts in place the framework into which you can start creating new files and directories.
While you are formatting a disk you will likely be given the choice of what block (allocation unit) size to use. The default on many filesystems (including Windows, surprisingly) is 4096 bytes. You can almost certainly increase or decrease this size (by factors of a power of 2).
The smallest unit of information that can be allocated is one block. A one-byte file therefore consumes an entire block for its contents, no matter what block size is chosen. The rest of the block is wasted. This is called internal fragmentation of the storage and there is nothing you can do about it (other than reduce the block size).
On the other hand, a huge file will contain very many blocks of data and will therefore have a very large block list in its inode. This wastes space in the inode and potentially makes accessing the file less efficient, because the contents of the file are distributed over many different blocks located far apart on the surface of the disk. (This is what most people mean when they say 'fragmentation' in the context of disk storage. It is also the problem that is solved by the 'disk defragmenter' tool in Windows which attempts to rearrange the blocks belonging to each file to keep them closer together on the disk, ideally in a single contiguous sequence of blocks.)
If you expect your disk to contain mostly very small files (common in big data analytics) then a small block (allocation unit) size will perform better. If you expect your disk to contain mostly very large files (common in audio/video media editing) then a large block (allocation unit) size will perform better. If you have no idea what to expect then the default block (allocation unit) size is probably going to be fine.
In your programs you will use functions such as open()
, read()
, write()
, and close()
to access and modify the contents of files.
Those functions directly manipulate the directories, inodes, and disk blocks described above.
The filesystem's job is to make sure that happens safely and efficiently, and that you never notice all of the underlying complexity.
Installing and configuring MobaXterm: https://projects.ncsu.edu/hpc/Documents/mobaxterm.php
In-depth explanation of disks, filesystems, and network storage: https://www.netmeister.org/book/04-file-systems.pdf
Wikipedia's entry on path names: https://en.wikipedia.org/wiki/Path_(computing)
Microsoft's explanation of Windows path names: https://docs.microsoft.com/en-us/dotnet/standard/io/file-path-formats
You do not rise to the level of your goals, you fall to the level of your systems. James Clear
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
This week's self-preparation assignment is part practical preparation and part study.
First, read the following sections from the Command line interface guide:
Second, complete the practical exercise in the Notes section below. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.)
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.
You will learn what the command line (shell) is, why you should know how to use it, and understand how to
Using the command line puts you in control at the level of the operating system and other fundamental processes that make it work. Many operations and options that are not accessible using a graphical interface (Windows Explorer, Mac Finder, etc.) become accessible to you on the command line.
Developers, engineers, scientists, and researchers all use the command line to make themselves faster and more effective (and happier) than would be using only graphical interfaces.
What is the command line anyway?
The rest of this document leads you through a practical exploration of basic command line features.
Things you should type are shown in a grey box like this
.
Keys you should type include Enter
or Return
(don't type the word, just press the key),
and Control-C
which means “hold down the Control key while typing the character C
.
To follow all the steps you will need a text editor.
If you do not already have an editor then one possibility is nano.
It is installed by default on Mac and many Linux distributions.
In MobaXterm you can install it by typing apt-get install nano
on the command line (while connected to the Internet,
since it has to be downloaded before it is installed).
ls
and wait.Enter
(or Return
if you have it).
Remark: The shell will wait (literally) forever for you to press Enter
.
If the computer is not responding, did you simply forget to press Enter
?
From now on I will assume you press Enter
after every command you type.
cat
(followed by Enter
).Control-C
.Remark: If you give no arguments to some programs then they use your keyboard for input. If the computer is not responding, did you forget to tell a program which file to read from?
echo
.echo hello world
.echo hello world
.echo hello world
and press the cursor-left key until the cursor is in the middle of the line before pressing Enter
.
Remark: White space is used to break the line into a command part followed by zero or more argument parts.
Once the line is broken into parts the white space is discarded.
It does not matter how much white space you use,
or even where the cursor is positioned in the line when you press Enter
.
echo ~
(this is the path name of your home directory)cd
(this changes your current directory to your home directory)pwd
(this shows you where you are; check you actually are in your home directory)ls
(this will show you the details of the files and directories in your home directory)ls /home/<your-username>
or ls /Users/<your-username>
(e.g., ls /home/piumarta
– this also shows you your home directory)ls ~
(of course, this is another way to list your home directory)ls .
(a single dot is another name for “this directory”, which is either your home or the last directory you changed to using cd
)ls /home
(or maybe ls /Users
– this shows you the directory where all accounts are stored, the parent of your home directory)ls ..
(this also shows you the parent of your home directory, because your current directory is your home and '..' is the name of the parent directory of the current directory)If there are more names in the /home (or /Users) directory, pick one of them. Let's call that name <name>. (If there are none, just your your name again.)
ls ~<name>
(this is another name for the home directory of the user called <name>)cd ..
(this changes your current directory to /home, where all the home directories are stored)pwd
(this prints the working directory, proving you are now “in” the directory where home directories are stored)ls
(you will see your account name listed, and the names of any other accounts on the computer)ls <your-username>
(this is a relative path, which begins in the working directory instead of the root directory)cd -
(this should print nothing, but… where are you now?)pwd
(cd understands the special argument '-' which means “the directory I was in before this one”)Remark: There are several ways to specify locations in the computer, and one of them is implicit (the current working directory) and often used as a default when you do not specify any other directory.
type echo
(this shows you that echo is a built-in command, implemented in the shell itself;
when you echo things, the shell performs the “echo”ing for you directly)type ls
(this shows you that ls is a program that is stored on the disk;
the shell runs the ls program for you whenever you type its name)Remark: Commands are either built-in to the shell, or they are programs stored in the filesystem just like any other file. Having a user program manage the running of other user programs in this way was one of the reasons why shells were invented.
Remark: There is nothing special about commands, and you can add lots (and lots) of new commands by installing programs on your computer in places such as /usr/bin or /usr/local/bin.
ls .
(this shows you all the files in the current directory, not the directory itself)ls -d .
(this shows you the details of the directory '.' itself, not the files that it contains;
-d
means “list directories as themselves”)ls -a
(now you can see the hidden directory entries, which start with '.', including '.'
itself and its ancestor '..')ls -aF
(this will put a '/' after directory names, and a '*' after executable files)ls -F /usr/bin
(there is a large collection of executable files in /usr/bin)cd /tmp
(this puts you in a directory meant for temporary files)pwd
(make sure the cd command really worked and this prints /tmp)mkdir mydir
(MaKes a DIRectory called mydir)ls -ld mydir
(check that you are the owner of the directory:
-l
= long format,
-d
= show information about directories themselves, not about the files they contain)cp /etc/passwd mydir
(this copies the file /etc/passwd into your mydir directory.
We can do better, though.
Try the following instead…)cp -vip /etc/passwd mydir/
(this version employs several “safety features” that command line pros use often
-v
means “verbose”: it prints each file as it is copied-i
means “interactive”: it asks you whether you want to overwrite any files that already exist-p
means “preserve permissions”: in particular, the copy will have the same timestamp as the originalRemark: use the options that programs like ls and cp provide so that they give you the information and protection from mistakes that you want, and make use of the (very) few “safety” features (such as trailing / on directory names) that are available in the command line.
nano data.txt
one
two
three
four
five
six
seven
eight
nine
ten
Control-O
and Enter
(to write Out the file)Control-X
(to eXit nano)ls -il
(you can see your file, its owner, how long is it, and
the first — probably huge — number is the disk address of the inode describing the file's contents)cp data.txt copy1.txt
ls -il
(you can see copy1.txt is the same size but has a different inode — the contents were copied)nano copy1.txt
and then add this is copy1
at the start of the fileControl-O
Enter
Control-X
(write out the file and exit)ls -il
(you can see copy1.txt is now larger than data.txt,
but its inode has not changed)cat data.txt
(this concatenates the files named in the command arguments and prints them on the terminal;
you can see that the original file is unchanged)cat copy1.txt
(you can see that the copy has been changed)ls -il
(you can see that the inode of copy1.txt has not changed,
but the contents of the storage blocks of the file were changed)Remark: cp makes a brand new directory entry and a brand new inode and then copies the contents of the original file into brand new storage.
Remark: When you edit a file with nano the inode does not change, only the contents of the file change. Continue with this section to see why this is significant.
ln data.txt copy2.txt
(this creates another link to data.txt's inode called copy2.txt)ls -li
(you can see that the inode numbers of data.txt and copy2.txt are the same.
The ln program made a new directory entry but did not copy the inode.
You can also see that the link count of data.txt and copy2.txt is 2,
whereas the link count of copy1.txt remains 1,
because there are now two directory entries pointing at the one inode shared by data.txt and copy2.txt)nano copy2.txt
and add "this is copy2" at the start of the file; then press Control-O
, Enter
, and Control-X
to write out the file and exit)ls -il
(you can see copy2.txt and data.txt are both now larger)cat copy2.txt
(you can see that the copy2.txt has been changed)cat data.txt
(because copy2.txt and data.txt share the same inode,
they both changed when you edit either one of them; they are the same file, but with multiple directory entries pointing to it with different names)Remark: ln makes a new link to an existing inode and file contents. If you modify any one of the files sharing the same inode, they will all change in exactly the same way.
Remark: The link count of a file (or directory) tells you how many directory entries “point to” (share) the same inode.
find . -type d
(this will print all the directories in or under the current directory;
it will probably only print '.' unless you created more directories)find .. -type d
(this will print all the directories in or under the parent directory;
it should probably find several more names, including 'mydir')find . -type f
(this will print all the files in or under the parent directory;
it should print at least data.txt, copy1.txt, and copy2.txt)find . -name *.txt
(this will print an error message… why?!?
Let's find out…)
Remark: You can search for files based on their type: file (-type f
), directory (-type d
), etc.
echo find . -name *.txt
(this will print the command that the shell just ran;
it says ”find . -name copy1.txt copy2.txt data.txt“
which is not what you wanted – the shell expanded *.txt in to the names of all the .txt files)find . -name '*.txt'
(this will print all the files in, or under, '.' whose names end with '.txt')find . -name 'c*'
(this will print copy1.txt and copy2.txt;
all the files whose names start with 'c')Remark: You can use echo to see exactly what the shell is doing with complex arguments.
Remark: You can use quote characters '…' to stop the shell messing with your arguments; often you want *.txt to mean “all the text files in this directory”, but in this case you did not want that at all.
grep e data.txt
(this will search the content of data.txt for all lines that have the letter 'e' in them;
you should see hello, one, three, five, etc.,
but not two, four, or six)grep two *
(this will search the content of all files in the current directory for lines that have the word 'two'.
You should see all three lines from all three files.
Because there was more than one file argument on the command line,
grep also prints the name of the file(s) where the target string 'two' was found)Remark: You can search for content in one or more files.
Remark: You can search for files based on whether they contain particular content.
ls -l /usr/bin
(there is a lot of output)ls -l /usr/bin > /tmp/files.txt
(there is no output; what happened?
All the output from ls was redirected (written) to /tmp/files.txt instead of to the screen)cat /tmp/files.txt
(there's the output that would have gone to the screen)
Remark: Command output can be saved in a file using the redirection operator > file
.
Remark: There is too much output in files.txt to see it all at once.
less /tmp/files.txt
(this will show you the output one page at a time.
Press space
to move forward a page;
press up
and down
arrows to move forward or backward a line;
press G
to go to the end and g
to go to the start of the tile;
press q
to quit.)Remark: To view a large amount of data one page at a time, use the program 'less'.
grep ed /tmp/files.txt
(this finds all programs in /usr/bin that have 'ed' in their name;
not especially useful, but it illustrates an important point…)Remark: The output of a command can be redirected to (saved in) a file and analysed using other programs.
grep ed
and then enter:
hello
are
we
bored
(grep will echo back only the line containing 'ed')yet?
Control-C
(to terminate the program)Remark: Many commands can read from the keyboard as well as reading from files.
grep ed < /tmp/files.txt
(this will act as if you have typed the input, but the input is taken from the file /tmp/files.txt)Remark: Just as output can be redirected to a file using >, input can be redirected from a file using <.
What if you wanted to avoid creating a temporary file in between ls and grep?
Type ls -l /usr/bin | grep ed
(this prints all files in /usr/bin whose name includes ed.
The output of ls was redirected to the input of grep, without using an explicit temporary file in the middle.
[There actually is a temporary file in the middle, but it is invisible and exists only in the computer's memory.])Remark: The output of one program can be sent to the input of another program.
wc
and then enter these two lines:
Why was the computer late for the meeting?
Because it had a hard drive.
Control-D
(wc will print the number of characters, words, and lines that you typed)Remark: wc can analyse text files by counting characters, words, and lines.
Remark: When a program is reading from the keyboard, Control-D
is a way to make the program believe that it reached the end of the input file.
Try it with cat: run cat
, type a few lines, then press Control-D
.
ls -l /usr/bin | grep e | wc -l
(this prints the number of programs in /usr/bin that have an 'e' in their name)Remark: Programs can be chained together into long pipelines by joining inputs to outputs together.
Remark: In this example, grep is acting as a filter. It reads input, filters it in some way, and then writes the result to its output.
Remark: Many command line utilities are built this way, so that they can be composed to perform useful functions. Individually they are all quite small and simple, but together their behaviour can be very complex. The flexibility to compose them in many ways is one reason that the command line is so powerful for managing and analysing data.
Remark: The '|' character is called “pipe”; it is used a lot by command line “pros”.
Note: This week we will practice and learn more about working with multiple files and directories, and about how text files can be used to store simple databases. In the two weeks following this one we will study command sequencing (scripts), control, and shell variables.
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.
tail
.cut
to extract fields from a simple database stored as a text file.The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.
Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.
First make sure you understand the important topics from the previous two weeks.
The command cp files… directory
copies one or more files into directory.
If any of the files
happen to be directories then the cp
command will fail.
To copy an entire directory (recursively) use cp
with the -r
option.
The cp -r files… directory
command copies one or more files into directory.
If any of the files
are directories then first the directory is copied along with
all of its contents.
Let's practice on a simple directory hierarchy.
Use the mkdir
and echo
commands to recreate the dir1
directory
and its three files as shown in the diagram.
The content of the three files is not important.
$ cd /tmp $ mkdir dir1 $ echo 1 > dir1/file1 $ echo 2 > dir1/file2 $ echo 3 > dir1/file3 $ ls -lR dir1 dir1: total 48 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file3.txt
Use cp -rv
(recursive and verbose)
to copy the entire directory dir1
to a new directory tree called dir2
.
$ cp -rv dir1 dir2 'dir1' → 'dir2' 'dir1/file3.txt' → 'dir2/file3.txt' 'dir1/file2.txt' → 'dir2/file2.txt' 'dir1/file1.txt' → 'dir2/file1.txt'
Because dir2
does not yet exist, it is first created in the current directory and then the contents of dir1
are copied to dir2
.
The -v
option shows you the directory being created and the files being copied.
What will happen if you run the same cp -rv dir1 dir2
command again?
$ cp -rv dir1 dir2 'dir1' → 'dir2/dir1' 'dir1/file3.txt' → 'dir2/dir1/file3.txt' 'dir1/file2.txt' → 'dir2/dir1/file2.txt' 'dir1/file1.txt' → 'dir2/dir1/file1.txt' $ ls -lR dir2 dir2: total 64 drwxr-xr-x 2 piumarta dialout 170 Oct 26 05:57 dir1 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file3.txt dir2/dir1: total 48 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file3.txt
Because dir2
already exists, dir1
is copied into dir2
;
the new copy of dir1
does not replace dir2
.
The rmdir dir
command removes the directory dir.
Try removing dir1
.
$ rmdir dir1 rmdir: failed to remove 'dir1': Directory not empty
A directory must be empty before it can be removed.
You could remove the files dir1/file1.txt
, dir1/file2.txt
, and dir1/file3.txt
one at a time but that would be tedious.
Instead, remove all three at the same time using a wildcard.
The path dir1/*
expands to all three of the files in dir1
.
If you use rm -v dir1/*
(-v
for verbose)
then each name will be printed as it is removed.
Once the three files are removed you will he able to remove their parent directory dir1
.
Use rm -v dir1/*
to remove all the files in dir1
.
$ ls dir1 file1.txt file2.txt file3.txt $ rm -v dir1/* removed 'dir1/file1.txt' removed 'dir1/file2.txt' removed 'dir1/file3.txt' $ rmdir dir1 $ ls dir1 ls: cannot access 'dir1': No such file or directory
We still have dir2
which contains three files and a copy of the original
dir1
(with three more files inside that directory).
The *
wildcard is less useful when removing this many files.
Instead you can use rm -r
(-r
for recursive) which
will remove the contents of a directory before removing the directory itself.
Use rm -r dir2
to remove dir2
and all of its contents.
$ ls -F dir2 dir1/ file1.txt file2.txt file3.txt $ rm -r dir2 $ ls dir2 ls: cannot access 'dir2': No such file or directory
When you delete a file from the command line it is gone forever. There is no 'trash can' that collects deleted files. There is no way to restore a deleted file later if you change your mind.
In the exercises above the argument dir2/*
matched all the filenames in dir2
.
The shell expanded the pattern dir2/*
into three separate arguments: dir2/file1
, dir2/file2
, and dir2/file3
.
The *
character actually matches any sequence of characters (zero or more) except /
.
You can use it to match 'anything' in a part of a filename.
You can also use it more than once to match 'anything' in several different parts of a filename.
List all files in /etc
that begin with b
, that end with .conf
, or that have a .
anywhere in their name.
$ ls /etc/b* /etc/baseprofile /etc/bash_completion $ ls /etc/*.conf /etc/nsswitch.conf $ ls -d /etc/*.* /etc/init.d /etc/nsswitch.conf /etc/rebase.db.i386 /etc/vimrc.less /etc/minirc.dfl /etc/persistprofile.sh /etc/sessionsaliases.sh /etc/xmodmap.esc
Another useful wildcard character is ?
which matches exactly one of any character (except /
).
List all files in /etc
that have an o
and an f
in their name separated by exactly one other character (it does not matter which character).
$ ls /etc/*o?f* /etc/nsswitch.conf /etc/ssh_config
One more useful wildcard pattern is [chars]
which matches exactly one of any of the chars listed between the square brackets.
List all files in /etc
that have a two consecutive vowels ('a', 'e', 'i', 'o', or 'u') in their name.
$ ls -d /etc/*[aeiou][aeiou]* /etc/bash_completion /etc/defaults /etc/screenrc /etc/version /etc/bash_completion.d /etc/group /etc/sessionsaliases.sh
When the chars contains a range of consecutive characters, you can specify the entire range using “first-last
”.
Use the “[first-last]
” pattern to list all files in /etc
whose name contains at least one digit.
$ ls -d /etc/*[0-9]* /etc/X11 /etc/at-spi2 /etc/dbus-1 /etc/gtk-3.0 /etc/pkcs11 /etc/rebase.db.i386
The wildcard patterns explained above are expanded by the shell according to the files that actually exist in the filesystem. What happens if you use a wildcard pattern that does not match any files?
Try to delete some non-existent 'log' files: dir1/*.log
.
$ rm dir/*.log rm: can't remove 'dir/*.log': No such file or directory
If the wildcard pattern does not match any files, it is simply left unexpanded. When the command tries to access a file named by a wildcard expression, the file does not exist and an error message is generated.
A 'dry run' is a rehearsal or practice that takes place before the real performance. In computing, a dry run shows you what a command would do but without actually doing it. One example of how useful they are is to see what files would be matched by wildcard patterns, for example before actually removing them.
For the next exercise, set up your dir1
directory as above, containing six files:
file1.txt
, file2.txt
, and file3.txt
, containing the words think
, for
, and yourself
;file1.dat
, file2.dat
, and file3.dat
, containing the number of characters in the corresponding .txt
files.$ mkdir dir1 $ echo think > dir1/file1.txt $ echo for > dir1/file2.txt $ echo yourself > dir1/file3.txt $ wc -c dir1/file1.txt > dir1/file1.dat $ wc -c dir1/file2.txt > dir1/file2.dat $ wc -c dir1/file3.txt > dir1/file3.dat $ ls -l dir1 total 3 -rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file1.dat
-rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file2.dat -rw-r–r– 1 user UsersGrp 4 Oct 26 16:51 file2.txt -rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file3.dat -rw-r–r– 1 user UsersGrp 9 Oct 26 16:51 file3.txt
Use the echo
command to perform a dry-run of removing:
.txt
files in dir1
,.dat
files in dir1
,.txt
and .dat
files for only file2
(two files in total),.txt
and .dat
files for file1
andfile3
(four files in total).$ echo rm dir1/*.txt rm dir1/file1.txt dir1/file2.txt dir1/file3.txt $ echo rm dir1/*.dat rm dir1/file1.dat dir1/file2.dat dir1/file3.dat $ echo rm dir1/file2.* rm dir1/file2.dat dir1/file2.txt $ echo rm dir1/file[13].* rm dir1/file1.dat dir1/file1.txt dir1/file3.dat dir1/file3.txt
The touch
command updates the last modification time of an existing file to be the current date and time.
If the file does not exist, an empty file is created.
Create two empty files called file1
and file2
.
$ cd dir1 $ ls -lt file[12] ls: file[12]: No such file or directory $ touch file1 file2 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2 $ touch file2 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 $ touch file1 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2
Note how touch
ing a file moves it to the top of the 'most recent' list (ls -t
).
Wildcards are used to match existing file names. They cannot be used to generate file names for non-existent files or directories, for example, to create a set of needed files or directories.
Try using a wildcard to create ten empty files called test0
, test1
, test2
, …, test9
.
$ touch test[0123456789] $ ls test* test[0123456789]
Creating a single file called test[0123456789]
is not what you intended.
That is what happened because the shell could not find any existing file to match
the pattern test[0123456789]
and so did not expand it in the command line.
A brace expression will generate multiple words based on a list or sequence of values.
The list of values to generate is written between curly braces {
and }
with items in the list separated by commas.
For example, the expression {a,b,c}
generates three separate words a
, b
, and c
.
The brace expression can appear in a larger pattern,
for example, the expression p{a,b,c}q
generates three separate words
paq
, pbq
, and pcq
.
Use a brace expression to generate the command needed to create the five files
test0.txt
to test4.txt
.
$ touch test{0,1,2,3,4}.txt $ ls test* test0.txt test1.txt test2.txt test3.txt test4.txt
When a sequence of numbers or letters are needed then the list can contain
just the first and last values separated by ..
.
This is called a sequence expression.
For example, the sequence expression p{a..z}q
generates a list of 26 words,
starting with paq
and pbq
, and ending with pyq
and pzq
.
Use a brace expression to generate the command needed to create the five files
test5.txt
to test9.txt
.
$ touch test{5..9}.txt $ ls test* test0.txt test1.txt test2.txt test3.txt test4.txt test5.txt test6.txt test7.txt test8.txt test9.txt
In a sequence expression that generates numbers, the first value in the sequence
sets the minimum width of the generated numbers.
This is useful if leading 0
s are needed.
For example, the following sequence expressions generate lists of 100 words:
test{0..99}
generates test0
, test1
, … , test98
, test99
, andtt{000..099}
generates tt000
, tt001
, … , tt098
, tt099
, andt{00000..99}
generates t00000
, t00001
, … , t00098
, t00099
.Text files are often used as simple 'databases' for storing captured sensor data, the results of data processing, etc. The shell provides several commands for manipulating data stored in this kind of text file.
A comma-separated value (CSV) file is one example of this kind of text file database.
Each line is a record and each field in that record is separated from the next with a specified delimiter character.
In a CSV file the delimiter is a comma, “,
”.
The cut
command selects and prints fields from exactly this kind of text file.
By default it uses a 'tab' character to separated fields (just as a copy-paste operation between Excel and a text editor does) but this can be changed using a command line option.
cut
has the following command line options:
-d character
specifies the delimiter character. To manipulate CSV files, use: “cut -d ,
”-f fields
tells cut
which of the fields you want to print. Fields are numbered, starting at 1, and fields can contain multiple fields separated by commas.
Create a CSV file called directory.txt
that contains the following data.
(The easiest way is to copy the text it from this web page and paste it into a text editor,
or into “cat > directory.txt
” followed by Control+D to simulate end-of-file.)
name,given,office,phone,lab,phone Adams,Douglas,042,0042,092,0092 Kay,Alan,301,3001,351,3051 Knuth,Donald,201,2001,251,2051 Lee,Tim,404,4004,454,4054 McCarthy,John,202,2002,252,2052 Shannon,Claude,304,3004,351,3051 Vinge,Vernor,302,3003,352,3053
Use the cut
command to extract just the “office” column from the data.
$ cut -d , -f 3 directory.txt office 042 301 201 404 202 304 302
The tail
command has an option to print a file starting at a specific line number.
The syntax is: “tail -n +number
”.
For example, “tail -n +5 file
” will print the contents of file starting from the 5th line in the file.
Pipe (|
) the output from the previous command into tail
.
Use the tail -n +number
option to print the input starting at line number 2.
$ cut -d , -f 3 directory.txt | tail -n +2 042 301 201 404 202 304 302
The grep
command understands the similar wildcard patterns to the shell.
(The shell uses them to filter file names and grep
uses them to filter or select lines of text.)
Each office number in our sample data is three digits long.
The first digit says which floor the office is on.
One way to extract just the office numbers on the second floor is to use grep
to search for numbers matching the pattern “2[0-9][0-9]
”.
You can then count how many offices are on the second floor using “wc -l
”.
Write a pipeline of commands that prints how many offices are located on the third floor. Try very hard to do this without looking at the sample answer. If you cannot find the solution, click on the link below to view the answer.
echo > file
can be used to create a file containing a line of data.touch file
can be used to create an empty file or to update its modification time to 'now'.mkdir directory
creates a new directory.cp oldfile newfile
copies (duplicates) oldfile to newfile.mv oldfile newfile
moves (renames) a file or directory.cp files… directory
copies one or more files (or directories) into an existing directory.mv files… directory
moves one or more files (or directories) into an existing directory.rm files…
removes (deletes) files.rmdir directory
removes (deletes) a directory which must be empty.rm -r directory
removes (deletes) a directory and all its contents, recursively.*
” in a file name matches zero or more characters, so “*.txt
” matches all files ending in “.txt
”.?
in a file name matches any single character, so ”?.txt
“ matches ”a.txt
“” but not “any.txt
”.[characters']
in a file name matches any one of the characters, so ”[aeiou].txt
“ matches ”a.txt
“” but not “b.txt
”.[first-last']
in a file name matches any character in the range first to last, so ”*[a-m].txt
“ matches ”boa.txt
“” but not “constrictor.txt
”.*
, ?
, []
) are expanded by the shell to match files that already exist. They cannot generate new (non-existent) file names.{a,b,c}
expands to three words: a
, b
, and c
.p{a,b,c}q{x,y,z}r
expands to nine words: paqxr paqyr paqzr pbqxr pbqyr pbqzr pcqxr pcqyr pcqzr
{000..5}.txt
expands to six words: 000.txt 001.txt 002.txt 003.txt 004.txt 005.txt
tail -n +number
displays input starting at line number (and continuing until the last line).cut -d char -f fields
prints the given fields from its input lines using char as the field delimiter.
The fields are numbered from 1 and multiple field numbers are separated by commas.This week we will study manipulating multiple files using loops and creating new commands out of sequences of existing commands.
The large sensor data file for the in-class assignment can be downloaded like this: curl -O https://kuas.org/tmp/metar-2019.tgz
Once downloaded, unpack the contents using tar -xf metar-2019.tgz
which will create a directory called metar-2019
containing 8752 files of weather sensor data.
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.
The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.
Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.
First make sure you understand the important topics from the previous two weeks. Click on this link to review what you should already know:
In the notes below, follow along by typing all the commands shown in bold. Check the the output from your commands is similar to the output shown here.
Download the file planets.tar
from the course web site.
$ cd $ curl -O https://kuas.org/tmp/planets.tar % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 20480 100 20480 0 0 485k 0 --:--:-- --:--:-- --:--:-- 487k
The file is a 'tar' archive.
Unpack the archive using the tar
command with options
-x
to extract an archive,
-v
to be verbose about each file extracted, and
-f
to give the archive filename on the command line.
$ tar -xvf planets.tar planets/earth.dat planets/jupiter.dat planets/mars.dat planets/mercury.dat planets/moon.dat planets/neptune.dat planets/pluto.dat planets/saturn.dat planets/uranus.dat planets/venus.dat
You can see from the output that a directory called planets
was created and that all the new files are inside it.
Change to the planets
directory and then check the contents of one of the files using cat
or less
.
$ cd planets $ cat earth.dat Name Earth Mass (10^24kg) 5.97 Diameter (km) 12,756 Density (kg/m^3) 5514 Gravity (m/s^2) 9.8 Escape Velocity (km/s) 11.2 Rotation Period (hours) 23.9 Length of Day (hours) 24.0 Distance from Sun (10^6 km) 149.6 Perihelion (10^6 km) 147.1 Aphelion (10^6 km) 152.1 Orbital Period (days) 365.2 Orbital Velocity (km/s) 29.8 Orbital Inclination (degrees) 0.0 Orbital Eccentricity 0.017 Obliquity to Orbit (degrees) 23.4 Mean Temperature (C) 15 Surface Pressure (bars) 1 Number of Moons 1 Ring System? No Global Magnetic Field? Yes
The files contain tab-separated values with two columns. The first column describes the data on that line, and the second column contains the data value.
Check the first two lines of the files to see if they all look the same.
$ head -n 2 *.dat ==> earth.dat <== Name Earth Mass (10^24kg) 5.97 ==> jupiter.dat <== Name Jupiter Mass (10^24kg) 1898
...etc... ==> uranus.dat <== Name Uranus Mass (10^24kg) 86.8 ==> venus.dat <== Name Venus Mass (10^24kg) 4.87
Line 17 of every file should contain the mean temperature.
Check line 17 of earth.dat
using the combination of head -n 17
and tail -n 1
that was used earlier.
$ head -n 17 earth.dat | tail -n 1 Mean Temperature (C) 15
How would you check line 17 of all the files to make sure they contain the mean temperature?
The obvious way is to change earth.dat
to *.dat
in the command you just used.
Will that work?
Try showing the 17th line of each file by running the command withearth.dat
changed to *.dat
.
$ head -n 17 *.dat | tail -n 1 Mean Temperature (C) 464
That's not right.
We only saw the line for one planet.
Which one was it?
Use grep
to find out.
$ grep 464 *.dat venus.dat:Mean Temperature (C) 464
Why did you see only one line of output?
To print the 17th line of every file we need to use something more sophisticated: a loop.
To print the 17th line of each file, what we want to do is this (in natural language):
.dat
The shell can do this for us using a for
loop.
The syntax (or 'general form') of a for
loop always looks like this:
for thing in list of things do operation_on $thing done
The word for
is followed by a variable name (in this case thing
),
then the word in
, and then a list of (space-separated) words.
The list of words ends with a newline (or semicolon – see below) and the word do
.
One or more commands then follow, collectively called the body of the loop, ending with the word done
.
The commands in the body will be run as many times are there are words in the list.
Each time the body commands are run, the variable will be set to the next item in the list (starting with the first).
Note that the parts in italics are not meant to be typed literally.
They are descriptive 'placeholders' for some particular list of things that you want to operate on and
and some specific operation that you want to perform on those things.
Let's make the loop print the 17th line of all the .dat
files by
*.dat
for our
list of things andhead -n 17 $thing | tail -1
for our operation_on
Note also that the name of the variable thing
is not important;
what is important is that the name used after for
matches the name used inside the loop
to refer to each of the words in the list of things.
Let's change the name thing
to something more meaningful, such as filename
for filename in *.dat do head -n 17 $filename | tail -n 1 done
Try running the above command, exactly as it is shown.
(If you make a mistake, or the shell gets confused about what you are typing, press Control-C
to
get back to the normal prompt.)
Note that the prompt changes to “>
” as soon as you finish typing the first line.
This is to remind you that you have not yet finished typing the complete for
command.
(A for
loop is not complete until the shell sees the word done
at the end.)
$ for filename in *.dat > do > head -n 17 $filename | tail -n 1 > done
How did the shell know that the filename
inside the loop was a variable, and not the name of a file?
Because of the $
symbol at the beginning.
Whenever a $
is followed by a name, the shell replaces
the $name
combination with whatever value is currently assigned to the variable with the given name.
Without the $
in front of filename
the head
command would have tried to
print the first 17 lines of the (non-existent) file literally called filename
.
How did the shell know that the filename
after the for
is the name of a variable?
Because the syntax of the for
command says that the next thing in the command must always the name of a variable.
The $
is not needed (and is even wrong) because we do not want to replace filename
with its value,
we are just telling for
the name of the variable it should set to each item in our list of things.
You can use the echo
command to see exactly how the loop works and what it is doing to the variable.
Use echo
to see how many times the loop is run and to see the value of filename
each time the loop runs.
for filename in *.dat do echo filename is $filename done
Try moving the $
from the second filename
to the first to see what changes.
$
in front of both filename
s?$
at all in front of the filename
s?What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop.
for filename in *.dat do ls *.dat done
What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop.
for filename in *.dat do ls $filename done
Use a loop to make a backup copy of each of the planet .dat
files.
For each file x.dat
, make a copy of that file called backup-x.dat
.
For example, earth.dat
should be copied to a file called backup-earth.dat
.
A single copy command such as the following will not work (try it if you like):
$ cp *.dat backup-*.dat cp: target 'backup-*.dat' is not a directory
The correct solution follows the same pattern as printing the 17th line of every file.
Of course, the operation should instead copy each file from “$filename
” to “backup-$filename
”.
The previous example shows how a variable is used to form part of a longer name.
The filename
variable is used to create the name backup-$filename
.
When filename
is set to earth.dat
, the longer name will be backup-earth.dat
.
A problem arises when trying to append a letter or digit to a name stored in a variable.
For this reason $filename
can also be written ${filename}
.
Since the characters {
and }
cannot be part of a variable name,
there is no possibility of ambiguity when this form is used inside a longer name next to a letter or a digit.
Delete your backup-*
files.
The for each file x.dat
create a backup file called x2
$ for name in *.dat > do > cp $name $name2 > done cp: missing destination file operand after 'earth.dat' cp: missing destination file operand after 'jupiter.dat' ...etc... cp: missing destination file operand after 'uranus.dat' cp: missing destination file operand after 'venus.dat'
What is the problem?
Use an echo
command to print what the shell will do when it executes the cp
command, like this:
$ for name in *.dat > do > echo cp $name $name2 > done cp earth.dat cp jupiter.dat ...etc... cp uranus.dat cp venus.dat
What happened to earth.dat2
, etc.?
Variable names start with a letter which is followed by any number of letters and digits.
The shell thinks that the “2
” is part of the variable name;
in other words, that the name of the variable in “$name2
” is “name2
”.
To fix this, use {
and }
around “name
” to separate it from the “2
”.
$ for name in *.dat > do > cp $name ${name}2 > done $ ls earth.dat mars.dat moon.dat pluto.dat uranus.dat earth.dat2 mars.dat2 moon.dat2 pluto.dat2 uranus.dat2 jupiter.dat mercury.dat neptune.dat saturn.dat venus.dat jupiter.dat2 mercury.dat2 neptune.dat2 saturn.dat2 venus.dat2 $ rm *.dat2
Wildcards (*
, ?
, and [...]
) in a
for
loop's list of things are expanded as usual.
What would be the results of running each of the following commands?
for name in p*.dat do echo $name done
for name in *p*.dat do echo $name done
Predict the answers, then check them by actually running the commands.
The up-arrow (or Control
+p
)
and down-arrow (or Control
+n
) keys can be used to scroll through recent commands.
The left-arrow (or Control
+b
) and right-arrow (Control
+f
) keys let you move around inside a command.
You can edit a previous command by deleting or inserting new content.
Pressing Return
re-runs the (edited) command.
If you try this on a for
loop you will notice that the loop has been recorded on single line.
To do this the shell has inserted some semicolon “;
” characters to separate the different parts of the loop.
A semicolon has been inserted in approximately the places where a newline was in the original for
loop.
When viewed in the history our loop looks like this:
$ for name in *.dat
> do
> ls $name
> done
earth.dat
jupiter.dat
...etc...
uranus.dat
venus.dat
$ Control
+P
$ for name in *.dat; do ls $name; done
The general form of a single-line for
loop is:
for thing in list of things ; do operation on thing ; ...etc... ; done
The semicolons take the place of newlines in the single-line version. Either or both of the semicolons can be replaced by newlines; the shell does not care whether you use semicolons or newlines.
Write the backup for
loop again, all on one line.
for name in *.dat; do cp $name backup-$name; done
Delete the backup files.
Add a command to echo
the name of each file before copying it,
still putting the entire for
loop on a single line..
$ rm backup-* $ for name in *.dat; do echo $name; cp $name backup-$name; done earth.dat jupiter.dat mars.dat mercury.dat moon.dat neptune.dat pluto.dat saturn.dat uranus.dat venus.dat $ rm backup-*
Let's print the 17th line of each file and redirect the output to another file.
The following will not work:
$ for name in *.dat; do > head -n 17 $name | tail -1 > lines.txt > done $ cat lines.txt Mean Temperature (C) 464
The problem is that each time around the loop the >
redirection
truncates (empties) lines.txt
before it writes the output from tail
into it.
There are two solutions to this problem.
The first solution is to use another redirection operator, >>
.
This operator appends lines to the output file instead of replacing its contents.
$ for name in *.dat; do > head -n 17 $name | tail -1 >> lines.txt > done $ cat lines.txt Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464
The second solution is to move the redirection outside the loop, so that every command executed inside the loop will all be part of a single output redirection.
$ for name in *.dat; do > head -n 17 $name | tail -1 > done > lines.txt $ cat lines.txt Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464
The echo
command normally prints a newline character after its arguments.
If you use the option -n
then this newline is not printed.
This lets you use several echo -n
commands to print several things on the same line.
In the following example a semicolon ;
is used (instead of newline) to separate two echo
commands.
The first echo
command uses the option -n
to prevent it printing the newline.
Try running these commands with and without the -n
to see the difference.
$ echo -n hello; echo "," world hello, world $ echo hello; echo "," world hello , world
In a for
loop, the operation that is performed inside the loop can be another for
loop.
(This is called nesting loops.)
For example:
$ for digit in {1..3}; do for letter in {a,b}; do echo $digit $letter; done; done 1 a 1 b 2 a 2 b 3 a 3 b
Arithmetic expansion is performed on any text written inside double parentheses after a $
symbol, like this:
“$((text))
”.
The entire expression (from “$
” to the closing “)
”) is replaced by the result of
evaluating text as an arithmetic expression.
Within text you can refer to variables without needing to use the $
prefix.
Some examples:
$ foo=32 $ echo foo plus ten is $((foo + 10)) foo plus ten is 42 $ N=1; for L in {a,b,c}; do echo $L$N; N=$((N+1)); done a1 b2 c3
Write two nested for
loops that print the following multiplication table:
1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100
Don't worry about properly lining up the columns.
The echo
command understands an option -e
that replaces certain sequences of characters with other characters.
One replacement that this enables is to convert “\\t
” into a tab character.
A tab moves the cursor forward to a column that is a multiple of 8.
Modify your loops to line up the columns, like this:.
1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100
for
loop repeats a command for every item in a list.for
loop sets a variable to the next item in the list before running the loop body.$name
to expand a variable (i.e., get its value),
or ${name}
if there are letters or digits immediately after the variable.for
loops can be written on one line by replacing newlines with semicolons.for
loops can be nested by writing a loop as the body of another loop.
This week we will study while
loops and if
statements,
several ways to test variable values and file properties,
and some useful ways to manipulate the values stored in variables.
Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.
This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.
On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.
To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.
while
loops to repeat commands while a condition is true.if
statements to optionally run commands based on a condition.test
command to test the properties of files, strings, and numbers.man
and help
commands, and using the --help
option with most commands.The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.
Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.
Make sure you understand the topics from last week. Click on the link below to expand a brief review.
As well as the indicated exercises, try typing in all the examples for yourself. If you can think of ways to modify the example to change the behaviour, try them. Exploration is the best way to learn.
Variables are used to store data.
Variable names must begin with a letter which can be followed by any number of letters or digits.
(The underscore “_
” is treated as a letter.)
Names that conform to these rules are legal (allowed by the rules);
names that break these rules are illegal (not allowed by the rules).
Some examples of legal variable names:
Name | Why it is legal |
---|---|
a | starts with a letter |
abcdef | letter followed by any number of letters |
a1b2c3 | letter followed by any number of letters or digits |
FooBar999Baz | letter followed by any number of letters or digits |
_ | underscore _ is a letter too |
_1234_ | letter followed by digits and a letter |
LONG_VARIABLE_NAME_NUMBER_1 | letter followed by lots of letters and a final digit |
Some examples of illegal variable names:
Name | Why it is not legal |
---|---|
0 | does not start with a letter |
2things | does not start with a letter |
x@y | @ is neither a letter nor a digit |
final value | space is neither a letter nor a digit |
You create or set a variable using the =
assignment operator.
The syntax (general form) of assignment is:
variableName=value
where variableName
follows the rules explained above and
value
is a single word (such as a filename), number, etc., with no spaces.
There must not be any space either side of the =
symbol.
You get the value of a variable by writing a $
before the variable's name.
For example:
$ metars=/tmp/metars-2019 $ echo $metars /tmp/metars-2019
Again, there must be no space between the $
and the variable name.
What if you want to put a space inside a value stored in a variable? You can protect spaces using quotation marks.
Single quotes around a value like this 'value'
will protect everything inside the value.
Wildcards (*
, ?
, etc.), dollar signs ($
), and other special characters will be completely ignored.
Spaces inside the value will be considered part of the value.
Double quotes around a value like this "value"
will protect everything inside the value
except for expansions (see below) introduced by the $
character.
One such expansion is getting the value of a variable using $name
.
$ foo='$woohoo $$$ * .* how about this?' single quotes stop *, ?, and $ from being treated specially $ echo '$foo' single quotes stop $ from being treated specially $foo $ echo " * $foo ? " double quotes allow $ to get the value of foo * $woohoo $$$ * .* how about this? ? but * and ? wildcards are still ignored
If you want a value with spaces inside, use '…'
.
If you want a value with spaces inside and variables to be expanded, use "…"
.
The $
character is used to transform variables and other values in the command line by a process called expansion.
There are several kinds of expansion:
A $
followed by a variable name expands to the value stored in the variable.
(If the variable is not set to any value then the result is blank.)
Braces {
and }
around the variable name are optional
but are necessary when a variable expansion is followed immediately by a letter or digit
that is not part of the variable name,
as in the last example below.
$ metars=/tmp/metars-2019 $ echo $metars /tmp/metars-2019 $ me=myself $ echo $me myself $ echo ${metars} /tmp/metars-2019 $ echo ${me}tars myselftars
The brace syntax {variable}
also provides several mechanisms
that modify the value retrieved from variable
.
Within a variable expansion with braces, a suffix (such as a file extension) can be removed
by following the variable name with %suffix
.
$ filename=2019-01-01T00:53:57-japan.txt $ echo ${filename} 2019-01-01T00:53:57-japan.txt $ echo ${filename%.txt} 2019-01-01T00:53:57-japan $ echo ${filename%-japan.txt} 2019-01-01T00:53:57
A prefix can be removed by following the variable name with #prefix
.
$ echo ${filename} 2019-01-01T00:53:57-japan.txt $ echo ${filename#2019} -01-01T00:53:57-japan.txt $ echo ${filename#2019-??} -01T00:53:57-japan.txt $ echo ${filename#2019-??-??} T00:53:57-japan.txt
In both cases (${name%pattern}
and ${name#pattern}
)
you can use wildcards such as ?
in the prefix or suffix.
$ echo ${filename} 2019-01-01T00:53:57-japan.txt $ echo ${filename#2019-??} -01T00:53:57-japan.txt $ echo ${filename#2019-??-??} T00:53:57-japan.txt $ echo ${filename%:??-*} 2019-01-01T00:53
You can also replace a pattern anywhere in a value with some other text using /pattern/replacement
after the variable name:
$ echo ${filename} 2019-01-01T00:53:57-japan.txt $ echo ${filename/T/ at time } 2019-01-01 at time 00:53:57-japan.txt
The variable expansions above should be all you need in most cases, but there are several more that you might need to use occasionally. If you are interested, here is a table showing most of them. (Click on the 'link' to toggle the table.)
String operators available during ''${...}'' expansion
Imagine that you are running out of disk space on your computer.
You have a lot of 'lossless' music stored in .wav
(Microsoft 'wave') files.
You could halve the amount of space they use by converting them to .flac
(free lossless audio codec) files.
The program ffmpeg
can do this for you.
The syntax is:
ffmpeg -i input-filename.wav output-filename.flac
First, make some 'fake' .wav
files like this:
for i in {1..9}; do echo $i > track-$i.wav; done
1. Write a “for wav in …
” loop that echo
s the names of all the *.wav
files in the current directory, one at a time.
track-1.wav track-2.wav ... track-9.wav
2. Change the echo
command so that for every file it prints two things: the original name ($wav
) as well as the name
with the original .wav
suffix removed.
track-1.wav track-1 track-2.wav track-2 ... track-9.wav track-9
3. Change the echo
command so that for every file it prints two things: the original name ($wav
) as well as the name
with the original .wav
suffix removed and a new .flac
suffix added.
track-1.wav track-1.flac track-2.wav track-2.flac ... track-9.wav track-9.flac
4. Change the echo
command so that for every .wav
file in the current directory your loop prints:
ffmpeg -i filename.wav filename.flac
The output of your loop should look like this:
ffmpeg -i track-1.wav track-1.flac ffmpeg -i track-2.wav track-2.flac ... ffmpeg -i track-8.wav track-8.flac ffmpeg -i track-9.wav track-9.flac
(If you had some genuine .wav
files, and a copy of the ffmpeg
program, you could remove the echo
from your loop body and it really would convert them all the .wav
files to .flac
for you.)
Parameters are the values passed to a shell script on the command line.
Whereas variables are named, parameters are numbered starting at 1.
(If you ever happen to need it, $0
is the name of the shell script exactly as it appeared on the command line.)
There are three other special variables that are useful inside shell scripts.
$#
expands to the number of command-line arguments, and
both $@
and $*
expand to a sequence containing all of the command-line arguments separated by spaces.
Parameter | Meaning |
---|---|
$1 | The first command-line argument |
$2 | The second command-line argument |
(and so on…) | |
$# | The number of command line arguments |
$@ | All of the command line arguments |
$* | All of the command line arguments |
Write a shell script that prints a single number showing how many command-line arguments it is run with.
(Don't forget you have to make it executable using chmod +x filename
before you can run it.)
The variables $@
and $*
behave differently when quoted.
To illustrate the difference, consider the following script:
#!/bin/sh echo 'using "$@":' for argument in "$@"; do echo "$argument" done echo 'using "$*":' for argument in "$*"; do echo "$argument" done
Running this script with three command-line arguments one
, "two too"
, and three
produces this result:
$ ./script one “two too” three using "$@": one two too three using "$*": one two too three
Create the script shown above.
Run it with arguments one "two too" three
.
Run it with other arguments, including no arguments.
You can see that "$*"
expands to a list of command-line arguments all inside one pair of double quotes ("
).
In other words, "$*"
is one single value containing all of the command line arguments.
On the other hand, "$@"
expands to a list of command-line arguments where each separate argument is inside a pair of double quotes ("
).
In other words, "$@"
is one value per argument, each value containing a quoted version of the corresponding argument.
Expansion | Equivalent |
---|---|
"$*" | Single value containing all arguments: "$1 $2 $3 $4 …" |
"$@" | Multiple values, one per argument: "$1" "$2" "$3 "$4" … |
In a for
loop you should almost always use "$@"
(to repeat the loop for each argument).
for argument in "$@"; do some_operation_on "$argument"; done
When assigning to a variable you should probably always use "$*"
, however most shells are clever enough to let you use either.
all_arguments="$*" all_arguments="$@"
You can evaluate arithmetic expressions by enclosing them in double parentheses preceded by a $
character: $((expression))
Within the expression you can use the normal arithmetic operators and the names of variables (without a $
in front of them).
$ echo $((2+4*10)) 42 $ two=2 $ ten=10 $ echo $((two+4*ten)) 42 $ total=0 $ for n in {1..10}; do total=$((total+n)); done $ echo $total 55 $ n=1 $ for word in one two three four; do echo $n $word; n=$((n+1)); done 1 one 2 two 3 three 4 four
Modify your shell script from the previous exercise so that it prints each command-line argument preceded with its number, starting at 1. For example:
$ ./script one “two too” three 1 one 2 two too 3 three
Write a shell script called factorial
that calculates the factorial of its command line argument.
Recall that factorial(n) = n * (n-1) * (n-2) * … * 1.
$ ./factorial 5 120
Sometimes you will need to store the output of a command in a variable, or use the output of one command as an argument to another command. Command substitution provides a way to do this.
The pattern $(command)
is replaced with the output from running command
.
Note that command can include command-line options and arguments, and can even be a pipeline made from several commands.
The result can be used to set the value of a variable.
In the following examples, note the use of double quotation marks around the command substitutions
to protect any spaces in the output from the commands.
$ ls | wc -l 8752 $ pwd /Users/piumarta/metar-2019 $ numFiles="$(ls | wc -l)" $ dirName="$(pwd)" $ echo there are $numFiles files in the directory $dirName there are 8752 files in the directory /Users/piumarta/metar-2019
Another way of doing the same thing, without variables, is to use the command substitutions directly where their output is needed:
$ ls | wc -l 8752 $ pwd /Users/piumarta/metar-2019 $ echo there are $(ls | wc -l) files in the directory $(pwd) there are 8752 files in the directory /Users/piumarta/metar-2019
Write a shell script called nfiles.sh
that prints the number of files in each of the directories written on the command line
followed by the name of the directory.
$ ./nfiles.sh . /bin /usr/bin 42 . 124 /bin 1486 /usr/bin
(Of course, your results will differ.)
A for
loop is executed once for each member of a list of items.
Other control structures include
the while
loop that executes until a condition becomes false,
the until
loop that executes until a condition becomes true, and
the if
statement that conditionally executes (or not) a sequence of commands.
The syntax (general form) of a while
loop is
while TEST do COMMANDS done
or on a single line like this:
while TEST ; do COMMANDS ; done
The COMMANDS part works exactly like it does in a for
loop.
The TEST part should be a command that can either succeed or fail.
The while
loop will continue to run its TEST and the COMMANDS until the TEST fails.
A useful command to use for the TEST part of a while
loop is test
, which can do many things.
One thing test
can do is compare two numbers.
Command | Succeeds if… | Example | |
---|---|---|---|
test LHS -lt RHS | LHS < RHS | test $num -lt $limit | $num is less than $limit |
test LHS -le RHS | LHS <= RHS | test $num -le 0 | $num is negative |
test LHS -eq RHS | LHS = RHS | test $num -eq 0 | $num is zero |
test LHS -ne RHS | LHS =/= RHS | test $num -ne -1 | $num is not -1 |
test LHS -ge RHS | LHS >= RHS | test $num -ge 0 | $num is non-negative |
test LHS -gt RHS | LHS > RHS | test $num -gt 0 | $num is positive |
Combining a while
loop with test
and arithmetic expansion to update a counter:
$ counter=0 $ while test $counter -lt 5; do > echo $counter > counter=$((counter+1)) > done 0 1 2 3 4
The if
statement conditionally executes a sequence of commands.
The syntax of if
statements is:
if TEST then COMMANDS fi
or on a single line like this:
if TEST ; then COMMANDS ; fi
The COMMANDS will be run only if the TEST succeeds.
Using the test
command again for the TEST:
$ n=3 $ if test $n -lt 5; then > echo $n is less than 5 > fi 3 is less than 5
Another form of the if
statement provides a second set of commands to be run if the TEST fails.
if TEST then COMMANDS1 else COMMANDS2 fi
or on a single line like this:
if TEST ; then COMMANDS1 ; else COMMANDS2; fi
First the TEST command is run. If TEST succeeds then COMMANDS1 are run. If TEST fails then COMMANDS2 are run.
$ n=7 $ if test $n -lt 5; then > echo $n is less than 5 > else > echo $n is not less than 5 > fi 7 is not less than 5
The test
command can also check the properties of a file or directory, the size of a string, or the relationship between two strings.
Command | Succeeds if… |
---|---|
test -d FILE | FILE exists and is a directory |
test -e FILE | FILE exists |
test -f FILE | FILE exists and is a regular file |
test -r FILE | FILE is readable |
test -s FILE | FILE exists and is non-empty |
test -w FILE | FILE is writable |
test -x FILE | FILE is executable |
test FILE1 -nt FILE2 | FILE1 is newer than FILE2 |
test FILE1 -ot FILE2 | FILE1 is older than FILE2 |
test -z STRING | STRING is empty |
test -n STRING | STRING is not empty |
test STRING1 = STRING2 | the strings are equal |
test STRING1 != STRING2 | the strings are not equal |
test STRING1 < STRING2 | STRING1 comes before STRING2 in dictionary order |
test STRING1 > STRING2 | STRING1 comes after STRING2 in dictionary order |
test -v VAR | the shell variable named VAR is set |
Combining the test for a directory with the if
statement:
$ if test -d subdir; then > echo subdir already exists > else > echo creating subdir > mkdir subdir > fi echo creating subdir $ if test -d subdir; then > echo subdir already exists > else > echo creating subdir > mkdir subdir > fi subdir already exists
Modify your nfiles.sh
script so that it checks each command-line argument.
If the argument is a directory, the script prints the number of files in the directory followed by the argument (as before).
If the argument is not a directory, the script prints '?' and then the argument.
$ ./nfiles.sh . /bin /usrbin /bin/ls 43 . 124 /bin ? /usrbin ? /bin/ls
Hint: instead of using two echo
commands, set a variable (e.g., n
) to either
the number of files in the directory or the value '?'.
At the end of your loop use a single echo
command to print n
and then the argument.
Modify your nfiles.sh
script so that it checks each command-line argument.
If the argument is a directory, the script prints the number of files in the directory followed by the argument (as before).
If the argument is a regular file, the script prints 'F' and then the argument.
If the argument is neither a directory nor a file (e.g., it does not exist) then the script prints '?' followed by the argument.
$ ./nfiles.sh . /bin /usrbin /bin/ls 43 . 124 /bin ? /usrbin F /bin/ls
Hint: the commands in the else
part of your if
statement should include another if
statement that
tests whether the non-directory argument is a regular file (test -f
).
This second if
selects between 'F' for a file or '?' for everything else.
The meanings of the above test
forms can be inverted by placing a !
(“not”) in front of them.
Command | Succeeds if… |
---|---|
test ! EXPR | EXPR fails (is false) |
Combining if
with the test for a directory (-d
) and inverting it (!
) to mean “the directory does not exist”:
if test ! -d subdir; then # subdir does not exist, so... mkdir subdir # make it fi
You can combine two or more test
forms with logical “and” or logical “or”:
Command | Succeeds if… |
---|---|
test EXPR1 -a EXPR2 | both EXPR1 and EXPR2 succeed (are true) |
test EXPR1 -o EXPR2 | either EXPR1 or EXPR2 succeeds (is true) |
To check if your log file exists as a regular file (-f
) and (-a
) is writable (-w
):
if test -f logfile -a -w logfile; then echo logfile is a regular file and is writable fi
Many shells have an alternative version of test
called [
(open square bracket).
Instead of test expression
you can write [ expression ]
which looks quite a lot nicer.
Note that you must put spaces on both sides of the opening “[
” and another before the final “]
”.
$ numFiles=$(ls | wc -l) $ echo $numFiles 43 $ while [ ${#numFiles} -lt 5 ]; do # make numFiles be five characters wide, padded with '0's on the left > numFiles=“0$numFiles” # add a '0' to the left of numFiles > done $ echo $numFiles 00043
Modify your nfiles.sh
script so that it prints the fist item on each line
(the number of files, or an 'F' or a '?') right-justified in a field 5 characters wide.
Use spaces to pad the number (or 'F' or '?') on the left to the required width.
$ ./nfiles.sh . /bin /usrbin /bin/ls 43 . 124 /bin ? /usrbin F /bin/ls
Many commands can be used as the test or condition in a loop or if
statement.
For example, grep
succeeds if it finds a match and fails if it cannot find a match.
if grep -q -s pattern files... ; then echo I found the pattern in the files. else echo The pattern does not occur in the files. fi
(-q
tells grep
not to print any output, and -s
tells grep
not to complain about missing files.)
See Finding information about commands and programs below for different ways
to look for information about success/failure of commands and their other options that help when using them as tests in loops and
if
statements.
Two built-in commands help with infinite loops.
Command | Succeeds |
---|---|
true | always |
false | never |
The following while
loop will never stop.
(If you try it then to make it stop type Control
+C
.)
while true; do echo are you bored yet? sleep 1 done
The following while
loop will stop immediately and never execute the echo
command.
while false; do echo this cannot happen done
One use of true
and false
is to set a flag in a shell script to affect an if
statement later on.
USE_LOGFILE=true # true ⇒ use log file; false ⇒ don't if $USE_LOGFILE; then echo “Running analysis at $(date)” >> logfile.txt fi
You can break out of a while
or for
loop using the break
command.
You can jump back to the test at the start of a while
loop using the continue
command.
Inside a for
loop, the continue
command restarts the loop body with the loop variable set to the next item in the list of items.
$ for i in {1..10}; do > if [ $i -eq 5 ]; then break; fi # break out of the loop if i = 5 > if [ $i -eq 3 ]; then continue; fi # restart the loop if i = 3 > echo $i > done 1 2 4
Modify your nfiles.sh
script so that it uses a flag to remember whether any arguments were non-directories.
If at least one argument was not a directory (it was a regular file, or did not exist) then
print a message at the end of the script saying: Warning: non-directories were encountered
.
$ ./nfiles.sh . /bin 43 . 124 /bin $ ./nfiles.sh . /bin /usrbin /bin/ls 43 . 124 /bin ? /usrbin F /bin/ls Warning: non-directories were encountered
You can terminate a shell script (or your interactive shell session) using exit
.
if test ! -d data; then echo "data directory does not exist: giving up" exit 1 fi
The argument to exit
is optional and should be a number.
0
is success and non-zero is failure.
This allows scripts to control loops and conditionals, as part of their TEST, by returning success or failure from the entire script.
Write a short script called exit0.sh
that immediately uses exit 0
to terminate its own execution.
Write another short script called exit1.sh
that immediately uses exit 1
to terminate its own execution.
Use an if
statement to verify which script 'succeeds' and which script 'fails'.
$ if ./exit0.sh; then echo succeeded; else echo failed; fi $ if ./exit1.sh; then echo succeeded; else echo failed; fi
Which exit
value represents 'success'?
Which exit
value represents 'failure'?
Modify nfiles.sh
so that it succeeds if all arguments were directories and fails if any arguments were non-directories.
Test whether it works using an if
statement on the command line.
$ if ./nfiles.sh . /bin; then echo OK; else echo KO; fi 43 . 124 /bin OK $ if ./nfiles.sh . /bin /usrbin /bin/ls; then echo OK; else echo KO; fi 43 . 124 /bin ? /usrbin F /bin/ls Warning: non-directories were encountered KO
You can save a lot of time by typing the first few characters of a filename and then pressing the Tab
key.
The shell will try to find a file matching what you typed, and then 'complete' the part of the filename that you did not type.
If there is more than one matching file, the shell will complete up to the point where the file names diverge.
If there is only one matching file, the shell will complete the entire filename and than add a space at the end.
$ touch a-file-with-a-very-long-name $ ls a- # press the Tab key $ ls a-file-with-a-very-long-name # the shell completes the name a-file-with-a-very-long-name $ touch a-file-with-an-equally-long-name $ ls a- # press the Tab key to complete the name $ ls a-file-with-a # press Tab again to list the matching files a-file-with-an-equally-long-name a-file-with-a-very-long-name $ ls a-file-with-a # the command line remains in the same state
Programs such as test
(and many others) have a large number of command line options.
Don't bother trying to memorise more than two or three of the most useful options.
Instead, know where to look up information when you need it.
There are several ways to find information about a command, depending on the kind of command it is.
(Note: MobaXterm has its own non-standard help
command that does not work as shown below.)
$ help true true: true Return a successful result. Exit Status: Always succeeds. $ help help help: help [-dms] [pattern …] Display information about builtin commands. Displays brief summaries of builtin commands. If PATTERN is specified, gives detailed help on all commands matching PATTERN, otherwise the list of help topics is printed. Options: -d output short description for each topic -m display usage in pseudo-manpage format -s output only a short usage synopsis for each topic matching PATTERN Arguments: PATTERN Pattern specifiying a help topic Exit Status: Returns success unless PATTERN is not found or an invalid option is given.
Using help
you can find information about the syntax of loops and conditionals, the options understood by
echo
and other commands, and even obtain a list of all the builtin commands by typing help
with no arguments.
Notice the last section, “Exit Status”.
This tells you when the command will 'succeed' and when it will 'fail'.
You can use the command as a TEST in a loop or if
statement to check
its “exit status” and therefore to test for whatever situation affects that status, according to
the description of the command.
Commands that are not builtin to the shell usually have a manual page.
Use man command
to read the manual page describing command.
Use man -k keyword
to see a list of manual pages related to the given keyword.
(Note that the version of man
used by MobaXterm does not provide the -k keyword
option.)
$ man ls LS(1) User Commands LS(1) NAME ls - list directory contents SYNOPSIS ls [OPTION]… [FILE]… DESCRIPTION List information about the FILEs (the current directory by default). Sort entries alphabetically if none of -cftuvSUX nor --sort is speci- fied. -a, --all do not ignore entries starting with . ...etc...
Note that the manual page for a command that can 'succeed' or 'fail'
(and which is therefore useful in loop and if
statement tests)
will almost always include an “Exit Status” section describing what situations you can
test for using the command.
Many programs respond to the option -h
or -help
or --help
by printing brief instructions about how to use that program.
$ cat --help Usage: /bin/cat [OPTION]… [FILE]… Concatenate FILE(s) to standard output. With no FILE, or when FILE is -, read standard input. -A, --show-all equivalent to -vET -b, --number-nonblank number nonempty output lines, overrides -n -e equivalent to -vE -E, --show-ends display $ at end of each line -n, --number number all output lines -s, --squeeze-blank suppress repeated empty output lines -t equivalent to -vT -T, --show-tabs display TAB characters as ^I -u (ignored) -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB --help display this help and exit --version output version information and exit Examples: /bin/cat f - g Output f's contents, then standard input, then g's contents. /bin/cat Copy standard input to standard output.
Commands that are useful as TESTs will generally tell you about their “exit status” too.
For example, on my computer, the output from grep --help
includes the following two lines:
Exit status is 0 if any line is selected, 1 otherwise; if any error occurs and -q is not given, the exit status is 2.
_
is a letter.name=value
.'single quotes'
or "double quotes"
.'single quotes'
neither “$
” expansions nor wildcards work.“double quotes”
variables and other “$
” expansions are performed, but wildcards are ignored.$name
or ${name}
.${name#pattern}
and suffixes with ${name%pattern}
.$1
for the first argument, $2
for the second, and so on.“$@”
to obtain a list of all the arguments (with spaces inside individual arguments preserved).$((expression))
expands to the result of evaluating the given arithmetic expression.$(command)
expands to the result of running command (which can be a pipeline).while
loop performs some commands until a 'test' program or command fails.if
statement conditionally runs some commands based on the success of a 'test' or 'condition' command.test
command implements many kinds of tests and then succeeds or fails in a way useful for while
and if
.[ expression ]
is shorthand for test expression
.if
statement or loop.break
exits from the loop immediately and continue
restarts the loop immediately.exit
terminates the entire script immediately. (Use it to terminate your shell, too.)Tab
to complete a filename. Press it again to see a list of possible completions.help command
, man program
, and program --help
to learn about commands and programs.This week's topic is the Internet: what it is and how it works.
Up to 10 points can be gained towards your final score by completing the in-class quiz on Friday.
This week's preparation is to watch some short videos about networks and the Internet.
Begin by watching four short videos (averaging about 5.5 minutes each) on YouTube. These videos present an easy introduction to the topic. They also have very good captions in Japanese. To turn the captions on, click the “Subtitles/Captions” button at the bottom-right of the video window. Next to that button is a “settings” button (it looks like a cog wheel) where you can change the captions to your preferred language.
1. What is the Internet? (3.5 minutes)
https://www.youtube.com/watch?v=Dxcc6ycZ73M&list=PLzdnOPI1iJNfMRZm5DDxco3UdsFegvuB7&index=1
2. Wires, cables, and WiFi (6.5 minutes)
https://www.youtube.com/watch?v=ZhEf7e4kopM&list=PLzdnOPI1iJNfMRZm5DDxco3UdsFegvuB7&index=2
3. IP addresses and DNS (6.5 minutes)
https://www.youtube.com/watch?v=5o8CwafCxnU&list=PLzdnOPI1iJNfMRZm5DDxco3UdsFegvuB7&index=3
4. Packets, routing, and reliability (6 minutes)
https://www.youtube.com/watch?v=AYdF7b3nMto
If you do not understand some word or phrase then pause the video and use Google, Wikipedia, or even YouTube to find a brief explanation online. If the explanation does not help (has too much or too little information, or is badly written) move rapidly on to a different explanation. If you cannot find any explanation that makes sense, ask on Teams for recommendations.
Deepen your knowledge ideas by watching the following videos. The total time is about 25 minutes. Use the summaries below to make sure you understood the important ideas presented in each video.
1. NICs (3.5 minutes)
https://www.youtube.com/watch?v=oo-tn17rUBo
Network Interface Cards (NICs) connect computers, printers, and other devices to a network. A single device can have multiple connections to one or more networks. NICs can have connectors for electrical “copper” Ethernet wires (RJ45) or light “optical fibre” (SFP, Small Form-factor Pluggable). Every NIC has a unique, hard-coded (cannot be changed) Media Access Control address. This address is used to identify pairs of NICs that are communicating directly with each other on a local area network (LAN).
2. Local area network devices (7 minutes)
https://www.youtube.com/watch?v=1z0ULvg_pW8
A hub connects all devices on a network together. A hub is not intelligent: it copies every packet received to all the other connected devices. This creates unnecessary traffic and wastes bandwidth.
A switch is intelligent: it learns the physical (MAC) addresses of all the devices connected to it. A switch sends received packets only to the intended destination device. Switches reduce unnecessary traffic on the network. Hubs and switches form a local area network (LAN) and communication is always direct between to devices based on MAC addresses.
A router communicates both with a LAN and with another external “wide area” network (WAN), usually the Internet. The Internet uses IP (Internet Protocol) addresses. When a router receives a packet the IP address determines if the packet is meant for the LAN or WAN. If the packet is meant for the LAN it is delivered directly to a local device. If the packet is meant for WAN it is forwarded directly to another router connected to another LAN. Routers are a kind of gateway for each LAN. The Internet is made of many routers connected together. Hubs and switches create networks, routers connect networks together (as an interconnected network = an internet).
3. Breaking data into packets (5 minutes)
https://www.youtube.com/watch?v=oj7A2YDgIWE
Networks are connected together to make the Internet. Billions of devices are connected to the Internet. To get data from one part of the world to another we need to package the data and then send it through many routers to its destination. Before it is transmitted over the network, large data is chopped into smaller pieces called PACKETS. Each is sent individually along with information about how to reconstruct the original data. The destination of a packet is indicated by an IP address. Each router understands how to forward a packet to the next router, one step or “hop” closer to the final destination. When the destination router is reached, the data is sent to the local device that should receive it. The device reassembles the original data which is then presented to the user.
4. Naming and DNS: the Domain Name Service (6 minutes)
https://www.youtube.com/watch?v=mpQZVYPuDGU
Computers are identified by numbers, but humans like to use names instead of numbers. To make communication easier, the domain name system (DNS) resolves (translates) domain names to numbers that are IP addresses. You can use the Internet with IP addresses if you like, but using names is much easier. DNS works like a telephone directory: before calling a remote telephone you look up the number based on the name of the person you want to reach. To turn a domain name into an IP address number, your local “resolver” (usually your router) first asks a root server. Your resolver is then sent to a series of other servers, each one closer to “yahoo.com”. Finally it asks the name server owned by Yahoo about “yahoo.com” which provides an authoritative answer.
E.g., to turn “yahoo.com” into an IP address:
5. Communication protocols: TCP vs UDP (4 minutes)
https://www.youtube.com/watch?v=uwoD5YsGACg
TCP (transmission control protocol) is one of the main protocols used on the Internet. TCP guarantees that all transmitted data is received, in the correct order. First a connection is established between two computers that want to communicate. This “synchronises” the communication (using “SYN” packets) so the computers know what packets have been sent and received. Data is then transferred between the computers. The receiver tells the sender about missing packets, which are re-transmitted. The receiver sorts received packets into their original order, reassembles the data inside them, and delivers it to the local application.
Typical uses of TCP:
UDP is a connectionless protocol. There is no initial connection, no synchronisation between sender and receiver, and therefore no guarantee of reliability. UDP does not care (or even know) if data is lost or if it arrives out of order at the destination. Because of lower overheads (no synchronisation, retransmission, reordering) UDP is faster than TCP.
Typical uses of UDP:
6. Networking tools (4.5 minutes)
https://www.youtube.com/watch?v=vJV-GBZ6PeM
The ping command sends packets from one computer to another requesting a simple “I am here” reply. Use it to detect router errors (“destination host unreachable”), DNS naming errors, etc.
The traceroute (tracert on Windows) shows the route a packet takes through the Internet to reach another machine. Use it to pinpoint where a problem lies if packets do not reach their destination.
Watch these videos if you would like additional information.
1. How does the Internet work? Networks and addresses explained (10 minutes)
An alternative explanation of how the Internet works: https://www.youtube.com/watch?v=82m2du-zgmY
2. The Internet vs. The Web (5 minutes)
A brief history of why and how the Internet was created: https://www.youtube.com/watch?v=CX_HyY3kbZw
3. Network troubleshooting using PING, TRACERT, IPCONFIG, and NSLOOKUP (14.5 minutes)
More networking tools (ifconfig, nslookup): https://www.youtube.com/watch?v=AimCNTzDlVo
Note: The “traceroute” and “ifconfig” commands are called “tracert” and “ipconfig” on Windows. If you have MacOS or Linux, “traceroute” and “ifconfig” work similarly to the commands described in this video.
This week's topic is about mobility: moving your data, and your computation, around the Internet.
Up to 10 points can be gained towards your final score by completing the in-class exercises on Friday.
rsync
.This week's preparation is to watch some short videos about Internet services that help with mobility of data and computation.
What is FTP? | https://m.youtube.com/watch?v=wig1szO7en8 |
Transfer Files via FTP on Windows | https://m.youtube.com/watch?v=58I2YoKJ3dc |
Using SSH and SCP | https://m.youtube.com/watch?v=rm6pewTcSro |
How to make backups using rsync | https://m.youtube.com/watch?v=8d5B1JC-1d4 |
How to use TightVNC | https://www.youtube.com/watch?v=x9xTyh63Tos |
https://www.youtube.com/watch?v=98rQ9J5XE_g (Japanese) |
What is FTP? https://m.youtube.com/watch?v=wig1szO7en8
Summary:
You can practice FTP using a public server providied by speedtest.tele2.net
.
The example shown here uses the Windows FTP client, but it will work the same on Mac or Linux.
The parts typed by the user are shown with a red background.
(Click on the image to see it at the original size.)
ftp: speedtest.tele2.net
anonymous
Enter
(to leave the password blank).ls
or dir
to obtain a listing of the public FTP directory.bin
get 10MB.zip
bye
(Note that these zip files contain junk and there is nothing you can recover from them.)
Most web browsers can connect to and view FTP sites.
ftp://speedtest.tele2.net/
upload
directory to enter it.test.txt
containing a few words of text.speedtest.tele2.net
using your command-line FTP client, as shown above.cd upload
command to change to that directory.bin
command to change to binary mode.put test.txt
command to send your text file to the server.test.txt
in your browser, download the file, and verify it has your few words of text inside it.