Week 01 — Professional communication tools

Up to 10 points can be gained towards your final score.

1. Introduce yourself to the class [2 points]

Log in to MS Teams using your university account and post a short message to the Self Introduction channel of the Information Literacy team. Introduce yourself, your main interest(s), and say which topics of Information Literacy you already have experience with. (Throughout the course, use the Q&A (English) and/or 質疑応答 (日本語) channels to ask any questions you might have about the course or the content of the classes. If you can answer a question posted there, please do not hesitate to do so.)

2. Use e-mail for professional communication [8 points]

One of our teaching assistants (TAs) will send you an e-mail asking you to send a reply with an interesting or funny image attached. Our TAs are official members of the teaching staff for this class and so your reply should follow proper “netiquette”. It should be professional, formal, and polite. Use any reference materials you can to make sure your reply is professional, for example the notes posted on the course web site or articles/blogs about writing e-mail that you can find by searching online. When you think your reply has been properly prepared, send it to our TA.

Using proper netiquette in your e-mail includes

  • using an appropriate subject line, greeting, closing, signature, and “signature block” with your professional details;
  • using proper sentences, grammar, and punctuation
  • quoting the relevant parts of the original messages when you reply to questions; and
  • providing any relevant files as attachments to your reply.

3. Bonus [1 point]

Continue to use the Q&A (English) and/or 質疑応答 (日本語) channels to post and answer questions about the course content. Students who contribute outstanding answers to the Q&A (English) channel might gain a bonus point towards their final score.

Further practice

Team up with one other student from the class (e.g., the student who sits next to you). Send your partner a professional “business” e-mail asking if they would be willing to help you improve your communication skills. Be formal and polite, as you would when seeking a business partnership with someone you do not know personally.

Read the e-mail that you receive from your partner. Think of ways that it could be improved. Write a formal reply that explains how you think they could improve their e-mail to you. When you receive the reply suggesting how to improve your original e-mail, reply one more time formally thanking your partner for their time and kindness.

What you will learn from this class

  • What kinds of professional communications tools are available and how they can be used.
  • How e-mail works and the various parts of e-mail messages.
  • How to use e-mail for effective communication in a professional setting.

Glossary of e-mail terms

Notes

There are many different kinds of communication tools. In an educational or professional environment the most important are collaboration support systems and e-mail.

Texting

Students and younger employees likely use text messaging (texting) for their daily communication. Text messages are brief and usually answered within a few minutes or hours. Because of the limitations of the message format, they are most suited for informal conversation with friends and family. However, texting from personal devices can sometimes be appropriate to alert colleagues to emerging situations such as arriving late for a meeting because of a train delay. Faculty members and managers are likely to prefer e-mail for all professional communication.

E-mail

E-mail is the standard communication tool in professional (academic, industry) life. Its advantages include permanence, searchability, non-invasive delivery, and the ability to compose messages of any length with as much care and consideration as are warranted by the situation. Just as you can send an informal birthday greeting to a friend or a formal request to the head of a company by postal mail, so you can send the same kind of content (with the same levels of formality) by e-mail.

The e-mail paradigm is very close to physical mail: a sender writes a message, a third-party mail delivery service (online rather than postal) delivers the message, and a recipient picks up the message and reads it. E-mail messages have several parts, some of which have names that correspond to the same parts of a postal message. Just as in postal mail, every message must specify the recipient's address (as is always written on the front of postal mail), the sender's address (as is often written in the corner or on the back of postal mail), some content containing the actual message (corresponding to the paper inside the postal envelope), and possibly one or more attachments (sometimes called enclosures in postal mail) which are separate documents sent along with the written message.

Headers and addresses

Every e-mail message contains a header which includes the date, the sender (From:) and recipient (To:) addresses, and the subject of the message. Messages can be delivered to more than one recipient by putting more than one address in the To: line. Messages can also be copied to other people using the Cc: and Bcc: fields. Replying to an e-mail message usually sends the reply to the sender (the From: address in the original message) although this can be changed by setting the Reply-to: field in the header.

Date: the time and date the message was sent
From: the sender's address (becomes the To: address if the message is replied to)
To: the address of the recipient(s) who are expected to contribute actively to the conversation
Subject: the purpose of the message or a one-line summary of the content
Cc: 'carbon copy' address(es), for observers of the conversation or non-active participants
Bcc: 'blind carbon copy' address(es), for observers of the conversation whose names will not be made visible to anyone else
Reply-to: The address that will become the To: address in a reply (instead of the From: address)

E-mail addresses contain two parts separated by an @ symbol. The second part (after the @) is the domain name of the organisation that is responsible for receiving the message. The first part (before the @) is the local name of the person (or department) within that organisation who should receive the message. For example, mail to katsuma.yoshiyuki@kuas.ac.jp will be delivered to a particular organisation (KUAS) and within that organisation a particular person (Mr. Katsuma) will be able to retrieve and read the message. Similarly, e-mail sent to sales@honda.co.jp will be delivered to the sales department within the Honda Motor Company, Ltd.

Message content

Example business e-mail.

Example business letter.

Writing a professional letter on paper means following social conventions and business etiquette. Writing a professional e-mail means following “netiquette” (from Internet + etiquette). Many of the conventions of netiquette are related to making your e-mail easier for the recipient to read. To develop an intuition for netiquette, just ask two simple questions about every part of your e-mail message:

  • What is the purpose of this word/sentence/paragraph (or other part) of the e-mail?
  • How can I maximise its efficiency (make it shorter) and effectiveness (make it convey my message better)?

For example…

Subject: Does it accurately reflect the content of the conversation? (Some people look quickly at just subject lines and immediately delete e-mail that appears irrelevant to them.)
Could the recipient find this conversation again in the future based only on the subject line?
message content Does the recipient really need to know/read this content at this moment in time?
Is the amount of detail just right for the recipient?

An e-mail message can be composed as if it were a postal letter to the same recipient. The same stock phrases, order of items, and levels of formality and politeness, can be carried over from paper letters to e-mails.

Greeting Indicates the name of the person you are talking to. Titles (e.g., “Dr.”) can be used if appropriate. Women are “Ms.” (instead of “Mrs.” or “Miss”) unless you know their preference. Using “M.” leaves the gender unspecified, for situations in which it is unknown.
Dear Sir, Dear Madam, Dear Professor, Dear Mr. Secretary, Dear Dr. Spock, Dear Ms. Jones, Dear Mr. Kite, Dear all, Dear colleagues, etc.
Opening line Brief introduction to the message including references to any prior communication if appropriate. In a reply, a thank-you to the sender for their previous message.
Body Main content written simply, clearly, and concisely.
Closing Thanks the reader for their time, expresses eagerness to receive a reply, etc.
Signature A sign-off including “Regards”, or “Sincerely” if you opened with the person's actual name, or “Faithfully” if you did not use their name, followed by the name of the sender.
Signature block Professional contact details about the sender: job title, postal address, telephone number, etc. Can provide similar information to a “business card”.

Replying

When replying to an e-mail, most mail software automatically inserts a copy of the original message in the reply. This is called quoting the original content. Typically each line of the quoted original is preceded with a > symbol to distinguish it from the reply. Replies can be divided into three styles, according to how this quoted material is used.

top-posting The reply is at the start of the message and the entire original message is left quoted underneath the reply.
bottom‑posting The entire original message is left quoted at the top and the actual reply is added underneath it.
inline reply The quoted original message is carefully edited to extract just the relevant parts, and associated replies are inserts underneath each quoted part.

Efficiency and effectiveness are also important for replies. Top-posting accumulates lots of previous content in reverse order (compared to the chronological order of the conversation) and therefore creates more work for people joining the conversation later on. Bottom-posting accumulates lots of previous content at the top in correct chronological order, but this can force readers to scroll past a lot of history to reach the important part of the message. (The quoted content can be trimmed to minimise the amount of work the reader has to do.)

Inline replies avoid the disadvantages of both top- and bottom-posting. Inline replies are more efficient for the reader, help keep the total size of the message small, and produce better results when searching through e-mail messages for specific information or conversations.

While conducting an e-mail conversation, anything that is no longer needed can be removed, anything that is lacking can be added, and anything that is inefficient or ineffective can be modified.

Subject:   Does it still accurately reflect the content of the conversation?
To: and Cc:   Is the message still going to exactly the right group of people?
reply content   Is the structure of the reply efficient and effective?
  Is replying above the original message more or less effective than replying inline or below it?
  Is the amount of quoted material just right to give the best context for the reply?

Emoticons

Facial expressions and other body language are not available when communicating by e-mail. In less formal professional e-mails, the use of emoticons can help to indicate emotions that would accompany parts of a message delivered face-to-face. Some mail software will convert emoticons into graphical emoji for the reader.

emoticon emoji typical meaning
:-) :-) humor or happiness
:-( :-( sadness or unhappy
:-D :-D very large grin
:-) :-) :-) :-) laughing
:-p :-P sticking out tongue (“so there!”, “I told you so!”)
;-) ;-) winking
:-| :-| disgust
:-/ :-/ puzzled
:-o :-o surprised

Text effects popular with adolescent users, such as aLtErNaTiNg CaPiTaLs iN nOrMaL tExT are very difficult to read and therefore contradict the goal of maximising the effectiveness of communication. Similarly, writing in ALL CAPITAL LETTERS can be interpreted as shouting or yelling; alternatives such as using punctuation for virtual /italics/ or _underlining_ or *boldface* are much gentler on the reader.

Attachments

Documents that are separate from the main e-mail message but sent along with it are called attachments. Attachments are best kept small and limited in number.

Some mail delivery services will delete e-mails having large attachments, without warning or indication. Original photographs can be very large and are often down-sized before sending by e-mail. PDF files that use unnecessary text effects such as drop-shadow can also be very large. (A good solution to that problem is to avoid using unnecessary text effects in documents.)

Some mail services might also reject messages with too many attachments. Programs such as zip let you gather many files into to a single archive for attaching to an e-mail message.

Some mail software reformats message content. If layout such as

+--------------+-------------------+
| tabular      | data              |
+--------------+-------------------+
| written as   | mono-spaced text  |
+--------------+-------------------+

needs to be preserved then a plain text file containing the content can be sent as an attachment, to protect it against reformatting.

E-mail security

E-mail is inherently insecure. Messages are transmitted and stored without encryption, making them relatively easy to intercept and read. Secret information sent 'in private' by e-mail might easily become public knowledge. If sensitive information must be sent by e-mail, one way to protect it is to send it in a password-protected .zip archive.

Accidentally sending e-mail to the wrong recipient, or replying to everyone in a conversation instead of just the original sender, is a common mistake. One way to mitigate that problem is to leave the recipient fields blank and fill them in just before sending the message.

The sender of an e-mail has no control over who might read it. Negative comments made about a person in an e-mail could eventually be seen by that person, causing embarrassment (or even loss of job) for the sender. A recipient address might be mistyped, for example, or one of the intended recipients might decide to forward the message to the person mentioned in the negative comment.

E-mail safety

Cyber-criminals can use e-mail to compromise your computer or your personal information. This includes stealing your financial information to commit fraud. Messages received from unrecognised sender addresses might include attachments that introduce viruses to a computer when opened. Phishing messages are written so that they appear to be sent by a trusted person, such as a bank manager (asking for account details or password “confirmation”), or by an unknown sender seeking a collaboration with apparently huge benefits for the recipient. It is extremely unlikely that such messages are genuine.

A good way to increase the safety of e-mail is to install a spam (junk mail) filter to delete suspicious messages before they are presented for reading, and an anti-virus program that scans attachments for potential threats. Many online web mail services provide these functions for all their users' incoming e-mail. Some (such as gmail) go further by banning outgoing attachments that might contain harmful content.

2020/09/03 15:56

Week 02 — Text processing

Evaluation

Up to 10 points can be gained towards your final score.

MS Word Options: Choose Display Language.

MS Word Options.

1. Fix your MS Word settings

You should already have a copy of MS Word installed on your computer. Start it up (or activate the File menu if it is already running) and click on Options at the bottom of the page. In the Word Options pop-up window, select the Language tab on the left and then under Choose Display Language move English to the top of the list using the up and down arrows. Using Word in English will make it easier to follow the material in this class, and will help you to improve your English faster.

2. Add formatting to a simple text document

Download the example Word file.

Follow the instructions in the file to modify the document in the following ways:

  • change the font of the “body” text
  • apply title and (sub-)heading styles to the title and (sub-)section heading lines
  • add automatic numbering to the headings
  • convert several lines of text into bulleted and numbered lists
  • add sub-items to those lists with addition indentation
  • convert several lines of text into a table
  • add some tabs to the ruler and use them to align some words and numbers
  • insert an image, give it a caption, and make the text flow around it
  • insert special mathematical symbols and an equation
  • insert a hyperlink to an external web page
  • convert some text into a footnote
  • place a citation in the text and then add a table of references to the end of the document
  • add running headers and footers at the top and bottom of the pages, with automatic page numbers
  • add a table of contents
  • mark some words as index entries and then add an index at the end of the document

The end result might look something like this.1

When you are happy with your formatted document, upload it to MS Teams and submit it. (In MS Teams either click on the “Assignments” tab and then the Week 02 assignment, or click on the assignment inside the announcement in the “General” tab. Then “attach your work” to the assignment and click on “turn in”.

Please try to finish the assignment before class. The hard deadline for assignments is 23:59 on the day of class.


1 If you used a document formatting system designed for publication, the end result might look something like this.

What you will learn from this class

  • How to edit text using MS Word.
  • How to apply simple formatting to change the appearance of text.
  • How to make lists and tables of information into a document.
  • How to add footnotes, citations, an index, and a table of contents to a document.

Glossary of word processing terms

Notes

Many kinds of text and document editors exist (as well as almost as many opinions about which ones are the best). Two kinds that you will encounter often as an engineer are text editors, and word processors.

Text editors

Emacs

Text editors manipulate any kind of plain text file using an interface that presents the contents of the file simply and literally. A simple plain text file can contain almost any kind of information, from recipes and shopping or 'to-do' lists to meeting minutes or random thoughts and notes. In more technical settings, plain text files might contain configuration settings or a program source code.

simple text editors
Linux LeafPad (packages available in most distributions via apt, yum, etc.)
MacOS TextEdit (bundled with the OS)
Windows Notepad or Notepad++ (recommended alternative)

Pro-tip

People who spend most of their time editing plain text files (programmers, technical authors, web designers, etc.) might use a much more capable (and complicated) text editor. There are several choices (as well as religious wars fought over which one is the best), for example Emacs, vi, and VS Code, all of which run on the three major operating systems.

Word processors

Microsoft Word

Word processors are programs for desktop publishing: the creation and production of structured, formatted documents such as printed letters, reports, and newsletters. Word processors use a graphical 'what you see is what you get' (WYSIWYG, pronounced “wizzi-wig”) interface where content is edited in a form that resembles its final, printed appearance. They almost always use their own proprietary file formats which make no sense when viewed as plain text files, and editing plain text files is almost always impossible using a word processor.

Half a billion pages of help for MS Word.

The de-facto standard word processor is Microsoft Word, which means that there is a huge amount of on-line help available for both beginners and experts. Almost any question about MS Word can be answered by searching in Google (or similar) for MS word followed by the topic of the question.

MS Word is also a very complicated program and the best way to learn it is to actually use it to create documents of increasing complexity.

Learning how to use search engines to answer questions about MS Word is therefore a vital skill for novice (and advanced) users. The results will also include a variety of different media, including video, tutorials, blog posts, and so on, that cater to different learning styles.

One of the first Google results for ‘ms word help’ is a section on Microsoft's own web site called Word help & learning that includes short tutorials on getting started, inserting text, working with pages and layouts, inserting pictures, and saving and printing documents. Learning the basics of word processing from sites such as this one is excellent preparation for a breadth-first tour of some of the features of Word that engineers and scientist might find the most useful. The following sections present such a tour with reference to the ribbon – the part of the user interface that most people interact with most often.

In the following sections, keyboard shortcuts are shown in side

Home

As the name implies, this is where the simplest and most common editing operations are located.

Clipboard contains cut Control-X, copy Control-C, and several varieties of paste Control-V (depending on whether you want the pasted text to retain its original formatting, adopt the destination formatting, and so on). Clicking once on the format painter and then again in the document copies the format of the text under the insertion point to the text that was clicked on. Double-clicking on the format painter makes it `sticky': multiple targets can be clicked to copy formatting; press Escape to stop format painting. Format can also be copied by typing Control-Shift-C and pasted using Control-Shift-V. Clicking on the little diagonal arrow (in the bottom right-hand corner) opens the clipboard dialogue, which handily lets you paste from a recent history of cut and copied text.

Font contains the tools to change font family (the typeface) and size (measured in points, of which there are approximately 72 per 25.4mm of length on the printed page), followed by buttons that increase font size Control-Shift→, decrease font size Control-Shift-<, change case of text (for all-caps, etc.), and clear all formatting from it. On the second line are toggles for boldface Control-B, italics Control-I, underline [Control-U], strikeout, subscript Control-=, and superscript Control-Shift-+. The “text effects” button comes next (and is best ignored – trust me followed by two buttons for text highlight colour (the background colour for text) and font colour (the foreground colour).

Paragraph contains the tools for bulleted lists, numbered lists, and multi-level numbering. The next two buttons decrease indent and increase indent of the selected text. The last two buttons on the top row will sort the selected text lines into alphabetical order or toggle the display of paragraph marks and other typesetting annotations in the text. On the lower line are buttons that tell the selected text to align left Control-L, centre Control-E, align right Control-R, to justify Control-J. Omitting the next button (which you should also ignore) we have a tool controlling line spacing and then two buttons that affect the shading (background) and border (edges) of the selected table cell or text.

Styles contains collections of format that can be clicked to apply them to text. The formatting of the text at the insertion point can also be copied into a style by right clicking on the associated button and selecting “Update to Match Selection”. Clicking on the little diagonal arrow (in the bottom right-hand corner) [Alt-Control-Shift-S] opens a very handy “styles chooser” dialogue that can remain open during other editing operations.

Editing contains the tools to find Control-F and replace Control-H text.

Insert

No prizes for guessing what is in this tab.

Pages contains tools to insert a front cover page, a new blank page, or a forced page break Control-Return.

Tables contains almost everything you need to create and edit a table.

Illustrations has tools to insert images and graphical objects of several kinds, including external pictures.

Links creates, modifies, or removes hyperlinks from text.

Header & Footer contains drop-down menus to control the running header, footer, and page numbering applied to all pages in the document.

Symbols has tools to insert mathematical equations or single mathematical symbols into the text.

Layout

Page Setup contains tools that control the entire page, including its overall size and the number of text columns.

References

Everything to do with referring to a part of the document from some other, faraway part.

Table of Contents has a button to create the table of contents and another to update table which is useful whenever heading numbering changes.

Footnotes has the insert footnote tool which places the footnote marker at the current insertion point and then prompts for the content of the footnote text.

Citations & Bibliography has the tool to insert citation at the current insertion point which then prompts for the information about a new reference source or the identity of an old reference source that was already entered. The manage sources tool allows editing of reference source details. The style menu controls how the citations and references will be presented, and bibliography inserts the list of references at the insertion point.

Captions adds caption text to a figure via the insert caption tool which prompts for the text of the caption.

Index has the mark entry tool which will include the currently selected text as an index term. Again, a pop-up dialog (which can, very usefully, persist) allows control over the presentation of the index entry. The insert index tool does exactly what it says, at the insertion point.

Review

Useful tools for collaboration and finding out who to blame.

Tracking has several tools to track changes made to the document content, and to control how the tracked changes are presented.

Help

Is there if you need it.

The most important tool here is actually present in every tab. search Alt-Q (also known as tell me) searches all the tools for some specific text and presents the results in a list where they can be directly clicked on. It's the Word equivalent of the Windows 10 Window-S key that that opens the “Type here to search” feature.

Ruler

Bordering the page on the left and top are the rulers.

Hovering over a transition from white to grey within either ruler will convert the cursor into a “slide” icon. Clicking and dragging the transition will then change the page margins.

The white blobs in the horizontal ruler control where the elements of lists (bullet or number, text of the item) are placed. If items with several lines of text are not lining up properly after the first line, move these blobs around to fix that. (Typing spaces into the text to try to align things will never look right and is an immediate indication that the author was clueless.)

Double-clicking inside the ruler opens a handy page setup menu which allows much finer control over page and margin dimensions.

The small grey icons visible in the screen-shot at 45, 55, 65, and 75 mm are tab stops. The tab stops become active whenever text contains Tab characters. Each paragraph has its own set of tab stops.

From left-to-right the stops in the image are:

  • a left tab, which fixes the position of the left edge of text following a Tab character;
  • a right tab, which fixes the position of the right edge of text following a Tab character;
  • a centre tab, which fixes the position of the centre of the text between the preceding and followingTab characters; and
  • a decimal tab, which fixes the positions of decimal points in numbers that follow a Tab character.

Clicking on the small icon in the top-left corner cycles it through all the available tab stop types. The kind of tab shown by the icon is inserted into the ruler by double-clicking the ruler's lower edge. This also opens a handy editor dialog to change the positions and types of each tab stop in the ruler.

At the bottom-right of the page is a handy control for zooming in and out.

2020/09/03 16:03

Week 03 — Presentations

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

MS Word Options: Choose Display Language.

MS Word Options.

1. Fix your MS PowerPoint settings

You should already have a copy of MS PowerPoint (PPT) installed on your computer. Start it up (or activate the File menu if it is already running) and click on Options at the bottom of the page. In the PowerPoint Options pop-up window, select the Language tab on the left and then under Choose Display Language move English to the top of the list using the up and down arrows. Using PPT in English will make it easier to follow the material in this class, and will help you to improve your English faster.

2. Complete the self-preparation assignment at home before next class

Starting with a blank presentation, reproduce the document shown in the following videos. (These videos are also embedded at the end of this page, in case you prefer to watch them without leaving your browser.) The versions on the right labeled “eng” have burned-in English captions, while “jpn” have burned-in Japanese captions that were auto-translated (probably very badly).

Substitute your own media – images, videos, etc. – for those shown in the sample document. I recommend you make ample use of the 'pause' button and follow along one step at a time.

You can download the actual sample document featured in the videos, if you think that would be helpful: 03-powerpoint_examples.pptx

3. Check your proficiency with PPT using the self-assessment questionnaire

  1. Answer each question in the self-assessment questionnaire as honestly as you can. (Do not press 'submit' when you are finished.)
  2. Check your scores.
  3. Revise those topics having the lowest scores.
  4. When you have made progress with understanding or skill, update your scores.
  5. Repeat from step 2 until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should know how to

  • create a new presentation with a proper title slide
  • create normal sentences, bulleted lists, and numbered lists
  • format words for emphasis (bold, italic, underline, special font, unusual size)
  • use text boxes
  • use different layouts
  • create and fill in tables, charts, and smart art
  • insert images and videos
  • make internal and external links
  • automatically number the slides

You can also use Internet search engines to find online tutorials and other educational materials relating to PPT, or even check our library to see if they have a book on the subject. In other words, use any resources you can to achieve your learning goals. At the same time, exercise judgement about good and bad advice you find online especially when it comes to choosing and organising the content of your slides. (You, and only you, know what content is needed and how best to present it. If you are unsure about choosing and presenting content, the best way to learn how to do that is by making lots of presentations and finding out for yourself what works and what does not work.)

What you will learn from this class

  • How to create a basic presentation using PPT.
  • How to make lists, tables, and charts to present information.
  • How to use different layouts to organise content on the slide.
  • How to indicate relationships visually using smart art and shapes.
  • How to include media such as images and videos.
  • How to put a running footer on your slides indicating slide number, date, your name, etc.

Further reading

Edward Tufte is a famous proponent of simplicity, clarity, and good design in printed media. His essay The Cognitive Style of PowerPoint is well worth reading and thinking about.1 While not everyone agrees with all of his points, his extreme position is a welcome counterbalance to the mass of bad advice that you will find elsewhere about how to construct an 'amazing' presentation in PPT. His essay offers few solutions to PowerPoint's problems, but a critical response by Jean-Luc Doumont entitled Slides Are Not All Evil does offer some practical advice (starting on page 68).

Remember also that PowerPoint slides are not the only way to create an effective presentation. For making traditional slides, several LaTeX packages are available. There are also web frameworks such as reveal.js with which you can make presentations that would be impossible in PPT. Breaking out of the 'slide' format entirely, I have seen an effective presentation made using a single (long) web page that the presenter simply scrolled through during his talk. Pick the right tool for the job.


1 This document was scanned to PDF with optical character recognition that failed in a handful of places. For example, in the fourth sentence, “2ist century” should be “21st century”.

Notes

PowerPoint is a program for preparing presentation support 'slides' that mimic the transparencies that were once used with overhead projectors. It is the de facto standard program for preparing slides used in business meetings.

An estimated 500,000,000 people use it regularly which means there is a huge amount of online help available for it. One of the first Google results for ‘powerpoint help’ is a section on Microsoft's own web site called PowerPoint help & learning that includes short tutorials on getting started, collaborating, design, animations pictures and charts, giving presentations, and slides and text.

The PPT user interface is quite similar to Word. The following sections present a breadth-first tour of PPT's ribbon – the part of the user interface that most people interact with most often. Since many of the tool groups are identical to MS Word, we will concentrate here on those aspects that are different.

Home

The Slides group adds common operations performed on slides. The Drawing group gives quick access to some of the most commonly functions available in the tabs relating to shapes and images.

Insert

New for PPT are groups for Illustrations and Media.

Draw

Several groups relating to hand-drawn content. Maybe the most useful is the ability to highlight text in a way that looks hand-drawn.

Design

Many distracting Themes are available, taking up most of the ribbon space. Much more useful is the Customise group which contains the Slide Size tool, essential for making a poster or other non-presentation media.

Transitions

If you need them for a rare special effect, here they are along with sounds that can play when changing slides too. Of most use is the Timing group which provides the ability to automatically advance through a range of slides based on time delay instead of clicking a mouse or pressing a key.

Slide Show

Start Slide Show contains several tools for presenting in different ways. Some of the online presentation possibilities are powerful when used in conjunction with MS Teams. For example, the ability to show the presenter view on your screen while showing the actual slide as your shared 'screen' to other participants in the meeting.

Set up has tools to help perfect the timing of a presentation, as well as to record a video of yourself presenting the slides.

Monitors is where you will find the options relating to multiple monitors, which includes the projector which the computer considers as an external monitor. Use Presenter View does exactly what it says on the label, showing the audience only the current slide while you see the current slides, presenter's notes, a countdown timer, and a preview of the upcoming slide.

View

Presentation Views includes the Normal view which is the default and the one most people spend most time looking at, along with three others that are useful. Slide Sorter is a thumbnail view that uses the entire width of the window. Notes View is where you can edit the presenter's notes that will be shown only to you when using presenter mode. Reading View lets you see and interact with the slide as you will when it is actually presented. Essential for testing animations, etc.

Master Views is where you switch from editing the normal slide content to editing the Slide Master layouts that dictate how each 'empty' slide is initially set up. If you need to permanently move things around (e.g., to make the title space smaller) or introduce entirely new layouts, do it here on the master slides. (Note that switching to the master slide opens a hidden tab in the ribbon, but the tools it contains should now be familiar to you.) Modifying the layout of printed handouts or 'slides plus notes' is done from here too.

Practical advice for giving and preparing presentations

Incompatibilities happen. Some projectors don't like some computers. Some PPT files prepared on one OS do not play in PPT on another OS. In other words, there is no guarantee that your presentation will display at the venue. One way to insure against this is to take a PDF version of the slides, which should display properly from almost anyone's computer. (PDFs do not, in general, display animations. Using PDF as an insurance policy therefore has the additional benefit of discouraging the use of animations in the original slides.)

In the worst case nothing will display at the venue (or the projector will explode, or their will be a power cut, etc.). How well do you know your presentation and material? Could you present the entire talk without using any slides at all?

Printing some thumbnails of the slides can be a handy reference during a talk, both to know what is coming up and as a map to get to a specific slide quickly if someone asks a question.

Simple fonts are more legible than fancier fonts. (Highly decorated fonts, or those that simulate handwriting, have no place in a presentation.)

Not all fonts are available on all computers. Sticking to common fonts (Arial, Times New Roman) almost guarantees a presentation will look the same to everyone.

Contrast aids legibility and therefore the efficiency of communication and information transfer. Black and white have the best contrast of any pair of colours. Other pairs of colours can have good contrast, if chosen with great care.

Colours on a computer monitor are different to the colours produced by a projector. Checking the legibility of coloured information well before a presentation can avoid embarrassment during it. (I once met a projector that refused to admit the existence of 100% saturated green. I spent quite a few minutes of that talk helping the audience to imagine the missing parts of my diagrams. I have avoided pure green in my presentations ever since.)

Videos

These 14 videos cover all the essentials of PowerPoint (assuming you already know how to use MS Word) while not diving deeply into any one topic. Use them to understand what features are there, and then explore the full capabilities of the interesting or useful features in more depth on your own.

The sample document created in these videos is available here: 03-powerpoint_examples.pptx

I made 14 × 1.5-minute videos (one per topic) instead of 2 × 10-minute videos (divided in half arbitrarily) or 1 × 20-minute video. I thought that would make the content easier to navigate. However, if you prefer fewer (but longer) videos then please tell me.

2020/09/03 16:03

Week 04 — Number processing

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

1. Complete the self-preparation assignment at home before next class

Watch at least videos 1 to 11 (inclusive). If any of the topics are not familiar to you, open a blank Excel workbook and try to reproduce the examples shown in the videos for yourself. (These videos are also embedded at the end of this page, in case you prefer to watch them without leaving your browser.)

IL-04-01   data entry and basic editing  
IL-04-02 rows and columns
IL-04-03 autofill
IL-04-04 tables
IL-04-05 formulas, cell references
IL-04-06 ranges, functions
IL-04-07 conditional functions/formatting
IL-04-08 filtering
IL-04-09 visualisation
IL-04-10 freezing rows and columns
IL-04-11 import and export, CSV
IL-04-13 extended example

2. Check your proficiency with Excel using the self-assessment questionnaire

  1. Answer each question in the self-assessment questionnaire as honestly as you can. (Do not press 'submit' when you are finished.)
  2. Check your scores.
  3. Revise those topics having the lowest scores.
  4. When you have made progress with understanding or skill, update your scores.
  5. Repeat from step 2 until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should know how to

  • enter data and perform basic editing and formatting of cell content
  • change the data type of a row or column
  • use conditional formatting to change the appearance of a cell
  • create charts
  • use functions such as SUM() and conditional functions such as SUMIF()
  • identify when an absolute reference is necessary, and use one
  • import CSV files

You can also use Internet search engines to find online tutorials and other educational materials relating to PPT, or even check our library to see if they have a book on the subject. In other words, use any resources you can to achieve your learning goals.

What you will learn from this class

  • how to create spreadsheets and use them to manage numeric data
  • how to quickly enter series of data such as dates
  • how to perform computations with the data in a spreadsheet
  • how to visualise the data in the spreadsheet
  • how to sort and filter the data in a spreadsheet
  • how to manage scrolling in a large spreadsheet
  • how to import and export numeric data
  • how to export a chart from a spreadsheet to another application
  • how to perform an engineering simulation using a spreadsheet

Notes

Excel is a program for creating tables of data, and performing computation and analysis on that data. It is based on the very old idea of a spreadsheet, which was a large piece of paper used by accountants. Known data was entered into spaces on the spreadsheet and then calculations were performed to calculate new values. The process continued for as many iterations as was required to calculate the final, useful information.

Excel's principles are exactly the same. You enter the data you know into cells, and use formulas to compute data you don't know in other cells. Excel takes care of figuring out the order in which the computations should be performed and the unknown data generated.

A B C
1 A1 B1 C1
2 A2 B2 C2
3 A3 B3 C3

Excel calls its spreadsheets workbooks. A workbook consists of a number of rows and columns. At the intersection of every row and column there is a cell that can contain data. The cells inside a spreadsheet are therefore laid out in a square pattern. The columns are given letters and the rows are given numbers. Every cell therefore has a 'name' or 'coordinate' defining where it is located; the official term is reference. The first column is called A and the first row is called 1, so the top-left cell in the spreadsheet has the reference A1. The cell to its right is B1 and the cell below it is A2.

How many cells are in a spreadsheet? 17,179,869,184 arranged in 1,048,576 rows of 16,384 columns. Excel is very good at hiding the empty ones from you and so you'll never even see them unless you go looking.

What happens after the column names run out of letters? Like in a cinema, after column Z come columns AA, AB, and AC. After column AZ come columns BA, BB, and BC. After ZZ come columns AAA, AAB, and AAC. (There are not enough columns to ever reach ZZZ.)

The interface

The UI should be very recognisable to anyone having experience with MS Word and PowerPoint. Excel has the same search feature as PowerPoint and Word, so it is easy to look up tools by name or description. I shall dare even to not reproduce its ribbons here.

The least familiar part of Excel might be the way references work. I shall therefore use the space to explain them instead of describing pretty pictures of the user interface.

References

References work like map coordinates, with a letter for horizontal position and a number for vertical position. They come in two types: relative and absolute.

Relative references

Relative references are what most people (and almost all beginners) use almost all of the time. One or more letters (naming a column) and one or more digits (naming a row) make up a reference. Even though they are called “relative”, they still identify a cell by its absolute position in the array of cells. So what makes them relative?

The relativity comes from their behaviour when they are used in a formula inside a cell. Formulae can move, either because they are copied and pasted or because rows and cells are inserted or deleted. When a formula moves, Excel looks at the relative positions of (distance between) the original position and the new position. The difference is added to the column letter and row number in the reference. The effect is that the name of the referenced cell changes, so that the formula continues to reference a value stored at the same position relative to wherever the formula happens to be.

Copying the formula B3+D3 and pasting it two rows below the original position causes the references within it to change to B5+D5. Moving that new formula one column to the right causes the references within it to change to C5+E5. This is bad when many formulae need to refer to a single cell, such as an interest rate, no matter where they may be moved.

Absolute references

By placing a $ in front of any letter or digit in a reference, it becomes absolute. This does not change how it refers to a cell, only how it behaves when the formula that it is part of is moved. In the case of absolute parts of references, they do not have the “distance” between the original and new position of the formula added to them. No matter where the formula is moved to, the absolute parts of the reference will always remain the same.

Take our formula containing B3+D3 and change it to $B$3+D3, then perform the same two moves on it. Moving it down two rows changes it to $B$3+D5, and moving that new formula right one column changes it to $B$3+E5. The second relative cell referenced has moved to remain in the same relative position as the formula, whereas the first absolute reference has not.

Change our formula to $B3+D$3 and perform the same two moves on it. Moving it down two rows changes it to $B5+D$3, and moving that new formula right one column changes it to $B5+E$3.

Videos

These 12 videos cover most of the essentials of Excel (assuming you already know how to use MS Word and PowerPoint). Use them to understand what features are there, and then explore the full capabilities of the interesting or useful features in more depth on your own.

02. Rows and columns
03. Autofill
05. Formulas, references
06. Ranges, functions
07. Conditional functions, formatting
08. Filtering
09. Visualisation
10. Freezing rows and columns
11. Import and export, CSV
13. Extended example
2020/09/03 16:03

Week 05 — File systems

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

1. Complete the self-preparation assignment at home before next class

This week's self-preparation assignment is part practical preparation and part study.

First, install a command-line environment on your computer that you can use to complete the next few weeks of this course. (Linux and Mac users already have a suitable command-line environment; there is nothing to do. Windows users have several options; please follow that link and install one of the options on your laptop computer.)

Second, review the notes on this page before coming to class. Use Internet search engines to find online tutorials, Wikipedia articles, etc., for any additional information (or explanation) that you might need.

2. Check your understanding of file system concepts using the self-assessment questionnaire

  1. Answer each question in the self-assessment questionnaire as honestly as you can. (Do not press 'submit' when you are finished.)
  2. Check your scores.
  3. Revise those topics having the lowest scores.
  4. When you have made progress with understanding or skill, update your scores.
  5. Repeat from step 2 until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should understand the topics outlined in the next section and further explained in the “Notes” section.

What you will learn from this class

  • How to use the SI, JEDEC, and IEC prefixes associated with file size and storage capacity.
  • The purpose of the file system within a computer system.
  • How data is stored physically on a disk.
  • The parameters that can affect the efficiency of data storage on a disk.
  • How the OS manages the allocation of physical storage to files and directories.
  • How files and directories are logically organised on the disk.

Notes

Filesystems

A filesystem (or file system) manages the storage of data on a device (such as a solid-state disk, hard disk drive, or USB flash memory drive). Many hundreds of different filesystems exist, each one providing different tradeoffs between speed, reliability, safety, security, etc. Almost all of them manage data by splitting it into files (sequences of bytes) placed within a hierarchy of directories (which map symbolic names to individual files). In most situations a filesystem refers to a single logical repository of files residing on a single physical medium (e.g., SSD or HDD).

Storage sizes

Sizes of files (and filesystems, and the physical media on which they exist) are measured in bytes. One byte contains eight bits, where a bit is a single binary digit (0 or 1). A bit is the smallest unit of information possible. A byte contains eight bits because that is sufficient to store a single character in English. For English and many European languages, a single character in a document will be represented as a single byte. Asian characters are typically larger. In Japanese, common characters are represented using two bytes but uncommon characters can require up to four bytes of storage.

storage sizes (traditional units)
unit name size (decimal SI1 units) size (binary JEDEC2 units)
1 kB kilobyte 103 = 1,000 bytes 210 = 1,024 bytes
1 MB megabyte 106 = 1,000,000 bytes 220 = 1,048,576 bytes
1 GB gigabyte 109 = 1,000,000,000 bytes 230 = 1,073,741,824 bytes
1 TB terabyte 1012 = 1,000,000,000,000 bytes 240 = 1,099,511,627,776 bytes


1 Système international, otherwise known as the metric system
2 Joint Electron Device Engineering Council

storage sizes (IEC binary prefixes)
unit name size
1 KiB kibibyte 210 = 1,024 bytes
1 MiB mebibyte 220 = 1,048,576 bytes
1 GiB gibibyte 230 = 1,073,741,824 bytes
1 TiB tebibyte 240 = 1,099,511,627,776 bytes

A byte is too small a quantity to be useful when discussing an entire filesystem or disk. SI prefixes are often applied to storage sizes, as shown in the table on the right. Permanent storage (such as disk drives) usually use the SI (metric) prefixes, which increase by factors of 1000. Volatile storage (such as computer memory) usually use the JEDEC (binary) prefixes, which increase by factors of 1024.

Available disk space reported by the OS might use either system leading to discrepancies between advertised (by the manufacturer) and actual (available to the computer user) storage space. For example, a drive advertised in SI units as containing 1 TB of space would be reported as providing only 931 MB of space by an OS using JEDEC units. This can be a cause for confusion, or even legal action.

A new set of binary prefixes was standardised by the IEC, corresponding to the JEDEC binary sizes but using a different spelling and (somewhat silly-sounding) names. The prefixes are formed from the SI/JEDEC ones by inserting an 'i' between the multiplier (k, M, G, etc.) and the unit (B). Their names are formed from the first two letters of the corresponding SI/JEDEC prefix (ki, me, gi, te, etc.) and the first two letters of the word 'binary' (bi); hence kibibyte, mebibyte, and so on.

The 'top' utility uses IEC units

Linux and recent versions of MacOS have adopted these prefixes for displaying information that uses 1024-based scales (such as memory statistics). Windows and the mass media have so far ignored them.

Filesystems

Filesystems are used to store:

  • User documents and data
    Documents can be application-specific files (.doc, .xls, etc.) or plain text files.
  • Applications
    Applications (Word, PowerPoint, your e-mail reader, etc.) are stored in files containing programs that the computer can execute. On Windows, look in the directories under 'C:\Program Files' to find lots of applications. On MacOS, look in '/Applications'. On Linux, look in '/usr/bin'.
  • The OS and its utility programs
    The operating system is really just another application, but is run automatically when the computer is booted.
  • Virtual memory
    When running many applications the required memory size can exceed the available physical memory size. The OS deals with this by temporarily moving unused parts of applications from memory onto the disk, and reloading them when they are required again. On Windows this disk space comes from a file called 'C:\pagefile.sys'. On MacOS look in '/private/var/vm'. On Linux you probably have part of the physical disk dedicated to this task, using its own filesystem that is independent of OS and user files.

Filesystem organisation

Hierarchy

No matter what filesystem is being used, the application sees a very simple model of storage:

  • Documents are stored as sequences of bytes in named files.
  • Files are organised within directories, which are mappings from file names to file contents.
  • Directories can also be placed inside directories, leading to a tree structure.

A directory maps names onto files. In the example on the right, 'Documents' is a directory containing two entries. One entry points to a regular file called 'local' (containing a document represented as a sequence of bytes). The other entry points to another directory called 'MobaXterm'. Using a 'family tree' metaphor, 'Documents' is called the parent directory of 'MobaXterm'.

The root directory, which is at the top of the tree structure, has no name. (File and directory names are stored in their parent directory. Since the root directory has no parent directory, there is simply nowhere to store a name for it.)

To specify a particular file or directory, start at the root and describe the path that must be followed to find that file or directory. For example, the 'Documents' directory has the following path:

  • root directory
  • 'Users' directory
  • 'piumarta' directory
  • 'Documents' directory

Explorer

Path

Each element in the path is separated by a “/” character (or “" on Windows). The root directory has no name so we start with an empty name, then the separator, then 'Users', another separator, and so on. The final path is therefore: '/Users/piumarta/Documents' (or '\Users\piumarta\Documents' on Windows).

On most computers there is only one root directory. For historical reasons Windows is an exception and has one root directory per filesystem, or 'volume' in Windows terminology. Each volume is named by a single letter followed by a colon, in this case 'C:'. Usually the volume name is prepended to paths (i.e., added at the beginning of the path). The correct path to the 'Documents' directory in Windows would therefore be: 'C:\Users\piumarta\Documents'.

Windows explorer shows you the path to the current directory above the list of files. If you click in it you will see it written in the notation shown above. You can also type into the location bar, or copy/paste from/to it, using the same notation.

Special directory entries

Every directory contains two special entries whose names are '.' and '..'. The name '.' points to the inode of the directory itself (so the path '/Users/././././.' refers still to the '/Users' directory). The name '..' points to the inode of the parent directory (so the path '/Users/Administrator/../piumarta/.' refers to my account's directory). The only exception is the root directory, which has no parent, and so for it the name '..' points back to the root directory again.

Allocating storage for files and directories

When you open a folder in Mac Finder or Windows Explorer, or type 'ls' in a command line window, you are looking at a list of directory entries. Every entry in the directory has a unique (file or directory) name and associates that name with some storage on the disk where the contents of the file or directory are stored. The structure describing where the contents are stored is called an index node, or inode for short. Inodes are not stored in the directories, but in a separate table on the disk. There is exactly one inode specifying where the contents for any given file or directory are stored on the disk. However, more than one name can be associated with a given inode (and therefore file) by having more than one directory entry specify the same inode as the location of the file's contents.

Directories, inodes, and blocks

An inode contains all the information relating to a file's or directory's contents, including:

  • type: regular, directory, symbolic link, special
  • link count: how many directory entries point to this inode
  • size: total number of bytes in the file that the inode describes
  • meta data: information about the file/directory itself
    • permissions: who can access the content to read, write, execute it
    • ownership: who created it
    • timestamps: when the content was created, last modified, last accessed

The inode also contains a list of blocks on the disk where the contents of the file/directory are actually stored. Each block has a fixed sized, typically 4096 bytes, and is identified by a number (unique within the filesystem) that can easily be converted into the physical location on the disk where the block is stored.

(Windows, just to be difficult, calls its inodes 'MFT records' where MFT stands for 'master file table'.)

Physical storage of blocks on the disk

Sectors and blocks

Hard disk drive

Hard disk drives (HDD) store data magnetically on the surface of spinning metal disks. The data is read and written by tiny heads that move between the edge and centre of the disks. It should now be obvious why they are called disk drives.

Each surface on the metal disks is divided into concentric tracks. Each track is divided into a number of sectors where the data is actually stored. The size of each sector is fixed by the disk drive manufacturer and cannot be changed.

A disk drive, whether solid-state or rotating, stores information in a fixed number of fixed-sized sectors. A HDD sector is typically 512 bytes long, and so a 1TB hard disk would contain 2,147,483,648 sectors of data. A sector is the smallest unit of data that can be transferred to or from the disk.

While sectors can be addressed directly, most filesystems do not do that for reasons of efficiency. Instead they combine several sectors into a block and treat that as the smallest unit of data when managing space on the disk. A typical block size is 8 sectors, or 4096 bytes. (Of course, Windows has to be different to everyone else and uses the terms 'allocation unit' or 'cluster'.)

Each block has a unique number, and data is always read or written to the disk in multiples of the block size.

Internal fragmentation

Why am I bothering to tell you about block (allocation unit size) size? Efficiency.

Before you can store information on a disk you have to format it. Formatting writes all the data structures that are needed to describe an empty filesystem onto the disk. It puts in place the framework into which you can start creating new files and directories.

While you are formatting a disk you will likely be given the choice of what block (allocation unit) size to use. The default on many filesystems (including Windows, surprisingly) is 4096 bytes. You can almost certainly increase or decrease this size (by factors of a power of 2).

The smallest unit of information that can be allocated is one block. A one-byte file therefore consumes an entire block for its contents, no matter what block size is chosen. The rest of the block is wasted. This is called internal fragmentation of the storage and there is nothing you can do about it (other than reduce the block size).

On the other hand, a huge file will contain very many blocks of data and will therefore have a very large block list in its inode. This wastes space in the inode and potentially makes accessing the file less efficient, because the contents of the file are distributed over many different blocks located far apart on the surface of the disk. (This is what most people mean when they say 'fragmentation' in the context of disk storage. It is also the problem that is solved by the 'disk defragmenter' tool in Windows which attempts to rearrange the blocks belonging to each file to keep them closer together on the disk, ideally in a single contiguous sequence of blocks.)

If you expect your disk to contain mostly very small files (common in big data analytics) then a small block (allocation unit) size will perform better. If you expect your disk to contain mostly very large files (common in audio/video media editing) then a large block (allocation unit) size will perform better. If you have no idea what to expect then the default block (allocation unit) size is probably going to be fine.

File abstraction

In your programs you will use functions such as open(), read(), write(), and close() to access and modify the contents of files. Those functions directly manipulate the directories, inodes, and disk blocks described above. The filesystem's job is to make sure that happens safely and efficiently, and that you never notice all of the underlying complexity.

Resources and further reading

Installing and configuring MobaXterm: https://projects.ncsu.edu/hpc/Documents/mobaxterm.php
In-depth explanation of disks, filesystems, and network storage: https://www.netmeister.org/book/04-file-systems.pdf
Wikipedia's entry on path names: https://en.wikipedia.org/wiki/Path_(computing)
Microsoft's explanation of Windows path names: https://docs.microsoft.com/en-us/dotnet/standard/io/file-path-formats

2020/09/03 16:03

Week 06 — Command line

You do not rise to the level of your goals, you fall to the level of your systems. James Clear

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

1. Complete the self-preparation assignment at home before next class

This week's self-preparation assignment is part practical preparation and part study.

First, read the following sections from the Command line interface guide:

Second, complete the practical exercise in the Notes section below. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.)

2. Check your understanding of file command line concepts using the self-assessment questionnaire

  1. Answer each question in the self-assessment questionnaire as honestly as you can.
  2. Revise the topics having the lowest scores, update your scores.
  3. Repeat the previous step until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.

What you will learn from this class

You will learn what the command line (shell) is, why you should know how to use it, and understand how to

  • enter commands (and understand how the shell interprets what you entered),
  • use absolute and relative paths,
  • navigate around the filesystem of your computer,
  • see what files and directories are present and how to display their contents,
  • specify the location files and directories,
  • create, copy, and delete files and directories,
  • find files and directories by type, name, pattern, or content,
  • search files for specific content, and
  • combine existing commands to do new, powerful things.

Notes

Why learn to use command line?

Using the command line puts you in control at the level of the operating system and other fundamental processes that make it work. Many operations and options that are not accessible using a graphical interface (Windows Explorer, Mac Finder, etc.) become accessible to you on the command line.

Developers, engineers, scientists, and researchers all use the command line to make themselves faster and more effective (and happier) than would be using only graphical interfaces.

What is the command line anyway?

The rest of this document leads you through a practical exploration of basic command line features. Things you should type are shown in a grey box like this. Keys you should type include Enter or Return (don't type the word, just press the key), and Control-C which means “hold down the Control key while typing the character C.

Exploring the command line

To follow all the steps you will need a text editor. If you do not already have an editor then one possibility is nano. It is installed by default on Mac and many Linux distributions. In MobaXterm you can install it by typing apt-get install nano on the command line (while connected to the Internet, since it has to be downloaded before it is installed).

Prompt, commands, pressing ''Enter'', terminating programs

  1. Start the shell, wait for it to print the prompt, then type ls and wait.
  2. When you get bored, press Enter (or Return if you have it).

Remark: The shell will wait (literally) forever for you to press Enter. If the computer is not responding, did you simply forget to press Enter?

From now on I will assume you press Enter after every command you type.

  1. Type cat (followed by Enter).
  2. When you get bored, press Control-C.

Remark: If you give no arguments to some programs then they use your keyboard for input. If the computer is not responding, did you forget to tell a program which file to read from?

How the shell parses what you type

  1. Type echo.
  2. Type echo hello world.
  3. Type echo hello world .
  4. Type echo hello world and press the cursor-left key until the cursor is in the middle of the line before pressing Enter.

Remark: White space is used to break the line into a command part followed by zero or more argument parts. Once the line is broken into parts the white space is discarded. It does not matter how much white space you use, or even where the cursor is positioned in the line when you press Enter.

Directories

  1. Type echo ~ (this is the path name of your home directory)
  2. Type cd (this changes your current directory to your home directory)
  3. Type pwd (this shows you where you are; check you actually are in your home directory)
  4. Type ls (this will show you the details of the files and directories in your home directory)
  5. Type ls /home/<your-username> or ls /Users/<your-username> (e.g., ls /home/piumarta – this also shows you your home directory)
  6. Type ls ~ (of course, this is another way to list your home directory)
  7. Type ls . (a single dot is another name for “this directory”, which is either your home or the last directory you changed to using cd)
  8. Type ls /home (or maybe ls /Users – this shows you the directory where all accounts are stored, the parent of your home directory)
  9. Type ls .. (this also shows you the parent of your home directory, because your current directory is your home and '..' is the name of the parent directory of the current directory)

If there are more names in the /home (or /Users) directory, pick one of them. Let's call that name <name>. (If there are none, just your your name again.)

  1. Type ls ~<name> (this is another name for the home directory of the user called <name>)
  2. Type cd .. (this changes your current directory to /home, where all the home directories are stored)
  3. Type pwd (this prints the working directory, proving you are now “in” the directory where home directories are stored)
  4. Type ls (you will see your account name listed, and the names of any other accounts on the computer)
  5. Type ls <your-username> (this is a relative path, which begins in the working directory instead of the root directory)
  6. Type cd - (this should print nothing, but… where are you now?)
  7. Type pwd (cd understands the special argument '-' which means “the directory I was in before this one”)

Remark: There are several ways to specify locations in the computer, and one of them is implicit (the current working directory) and often used as a default when you do not specify any other directory.

Where are commands implemented?

  1. Type type echo (this shows you that echo is a built-in command, implemented in the shell itself; when you echo things, the shell performs the “echo”ing for you directly)
  2. Type type ls (this shows you that ls is a program that is stored on the disk; the shell runs the ls program for you whenever you type its name)

Remark: Commands are either built-in to the shell, or they are programs stored in the filesystem just like any other file. Having a user program manage the running of other user programs in this way was one of the reasons why shells were invented.

Remark: There is nothing special about commands, and you can add lots (and lots) of new commands by installing programs on your computer in places such as /usr/bin or /usr/local/bin.

Hidden files, and visual clues about file type

  1. Type ls . (this shows you all the files in the current directory, not the directory itself)
  2. Type ls -d . (this shows you the details of the directory '.' itself, not the files that it contains; -d means “list directories as themselves”)
  3. Type ls -a (now you can see the hidden directory entries, which start with '.', including '.' itself and its ancestor '..')
  4. Type ls -aF (this will put a '/' after directory names, and a '*' after executable files)
  5. Type ls -F /usr/bin (there is a large collection of executable files in /usr/bin)

Creating directories, copying files

  1. Type cd /tmp (this puts you in a directory meant for temporary files)
  2. Type pwd(make sure the cd command really worked and this prints /tmp)
  3. Type mkdir mydir (MaKes a DIRectory called mydir)
  4. Type ls -ld mydir (check that you are the owner of the directory: -l = long format, -d = show information about directories themselves, not about the files they contain)
  5. Type cp /etc/passwd mydir (this copies the file /etc/passwd into your mydir directory. We can do better, though. Try the following instead…)
  6. Type cp -vip /etc/passwd mydir/ (this version employs several “safety features” that command line pros use often
    • the option -v means “verbose”: it prints each file as it is copied
    • the option -i means “interactive”: it asks you whether you want to overwrite any files that already exist
    • the option -p means “preserve permissions”: in particular, the copy will have the same timestamp as the original
    • following the destination directory's name with a '/' ensures the destination really is an existing directory: if for some reason the directory does not exist, cp will print an error message. Without the / at the end, if the directory did not exist then the file would be copied to a regular file called “mydir” which is definitely not what we want, so including the / ensures our intentions are enforced)

Remark: use the options that programs like ls and cp provide so that they give you the information and protection from mistakes that you want, and make use of the (very) few “safety” features (such as trailing / on directory names) that are available in the command line.

  1. Type nano data.txt
  2. Type the first ten counting numbers, as words, one per line:
    • one
    • two
    • three
    • four
    • five
    • six
    • seven
    • eight
    • nine
    • ten
  3. Type Control-O and Enter (to write Out the file)
  4. Type Control-X (to eXit nano)
  5. Type ls -il (you can see your file, its owner, how long is it, and the first — probably huge — number is the disk address of the inode describing the file's contents)
  6. Type cp data.txt copy1.txt
  7. Type ls -il(you can see copy1.txt is the same size but has a different inode — the contents were copied)
  8. Type nano copy1.txt and then add this is copy1 at the start of the file
  9. Type Control-O Enter Control-X (write out the file and exit)
  10. Type ls -il (you can see copy1.txt is now larger than data.txt, but its inode has not changed)
  11. Type cat data.txt (this concatenates the files named in the command arguments and prints them on the terminal; you can see that the original file is unchanged)
  12. Type cat copy1.txt (you can see that the copy has been changed)
  13. Type ls -il (you can see that the inode of copy1.txt has not changed, but the contents of the storage blocks of the file were changed)

Remark: cp makes a brand new directory entry and a brand new inode and then copies the contents of the original file into brand new storage.

Remark: When you edit a file with nano the inode does not change, only the contents of the file change. Continue with this section to see why this is significant.

  1. Type ln data.txt copy2.txt (this creates another link to data.txt's inode called copy2.txt)
  2. Type ls -li (you can see that the inode numbers of data.txt and copy2.txt are the same. The ln program made a new directory entry but did not copy the inode. You can also see that the link count of data.txt and copy2.txt is 2, whereas the link count of copy1.txt remains 1, because there are now two directory entries pointing at the one inode shared by data.txt and copy2.txt)
  3. Type nano copy2.txt and add "this is copy2" at the start of the file; then press Control-O, Enter, and Control-X to write out the file and exit)
  4. Type ls -il (you can see copy2.txt and data.txt are both now larger)
  5. Type cat copy2.txt (you can see that the copy2.txt has been changed)
  6. Type cat data.txt (because copy2.txt and data.txt share the same inode, they both changed when you edit either one of them; they are the same file, but with multiple directory entries pointing to it with different names)

Remark: ln makes a new link to an existing inode and file contents. If you modify any one of the files sharing the same inode, they will all change in exactly the same way.

Remark: The link count of a file (or directory) tells you how many directory entries “point to” (share) the same inode.

Finding files by their type

  1. Type find . -type d (this will print all the directories in or under the current directory; it will probably only print '.' unless you created more directories)
  2. Type find .. -type d (this will print all the directories in or under the parent directory; it should probably find several more names, including 'mydir')
  3. Type find . -type f (this will print all the files in or under the parent directory; it should print at least data.txt, copy1.txt, and copy2.txt)
  4. Type find . -name *.txt (this will print an error message… why?!? Let's find out…)

Remark: You can search for files based on their type: file (-type f), directory (-type d), etc.

Finding files by name

  1. Type echo find . -name *.txt (this will print the command that the shell just ran; it says ”find . -name copy1.txt copy2.txt data.txt“ which is not what you wanted – the shell expanded *.txt in to the names of all the .txt files)
  2. Type find . -name '*.txt' (this will print all the files in, or under, '.' whose names end with '.txt')
  3. Type find . -name 'c*' (this will print copy1.txt and copy2.txt; all the files whose names start with 'c')

Remark: You can use echo to see exactly what the shell is doing with complex arguments.

Remark: You can use quote characters '' to stop the shell messing with your arguments; often you want *.txt to mean “all the text files in this directory”, but in this case you did not want that at all.

Finding data in files, finding files by their content

  1. Type grep e data.txt (this will search the content of data.txt for all lines that have the letter 'e' in them; you should see hello, one, three, five, etc., but not two, four, or six)
  2. Type grep two * (this will search the content of all files in the current directory for lines that have the word 'two'. You should see all three lines from all three files. Because there was more than one file argument on the command line, grep also prints the name of the file(s) where the target string 'two' was found)

Remark: You can search for content in one or more files.

Remark: You can search for files based on whether they contain particular content.

Redirecting output to a file

  1. Type ls -l /usr/bin (there is a lot of output)
  2. Type ls -l /usr/bin > /tmp/files.txt (there is no output; what happened? All the output from ls was redirected (written) to /tmp/files.txt instead of to the screen)
  3. Type cat /tmp/files.txt (there's the output that would have gone to the screen)

Remark: Command output can be saved in a file using the redirection operator > file.

Remark: There is too much output in files.txt to see it all at once.

  1. Type less /tmp/files.txt (this will show you the output one page at a time. Press space to move forward a page; press up and down arrows to move forward or backward a line; press G to go to the end and g to go to the start of the tile; press q to quit.)

Remark: To view a large amount of data one page at a time, use the program 'less'.

  1. Type grep ed /tmp/files.txt(this finds all programs in /usr/bin that have 'ed' in their name; not especially useful, but it illustrates an important point…)

Remark: The output of a command can be redirected to (saved in) a file and analysed using other programs.

Redirecting input from a file

  1. Type grep ed and then enter:
    • hello
    • are
    • we
    • bored (grep will echo back only the line containing 'ed')
    • yet?
  2. type Control-C (to terminate the program)

Remark: Many commands can read from the keyboard as well as reading from files.

  1. Type grep ed < /tmp/files.txt (this will act as if you have typed the input, but the input is taken from the file /tmp/files.txt)

Remark: Just as output can be redirected to a file using >, input can be redirected from a file using <.

Pipelines and filters

What if you wanted to avoid creating a temporary file in between ls and grep?

  1. Type ls -l /usr/bin | grep ed (this prints all files in /usr/bin whose name includes ed. The output of ls was redirected to the input of grep, without using an explicit temporary file in the middle. [There actually is a temporary file in the middle, but it is invisible and exists only in the computer's memory.])

Remark: The output of one program can be sent to the input of another program.

  1. Type wc and then enter these two lines:
    • Why was the computer late for the meeting?
    • Because it had a hard drive.
  2. Type Control-D (wc will print the number of characters, words, and lines that you typed)

Remark: wc can analyse text files by counting characters, words, and lines.

Remark: When a program is reading from the keyboard, Control-D is a way to make the program believe that it reached the end of the input file. Try it with cat: run cat, type a few lines, then press Control-D.

  1. Type ls -l /usr/bin | grep e | wc -l (this prints the number of programs in /usr/bin that have an 'e' in their name)

Remark: Programs can be chained together into long pipelines by joining inputs to outputs together.

Remark: In this example, grep is acting as a filter. It reads input, filters it in some way, and then writes the result to its output.

Remark: Many command line utilities are built this way, so that they can be composed to perform useful functions. Individually they are all quite small and simple, but together their behaviour can be very complex. The flexibility to compose them in many ways is one reason that the command line is so powerful for managing and analysing data.

Remark: The '|' character is called “pipe”; it is used a lot by command line “pros”.

2020/09/03 16:03

Week 07 — Command sequencing

Note: This week we will practice and learn more about working with multiple files and directories, and about how text files can be used to store simple databases. In the two weeks following this one we will study command sequencing (scripts), control, and shell variables.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

1. Complete the self-preparation assignment at home before next class

This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.

2. Check your understanding of command concepts using the self-assessment questionnaire

  1. Answer each question in the self-assessment questionnaire as honestly as you can.
  2. Revise the topics having the lowest scores, update your scores.
  3. Repeat the previous step until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.

What you will learn from this class

  • How to create new directories and files.
  • How to move (rename) and copy (duplicate) files and directories.
  • How to delete directories.
  • How to use wildcards to specify a pattern that expands to many file names.
  • How to use brace expressions to generate new file and directory names.
  • How to skip over the first part a file using tail.
  • How to use cut to extract fields from a simple database stored as a text file.

Notes

The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.

Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.

Review

First make sure you understand the important topics from the previous two weeks.

Review of previous weeks

Copying directories

The command cp filesdirectory copies one or more files into directory. If any of the files happen to be directories then the cp command will fail.

To copy an entire directory (recursively) use cp with the -r option.

The cp -r filesdirectory command copies one or more files into directory. If any of the files are directories then first the directory is copied along with all of its contents.

Let's practice on a simple directory hierarchy.

Use the mkdir and echo commands to recreate the dir1 directory and its three files as shown in the diagram. The content of the three files is not important.

$ cd /tmp $ mkdir dir1 $ echo 1 > dir1/file1 $ echo 2 > dir1/file2 $ echo 3 > dir1/file3 $ ls -lR dir1 dir1: total 48 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:15 file3.txt

Use cp -rv (recursive and verbose) to copy the entire directory dir1 to a new directory tree called dir2.

$ cp -rv dir1 dir2 'dir1' → 'dir2' 'dir1/file3.txt' → 'dir2/file3.txt' 'dir1/file2.txt' → 'dir2/file2.txt' 'dir1/file1.txt' → 'dir2/file1.txt'

Because dir2 does not yet exist, it is first created in the current directory and then the contents of dir1 are copied to dir2. The -v option shows you the directory being created and the files being copied.

What will happen if you run the same cp -rv dir1 dir2 command again?

$ cp -rv dir1 dir2 'dir1' → 'dir2/dir1' 'dir1/file3.txt' → 'dir2/dir1/file3.txt' 'dir1/file2.txt' → 'dir2/dir1/file2.txt' 'dir1/file1.txt' → 'dir2/dir1/file1.txt' $ ls -lR dir2 dir2: total 64 drwxr-xr-x 2 piumarta dialout 170 Oct 26 05:57 dir1 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:54 file3.txt dir2/dir1: total 48 -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file1.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file2.txt -rw-r–r– 1 piumarta dialout 2 Oct 26 05:57 file3.txt

Because dir2 already exists, dir1 is copied into dir2; the new copy of dir1 does not replace dir2.

Removing directories

The rmdir dir command removes the directory dir.

Try removing dir1.

$ rmdir dir1 rmdir: failed to remove 'dir1': Directory not empty

A directory must be empty before it can be removed.

You could remove the files dir1/file1.txt, dir1/file2.txt, and dir1/file3.txt one at a time but that would be tedious. Instead, remove all three at the same time using a wildcard. The path dir1/* expands to all three of the files in dir1. If you use rm -v dir1/* (-v for verbose) then each name will be printed as it is removed. Once the three files are removed you will he able to remove their parent directory dir1.

Use rm -v dir1/* to remove all the files in dir1.

$ ls dir1 file1.txt file2.txt file3.txt $ rm -v dir1/* removed 'dir1/file1.txt' removed 'dir1/file2.txt' removed 'dir1/file3.txt' $ rmdir dir1 $ ls dir1 ls: cannot access 'dir1': No such file or directory

We still have dir2 which contains three files and a copy of the original dir1 (with three more files inside that directory). The * wildcard is less useful when removing this many files. Instead you can use rm -r (-r for recursive) which will remove the contents of a directory before removing the directory itself.

Use rm -r dir2 to remove dir2 and all of its contents.

$ ls -F dir2 dir1/ file1.txt file2.txt file3.txt $ rm -r dir2 $ ls dir2 ls: cannot access 'dir2': No such file or directory

When you delete a file from the command line it is gone forever. There is no 'trash can' that collects deleted files. There is no way to restore a deleted file later if you change your mind.

Wildcards

In the exercises above the argument dir2/* matched all the filenames in dir2. The shell expanded the pattern dir2/* into three separate arguments: dir2/file1, dir2/file2, and dir2/file3.

The * character actually matches any sequence of characters (zero or more) except /. You can use it to match 'anything' in a part of a filename. You can also use it more than once to match 'anything' in several different parts of a filename.

List all files in /etc that begin with b, that end with .conf, or that have a . anywhere in their name.

$ ls /etc/b* /etc/baseprofile /etc/bash_completion $ ls /etc/*.conf /etc/nsswitch.conf $ ls -d /etc/*.* /etc/init.d /etc/nsswitch.conf /etc/rebase.db.i386 /etc/vimrc.less /etc/minirc.dfl /etc/persistprofile.sh /etc/sessionsaliases.sh /etc/xmodmap.esc

Another useful wildcard character is ? which matches exactly one of any character (except /).

List all files in /etc that have an o and an f in their name separated by exactly one other character (it does not matter which character).

$ ls /etc/*o?f* /etc/nsswitch.conf /etc/ssh_config

One more useful wildcard pattern is [chars] which matches exactly one of any of the chars listed between the square brackets.

List all files in /etc that have a two consecutive vowels ('a', 'e', 'i', 'o', or 'u') in their name.

$ ls -d /etc/*[aeiou][aeiou]* /etc/bash_completion /etc/defaults /etc/screenrc /etc/version /etc/bash_completion.d /etc/group /etc/sessionsaliases.sh

When the chars contains a range of consecutive characters, you can specify the entire range using “first-last”.

Use the “[first-last]” pattern to list all files in /etc whose name contains at least one digit.

$ ls -d /etc/*[0-9]* /etc/X11 /etc/at-spi2 /etc/dbus-1 /etc/gtk-3.0 /etc/pkcs11 /etc/rebase.db.i386

The wildcard patterns explained above are expanded by the shell according to the files that actually exist in the filesystem. What happens if you use a wildcard pattern that does not match any files?

Try to delete some non-existent 'log' files: dir1/*.log.

$ rm dir/*.log rm: can't remove 'dir/*.log': No such file or directory

If the wildcard pattern does not match any files, it is simply left unexpanded. When the command tries to access a file named by a wildcard expression, the file does not exist and an error message is generated.

Dry runs: using "echo" to preview commands

A 'dry run' is a rehearsal or practice that takes place before the real performance. In computing, a dry run shows you what a command would do but without actually doing it. One example of how useful they are is to see what files would be matched by wildcard patterns, for example before actually removing them.

For the next exercise, set up your dir1 directory as above, containing six files:

  • three text files file1.txt, file2.txt, and file3.txt, containing the words think, for, and yourself;
  • three data files file1.dat, file2.dat, and file3.dat, containing the number of characters in the corresponding .txt files.

$ mkdir dir1 $ echo think > dir1/file1.txt $ echo for > dir1/file2.txt $ echo yourself > dir1/file3.txt $ wc -c dir1/file1.txt > dir1/file1.dat $ wc -c dir1/file2.txt > dir1/file2.dat $ wc -c dir1/file3.txt > dir1/file3.dat $ ls -l dir1 total 3 -rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file1.dat

  1. rw-r–r– 1 user UsersGrp 6 Oct 26 16:51 file1.txt

-rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file2.dat -rw-r–r– 1 user UsersGrp 4 Oct 26 16:51 file2.txt -rw-r–r– 1 user UsersGrp 17 Oct 26 16:51 file3.dat -rw-r–r– 1 user UsersGrp 9 Oct 26 16:51 file3.txt

Use the echo command to perform a dry-run of removing:

  • all the .txt files in dir1,
  • all the .dat files in dir1,
  • the .txt and .dat files for only file2 (two files in total),
  • the .txt and .dat files for file1 andfile3 (four files in total).

$ echo rm dir1/*.txt rm dir1/file1.txt dir1/file2.txt dir1/file3.txt $ echo rm dir1/*.dat rm dir1/file1.dat dir1/file2.dat dir1/file3.dat $ echo rm dir1/file2.* rm dir1/file2.dat dir1/file2.txt $ echo rm dir1/file[13].* rm dir1/file1.dat dir1/file1.txt dir1/file3.dat dir1/file3.txt

Why is it called a 'dry run'?

Creating files and updating timestamps

The touch command updates the last modification time of an existing file to be the current date and time. If the file does not exist, an empty file is created.

Create two empty files called file1 and file2.

$ cd dir1 $ ls -lt file[12] ls: file[12]: No such file or directory $ touch file1 file2 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2 $ touch file2 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 $ touch file1 $ ls -lt file[12] -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file1 -rw-r–r– 1 user UsersGrp 0 Oct 26 18:33 file2

Note how touching a file moves it to the top of the 'most recent' list (ls -t).

Generating path names using brace expressions

Wildcards are used to match existing file names. They cannot be used to generate file names for non-existent files or directories, for example, to create a set of needed files or directories.

Try using a wildcard to create ten empty files called test0, test1, test2, …, test9.

$ touch test[0123456789] $ ls test* test[0123456789]

Creating a single file called test[0123456789] is not what you intended. That is what happened because the shell could not find any existing file to match the pattern test[0123456789] and so did not expand it in the command line.

A brace expression will generate multiple words based on a list or sequence of values. The list of values to generate is written between curly braces { and } with items in the list separated by commas. For example, the expression {a,b,c} generates three separate words a, b, and c. The brace expression can appear in a larger pattern, for example, the expression p{a,b,c}q generates three separate words paq, pbq, and pcq.

Use a brace expression to generate the command needed to create the five files test0.txt to test4.txt.

$ touch test{0,1,2,3,4}.txt $ ls test* test0.txt test1.txt test2.txt test3.txt test4.txt

When a sequence of numbers or letters are needed then the list can contain just the first and last values separated by ... This is called a sequence expression. For example, the sequence expression p{a..z}q generates a list of 26 words, starting with paq and pbq, and ending with pyq and pzq.

Use a brace expression to generate the command needed to create the five files test5.txt to test9.txt.

$ touch test{5..9}.txt $ ls test* test0.txt test1.txt test2.txt test3.txt test4.txt test5.txt test6.txt test7.txt test8.txt test9.txt

In a sequence expression that generates numbers, the first value in the sequence sets the minimum width of the generated numbers. This is useful if leading 0s are needed. For example, the following sequence expressions generate lists of 100 words:

  • test{0..99} generates test0, test1, … , test98, test99, and
  • tt{000..099} generates tt000, tt001, … , tt098, tt099, and
  • t{00000..99} generates t00000, t00001, … , t00098, t00099.

CSV files and the "cut" command

Text files are often used as simple 'databases' for storing captured sensor data, the results of data processing, etc. The shell provides several commands for manipulating data stored in this kind of text file.

A comma-separated value (CSV) file is one example of this kind of text file database. Each line is a record and each field in that record is separated from the next with a specified delimiter character. In a CSV file the delimiter is a comma, “,”.

The cut command selects and prints fields from exactly this kind of text file. By default it uses a 'tab' character to separated fields (just as a copy-paste operation between Excel and a text editor does) but this can be changed using a command line option. cut has the following command line options:

  • -d character specifies the delimiter character. To manipulate CSV files, use: “cut -d ,
  • -f fields tells cut which of the fields you want to print. Fields are numbered, starting at 1, and fields can contain multiple fields separated by commas.

Create a CSV file called directory.txt that contains the following data. (The easiest way is to copy the text it from this web page and paste it into a text editor, or into “cat > directory.txt” followed by Control+D to simulate end-of-file.)

name,given,office,phone,lab,phone
Adams,Douglas,042,0042,092,0092
Kay,Alan,301,3001,351,3051
Knuth,Donald,201,2001,251,2051
Lee,Tim,404,4004,454,4054
McCarthy,John,202,2002,252,2052
Shannon,Claude,304,3004,351,3051
Vinge,Vernor,302,3003,352,3053

Use the cut command to extract just the “office” column from the data.

$ cut -d , -f 3 directory.txt office 042 301 201 404 202 304 302

The tail command has an option to print a file starting at a specific line number. The syntax is: “tail -n +number”. For example, “tail -n +5 file” will print the contents of file starting from the 5th line in the file.

Pipe (|) the output from the previous command into tail. Use the tail -n +number option to print the input starting at line number 2.

$ cut -d , -f 3 directory.txt | tail -n +2 042 301 201 404 202 304 302

The grep command understands the similar wildcard patterns to the shell. (The shell uses them to filter file names and grep uses them to filter or select lines of text.)

Each office number in our sample data is three digits long. The first digit says which floor the office is on. One way to extract just the office numbers on the second floor is to use grep to search for numbers matching the pattern “2[0-9][0-9]”. You can then count how many offices are on the second floor using “wc -l”.

Write a pipeline of commands that prints how many offices are located on the third floor. Try very hard to do this without looking at the sample answer. If you cannot find the solution, click on the link below to view the answer.

Sample answer

Summary

  • echo > file can be used to create a file containing a line of data.
  • touch file can be used to create an empty file or to update its modification time to 'now'.
  • mkdir directory creates a new directory.
  • cp oldfile newfile copies (duplicates) oldfile to newfile.
  • mv oldfile newfile moves (renames) a file or directory.
  • cp files… directory copies one or more files (or directories) into an existing directory.
  • mv files… directory moves one or more files (or directories) into an existing directory.
  • rm files… removes (deletes) files.
  • rmdir directory removes (deletes) a directory which must be empty.
  • rm -r directory removes (deletes) a directory and all its contents, recursively.
  • *” in a file name matches zero or more characters, so “*.txt” matches all files ending in “.txt”.
  • ? in a file name matches any single character, so ”?.txt“ matches ”a.txt“” but notany.txt”.
  • [characters'] in a file name matches any one of the characters, so ”[aeiou].txt“ matches ”a.txt“” but notb.txt”.
  • [first-last'] in a file name matches any character in the range first to last, so ”*[a-m].txt“ matches ”boa.txt“” but notconstrictor.txt”.
  • Wildcards (*, ?, []) are expanded by the shell to match files that already exist. They cannot generate new (non-existent) file names.
  • {a,b,c} expands to three words: a, b, and c.
  • p{a,b,c}q{x,y,z}r expands to nine words: paqxr paqyr paqzr pbqxr pbqyr pbqzr pcqxr pcqyr pcqzr
  • {000..5}.txt expands to six words: 000.txt 001.txt 002.txt 003.txt 004.txt 005.txt
  • tail -n +number displays input starting at line number (and continuing until the last line).
  • There is no 'trash': when a file or directory is deleted it is gone immediately and forever.
  • cut -d char -f fields prints the given fields from its input lines using char as the field delimiter. The fields are numbered from 1 and multiple field numbers are separated by commas.
2020/09/03 16:03

Week 08 — Loops, scripts

This week we will study manipulating multiple files using loops and creating new commands out of sequences of existing commands.

The large sensor data file for the in-class assignment can be downloaded like this: curl -O https://kuas.org/tmp/metar-2019.tgz

Once downloaded, unpack the contents using tar -xf metar-2019.tgz which will create a directory called metar-2019 containing 8752 files of weather sensor data.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

1. Complete the self-preparation assignment at home before next class

This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.

2. Check your understanding of command concepts using the self-assessment questionnaire

  1. Answer each question in the self-assessment questionnaire as honestly as you can.
  2. Revise the topics having the lowest scores, update your scores.
  3. Repeat the previous step until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.

What you will learn from this class

  • How to perform an action on many different files.
  • How to use a loop to apply a command to many files.
  • How to use a variable in part of a larger name.
  • How to use wildcards to control which files a loop processes.
  • How to use interactive history to modify and/or repeat recent commands.
  • How to write a loop on a single line.
  • How to use redirection with loops.

Notes

The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.

Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.

Review

First make sure you understand the important topics from the previous two weeks. Click on this link to review what you should already know:

Review of previous weeks

In the notes below, follow along by typing all the commands shown in bold. Check the the output from your commands is similar to the output shown here.

Download some reference data

Download the file planets.tar from the course web site.

$ cd $ curl -O https://kuas.org/tmp/planets.tar % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 20480 100 20480 0 0 485k 0 --:--:-- --:--:-- --:--:-- 487k

The file is a 'tar' archive. Unpack the archive using the tar command with options -x to extract an archive, -v to be verbose about each file extracted, and -f to give the archive filename on the command line.

$ tar -xvf planets.tar planets/earth.dat planets/jupiter.dat planets/mars.dat planets/mercury.dat planets/moon.dat planets/neptune.dat planets/pluto.dat planets/saturn.dat planets/uranus.dat planets/venus.dat

You can see from the output that a directory called planets was created and that all the new files are inside it. Change to the planets directory and then check the contents of one of the files using cat or less.

$ cd planets $ cat earth.dat Name Earth Mass (10^24kg) 5.97 Diameter (km) 12,756 Density (kg/m^3) 5514 Gravity (m/s^2) 9.8 Escape Velocity (km/s) 11.2 Rotation Period (hours) 23.9 Length of Day (hours) 24.0 Distance from Sun (10^6 km) 149.6 Perihelion (10^6 km) 147.1 Aphelion (10^6 km) 152.1 Orbital Period (days) 365.2 Orbital Velocity (km/s) 29.8 Orbital Inclination (degrees) 0.0 Orbital Eccentricity 0.017 Obliquity to Orbit (degrees) 23.4 Mean Temperature (C) 15 Surface Pressure (bars) 1 Number of Moons 1 Ring System? No Global Magnetic Field? Yes

The files contain tab-separated values with two columns. The first column describes the data on that line, and the second column contains the data value.

Check the first two lines of the files to see if they all look the same.

$ head -n 2 *.dat ==> earth.dat <== Name Earth Mass (10^24kg) 5.97 ==> jupiter.dat <== Name Jupiter Mass (10^24kg) 1898

...etc... ==> uranus.dat <== Name Uranus Mass (10^24kg) 86.8 ==> venus.dat <== Name Venus Mass (10^24kg) 4.87

Line 17 of every file should contain the mean temperature. Check line 17 of earth.dat using the combination of head -n 17 and tail -n 1 that was used earlier.

$ head -n 17 earth.dat | tail -n 1 Mean Temperature (C) 15

How would you check line 17 of all the files to make sure they contain the mean temperature?

The obvious way is to change earth.dat to *.dat in the command you just used. Will that work?

Try showing the 17th line of each file by running the command withearth.dat changed to *.dat.

$ head -n 17 *.dat | tail -n 1 Mean Temperature (C) 464

That's not right. We only saw the line for one planet. Which one was it? Use grep to find out.

$ grep 464 *.dat venus.dat:Mean Temperature (C) 464

Why did you see only one line of output?

Answer

To print the 17th line of every file we need to use something more sophisticated: a loop.

Running a command on multiple files using a loop

To print the 17th line of each file, what we want to do is this (in natural language):

  • for every file ending in .dat
    • print the last 1 line of the first 17 lines of the file

The shell can do this for us using a for loop. The syntax (or 'general form') of a for loop always looks like this:

for thing in list of things do     operation_on $thing done

The word for is followed by a variable name (in this case thing), then the word in, and then a list of (space-separated) words. The list of words ends with a newline (or semicolon – see below) and the word do. One or more commands then follow, collectively called the body of the loop, ending with the word done. The commands in the body will be run as many times are there are words in the list. Each time the body commands are run, the variable will be set to the next item in the list (starting with the first).

Note that the parts in italics are not meant to be typed literally. They are descriptive 'placeholders' for some particular list of things that you want to operate on and and some specific operation that you want to perform on those things. Let's make the loop print the 17th line of all the .dat files by

  • using *.dat for our list of things and
  • head -n 17 $thing | tail -1 for our operation_on

Note also that the name of the variable thing is not important; what is important is that the name used after for matches the name used inside the loop to refer to each of the words in the list of things. Let's change the name thing to something more meaningful, such as filename

for filename in *.dat do     head -n 17 $filename | tail -n 1 done

Try running the above command, exactly as it is shown. (If you make a mistake, or the shell gets confused about what you are typing, press Control-C to get back to the normal prompt.)

Note that the prompt changes to “>” as soon as you finish typing the first line. This is to remind you that you have not yet finished typing the complete for command. (A for loop is not complete until the shell sees the word done at the end.)

$ for filename in *.dat > do >    head -n 17 $filename | tail -n 1 > done

How did the shell know that the filename inside the loop was a variable, and not the name of a file? Because of the $ symbol at the beginning. Whenever a $ is followed by a name, the shell replaces the $name combination with whatever value is currently assigned to the variable with the given name. Without the $ in front of filename the head command would have tried to print the first 17 lines of the (non-existent) file literally called filename.

How did the shell know that the filename after the for is the name of a variable? Because the syntax of the for command says that the next thing in the command must always the name of a variable. The $ is not needed (and is even wrong) because we do not want to replace filename with its value, we are just telling for the name of the variable it should set to each item in our list of things.

You can use the echo command to see exactly how the loop works and what it is doing to the variable.

Use echo to see how many times the loop is run and to see the value of filename each time the loop runs.

for filename in *.dat do     echo filename is $filename done

Try moving the $ from the second filename to the first to see what changes.

  • What will be the output if you put a $ in front of both filenames?
  • What will be the output if you do not use $ at all in front of the filenames?

What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop.

for filename in *.dat do     ls *.dat done

What is the output of the following loop? Try to predict the output by looking at the code, then check your prediction by executing the loop.

for filename in *.dat do     ls $filename done

Use a loop to make a backup copy of each of the planet .dat files. For each file x.dat, make a copy of that file called backup-x.dat. For example, earth.dat should be copied to a file called backup-earth.dat.

A single copy command such as the following will not work (try it if you like):

$ cp *.dat backup-*.dat cp: target 'backup-*.dat' is not a directory

The correct solution follows the same pattern as printing the 17th line of every file. Of course, the operation should instead copy each file from “$filename” to “backup-$filename”.

Answer

Using variables as parts of filenames

The previous example shows how a variable is used to form part of a longer name. The filename variable is used to create the name backup-$filename. When filename is set to earth.dat, the longer name will be backup-earth.dat.

A problem arises when trying to append a letter or digit to a name stored in a variable. For this reason $filename can also be written ${filename}. Since the characters { and } cannot be part of a variable name, there is no possibility of ambiguity when this form is used inside a longer name next to a letter or a digit.

Delete your backup-* files. The for each file x.dat create a backup file called x2

$ for name in *.dat > do >    cp $name $name2 > done cp: missing destination file operand after 'earth.dat' cp: missing destination file operand after 'jupiter.dat' ...etc... cp: missing destination file operand after 'uranus.dat' cp: missing destination file operand after 'venus.dat'

What is the problem? Use an echo command to print what the shell will do when it executes the cp command, like this:

$ for name in *.dat > do >    echo cp $name $name2 > done cp earth.dat cp jupiter.dat ...etc... cp uranus.dat cp venus.dat

What happened to earth.dat2, etc.?

Variable names start with a letter which is followed by any number of letters and digits. The shell thinks that the “2” is part of the variable name; in other words, that the name of the variable in “$name2” is “name2”. To fix this, use { and } around “name” to separate it from the “2”.

$ for name in *.dat > do >    cp $name ${name}2 > done $ ls earth.dat mars.dat moon.dat pluto.dat uranus.dat earth.dat2 mars.dat2 moon.dat2 pluto.dat2 uranus.dat2 jupiter.dat mercury.dat neptune.dat saturn.dat venus.dat jupiter.dat2 mercury.dat2 neptune.dat2 saturn.dat2 venus.dat2 $ rm *.dat2

Using wildcards

Wildcards (*, ?, and [...]) in a for loop's list of things are expanded as usual.

What would be the results of running each of the following commands?

for name in p*.dat do     echo $name done

for name in *p*.dat do     echo $name done

Predict the answers, then check them by actually running the commands.

Avoiding typing: interactive history

The up-arrow (or Control+p) and down-arrow (or Control+n) keys can be used to scroll through recent commands. The left-arrow (or Control+b) and right-arrow (Control+f) keys let you move around inside a command. You can edit a previous command by deleting or inserting new content. Pressing Return re-runs the (edited) command.

If you try this on a for loop you will notice that the loop has been recorded on single line. To do this the shell has inserted some semicolon “;” characters to separate the different parts of the loop. A semicolon has been inserted in approximately the places where a newline was in the original for loop.

When viewed in the history our loop looks like this:

$ for name in *.dat > do >    ls $name > done earth.dat jupiter.dat ...etc... uranus.dat venus.dat $ Control+P $ for name in *.dat; do ls $name; done

Putting the entire loop on a single line

The general form of a single-line for loop is:

for thing in list of things ; do operation on thing ; ...etc... ; done

The semicolons take the place of newlines in the single-line version. Either or both of the semicolons can be replaced by newlines; the shell does not care whether you use semicolons or newlines.

Write the backup for loop again, all on one line.

for name in *.dat; do cp $name backup-$name; done

Delete the backup files. Add a command to echo the name of each file before copying it, still putting the entire for loop on a single line..

$ rm backup-* $ for name in *.dat; do echo $name; cp $name backup-$name; done earth.dat jupiter.dat mars.dat mercury.dat moon.dat neptune.dat pluto.dat saturn.dat uranus.dat venus.dat $ rm backup-*

Using redirection with loops

Let's print the 17th line of each file and redirect the output to another file.

The following will not work:

$ for name in *.dat; do >    head -n 17 $name | tail -1 > lines.txt > done $ cat lines.txt Mean Temperature (C) 464

The problem is that each time around the loop the > redirection truncates (empties) lines.txt before it writes the output from tail into it. There are two solutions to this problem.

The first solution is to use another redirection operator, >>. This operator appends lines to the output file instead of replacing its contents.

$ for name in *.dat; do >    head -n 17 $name | tail -1 >> lines.txt > done $ cat lines.txt Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464

The second solution is to move the redirection outside the loop, so that every command executed inside the loop will all be part of a single output redirection.

$ for name in *.dat; do >    head -n 17 $name | tail -1 > done > lines.txt $ cat lines.txt Mean Temperature (C) 15 Mean Temperature (C) -110 Mean Temperature (C) -65 Mean Temperature (C) 167 Mean Temperature (C) -20 Mean Temperature (C) -200 Mean Temperature (C) -225 Mean Temperature (C) -140 Mean Temperature (C) -195 Mean Temperature (C) 464

Challenges

The echo command normally prints a newline character after its arguments. If you use the option -n then this newline is not printed. This lets you use several echo -n commands to print several things on the same line. In the following example a semicolon ; is used (instead of newline) to separate two echo commands. The first echo command uses the option -n to prevent it printing the newline. Try running these commands with and without the -n to see the difference.

$ echo -n hello; echo "," world hello, world $ echo hello; echo "," world hello , world

In a for loop, the operation that is performed inside the loop can be another for loop. (This is called nesting loops.) For example:

$ for digit in {1..3}; do for letter in {a,b}; do echo $digit $letter; done; done 1 a 1 b 2 a 2 b 3 a 3 b

Arithmetic expansion is performed on any text written inside double parentheses after a $ symbol, like this: “$((text))”. The entire expression (from “$” to the closing “)”) is replaced by the result of evaluating text as an arithmetic expression. Within text you can refer to variables without needing to use the $ prefix.

Some examples:

$ foo=32 $ echo foo plus ten is $((foo + 10)) foo plus ten is 42 $ N=1; for L in {a,b,c}; do echo $L$N; N=$((N+1)); done a1 b2 c3

Write two nested for loops that print the following multiplication table:

1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100

Don't worry about properly lining up the columns.

Hint

The echo command understands an option -e that replaces certain sequences of characters with other characters. One replacement that this enables is to convert “\\t” into a tab character. A tab moves the cursor forward to a column that is a multiple of 8.

Modify your loops to line up the columns, like this:.

1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100

Hint

Summary

  • A for loop repeats a command for every item in a list.
  • A for loop sets a variable to the next item in the list before running the loop body.
  • Use $name to expand a variable (i.e., get its value), or ${name} if there are letters or digits immediately after the variable.
  • Use the up-arrow key to scroll up through previous commands, then edit and/or repeat them.
  • for loops can be written on one line by replacing newlines with semicolons.
  • for loops can be nested by writing a loop as the body of another loop.
2020/09/03 16:03

Week 09 — Expansions, conditionals

This week we will study while loops and if statements, several ways to test variable values and file properties, and some useful ways to manipulate the values stored in variables.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

1. Complete the self-preparation assignment at home before next class

This week's self-preparation assignment is mostly practice with some familiar shell commands and some new ones. The new commands are explained in the Notes section below, which also contains the practice exercises. (If you are already familiar with the command line, please at least skim the notes to make sure there are no small details that are new to you.) These commands and concepts will be further explored in the in-class assignment on Friday.

2. Check your understanding of the concepts using the self-assessment questionnaire

  1. Answer each question in the self-assessment questionnaire as honestly as you can.
  2. Revise the topics having the lowest scores, update your scores.
  3. Repeat the previous step until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should understand the topics outlined in the “Notes” section.

What you will learn from this class

  • How to remove prefixes and suffixes when you fetch a variable's value.
  • How to replace a pattern with some other text when you fetch a variable's value.
  • How to replace part of a word with the output from a command.
  • How to replace part of a word with the result of an arithmetic expression.
  • How environment variables control your interaction with the command line.
  • How to to write words (including filenames) that contain spaces.
  • How to use while loops to repeat commands while a condition is true.
  • How to use if statements to optionally run commands based on a condition.
  • How to use the test command to test the properties of files, strings, and numbers.
  • How to get help: using the man and help commands, and using the --help option with most commands.

Notes

The notes below include several exercises with answers that introduce new concepts. Many of these concepts will be used in this week's in-class assignment.

Read the notes and try to complete each of the exercises without looking at the sample answer. If you cannot complete an exercise using a few short commands then read the sample answer, practice it, and make sure you understand it before continuing.

Review

Make sure you understand the topics from last week. Click on the link below to expand a brief review.

Review

Exercises

As well as the indicated exercises, try typing in all the examples for yourself. If you can think of ways to modify the example to change the behaviour, try them. Exploration is the best way to learn.

Variables

Variables are used to store data. Variable names must begin with a letter which can be followed by any number of letters or digits. (The underscore “_” is treated as a letter.) Names that conform to these rules are legal (allowed by the rules); names that break these rules are illegal (not allowed by the rules).

Some examples of legal variable names:

Name Why it is legal
a starts with a letter
abcdef letter followed by any number of letters
a1b2c3 letter followed by any number of letters or digits
FooBar999Baz letter followed by any number of letters or digits
_ underscore _ is a letter too
_1234_ letter followed by digits and a letter
LONG_VARIABLE_NAME_NUMBER_1 letter followed by lots of letters and a final digit

Some examples of illegal variable names:

Name Why it is not legal
0 does not start with a letter
2things does not start with a letter
x@y @ is neither a letter nor a digit
final value space is neither a letter nor a digit

You create or set a variable using the = assignment operator. The syntax (general form) of assignment is:

variableName=value

where variableName follows the rules explained above and value is a single word (such as a filename), number, etc., with no spaces. There must not be any space either side of the = symbol.

You get the value of a variable by writing a $ before the variable's name. For example:

$ metars=/tmp/metars-2019 $ echo $metars /tmp/metars-2019

Again, there must be no space between the $ and the variable name.

Quoting

What if you want to put a space inside a value stored in a variable? You can protect spaces using quotation marks.

Single quotes around a value like this 'value' will protect everything inside the value. Wildcards (*, ?, etc.), dollar signs ($), and other special characters will be completely ignored. Spaces inside the value will be considered part of the value.

Double quotes around a value like this "value" will protect everything inside the value except for expansions (see below) introduced by the $ character. One such expansion is getting the value of a variable using $name.

$ foo='$woohoo $$$ * .* how about this?' single quotes stop *, ?, and $ from being treated specially $ echo '$foo' single quotes stop $ from being treated specially $foo $ echo " * $foo ? " double quotes allow $ to get the value of foo * $woohoo $$$ * .* how about this? ? but * and ? wildcards are still ignored

If you want a value with spaces inside, use '…'. If you want a value with spaces inside and variables to be expanded, use "…".

Expansions

The $ character is used to transform variables and other values in the command line by a process called expansion. There are several kinds of expansion:

  • variable expansion
  • parameter expansion
  • arithmetic expansion
  • command substitution

Variable expansion

A $ followed by a variable name expands to the value stored in the variable. (If the variable is not set to any value then the result is blank.) Braces { and } around the variable name are optional but are necessary when a variable expansion is followed immediately by a letter or digit that is not part of the variable name, as in the last example below.

$ metars=/tmp/metars-2019 $ echo $metars /tmp/metars-2019 $ me=myself $ echo $me myself $ echo ${metars} /tmp/metars-2019 $ echo ${me}tars myselftars

The brace syntax {variable} also provides several mechanisms that modify the value retrieved from variable. Within a variable expansion with braces, a suffix (such as a file extension) can be removed by following the variable name with %suffix.

$ filename=2019-01-01T00:53:57-japan.txt $ echo ${filename} 2019-01-01T00:53:57-japan.txt $ echo ${filename%.txt} 2019-01-01T00:53:57-japan $ echo ${filename%-japan.txt} 2019-01-01T00:53:57

A prefix can be removed by following the variable name with #prefix.

$ echo ${filename} 2019-01-01T00:53:57-japan.txt $ echo ${filename#2019} -01-01T00:53:57-japan.txt $ echo ${filename#2019-??} -01T00:53:57-japan.txt $ echo ${filename#2019-??-??} T00:53:57-japan.txt

In both cases (${name%pattern} and ${name#pattern}) you can use wildcards such as ? in the prefix or suffix.

$ echo ${filename} 2019-01-01T00:53:57-japan.txt $ echo ${filename#2019-??} -01T00:53:57-japan.txt $ echo ${filename#2019-??-??} T00:53:57-japan.txt $ echo ${filename%:??-*} 2019-01-01T00:53

You can also replace a pattern anywhere in a value with some other text using /pattern/replacement after the variable name:

$ echo ${filename} 2019-01-01T00:53:57-japan.txt $ echo ${filename/T/ at time } 2019-01-01 at time 00:53:57-japan.txt

The variable expansions above should be all you need in most cases, but there are several more that you might need to use occasionally. If you are interested, here is a table showing most of them. (Click on the 'link' to toggle the table.)

String operators available during ''${...}'' expansion

Imagine that you are running out of disk space on your computer. You have a lot of 'lossless' music stored in .wav (Microsoft 'wave') files. You could halve the amount of space they use by converting them to .flac (free lossless audio codec) files. The program ffmpeg can do this for you. The syntax is:

ffmpeg -i input-filename.wav output-filename.flac

First, make some 'fake' .wav files like this:

for i in {1..9}; do echo $i > track-$i.wav; done

1. Write a “for wav in …” loop that echos the names of all the *.wav files in the current directory, one at a time.

track-1.wav
track-2.wav
  ...
track-9.wav

2. Change the echo command so that for every file it prints two things: the original name ($wav) as well as the name with the original .wav suffix removed.

track-1.wav track-1
track-2.wav track-2
  ...
track-9.wav track-9

3. Change the echo command so that for every file it prints two things: the original name ($wav) as well as the name with the original .wav suffix removed and a new .flac suffix added.

track-1.wav track-1.flac
track-2.wav track-2.flac
  ...
track-9.wav track-9.flac

4. Change the echo command so that for every .wav file in the current directory your loop prints: ffmpeg -i filename.wav filename.flac

The output of your loop should look like this:

ffmpeg -i track-1.wav track-1.flac
ffmpeg -i track-2.wav track-2.flac
  ...
ffmpeg -i track-8.wav track-8.flac
ffmpeg -i track-9.wav track-9.flac

(If you had some genuine .wav files, and a copy of the ffmpeg program, you could remove the echo from your loop body and it really would convert them all the .wav files to .flac for you.)

Answer

Parameter expansion

Parameters are the values passed to a shell script on the command line. Whereas variables are named, parameters are numbered starting at 1. (If you ever happen to need it, $0 is the name of the shell script exactly as it appeared on the command line.)

There are three other special variables that are useful inside shell scripts. $# expands to the number of command-line arguments, and both $@ and $* expand to a sequence containing all of the command-line arguments separated by spaces.

Parameter Meaning
$1 The first command-line argument
$2 The second command-line argument
(and so on…)
$# The number of command line arguments
$@ All of the command line arguments
$* All of the command line arguments

Write a shell script that prints a single number showing how many command-line arguments it is run with. (Don't forget you have to make it executable using chmod +x filename before you can run it.)

Answer

The variables $@ and $* behave differently when quoted. To illustrate the difference, consider the following script:

#!/bin/sh

echo 'using "$@":'
for argument in "$@"; do
  echo "$argument"
done

echo 'using "$*":'
for argument in "$*"; do
  echo "$argument"
done

Running this script with three command-line arguments one, "two too", and three produces this result:

$ ./script one “two too” three using "$@": one two too three using "$*": one two too three

Create the script shown above. Run it with arguments one "two too" three. Run it with other arguments, including no arguments.

You can see that "$*" expands to a list of command-line arguments all inside one pair of double quotes ("). In other words, "$*" is one single value containing all of the command line arguments.

On the other hand, "$@" expands to a list of command-line arguments where each separate argument is inside a pair of double quotes ("). In other words, "$@" is one value per argument, each value containing a quoted version of the corresponding argument.

Expansion Equivalent
"$*" Single value containing all arguments: "$1 $2 $3 $4 …"
"$@" Multiple values, one per argument: "$1" "$2" "$3 "$4" …

In a for loop you should almost always use "$@" (to repeat the loop for each argument).

for argument in "$@"; do some_operation_on "$argument"; done

When assigning to a variable you should probably always use "$*", however most shells are clever enough to let you use either.

all_arguments="$*"
all_arguments="$@"

Arithmetic substitution

You can evaluate arithmetic expressions by enclosing them in double parentheses preceded by a $ character: $((expression)) Within the expression you can use the normal arithmetic operators and the names of variables (without a $ in front of them).

$ echo $((2+4*10)) 42 $ two=2 $ ten=10 $ echo $((two+4*ten)) 42 $ total=0 $ for n in {1..10}; do total=$((total+n)); done $ echo $total 55 $ n=1 $ for word in one two three four; do echo $n $word; n=$((n+1)); done 1 one 2 two 3 three 4 four

Modify your shell script from the previous exercise so that it prints each command-line argument preceded with its number, starting at 1. For example:

$ ./script one “two too” three 1 one 2 two too 3 three

Answer

Write a shell script called factorial that calculates the factorial of its command line argument. Recall that factorial(n) = n * (n-1) * (n-2) * … * 1.

$ ./factorial 5 120

Answer

Command substitution

Sometimes you will need to store the output of a command in a variable, or use the output of one command as an argument to another command. Command substitution provides a way to do this.

The pattern $(command) is replaced with the output from running command. Note that command can include command-line options and arguments, and can even be a pipeline made from several commands. The result can be used to set the value of a variable. In the following examples, note the use of double quotation marks around the command substitutions to protect any spaces in the output from the commands.

$ ls | wc -l 8752 $ pwd /Users/piumarta/metar-2019 $ numFiles="$(ls | wc -l)" $ dirName="$(pwd)" $ echo there are $numFiles files in the directory $dirName there are 8752 files in the directory /Users/piumarta/metar-2019

Another way of doing the same thing, without variables, is to use the command substitutions directly where their output is needed:

$ ls | wc -l 8752 $ pwd /Users/piumarta/metar-2019 $ echo there are $(ls | wc -l) files in the directory $(pwd) there are 8752 files in the directory /Users/piumarta/metar-2019

Write a shell script called nfiles.sh that prints the number of files in each of the directories written on the command line followed by the name of the directory.

$ ./nfiles.sh . /bin /usr/bin 42 . 124 /bin 1486 /usr/bin

(Of course, your results will differ.)

Answer

Control structures

A for loop is executed once for each member of a list of items. Other control structures include the while loop that executes until a condition becomes false, the until loop that executes until a condition becomes true, and the if statement that conditionally executes (or not) a sequence of commands.

While loop

The syntax (general form) of a while loop is

while TEST do   COMMANDS done

or on a single line like this:

while TEST ; do COMMANDS ; done

The COMMANDS part works exactly like it does in a for loop. The TEST part should be a command that can either succeed or fail. The while loop will continue to run its TEST and the COMMANDS until the TEST fails.

The ''test'' command

A useful command to use for the TEST part of a while loop is test, which can do many things. One thing test can do is compare two numbers.

Command Succeeds if… Example
test LHS -lt RHS LHS <   RHS test $num -lt $limit $num is less than $limit
test LHS -le RHS LHS <=  RHS test $num -le 0 $num is negative
test LHS -eq RHS LHS =   RHS test $num -eq 0 $num is zero
test LHS -ne RHS LHS =/= RHS test $num -ne -1 $num is not -1
test LHS -ge RHS LHS >=  RHS test $num -ge 0 $num is non-negative
test LHS -gt RHS LHS >   RHS test $num -gt 0 $num is positive

Combining a while loop with test and arithmetic expansion to update a counter:

$ counter=0 $ while test $counter -lt 5; do > echo $counter > counter=$((counter+1)) > done 0 1 2 3 4

If statement

The if statement conditionally executes a sequence of commands. The syntax of if statements is:

if TEST then   COMMANDS fi

or on a single line like this:

if TEST ; then COMMANDS ; fi

The COMMANDS will be run only if the TEST succeeds. Using the test command again for the TEST:

$ n=3 $ if test $n -lt 5; then > echo $n is less than 5 > fi 3 is less than 5

Another form of the if statement provides a second set of commands to be run if the TEST fails.

if TEST then   COMMANDS1 else   COMMANDS2 fi

or on a single line like this:

if TEST ; then COMMANDS1 ; else COMMANDS2; fi

First the TEST command is run. If TEST succeeds then COMMANDS1 are run. If TEST fails then COMMANDS2 are run.

$ n=7 $ if test $n -lt 5; then > echo $n is less than 5 > else > echo $n is not less than 5 > fi 7 is not less than 5

The test command can also check the properties of a file or directory, the size of a string, or the relationship between two strings.

Command Succeeds if…
test -d FILE FILE exists and is a directory
test -e FILE FILE exists
test -f FILE FILE exists and is a regular file
test -r FILE FILE is readable
test -s FILE FILE exists and is non-empty
test -w FILE FILE is writable
test -x FILE FILE is executable
test FILE1 -nt FILE2 FILE1 is newer than FILE2
test FILE1 -ot FILE2 FILE1 is older than FILE2
test -z STRING STRING is empty
test -n STRING STRING is not empty
test STRING1 = STRING2 the strings are equal
test STRING1 != STRING2 the strings are not equal
test STRING1 < STRING2 STRING1 comes before STRING2 in dictionary order
test STRING1 > STRING2 STRING1 comes after STRING2 in dictionary order
test -v VAR the shell variable named VAR is set

Combining the test for a directory with the if statement:

$ if test -d subdir; then > echo subdir already exists > else > echo creating subdir > mkdir subdir > fi echo creating subdir $ if test -d subdir; then > echo subdir already exists > else > echo creating subdir > mkdir subdir > fi subdir already exists

Modify your nfiles.sh script so that it checks each command-line argument. If the argument is a directory, the script prints the number of files in the directory followed by the argument (as before). If the argument is not a directory, the script prints '?' and then the argument.

$ ./nfiles.sh . /bin /usrbin /bin/ls 43 . 124 /bin ? /usrbin ? /bin/ls

Hint: instead of using two echo commands, set a variable (e.g., n) to either the number of files in the directory or the value '?'. At the end of your loop use a single echo command to print n and then the argument.

Answer

Modify your nfiles.sh script so that it checks each command-line argument. If the argument is a directory, the script prints the number of files in the directory followed by the argument (as before). If the argument is a regular file, the script prints 'F' and then the argument. If the argument is neither a directory nor a file (e.g., it does not exist) then the script prints '?' followed by the argument.

$ ./nfiles.sh . /bin /usrbin /bin/ls 43 . 124 /bin ? /usrbin F /bin/ls

Hint: the commands in the else part of your if statement should include another if statement that tests whether the non-directory argument is a regular file (test -f). This second if selects between 'F' for a file or '?' for everything else.

Answer

The meanings of the above test forms can be inverted by placing a ! (“not”) in front of them.

Command Succeeds if…
test ! EXPR EXPR fails (is false)

Combining if with the test for a directory (-d) and inverting it (!) to mean “the directory does not exist”:

if test ! -d subdir; then # subdir does not exist, so...   mkdir subdir # make it fi

You can combine two or more test forms with logical “and” or logical “or”:

Command Succeeds if…
test EXPR1 -a EXPR2 both EXPR1 and EXPR2 succeed (are true)
test EXPR1 -o EXPR2 either EXPR1 or EXPR2 succeeds (is true)

To check if your log file exists as a regular file (-f) and (-a) is writable (-w):

if test -f logfile -a -w logfile; then   echo logfile is a regular file and is writable fi

Shorthand for ''test''

Many shells have an alternative version of test called [ (open square bracket). Instead of test expression you can write [ expression ] which looks quite a lot nicer. Note that you must put spaces on both sides of the opening “[” and another before the final “]”.

$ numFiles=$(ls | wc -l) $ echo $numFiles 43 $ while [ ${#numFiles} -lt 5 ]; do # make numFiles be five characters wide, padded with '0's on the left > numFiles=“0$numFiles” # add a '0' to the left of numFiles > done $ echo $numFiles 00043

Modify your nfiles.sh script so that it prints the fist item on each line (the number of files, or an 'F' or a '?') right-justified in a field 5 characters wide. Use spaces to pad the number (or 'F' or '?') on the left to the required width.

$ ./nfiles.sh . /bin /usrbin /bin/ls   43 .   124 /bin   ? /usrbin   F /bin/ls

Answer

Other commands as loop/conditional tests

Many commands can be used as the test or condition in a loop or if statement. For example, grep succeeds if it finds a match and fails if it cannot find a match.

if grep -q -s pattern files... ; then   echo I found the pattern in the files. else   echo The pattern does not occur in the files. fi

(-q tells grep not to print any output, and -s tells grep not to complain about missing files.)

See Finding information about commands and programs below for different ways to look for information about success/failure of commands and their other options that help when using them as tests in loops and if statements.

Infinite loops

Two built-in commands help with infinite loops.

Command Succeeds
true always
false never

The following while loop will never stop. (If you try it then to make it stop type Control+C.)

while true; do   echo are you bored yet?   sleep 1 done

The following while loop will stop immediately and never execute the echo command.

while false; do   echo this cannot happen done

One use of true and false is to set a flag in a shell script to affect an if statement later on.

USE_LOGFILE=true # true ⇒ use log file; false ⇒ don't   if $USE_LOGFILE; then   echo “Running analysis at $(date)” >> logfile.txt fi

Stopping or restarting loops: ''break'', and ''continue''

You can break out of a while or for loop using the break command. You can jump back to the test at the start of a while loop using the continue command. Inside a for loop, the continue command restarts the loop body with the loop variable set to the next item in the list of items.

$ for i in {1..10}; do > if [ $i -eq 5 ]; then break; fi # break out of the loop if i = 5 > if [ $i -eq 3 ]; then continue; fi # restart the loop if i = 3 > echo $i > done 1 2 4

Modify your nfiles.sh script so that it uses a flag to remember whether any arguments were non-directories. If at least one argument was not a directory (it was a regular file, or did not exist) then print a message at the end of the script saying: Warning: non-directories were encountered.

$ ./nfiles.sh . /bin   43 .   124 /bin $ ./nfiles.sh . /bin /usrbin /bin/ls   43 .   124 /bin   ? /usrbin   F /bin/ls Warning: non-directories were encountered

Answer

Stopping a script or shell: ''exit''

You can terminate a shell script (or your interactive shell session) using exit.

if test ! -d data; then
  echo "data directory does not exist: giving up"
  exit 1
fi

The argument to exit is optional and should be a number. 0 is success and non-zero is failure. This allows scripts to control loops and conditionals, as part of their TEST, by returning success or failure from the entire script.

Write a short script called exit0.sh that immediately uses exit 0 to terminate its own execution.

Answer

Write another short script called exit1.sh that immediately uses exit 1 to terminate its own execution.

Answer

Use an if statement to verify which script 'succeeds' and which script 'fails'.

$ if ./exit0.sh; then echo succeeded; else echo failed; fi $ if ./exit1.sh; then echo succeeded; else echo failed; fi

Which exit value represents 'success'?

Which exit value represents 'failure'?

Answer

Modify nfiles.sh so that it succeeds if all arguments were directories and fails if any arguments were non-directories. Test whether it works using an if statement on the command line.

$ if ./nfiles.sh . /bin; then echo OK; else echo KO; fi   43 .   124 /bin OK $ if ./nfiles.sh . /bin /usrbin /bin/ls; then echo OK; else echo KO; fi   43 .   124 /bin   ? /usrbin   F /bin/ls Warning: non-directories were encountered KO

Answer

Command and filename completion

You can save a lot of time by typing the first few characters of a filename and then pressing the Tab key. The shell will try to find a file matching what you typed, and then 'complete' the part of the filename that you did not type. If there is more than one matching file, the shell will complete up to the point where the file names diverge. If there is only one matching file, the shell will complete the entire filename and than add a space at the end.

$ touch a-file-with-a-very-long-name $ ls a- # press the Tab key $ ls a-file-with-a-very-long-name # the shell completes the name a-file-with-a-very-long-name $ touch a-file-with-an-equally-long-name $ ls a- # press the Tab key to complete the name $ ls a-file-with-a # press Tab again to list the matching files a-file-with-an-equally-long-name a-file-with-a-very-long-name $ ls a-file-with-a # the command line remains in the same state

Finding information about commands and programs

Programs such as test (and many others) have a large number of command line options. Don't bother trying to memorise more than two or three of the most useful options. Instead, know where to look up information when you need it. There are several ways to find information about a command, depending on the kind of command it is.

Use ''help'' to learn about built-in commands

(Note: MobaXterm has its own non-standard help command that does not work as shown below.)

$ help true true: true   Return a successful result.     Exit Status:   Always succeeds.   $ help help help: help [-dms] [pattern …]   Display information about builtin commands.     Displays brief summaries of builtin commands. If PATTERN is   specified, gives detailed help on all commands matching PATTERN,   otherwise the list of help topics is printed.     Options:   -d output short description for each topic   -m display usage in pseudo-manpage format   -s output only a short usage synopsis for each topic matching   PATTERN     Arguments:   PATTERN Pattern specifiying a help topic     Exit Status:   Returns success unless PATTERN is not found or an invalid option is given.

Using help you can find information about the syntax of loops and conditionals, the options understood by echo and other commands, and even obtain a list of all the builtin commands by typing help with no arguments.

Notice the last section, “Exit Status”. This tells you when the command will 'succeed' and when it will 'fail'. You can use the command as a TEST in a loop or if statement to check its “exit status” and therefore to test for whatever situation affects that status, according to the description of the command.

Use ''man'' to read the manual page for most programs

Commands that are not builtin to the shell usually have a manual page. Use man command to read the manual page describing command. Use man -k keyword to see a list of manual pages related to the given keyword. (Note that the version of man used by MobaXterm does not provide the -k keyword option.)

$ man ls LS(1) User Commands LS(1)   NAME   ls - list directory contents   SYNOPSIS   ls [OPTION]… [FILE]…   DESCRIPTION   List information about the FILEs (the current directory by default).   Sort entries alphabetically if none of -cftuvSUX nor --sort is speci-   fied.     -a, --all   do not ignore entries starting with . ...etc...

Note that the manual page for a command that can 'succeed' or 'fail' (and which is therefore useful in loop and if statement tests) will almost always include an “Exit Status” section describing what situations you can test for using the command.

Asking programs for help

Many programs respond to the option -h or -help or --help by printing brief instructions about how to use that program.

$ cat --help Usage: /bin/cat [OPTION]… [FILE]… Concatenate FILE(s) to standard output.   With no FILE, or when FILE is -, read standard input.     -A, --show-all equivalent to -vET   -b, --number-nonblank number nonempty output lines, overrides -n   -e equivalent to -vE   -E, --show-ends display $ at end of each line   -n, --number number all output lines   -s, --squeeze-blank suppress repeated empty output lines   -t equivalent to -vT   -T, --show-tabs display TAB characters as ^I   -u (ignored)   -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB   --help display this help and exit   --version output version information and exit   Examples:   /bin/cat f - g Output f's contents, then standard input, then g's contents.   /bin/cat Copy standard input to standard output.

Commands that are useful as TESTs will generally tell you about their “exit status” too. For example, on my computer, the output from grep --help includes the following two lines:

Exit status is 0 if any line is selected, 1 otherwise;
if any error occurs and -q is not given, the exit status is 2.

Summary

  • A variable name begins with a letter, followed by zero or more letters or digits.
  • The underscore _ is a letter.
  • Assign a value to a variables using name=value.
  • To protect spaces inside a word or value, use 'single quotes' or "double quotes".
  • Inside 'single quotes' neither “$” expansions nor wildcards work.
  • Inside “double quotes” variables and other “$” expansions are performed, but wildcards are ignored.
  • Variable expansion: get a variable's value using $name or ${name}.
  • Remove prefixes with ${name#pattern} and suffixes with ${name%pattern}.
  • In a script, access command-line arguments using $1 for the first argument, $2 for the second, and so on.
  • In a script, use “$@” to obtain a list of all the arguments (with spaces inside individual arguments preserved).
  • Arithmetic substitution: $((expression)) expands to the result of evaluating the given arithmetic expression.
  • Command substitution: $(command) expands to the result of running command (which can be a pipeline).
  • A while loop performs some commands until a 'test' program or command fails.
  • The if statement conditionally runs some commands based on the success of a 'test' or 'condition' command.
  • The test command implements many kinds of tests and then succeeds or fails in a way useful for while and if.
  • [ expression ] is shorthand for test expression.
  • Many commands can be used as the 'test' or 'condition' in an if statement or loop.
  • In a loop, break exits from the loop immediately and continue restarts the loop immediately.
  • In a script, exit terminates the entire script immediately. (Use it to terminate your shell, too.)
  • Press Tab to complete a filename. Press it again to see a list of possible completions.
  • Use help command, man program, and program --help to learn about commands and programs.
2020/09/03 16:03

Week 10 — The Internet

This week's topic is the Internet: what it is and how it works.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class quiz on Friday.

What you will learn from this class

  • What the Internet is.
  • What a Network Interface Card is and how NICs are identified by their MAC addresses.
  • How wires, cables, and WiFi connect devices together to form a Local Area Network.
  • How devices on the Internet are identified by their IP address.
  • How the Domain Name System translates names into numeric IP addresses.
  • How data is broken into packets and routed through the Internet from sender to destination.
  • The difference between the Transmission Control Protocol and the User Datagram Protocol.
  • How reliability and performance tradeoffs can be made by choosing TCP or UDP.
  • How to obtain information about your computer's NIC, IP address, DNS server, etc.
  • How to use ping, traceroute, ifconfig, and nslookup to diagnose network problems.

Preparation

This week's preparation is to watch some short videos about networks and the Internet.

Introduction

Begin by watching four short videos (averaging about 5.5 minutes each) on YouTube. These videos present an easy introduction to the topic. They also have very good captions in Japanese. To turn the captions on, click the “Subtitles/Captions” button at the bottom-right of the video window. Next to that button is a “settings” button (it looks like a cog wheel) where you can change the captions to your preferred language.

1. What is the Internet? (3.5 minutes)

https://www.youtube.com/watch?v=Dxcc6ycZ73M&list=PLzdnOPI1iJNfMRZm5DDxco3UdsFegvuB7&index=1

2. Wires, cables, and WiFi (6.5 minutes)

https://www.youtube.com/watch?v=ZhEf7e4kopM&list=PLzdnOPI1iJNfMRZm5DDxco3UdsFegvuB7&index=2

3. IP addresses and DNS (6.5 minutes)

https://www.youtube.com/watch?v=5o8CwafCxnU&list=PLzdnOPI1iJNfMRZm5DDxco3UdsFegvuB7&index=3

4. Packets, routing, and reliability (6 minutes)

https://www.youtube.com/watch?v=AYdF7b3nMto

If you do not understand some word or phrase then pause the video and use Google, Wikipedia, or even YouTube to find a brief explanation online. If the explanation does not help (has too much or too little information, or is badly written) move rapidly on to a different explanation. If you cannot find any explanation that makes sense, ask on Teams for recommendations.

Technical details

Deepen your knowledge ideas by watching the following videos. The total time is about 25 minutes. Use the summaries below to make sure you understood the important ideas presented in each video.

1. NICs (3.5 minutes)

https://www.youtube.com/watch?v=oo-tn17rUBo

Network Interface Cards (NICs) connect computers, printers, and other devices to a network. A single device can have multiple connections to one or more networks. NICs can have connectors for electrical “copper” Ethernet wires (RJ45) or light “optical fibre” (SFP, Small Form-factor Pluggable). Every NIC has a unique, hard-coded (cannot be changed) Media Access Control address. This address is used to identify pairs of NICs that are communicating directly with each other on a local area network (LAN).

2. Local area network devices (7 minutes)

https://www.youtube.com/watch?v=1z0ULvg_pW8

A hub connects all devices on a network together. A hub is not intelligent: it copies every packet received to all the other connected devices. This creates unnecessary traffic and wastes bandwidth.

A switch is intelligent: it learns the physical (MAC) addresses of all the devices connected to it. A switch sends received packets only to the intended destination device. Switches reduce unnecessary traffic on the network. Hubs and switches form a local area network (LAN) and communication is always direct between to devices based on MAC addresses.

A router communicates both with a LAN and with another external “wide area” network (WAN), usually the Internet. The Internet uses IP (Internet Protocol) addresses. When a router receives a packet the IP address determines if the packet is meant for the LAN or WAN. If the packet is meant for the LAN it is delivered directly to a local device. If the packet is meant for WAN it is forwarded directly to another router connected to another LAN. Routers are a kind of gateway for each LAN. The Internet is made of many routers connected together. Hubs and switches create networks, routers connect networks together (as an interconnected network = an internet).

3. Breaking data into packets (5 minutes)

https://www.youtube.com/watch?v=oj7A2YDgIWE

Networks are connected together to make the Internet. Billions of devices are connected to the Internet. To get data from one part of the world to another we need to package the data and then send it through many routers to its destination. Before it is transmitted over the network, large data is chopped into smaller pieces called PACKETS. Each is sent individually along with information about how to reconstruct the original data. The destination of a packet is indicated by an IP address. Each router understands how to forward a packet to the next router, one step or “hop” closer to the final destination. When the destination router is reached, the data is sent to the local device that should receive it. The device reassembles the original data which is then presented to the user.

4. Naming and DNS: the Domain Name Service (6 minutes)

https://www.youtube.com/watch?v=mpQZVYPuDGU

Computers are identified by numbers, but humans like to use names instead of numbers. To make communication easier, the domain name system (DNS) resolves (translates) domain names to numbers that are IP addresses. You can use the Internet with IP addresses if you like, but using names is much easier. DNS works like a telephone directory: before calling a remote telephone you look up the number based on the name of the person you want to reach. To turn a domain name into an IP address number, your local “resolver” (usually your router) first asks a root server. Your resolver is then sent to a series of other servers, each one closer to “yahoo.com”. Finally it asks the name server owned by Yahoo about “yahoo.com” which provides an authoritative answer.

E.g., to turn “yahoo.com” into an IP address:

  • Your resolver asks a root server: “What is the address for 'yahoo.com'?”
  • The root server tells your resolver: “Ask the top-level domain (TLD) server in charge of '.com' addresses”
  • Your resolver asks the '.com' TLD server: “What is the address for 'yahoo.com'?”
  • The '.com' server tells your resolver: “Ask 'ns1.yahoo.com', the authoritative server in charge of all the 'yahoo.com' addresses”
  • Your resolver asks the 'ns1.yahoo.com' server: “What is the address for 'yahoo.com'?”
  • The 'ns1.yahoo.com' server tells your resolver: “74.6.231.20” (the actual answer to the original question)
  • Your resolver then remembers (caches) the answer so that the next time you need to look it up, the answer is immediately available.

5. Communication protocols: TCP vs UDP (4 minutes)

https://www.youtube.com/watch?v=uwoD5YsGACg

TCP (transmission control protocol) is one of the main protocols used on the Internet. TCP guarantees that all transmitted data is received, in the correct order. First a connection is established between two computers that want to communicate. This “synchronises” the communication (using “SYN” packets) so the computers know what packets have been sent and received. Data is then transferred between the computers. The receiver tells the sender about missing packets, which are re-transmitted. The receiver sorts received packets into their original order, reassembles the data inside them, and delivers it to the local application.

Typical uses of TCP:

  • e-mail,
  • video/audio streaming,
  • other information that must not have gaps or be rearranged.

UDP is a connectionless protocol. There is no initial connection, no synchronisation between sender and receiver, and therefore no guarantee of reliability. UDP does not care (or even know) if data is lost or if it arrives out of order at the destination. Because of lower overheads (no synchronisation, retransmission, reordering) UDP is faster than TCP.

Typical uses of UDP:

  • real-time sensor data,
  • character position updates in online video games,
  • other information that is constantly and rapidly renewed.

6. Networking tools (4.5 minutes)

https://www.youtube.com/watch?v=vJV-GBZ6PeM

The ping command sends packets from one computer to another requesting a simple “I am here” reply. Use it to detect router errors (“destination host unreachable”), DNS naming errors, etc.

The traceroute (tracert on Windows) shows the route a packet takes through the Internet to reach another machine. Use it to pinpoint where a problem lies if packets do not reach their destination.

Background information

Watch these videos if you would like additional information.

1. How does the Internet work? Networks and addresses explained (10 minutes)

An alternative explanation of how the Internet works: https://www.youtube.com/watch?v=82m2du-zgmY

2. The Internet vs. The Web (5 minutes)

A brief history of why and how the Internet was created: https://www.youtube.com/watch?v=CX_HyY3kbZw

3. Network troubleshooting using PING, TRACERT, IPCONFIG, and NSLOOKUP (14.5 minutes)

More networking tools (ifconfig, nslookup): https://www.youtube.com/watch?v=AimCNTzDlVo

Note: The “traceroute” and “ifconfig” commands are called “tracert” and “ipconfig” on Windows. If you have MacOS or Linux, “traceroute” and “ifconfig” work similarly to the commands described in this video.

2020/09/03 16:03

Week 11 — Mobility of data and computation

This week's topic is about mobility: moving your data, and your computation, around the Internet.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class exercises on Friday.

What you will learn from this class

  • What the File Transfer Protocol is.
  • How to transfer files to/from a remote server.
  • How to view the contents of a remote server using a web browser.
  • How to set up a FTP server on your Windows computer.
  • How to connect to a command line on a remote machine using secure shell.
  • How to copy files to/from a remote machine securely using secure copy.
  • How to synchronise two directories or files using rsync.
  • What remote desktops are.
  • How to use VNC to connect to a remote desktop.

Preparation

This week's preparation is to watch some short videos about Internet services that help with mobility of data and computation.

Notes

FTP: file transfer protocol

What is FTP?      https://m.youtube.com/watch?v=wig1szO7en8

Summary:

  • File Transfer Protocol (FTP) uses TCP as its underlying transport protocol.
  • FTP allows files or data to be transferred between any two computers on the Internet.
  • Because it uses TCP, all transfers are reliable.
  • FTP uses a “client-server” model: the remote computer (server) stores the files, and your computer (client) transfers files to or from the server.
  • When connecting to the server, the client user (you) must first authenticate (log in).
  • If the connection is accepted, the client (you) can send commands to list directories, upload (send) or download (receive) files, and so on.
  • You can interact with FTP using a graphical client, which looks a bit like the Windows Explorer or Mac Finder.
  • You can also interact with FTP using a command-line interface typing actual FTP commands yourself, which is much more flexible.
  • FTP is useful for large file transfers or bulk uploads that are impossible with e-mail or online file sharing services.
  • FTP is fast and uses very few resources on the server.
  • FTP is often used by web site hosting services (Wix, GoDaddy, Weebly, etc.) for users to upload their web site content.

Useful FTP client commands

ls   – list the current directory
cd dirname   – change the current directory
bin   – change to binary mode (ensures perfect copies of files)
get filename   – download (receive) files
put filename   – upload (send) files
bye   – terminate session (log out)

Exercise: practice using FTP on a public test server

You can practice FTP using a public server providied by speedtest.tele2.net. The example shown here uses the Windows FTP client, but it will work the same on Mac or Linux. The parts typed by the user are shown with a red background. (Click on the image to see it at the original size.)

  1. Connect to the server using this command: ftp: speedtest.tele2.net
  2. When asked, enter the user name: anonymous
  3. When asked for a password, press Enter (to leave the password blank).
  4. Use ls or dir to obtain a listing of the public FTP directory.
  5. Make sure you are transferring files in binary mode with this command: bin
  6. Download a test file using this command: get 10MB.zip
  7. Close the session with this command: bye

(Note that these zip files contain junk and there is nothing you can recover from them.)

Exercise: view an FTP server using your web browser

Most web browsers can connect to and view FTP sites.

  1. Visiting this URL in a browser tab or window: ftp://speedtest.tele2.net/
  2. Try clicking on the upload directory to enter it.
  3. Leave the window open while you complete the next exercise.

Exercise: practice uploading a file to an FTP server

  1. Create a small text file called test.txt containing a few words of text.
  2. Connect to speedtest.tele2.net using your command-line FTP client, as shown above.
  3. Once connected, use the cd upload command to change to that directory.
  4. Use the bin command to change to binary mode.
  5. Use the put test.txt command to send your text file to the server.
  6. Refresh your browser window; you should see your file on their server.
  7. Click on test.txt in your browser, download the file, and verify it has your few words of text inside it.

Exercise: practice using FTP on your own Windows machine

SSH: secure shell (remote login) and secure copy

Using SSH and SCP:      https://m.youtube.com/watch?v=rm6pewTcSro

ssh is stands for secure shell. ssh is a way for you to remotely access a computer with full command line access.

To connect to a remote machine, use ssh username@address where

  • username is your account on the remote machine and
  • address is the IP address or DNS name of the remote machine.

The default port of ssh is 22, but for security reasons many administrators choose to run ssh on a different port. To change port that you are connecting to (e.g., port 1234) use the -p (port) option:

ssh -p 1234 username@address

Once connected, ssh will prompt you for your password on the remote machine. Type the password to authenticate and log in. If successful you will have full command-line access to the remote machine, just like running a terminal on that machine.

One thing you can do remotely is to find the full path name for a file that you want to copy securely to your local machine using scp.

scp stands for secure copy. scp is a way to move files between computers that is more secure than FTP. It can copy individual files, multiple files using wildcards, or an entire directory (recursively).

The syntax of scp is just like the cp command:

scp fromPath toPath

If you are copying a directory recursively, add the -r (resursive) option:

scp -r fromDirectoryPath toPath

In either case the toPath can be a filename (which is replaced if it exists) or a destination directory name.

Either fromPath or toPath (but usually not both) can be a remote file. The syntax for a remote path is username@address:path where

  • username is your account name on the remote machine,
  • address is the IP address or DNS name of the remote machine, and
  • path is the path to the remote file, either absolute or relative to your home directory on the remote machine.

(Note that if you want to guarantee that path is the name of a directory then put a trailing / after it as shown in the examples below.)

If the ssh port on the remote machine is not the default 22 then add the -P (port) option (note the capital letter).

Assuming my remote username is piumarta, the remote machine is called server, and ssh is running on port 1234, then

scp -P 1234 piumarta@server:/tmp/data.txt /tmp/
copies /tmp/data.txt from the remote machine server into the directory /tmp on the local machine
scp -P 1234 /tmp/data.txt piumarta@server:data/
copies /tmp/data.txt from the local machine to the data directory located in my home directory on the remote machine server.
scp -P 1234 -r $HOME piumarta@server:/var/backups/laptop
copies my entire home directory recursively to /var/backups/laptop on the remote machine server.

The last example makes a backup of all your personal files. However, it is not an efficient way to back them up because it will copy every file — even those that have not changed since the last backup.

Synchronise files/directories: rsync

The rsync command synchronises two files or directories. The files or directories can be local or remote. The syntax is the same as scp and the -r option (recursive) is used to copy directories.

One popular use of rsync is to make backups of important files. Compared to scp, the advantage of rsync is that only changed files are copied to the destination. The first time you make a backup using rsync can take a long time, but subsequent backups are much faster because they copy only the changes you made to your files. Even very large files are copied efficiently because only the changed parts are updated.

How to make backups using rsync:      https://m.youtube.com/watch?v=8d5B1JC-1d4

Remote desktop

A remote desktop gives access to the graphical user interface of a remote computer over the Internet. A user can interact with the remote system as if they were sitting in front of it. Keyboard and mouse inputs are transferred from the user's computer to the remote computer, and screen updates are transferred from the remote computer to the user's computer.

Typical uses include using a workplace workstation when at home or vice versa, fixing a computer problem remotely, performing administrative tasks easily, and demonstrating processes or software applications. In addition, headless computers that have no monitor, keyboard, or mouse can easily be accessed remotely by administrators.

Protocols for remote desktop connectivity include:

Remote Desktop Protocol
RDP is built-in to Windows professional and higher versions. One disadvantage is that only one person can access the controlled computer at once. When a remote user connects to it, the local user is locked out.
Virtual Network Computing    https://tightvnc.com/
VNC is available on (and interoperates between) Windows, MacOS, and Linux. One advantage is that multiple users can connect to the same remote screen, and a user sitting at the remote computer is not locked out when remote users connect.

How to install VNC:      https://turbofuture.com/computers/How-to-Install-and-Configure-TightVNC

How to use TightVNC:      https://www.youtube.com/watch?v=x9xTyh63Tos

ネットワーク経由でWindows 10を操作する(VNC編):      https://www.youtube.com/watch?v=98rQ9J5XE_g

2020/09/03 16:03

Week 12 — The World Wide Web

This week's topic is about the world wide web and how it works.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class exercises on Friday.

What you will learn from this class

  • What the World Wide Web is.
  • How web servers host the web.
  • How URLs work to identify the location of resources on the web.
  • How web content is provided statically or generated dynamically.
  • How a client uses HTTP to request a web resource from a web server.
  • How a server informs the client about its request succeeding or failing.
  • How different kinds of media are identified within a web page.
  • How hyperlinks create a “web” of pages that spans the Internet and therefore the world.

Preparation

This week's preparation is to watch some short videos about the World Wide Web and then to install software on your computer that will let you run your own web server.

Videos: WWW and HTTP

What is the world wide web? (4 minutes) https://www.youtube.com/watch?v=J8hzJxb0rpc
What is HTTP? (7 minutes) https://www.youtube.com/watch?v=LZJNj-HHfII
How a browser displays a web page (10 minutes) https://www.youtube.com/watch?v=DuSURHrZG6I

Software: Python 3

Linux: you probably already have Python 3 installed, but if not then install it from your repository manager (using, e.g., sudo apt install python3)

MacOS: install from MacPorts (using sudo port install python39) or from Homebrew (using brew install python3) or download an installer from python.org

Windows: download an installer from python.org

Click here for detailed Windows instructions

When Python is installed you should be able to run either

python3 --version

or

python --version

and see something like “Python 3.5.3” or “Python 3.9.0” printed. You should also be able to run the same python3 (or python) command like this

python3 -m http.server

and see output that looks like “Serving HTTP on :: port 8000 (http://[::]:8000/) …”.

(Press Control+c to stop the program.)

Notes

The three self-preparation videos cover the following topics.

What is the world wide web?

https://www.youtube.com/watch?v=J8hzJxb0rpc (4 minutes)

  • the Web can be used for any activity built around organising or exchanging data
  • the Web is accessible from computers, smart phones, and even cars
  • the Web is not the Internet
    • the Internet is the network computers use to communicate with each other
    • the Web is just one application protocol that uses the Internet for communication
  • a Web server is a computer that is always connected to the Internet, specifically designed to store information and share it with Web browsers
  • one or more Web sites can be hosted on a Web server
  • Web sites are identified by the IP address of their server, usually in the form of a domain name
  • the name (IP address) says which server has the Web site content we want
  • the Web is special because of its non-linear organisation of data (compared to a book which is read linearly, page by page, in sequence)
  • each page or other resource on a Web server has a unique path name that comes after the server name
  • a Uniform Resource Locator (URL) identifies a Web document or resources
    • when people say “a Web address” they usually mean “a URL
  • a URL combines a protocol (http) with a server address (its DNS name) and a path name to a resource on that server (such as a Web page or media file)
  • URLs can be embedded in Web pages in the form of hyperlinks
  • when you click on a hyperlink your browser displays the document that it refers to
    • this is what most people call “following a link”
  • a single Web page can link to many other related pages or media files
    • unlike a linear book, additional information and ideas can be linked to and expanded as soon as they are encountered
  • the hyperlinks therefore form a loose, interconnected network, like a spider's web
  • in fact you can even say that the “The Web” doesn't really exist
    • “The Web” is made from all the the spaces between Web pages and the resources that they link to
    • it is a web of relationships, and not a physical thing
    • rather like a family tree, which clearly exists but is not actually a physical thing

What is HTTP?

https://www.youtube.com/watch?v=LZJNj-HHfII (7 minutes)

  • a protocol is a standard procedure (or set of rules) governing how to do something
  • on the Internet, the Hyper Text Transfer Protocol (HTTP) governs how a Web client (browser) asks a Web server for a document or media file
    • a Web client requests content or resources
    • a Web server responds by delivering the content or resource to the client
  • HTTP and the Web are an evolution from sharing plain text files to sharing graphics- and multimedia-rich documents
  • HTML is the language of Web pages which lets you create links to resources stored on any Web server anywhere in the world
  • clicking on a link fetches and displays that resource (often a Web page)
  • a URL is a Uniform Resource Locator that identifies:
    • a specific protocol (often http)
    • an Internet server address (usually by its domain name) and port number (often omitted to use the default)
    • a path to a resource located on the server
  • port 80 is used for normal HTTP, and port 443 is for secure HTTPS (encrypted communication)
  • to fetch the resource described by a URL using HTTP:
    1. the client sends a GET request to the appropriate port on the server, along with the path of the resource it wants
    2. the server sends back the content of the resource,
  • if an error occurs the server sends back a standard document that looks like a Web page and which specifies a status code indicating what the problem was
  • the status code in the response is encoded as a number:
1xx the server is providing the client with some requested information
2xx the request succeeded and the desired document or resource is provided in the response
3xx the requested resource has moved
4xx the request failed because of a client problem or error; e.g., status code 404 means “Resource Not Found”
5xx the request failed because of a server problem or error

The developers of the Firefox browser provide a nice summary of HTTP status codes here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

How a browser displays a web page

https://www.youtube.com/watch?v=DuSURHrZG6I (10 minutes)

This goes a little deeper into the topics of the other two videos and touches on how content is described in the HTML content of a web page.

  • a Web browser is just one kind of Web client
  • any application that understands HTTP can be a Web client
  • first the user tells the browser what they want to look at in the form of a URL
  • a URL is a Uniform Resource Locator that identifies:
    1. a specific protocol (or scheme)
    2. an Internet server address (usually a domain name) and optional port number
    3. a resource on the server identified by its path within the server's file store
    4. optional parameters following a “?
    5. an optional section name within the page following a “#
  • URLs in the document can also specify other resources needed by the page
    1. cascading style sheets (CSS) describing how the content should be presented
    2. JavaScript programs that add dynamic behaviour to the content
  • the exact same kind of URL can appear in hyperlinks (or “anchors”) inside as Web document
  • to fetch a Web page, given a URL, the client
    1. opens a TCP connection to the server using the “address” part of the URL
    2. sends a HTTP GET request that specifies the resource it wants using the “path” part of the URL
  • in response to a GET request, the server
    1. looks for a file or other resource corresponding to the path part of the GET request
    2. if possible, sends back the content of the resource for the browser to display
    3. if not possible, sends back a Web “page” that describes what went wrong
  • a normal Web page contains a document described using Hyper Text Mark-up Language (HTML)
  • the browser uses the HTML to build a model of the content of the page including paragraphs, section heading, hyperlinks, etc.
  • if there are any other resources needed to display the page, they are specified by URL and are fetched by the browser while rendering the page
  • any style sheets that were specified are used to choose fonts, colours, etc., for paragraphs, headings, tables, and so on
  • any JavaScript programs that are included in the page start to run to add dynamic behaviour to the document
  • based on the different parts of the page, the browser builds a visual representation of the page and renders it on the screen for the user to see

More technical details

If the above videos were not detailed enough, you can find many longer videos that explain the World Wide Web in much greater detail. Here is an example that is maybe one step up in detail from the videos above: How The Web Works (12 minutes)

Exercise

If you have not already done so, follow the instructions above to install Python 3 on your computer.

With Python 3 installed, running a Web server on your computer is super easy. Create a directory to store your web site and change to it. (I usually call mine something like “html”.)

mkdir html
cd html

Use cat or nano to create a file called index.html that has the following contents:

<html>
<body>
<h1>Hello, world!</h1>
<p>Welcome to your Computer-Wide Web.</p>
</body>
</html>

In the same directory, run this command (use python if you don't have python3):

python3 -m http.server

Open a new tab in your Web browser, paste (or type) the following URL into the address bar

http://localhost:8000

and then press return. If all went well, you should see your web page in the browser.

What that URL means

Try modifying the content of the “index.html” file. For example, add more lines containing “<h1></h1>” or more lines containing “<p></p>” (with something interesting instead of “…”, obviously).

Pick a word inside a “<p></p>” section and put “<i>” in front of it and “</i>” after it.

Pick a word inside a “<p></p>” section and put “<b>” in front of it and “</b>” after it.

Try putting “<tt>” and “</tt>” around another word.

How much fun is that? 😀

Don't forget: every time you modify something in your index.html file you must reload the page in your browser to see the change. A convenient way to do this is by pressing Control+r while your browser window is active. (In conjunction with Alt+Tab to switch between applications, you can even edit the index.html file and reload the browser without ever taking your hands off the keyboard.)

2020/09/03 16:03

Week 13 — Web content

This week's topic is about creating content for the Web.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class exercises on Friday.

What you will learn from this class

  • What HTML is.
  • What the basic structure of a Web page is.
  • How to divide your page into sections.
  • How to add lists and tables to your page.
  • How to include graphics in your page.
  • How to make a hyperlink to another (external) Web page.
  • How to use fragments to identify different parts of your Web page as part of its URL.
  • How to make a hyperlink to another part of your (current) Web page.
  • How to use CSS to change the style of different elements in your web page.
  • How to identify a specific element by naming it, or a group of elements by collecting them into a class.
  • How to add simple dynamic behaviour to a web page using JavaScript.

Preparation

This week's preparation is to watch two videos about Web pages, specifically about hypertext markup language (HTML) and cascading style sheets (CSS), and then to complete a few exercises that will familiarise you with the basics of HTML and CSS.

Videos

The first video (HTML tutorial) covers most (but not all) of the material in the exercises below. I recommend you follow along practically with the first video, creating your own small web site.

Exercises

1. Create a small "Web development" environment

First, create a file called “index.html” in your current directory that contains the following text:

<!DOCTYPE html>
<html>
  <body>
    <p>Welcome to your Computer-Wide Web!</p>
  </body>
</html>

This is “the world's simplest Web page”. To display it in your browser you have two options: open the page directly, or run a small Web server.

Option 1: display the page directly

  1. Open a blank page or tab in your favourite Web browser.
  2. Open a file manager on the current directory. Either navigate there in your file manager, or open a new file manager window from a terminal; for example:
    • on Windows, type: explorer.exe .
    • on Mac, type: open .
    • on Linux, type: thunar . (or substitute whatever your file manager is called).
  3. Drag the file index.html from the file manager window into the address bar of your browser window.

Option 2: run a small HTTP server

  1. Open a new terminal window.
  2. Use cd to navigate to the directory where you created index.html.
  3. Run the HTTP server: python3 -m http.server
  4. Type this URL into your Web browser address bar: http://localhost:8000/index.html

You can make changes to index.html using any text editor. Press the “reload” button (Control+r and/or F5) in the browser window to see the changes.

2. Add a heading to the page

Look at your Web page. It contains three nested elements that are delimited by starting and ending tags:

  1. The entire page which is enclosed in: <html> … </html>
  2. Inside that, the visible content of the page which is enclosed in: <body> … </body>
  3. Inside that, a single paragraph of text which is enclosed in: <p> … </p>

(The first line is not really part of the Web page. It is an SGML document type declaration which tells the browser exactly which set of rules it should follow when trying to understand the rest of the file's contents.)

To make a heading you surround some text with <h1> and </h1>. Since the heading should be visible, it must be inside the body element. Since the logical place for the heading would be before the paragraph of text, it should also come before the p element.

<!DOCTYPE html>
<html>
  <body>
    <h1>Hello, world</h1>
    <p>Welcome to your Computer-Wide Web!</p>
  </body>
</html>

Modify the file and refresh the page to see the heading.

The elements in a Web page form a “hierarchical” tree structure, just like the tree structure of folders and files in a file system. The root of the hierarchy is the html element. The body element is a child of the html element. In your original document the body element is the parent of the p element, and the p element is a child of the body element.

Elements can have any number of children. When you added the h1 element for the heading you put it inside the body element. The h1 is therefore a child of the body element and a sibling of (which means “has the same parent as”) the p element.

Child elements are ordered. Since you added the h1 before the p element in the document, it comes before the p element in the list of children belonging to the body.

(Try swapping the order of the h1 and p elements. The page changes in the obvious way, because you have changed the order of the child elements in the body element.)

3. Add some font weight and style changes

Anything that looks like <word> is called a start tag and it indicates the start of some element in the page. Anything that looks like </word> is called a end tag and it indicates the end of some element in the page. With only a few exceptions, the <word> and </word> tags must come in matching pairs. The corresponding element groups its content (everything inside it) together and affects how that content is interpreted or displayed.

For example, to make some text bold you can surround it with <b> and </b> tags. Make the word “your” bold by surrounding it with these tags and refreshing the page in the browser.

To make some text italic you can surround it with <i> and </i> tags. Make the words “Computer-Wide” italic by surrounding them with these tags and refreshing the page in the browser.

<!DOCTYPE html>
<html>
  <body>
    <h1>Hello, world</h1>
    <p>Welcome to <b>your</b> <i>Computer-Wide</i> Web!</p>
  </body>
</html>

Everything in a Web page must follow a strictly nested structure and must be nested without overlapping. For example, in your Web page you have this paragraph:

<p>Welcome to <b>your</b> <i>Computer-Wide</i> Web!</p>

The </b> ends the b element, and the <i> starts the i element. Those two tags cannot be swapped, like this

<p>Welcome to <b>your<i> </b>Computer-Wide</i> Web!</p>

because then the elements would overlap, and the single space between <i> and </b> would have to belong to two elements at the same time. Even though many browsers will do what you expect if you write HTML this way, the results are not guaranteed and you should try hard to avoid it.

4. Give your page a title

You have probably noticed that your browser tabs contain the name of the web site they are displaying. Each page can specify what it wants this title to be, along with many other kinds of meta data describing the page. This information is not part of the page content and so, logically, it should not be part of the body.

The html (root) element always has two children, the body and another element called the head. The head element contains information about the page (rather than the actual content of the page) and is always present (even if you omit its tags in your HTML code and never mention it at all). The children of the head contain the meta data for the page, e.g., its title.

To add the title to your page, begin by adding the missing <head> and </head> tags before the body. Then inside the head, add a title element that contains the text you want to appear in the browser tab for your page.

<!DOCTYPE html>
<html>
  <head>
    <title>Hello page</title>
  </head>
  <body>
    <h1>Hello, world</h1>
    <p>Welcome to <b>your</b> <i>Computer-Wide</i> Web!</p>
  </body>
</html>

5. Add a horizontal rule at the end of your page

A hr element creates a horizontal rule. Since it separates paragraphs, it should be placed outside any paragraph p tag.

The hr element has no children, so there is no point having separate start <hr> and end </hr> tags. The end tag is incorporated into the start tag by placing the / just inside the end of the start tag, like this: <hr />

<!DOCTYPE html>
<html>
  <head>
    <title>Hello page</title>
  </head>
  <body>
    <h1>Hello, world</h1>
    <p>Welcome to <b>your</b> <i>Computer-Wide</i> Web!</p>
    <hr/>
  </body>
</html>

6. Add an image to your page

Since images can be embedded in text they should appear inside paragraph p elements.

Add another paragraph element after your h1 heading. You can put some text in it, if you want, but the real purpose is to put an image in it. The next step is therefore to find (or download) an image that you like and copy (or save) it in the same directory as your index.html file. Let's say it is called image.jpg.

To add an image to the page, create an img element to hold it. This img element is special in two ways:

  1. Just like the hr element, the img element does not have any children. The end tag can be included in the start tag by putting the “/” inside the start tag, like this: <img />
  2. The img needs to know where the image is stored. It gets this information through a src (source) attribute that is attached to the tag: <img src=“path-to-image”/>

From now on I will leave out most of the code, so here is just the part of the file near the img:

<h1>Hello, world</h1>
<p>At least it is not a kitten: <img src="image.jpg"/></p>
<p>Welcome to <b>your<i> </b>Computer-Wide</i> Web!</p>

(Obviously, replace image.jpg with the actual name of your image.)

Is your image too large or too small? You can put more attributes inside the tag. The width attribute affects how wide the image will be when rendered. To make it occupy 20% of the width of the screen, set the width attribute to "20%".

<h1>Hello, world</h1>
<p>At least it is not a kitten: <img src="image.jpg" width="20%"/></p>
<p>Welcome to <b>your<i> </b>Computer-Wide</i> Web!</p>

Note that the attribute values (the name of the image file, the quantity describing the width) should always be written inside quotation marks.

7. Add some lists to your document

HTML supports several kinds of list. An unordered list has items indicated by bullets. The ul element represents an unordered list. It has zero or more li list item child elements.

Lists are usually displayed between paragraphs and so should appear outside p paragraph elements.

<h1>Hello, world</h1>
<ul>
  <li>use <tt>head</tt> to hold the meta data</li>
  <li>use <tt>body</tt> to hold the content</li>
</ul>

(Note the tt elements, within the li list items, that cause their content to be rendered in teletype [fixed-width] font.)

An ol ordered list has numbered items. Its children are the same li elements, but instead of bullets they are indicated by numbers. Try changing <ul>…</ul> to <ol>…</ol> and check the effect.

8. Add a table

A table element contains zero or more table row elements. A table row tr element contains zero or more table data elements. A table data td element contains text, images, or any other content that can appear in a paragraph.

<h1>Hello, world</h1>
<table>
  <tr> <td><tt>ul  </tt></td> <td>for an unordered (bullet) list </td> </tr>
  <tr> <td><tt>ol  </tt></td> <td>for an ordered (numbered) list </td> </tr>
  <tr> <td><tt>li  </tt></td> <td>for each item in either list   </td> </tr>
  <tr> <td><tt>html</tt></td> <td>for the entire Web page        </td> </tr>
</table>

Hyperlinks are represented by anchor a elements. The target URL is given as an href (hypertext reference) attribute to the start tag. The children of the a element are the text, images, or other content that will be the “clickable link” visible on the page.

<h1>Hello, world</h1>
<p>Let's go <a href="http://kuas.org">somewhere else</a>!</p>

10. Add some style to your page to make things look different

Every element has a default “look and feel”. The defaults can be changed by specifying style attributes for elements. Style information is placed inside a style element, and style elements can be placed anywhere in the document (either the head or the body).

A simple style might place a border around our table.

<head>
  <title>Hello page</title>
  <style>
    table {
      border: 1px solid red;
    }
  </style>
</head>

The style element contains selectors and associated declarations. In the example above, the selector table refers to every table element in the document.

Following the selector, enclosed in curly braces, are a set of declarations that apply to the selector. Each declaration has the form: property : value ;

In this example the table selector has only one property affected by style. The border property controls how the outside border of the table elements will be drawn. This example sets the border of all tables to be one pixel wide (1px) drawn as a solid line in the colour red.

11. Add some dynamic behaviour using JavaScript

Let's add a button to the page and make it do something fun when it is pressed.

A button element works a bit like an anchor (a) element. Since buttons can appear anywhere in a paragraph, they should appear inside p elements. The children of the button element specify the content of the button's label.

<h1>Hello, world</h1>
<p>Here is a strange button!
  <button>
    Click Me!
  </button>
</p>

The action that occurs when the button is clicked can be affected by attributes attached to the button element. The onclick attribute contains JavaScript code that will run when the button is clicked. Let's use it to change the colour of the heading.

The first thing we have to add is an identifier that will give a unique name to the heading that we want to modify. To do that, add an id attribute to the h1 element. The value of the id is the name by which we can look up the heading.

<h1 id='hello'>Hello, world</h1>

After adding this attribute we can use a JavaScript function getElementById() to find the heading by its identifier. Once found, the same JavaScript code can set the heading's style.color attribute to change its colour.

<h1 id='hello'>Hello, world</h1>
<p>Here is a strange button!
  <button onclick="document.getElementById('hello').style.color='red'">
    Click Me!
  </button>
</p>

Note that, in JavaScript, strings can be surrounded by "double quotes" or 'single quotes'. As shown in the above example, this is convenient when, e.g., JavaScript strings have to be written inside attribute values (which also must be strings).

Finally, to truly make the button strange, try this:

<button id='clicky' onclick="document.getElementById('clicky').remove()">
  Click Me!
</button>

Summary

Your final index.html should look something like this.

<!DOCTYPE html>
<html>
  <head>
    <title>Hello page</title>
    <style>
      table { border: 1px solid red; }
    </style>
  </head>
  <body>
    <h1 id='hello'>Hello, world</h1>
    <p>
      Here is a strange button!
      <button onclick="document.getElementById('hello').style.color='red'">
        Click Me!
      </button>
    </p>
    <p>
      Let's go <a href="http://kuas.org">somewhere else</a>!
    </p>
    <table>
      <tr> <td><tt>ul  </tt></td> <td>for an unordered (bullet) list</td> </tr>
      <tr> <td><tt>ol  </tt></td> <td>for an ordered (numbered) list</td> </tr>
      <tr> <td><tt>li  </tt></td> <td>for each item in either list  </td> </tr>
      <tr> <td><tt>html</tt></td> <td>for the entire Web page       </td> </tr>
    </table>
    <ul>
      <li>use <tt>head</tt> to hold the meta data</li>
      <li>use <tt>body</tt> to hold the content</li>
    </ul>
    <p>
      At least it is not a kitten: <img src="image.jpg" width="20%"/>
    </p>
    <p>
      Welcome to <b>your<i> </b>Computer-Wide</i> Web!
    </p>
    <hr/>
  </body>
</html>

Further information

We have only scratched the surface of HTML, CSS, and JS. (Each one of them could be an entire course all by itself.) If you want to learn more then there is a huge amount of information online about HTML, CSS, and JavaScript. One very good place to start is: https://www.w3schools.com/

2020/09/03 16:03

Week 14 — Web apps and 'The Cloud'

This week's topic is about creating content for the Web.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class exercises on Friday.

What you will learn from this class

  • What a 'Web app' is.
  • How to implement simple Web app.
  • What a 'cloud app' is.
  • How to implement simple cloud app.
  • What the different forms of cloud computing are.
  • What SaaS, PaaS, and IaaS mean and are.
  • What virtualisation is and how it can be used.
  • What the benefits of the cloud are.
  • What the risks of the cloud are.
  • How to set up your own 'cloud' services on your own computers.

Preparation

This week's preparation is to watch three short videos about Web applications and three short videos about 'The Cloud', and then complete a couple of short exercises to experience some of the fundamental concepts.

Videos

Each set of three videos progresses from very simple to more detailed.

Web Apps

What is a Web App? 1:48 https://www.youtube.com/watch?v=qt6gSW-uYKI
What is a web application? 6:28 https://www.youtube.com/watch?v=Hv3czsQh8tg
Web pages, Websites, and Web Applications 6:36 https://www.youtube.com/watch?v=ylbQrYhfa18

The Cloud

Computer Basics: What Is the Cloud? 2:31 https://www.youtube.com/watch?v=4OO77HFcCUs
What is the cloud? 3:00 https://www.youtube.com/watch?v=i9x0UO8MY0g
The Three Ways to Cloud Compute 7:09 https://www.youtube.com/watch?v=SgujaIzkwrE

Notes

There are two exercises below. The first shows how to run code in the user's browser. The second shows how to run a program on the server to generate page content dynamically.

If you do not want to try the second exercise then you can skip the rest of this section.

If you want to try the second exercise then you will need to install a local Web server that can run server-side scripts. The most popular language for that is called PHP. Run the following command to check if you already have PHP installed on your computer:

php --version

If your shell says “command not found” then expand the instructions below to learn how to install it.

Installing PHP

Web projects

The following two mini-projects illustrate (A) how to run a program in the client Web browser when displaying a page, and (B) how to run a program in the server to generate parts of a Web page dynamically. These techniques are the foundations of Web and cloud apps.

Project A: Create a small 'click counter' Web app that runs in your browser

We have already seen some of the necessary parts for making a small Web app:

  • making an input element, e.g., a button,
  • running JavaScript in response to an event. e.g., clicking the button, and
  • identifying elements by name using the id property and the getElementById() JavaScript function.

To make a small 'app' such as a click counter requires just a few additional things, which we will investigate soon:

  • persistent state, i.e., somewhere to store the 'state' of the application, and
  • a way to modify the content of an element, e.g., the text inside a paragraph (p).

1. Create a file containing just the structure of the web page

Let's begin very simply and create a file containing just the HTML describing the visible structure of the application. (You can call the file anything you like, including “index.html”.) The structure will have the following parts:

  • a heading (so we know what the application does),
  • a paragraph displaying the current value of the counter, which should initially be 0, and
  • a paragraph containing a clickable button that increases the counter.

<!DOCTYPE html>
<html>
  <body>
    <h1>Click counter</h1>
    <p>
      0
    </p>
    <p>
      <button>
        Count
      </button>
    </p>
  </body>
</html>

2. Modify the content of the first paragraph when the button is pressed

The text content of a paragraph is stored in the innerHTML property of the p element. Assigning a new value to the innerHTML property of a paragraph will dynamically change what is displayed inside that paragraph. For example, this will remove the old content of a p whose id is my-paragraph and replace it with a 'Welcome' message:

document.getElementById('my-paragraph').innerHTML = 'Welcome to some new content!'

To run this code whenever a button is clicked, put the code in the onclick property of the button element. The first time the button is clicked, the 0 that is initially in the paragraph will be replaced by a longer message. Let's add the two things we need which are an id for the paragraph (we'll call it display)

<p id="display"> 0 </p>

and the onclick property for the button (containing the code that updates the paragraph's content).

<button onclick="document.getElementById('display').innerHTML = 'You clicked!';">

Click here to see the updated Web app code

3. Increment the counter when the button is clicked

The innerHTML property of an element can also be read, so we can also store information there. Let's store the value of the counter in the innerHTML property of the display paragraph.

One of the interesting things about JavaScript is that values can be used very freely. If a string 'looks like' a number then you can treat it as a number (and perform arithmetic on it). This means we can use the string stored in the innerHTML of the display paragraph to remember the numeric value of the counter and to represent the string that should be displayed in the paragraph.

The += operator will increment its left-hand side by its right-hand side. For example, x += 42 will increment x by 42. Instead of assigning a new string to the display, we'll use the += operator to add 1 to it.

<button onclick="document.getElementById('display').innerHTML = document.getElementById('display').innerHTML + 1;">

Click here to see the updated Web site code

Well, that almost worked!

Instead of adding 1 to the numeric value of the count (which is stored as a string), JavaScript converted the 1 into a string and then added that to the end of the displayed count. To increment the numeric value of the count we first have to convert the string "0" into the integer 0.

JavaScript 'pro-tip': if you have a string that you know represents a decimal integer then to convert it into an actual integer you can use the 'unary +' operator. So, if x contains the string "41" then

x + 1

will be "411", whereas

+x + 1

will be 42, which is exactly the effect we want when incrementing our counter.

Putting a unary + in front of the left hand side of the binary + operator makes the counter work properly.

<button onclick="document.getElementById('display').innerHTML = +document.getElementById('display').innerHTML + 1;">

Click here to see the updated Web app code

4. Tidy up the code

More complex Web apps will, of course, have a lot more JavaScript code. The script element provides a place to put JavaScript code (in a similar way to the style element providing a place to put CSS code). One advantage of using a script element is that you can define things like 'helper functions' and global variables to store persistent values. Let's tidy up the code of our Web app in the following ways:

  • lookup the display paragraph just once, and remember it in a global variable called display,
  • create a function that updates the counter value, with a parameter that gives the amount to increment it by, and
  • use that function in the onclick action of the button.
<script>
  var display = document.getElementById("display");
 
  function count(amount)
  {
    display.innerHTML = +display.innerHTML + amount;
  }
</script>

The first line of this script creates the global variable display and assigns to it the element whose id is display. (In other words, the variable will remember the p paragraph element that is holding and displaying the current counter value.)

The last four lines of this script declare a function called count with one parameter amount. When the function count is called with a numeric argument it first converts display.innerHTML to an integer, adds amount to that integer, and then stores the result back in display.innerHTML. Storing the results back into the display.innerHTML has the side effect of causing the browser to redraw the changed parts of the screen, thereby showing the updated counter.

The final Web app looks like this:

<!DOCTYPE html>
<html>
  <body>
    <h1>Click counter</h1>
    <p id="display">
      0
    </p>
    <script>
      var display = document.getElementById("display");
 
      function count(amount)
      {
	display.innerHTML = +display.innerHTML + amount;
      }
    </script>
    <p>
      <button onclick="count(1)">
        Count
      </button>
    </p>
  </body>
</html>

Exercises and challenges

1. Does it matter where you put the script element? What happens if you move it to the start of the body?

2. Add a second button that decrements the counter.

3. Replace the buttons with a row of four buttons labelled -10, -1, +1, and +10. Make the buttons add the appropriate amount to the counter. Add another row containing just one button labelled reset that resets the counter to 0 when it is clicked.

4. Explore some more pieces of JavaScript. Maybe start with the functions ''setInterval'' and ''clearInterval'' which can be used to call a function regularly, for example, once every second. Use these functions to implement a stop watch with three buttons: “start”, “stop”, and “clear”.

Project B: Create a simple 'text chat' Cloud app that executes on both server and client

Note: to run this example you (or one of your friends) need to have a computer with PHP installed. PHP only needs to be installed on the server, so you can still try this example by connecting your browser (the client) to the app running on your friend's Web server where PHP is installed. To do this, instead of visiting localhost:8000/chat.php just replace localhost with the IP address of your friend's computer. Since the app is 'served' from your friends Web server, there is nothing left for you to do (except maybe help them to implement the chat app!).

However, if you did install PHP on your computer, then please have fun following along with the rest of this project.

0. Start a Web server that can run server-side scripts

If you are still running the Python Web server, stop it first.

Create a directory for your chat project. Change to that directory and start the PHP Web server on port 8000.

mkdir WebChat
cd WebChat
php -S localhost:8000

(If you want your friends to be able to connect to your server from their remote computers, run “php -S 0.0.0.0:8000” instead.)

1. Create a file containing just the structure of the web page

What is the simplest text chat app that you can imagine? Obviously, at the very least, the client would need:

  • an input field for you to enter a text message to send to the chat,
  • a button to press to send the message, and
  • a display area for the previously sent messages to be displayed.

It could look something like the window shown on the right. So, just as with the simple counter Web app, let's begin by entering the HTML that we need to display the static structure of the page.

Because we will be running some PHP code every time the web page is accessed, we have to use a file name ending in .php (instead of .html). You can call your file anything you like, but something like chat.php would be appropriate.

Inside a .php file you can write HTML, including CSS style and JavaScript that runs on the client, exactly like any other Web page. The following HTML will create the basic static structure of the page:

<html>
<head>
  <title>PHP Chat</title>
</head>
<body>
  <p>
    <textarea rows="10" cols="80">
    </textarea>
  </p>
  <p>
    <form>
      <input type="text" size="70" />
      <input type="submit" value="send" />
    </form>
  </p>
</body>
</html>

The new elements here are textarea, form, and input.

A textarea creates a box of a given size (rows and columns) that can display and/or allow editing of text.

input creates various kinds of input element according to its type attribute. A text input creates a single-line text box that you can type into. A submit input creates a button whose value attribute can be used to set its label.

A form groups several input elements together and allows data from those elements to be sent ('posted') back to the server. Within the form, clicking the submit input button will cause the form's data to be posted back to the server. One source of data sent back to the server is the content of any text input boxes. (Conveniently, pressing Enter inside a text input box will also post the content of the form back to the server.)

2. Add some dynamically generated content to the page

To demonstrate how PHP generates content in the page, let's insert the name of the server into the page title.

In addition to normal HTML, a .php file can also contain these two start “<?php” and end “?>” tags. Between those tags is PHP code that is run on the server before the content of the file is sent back to the client in the response to the HTTP GET request. The tags and PHP code are removed from the page content, but anything that the PHP code prints while it is running is added to the page content.

The following PHP code prints the name of the server:

echo gethostname()

Adding that code into the content of the title element, between <?php and ?> tags, inserts the server name into the page title every time the client fetches the page.

<head>
  <title>PHP Chat @ <?php echo gethostname() ?></title>
</head>

3. Make the "send" button upload the text input area to the server

A form element collects input from one or more of its child input elements and then sends the data entered into those inputs back to the server when the form is 'submitted'. When the data is sent back, two things happen: the data is included with the HTTP request, and the server responds with a new Web page that will replace the original one in the browser. This allows the server to change the content of the page every time the user submits data.

To send the form data back to the server, and also to see that data and confirm the upload worked, we have to do three things:

  1. tell the form how to reload the page when the submit button is pressed,
  2. tell the text input what name it should use to identify its content (the text the user entered) when the form data is sent back to the server, and
  3. insert the uploaded data into the textarea content as the page is reloading, to show the upload worked.

First, add two attributes to the form to make it reload the page when its data is submitted. The method attribute says how to send the form data back to the server. GET and POST are the usual ways, and we'll use POST. The action attribute says what URL should be used to reload a 'result' Web page. We will use the original chat app URL. All variable names in PHP begin with a dollar sign “$”. The PHP variable $_PHP_SELF contains the URL of the app page.

<form action="<?php $_PHP_SELF ?>" method="POST">
  <input type="text" size="70" name="send" />
  <input type="submit" value="send" />
</form>

The name attribute has been added to the text input element. Setting that attribute to send allows the server to retrieve the content of the text input element using the name send when the form data is uploaded to the server. (I used send but you can use any name you like, as long as it is consistent between the input element and the PHP code that retrieves the data.)

If you reload the page you should now find that clicking the send button, or pressing Enter in the text input field, will cause the page to reload.

4. Echo the uploaded data in the text area

The input text is currently being thrown away by the server. Let's instead insert that text into the text area, so that we can see the data is being uploaded properly.

To do this we will insert some PHP code into the textarea content. This code has to

  1. retrieve the data from the text input that was uploaded along with the page request (under the name send), and
  2. if the uploaded data is not empty, echo it to provide new content for the textarea.

When the page is reloaded the PHP variable $_REQUEST contains all the information about the request sent by the client, including any data that was uploaded from forms on the page. $_REQUEST behaves like an array, indexed by the name of the data you want. The data sent from the input text is included in the $_REQUEST array using the name send. (We told the form element to call that data send, using the name attribute of the text input.) The server can therefore access that data as $_REQUEST["send"].

Inside the textarea we will add some PHP code to retrieves the $_REQUEST["send"] data, store it in the variable $send, and then (if it is not empty) echo it (to make it become the content of the textarea element).

<p>
  <textarea rows="10" cols="80"><?php
    $send = $_REQUEST["send"];
    if ($send) echo $send;
?></textarea>
</p>
 

5. Accumulate the chat text in persistent storage on the server

PHP can write to, and read from, files stored on the server. Let's append each line of text sent in $_REQUEST["send"] into a 'chat log' file, and then insert the contents of that file into the page as the content of the textarea element. The effect will be to collect all the text sent to the chat app and echo them all in the display area.

The code will have to

  1. add a newline to the end of the uploaded text,
  2. append the line to a 'chat log' file,
  3. get the entire contents of the 'chat log' file, and
  4. echo the contents of the 'chat log' file to provide the contents of the textarea element.

Let's store the 'chat log' in a file called chatlog.txt. Instead of echoing the $send text into the textarea, first (if it is not empty) append a newline ("\n") to it (using the string concatenation operator “.”) and then use the function file_put_contents to append the result to the chatlog.txt file.

Next, let's use file_get_contents to retrive the entire contents of chatlog.txt and (if it is not empty) echo those contents to make them become the content of the textarea element.

  <textarea rows="10" cols="80"><?php
    $send = $_REQUEST["send"];
    if ($send) {
      $send = $send."\n";
      file_put_contents("chatlog.txt", $send, FILE_APPEND);
    }
    $chat = file_get_contents("chatlog.txt");
    if ($chat) echo $chat;
?></textarea>

We now have a working Web chat app! Your complete app should look like this:

<!DOCTYPE html>
<html>
<head>
  <title>PHP Chat @ <?php echo gethostname() ?></title>
</head>
<body>
  <p>
    <textarea rows="10" cols="80"><?php
      $send = $_REQUEST["send"];
      if ($send) {
        $send = $send."\n";
        file_put_contents("chatlog.txt", $send, FILE_APPEND);
      }
      $chat = file_get_contents("chatlog.txt");
      if ($chat) echo $chat;
  ?></textarea>
  </p>
  <p>
    <form action="<?php $_PHP_SELF ?>" method="POST">
      <input type="text" size="70" name="send" />
      <input type="submit" value="send" />
    </form>
  </p>
</body>
</html>

Excercises and challenges

1. Automatically set the focus to the text input area when the page (re)loads.

Currently it is necessary to click in the text input area to begin typing into it. It would be a better user experience if the text input area were automatically focused when the page (re)loads. Let's set the focus to the correct element when the page has finished loading.

Similarly to the onclick attribute for button elements, you can use the onload attribute of the body element to run some JavaScript when the page has finished loading completely. To do this:

  • give the text input element an identifier (e.g, “text”), and then
  • set the onload property of the body element to JavaScript that sets focus to the element with the identifier text.

The code you need will look something like this:

document.getElementById('text').focus()

2. Scroll the textarea element to the bottom when the page is (re)loaded.

When the chat textarea has more lines of content than space to display them, it displays the oldest lines instead of the newest ones. To fix this, you can

  • give the textarea element an identifier (e.g., chat), and
  • scroll the chat area to the bottom using JavaScript.

The JavaScript to scroll a textarea to the bottom (most recent lines) looks like this:

var chat = document.getElementById('chat');
chat.scrollTop = chat.scrollHeight;

3. Prepend the time to each message.

In PHP you can obtain the time as a string like this:

date("H:i:s")

If you prepend this to the message (in the same statement that appends the newline character to it) you will get timestamps in the chat.

4. Add each participant's name to their chat messages.

Add another text input field to the form that contains your name. Automatically include your name at the start of the message. The easiest way to do this is to send your name (in addition to the chat message text) to the server when the form is submitted, include that “name” value both in the message stored in the chat log file, and then also include the same “name” data in the generated page as the value attribute of the name text input field. This sounds a bit 'circular' but it will work to both include your name in messages and preserve your chosen name across message submissions (page reloads).

Is this a Web app or a cloud app?

Calling this a 'cloud app' is maybe debatable. However, this app does do many of the things a Cloud app would do…

  • store persistent data entered by the user on the server,
  • modify what the user sees based on prior input, and
  • allow multiple users to collaborate simultaneously from multiple remote clients.

The server-side data is stored in a simple text file and the server code runs only on one server. If the persistent data were instead stored a distributed database, and if multiple servers cooperated to share the work of handling thousands of simultaneous users, then there would be no question at all that this qualifies as a 'cloud app'.

2020/09/03 16:03

Week 15 — Safety and security

This week's topic is about using computers and networks safely and securely.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class exercises on Friday.

What you will learn from this class

  • The ways in which various media are unreliable and why backups are important.
  • What online safety is and some things you can do to improve it.
  • What Internet security is and some ways you can ensure it.
  • What Internet privacy is and some ways you can protect it.
  • What kinds of cyberattack exist including malware, viruses, and man-in-the-middle attacks.
  • How to create a strong password.
  • How to protect your computer against attack using firewalls and anti-virus software.
  • How to identify and avoid e-mail phishing attacks.
  • What a VPN is and how it improves your security and privacy.
  • What network neutrality is.
  • What Tor is and how it protects your privacy and anonymity.

Preparation

This week's preparation is to watch three short videos about safety, security, and privacy when using the Internet. You can also watch several more (optional) videos to learn about geoblocking and online anonymity.

Videos

The following three videos describe several topics related to Internet safety, security, and privacy. A short summary of the important content follows each video URL.

What is cyber security: how it works 7:06 https://www.youtube.com/watch?v=inWWhr5tnEA
  • phishing e-mails ask you for personal information (e.g., online account or banking details)
  • they try to convince you that there is a good reason to give them that information
  • the information is instead used to steal your identity and/or property
  • cyberattacks are crimes committed using the Internet or Web
  • malware is any kind of software that can cause harm
    • a 'trojan' (from 'trojan horse') is software that allows an external hacker to control your computer
    • 'adware' generates money for the attacker by causing you to see advertisements that you would normally not see
    • 'spyware' gathers information about you and sends it to the cybercriminal who can (for example) sell it
    • 'viruses' are programs that replicate themselves and then spread over the Internet, and that can damage machines, networks, and data
  • man-in-the-middle attacks occur when cybercriminals intercept or monitor your Internet communications
    • if they record your communication with an online service, they can replay the recording later and pretend to be you
  • password attacks attempt to guess your password, allowing the cybercriminal to pretend to be you
  • cybersecurity is a range of techniques and technologies you use to try to avoid cyberattacks
  • a firewall filters communication between you and the Internet and only allows authorised communications to pass
    • e.g., you might not allow incoming connections to your secure shell (ssh) port
  • a honey pot lures attackers away from real services
    • e.g., you might arrange for incoming connections to the standard ssh port to time out, very slowly, wasting a lot of the cybercriminals' time
    • the real ssh that you actually use can be running on a non-standard port
  • passwords should be difficult to guess
  • anti-virus software protects you against viruses and malware
  • a good junk e-mail filter can eliminate a lot of phishing attacks
  • cyberattacks against institutions can cause serious loss of data or even money
  • an advanced persistent threat is a cybercriminal who gains access to a system and then steals data or money slowly over a long period of time
  • a denial of service attack floods a service with many false connections, preventing legitimate users from connecting
    • the false connections often come from thousands of PCs distributed across the world that have been infected by a criminal's trojan malware
  • ethical hackers try to break into their employer's own computer systems, thereby identifying weaknesses in the security
  • security architects design strategies and apply technologies to remove those weaknesses
How to make a strong password 1:37 https://www.youtube.com/watch?v=q5DYkzOrz_I
  • often a good password is your only defence against having your personal or financial information stolen
  • using a common or simple password is like leaving the door of your house open while you go on holiday: anyone can gain access
    • avoid weak passwords: like this (1.9 seconds to crack, using freely available software on a typical 2020 computer)
  • a strong password is easy to create if you know what precautions to take
    • mix capital and small letters: LiKe tHiS (6 minutes to crack)
    • replace letters with similar-looking digits: L1k3 tHiS (2 minutes 15 seconds to crack)
    • add special or punctuation characters: L1k3 tH15!? (8 hours to crack)
    • use longer passwords, e.g., by using a pass phrase instead of a single word: m0r3 L1K3 th15! P3RH4P5? (3 million years to crack)
What is a VPN and how does it work? 3:22 https://www.youtube.com/watch?v=lh-72JCv0rg
  • VPN = virtual private network
  • a VPN connects your computer to a remote (trusted) network over an (untrusted) Internet connection
    • your computer appears to be part of the remote trusted network, not the local untrusted network
  • all communication between your machine and the remote network is encrypted, which stops cybercriminals from intercepting it
    • even on a public WiFi (e.g., in a coffee shop) nobody can intercept or spy on your VPN communications
  • a VPN makes you part of your institution's network, even when you are working at home or in a hotel
    • or the other way around; e.g: when at KUAS I often use a VPN to connect my laptop to my home network, giving me much better access to Internet services
  • at your institution, other computers and devices think that you are physically present on their network
    • printers, file shares, etc., on the remote (trusted) network are all available to you
  • you can also use a VPN to stop your ISP from spying on your Internet or Web activity and selling or logging that information
  • there are dedicated VPN companies that you can use just for this purpose, but make sure they are trustworthy before using them
  • a geoblocked website is one that is only accessible from certain parts of the world
    • your IP address is used to determine where, approximately, you are located
    • video streaming services, and some online games, use geoblocking to control which countries can access their servers
  • you can use a VPN to get around geoblocking by appearing to be located in a different country
    • when connected to the VPN, you appear to be accessing the Internet from the physical location of the remote network
    • E.g: I use VPNs in other countries to access online banking, because the banks use geoblocking to prevent 'foreigners' from trying to access the service
    • E.g: I use a VPN to watch English movies on streaming services (such as Amazon) that are geoblocked in Japan because of distribution/licensing restrictions
  • some ISPs throttle communication (make it artificially slow) when downloading files, using peer-to-peer networks, or transferring other specific kinds of data
  • a VPN can be used to hide the nature of your communications and avoid the throttling, ensuring 'network neutrality'
  • people living in countries that censor Internet services (China, USSR, etc.) can use a VPN to 'tunnel' out from their country to the open Internet
  • the secure, encrypted communication channel that a VPN creates between your computer and a remote (trusted) network is called a 'tunnel'
  • a VPN service is only as safe and trustworthy as the people who run it (and the remote network it connects you to)
    • maybe the VPN operator is logging all your activity to analyse and sell!
    • one way to avoid this is to set up your own VPN, on your own rented server in another country
      • you then know that the communication is secure, and that your activity is not being logged and analysed or sold
      • such a server can cost as little as a few hundred yen per month
      • software such as 'openvpn' makes setting up your own VPN quite easy to do (especially if you have been studying this Information Literacy course!)
  • there are also other high-tech ways to track your Internet use, even over a VPN
    • systems such as Tor can protect you from this by hiding your true location and the content of your communication

Note that there are now two common uses of the term 'VPN', which can usually be distinguished by context.

  1. the original, technical definition: a VPN extends a remote, trusted, network and allows computers located outside that network to become virtually part of that network
  2. the new, commercial definition: a service (often paid) that allows you to to connect to a remote VPN server and its network (usually in a country of your choice) to avoid geoblocking or other censorship.

The following videos are optional but you can watch them if you are interested to learn more about security, privacy, and anonymity.

Note that the first of these videos, about geoblocking, has been censored by YouTube. YouTube forces you to log in to 'prove' that you are over 18 (a violation of your privacy) before they will allow you to watch the video. The video contains nothing that is inappropriate for young people, so their censorship is really about limiting access to the knowledge it contains. Presumably Google (who own YouTube) believe your knowing about geoblocking, and how to circumvent it, is not in their financial and/or business and/or political interests. (Google, Facebook, Twitter, etc., engage in massive amounts of censorship to restrict or remove content from their platforms that criticises or contradicts their favoured political narratives and long-term socio-economic agendas.) I have fixed their unethical overreach by downloading the video and making a local copy available for you to view from this Web page.

What is geoblocking? 4:54 https://www.youtube.com/watch?v=AkALEDV2Exk (censored: view the local copy above)
Using the Tor browser for online anonymity 7:15 https://www.youtube.com/watch?v=xCXOSRsirR8
Is Tor or VPN better for privacy, security, anonymity? 12:31 https://www.youtube.com/watch?v=6ohvf03NiIA
How to make your own VPN 25:53 https://www.youtube.com/watch?v=gxpX_mubz2A

Notes

What is security?

The term security refers to the protection of individuals, organisations, and property against external threats and criminal activities. Security is focused on preventing deliberate actions that are intended to inflict harm to an individual, organisation, or property. (Bank security includes having serious locks to prevent unauthorised access to the underground vault where the big pile of gold that used to give actual value to your paper money was stored until about 50 years ago when paper money was made worthless, taking away your financial security in an activity that certainly should be considered criminal.)

What is safety?

The term safety means being protected from anything that might cause harm. The harm might come from known dangers or from unintended accidents. (Astronaut safety includes protection from the extreme temperatures in outer space. Building site safety includes wearing a hard hat to protect against accidentally dropped objects.)

What is privacy?

The term privacy relates to the rights you have to control your personal information, who can access it, and how it is used. The personal information might be explicitly collected or implied from your behaviour. (When downloading a smartphone app you agree to what personal information it can collect from your e-mails, camera, location, etc. You might also take steps to actively prevent anyone from knowing which Web sites you browse, or which products you are buying for how much from which vendors. In the case of 'free' services, you often pay by giving up your privacy: until recently, Google scanned all your gmail communications to help them decide what advertisements you should see. In 2017 they said they were going to stop doing that. Maybe they did, but even so: whenever any corporation provides an online service for 'free' then it is always the service's users who are that corporation's commercial product and source of profit, almost always at the expense of the users' privacy.)

What is network neutrality?

The term network neutrality refers to the principle that Internet Service Providers (ISPs) must not discriminate against particular uses of the Internet. Discrimination could be in the form of a slower (or capped) service, or additional fees. (If Rakuten ran the Internet in Japan then they could violate net neutrality to favour their own business by making it harder for you to choose alternatives. For example they could provide slower Internet service, or charge additional usage fees, whenever you access Amazon to make an online purchase. Geoblocking can be considered a kind of violation of net neutrality. Some countries have laws that require net neutrality from ISPs, and some content distributors such as Netflix try to license content in ways that do not require them to implement any geoblocking.)

2020/09/03 16:03