KUAS Engineering

Week 12 — The World Wide Web

This week's topic is about the world wide web and how it works.

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class exercises on Friday.

What you will learn from this class

  • What the World Wide Web is.
  • How web servers host the web.
  • How URLs work to identify the location of resources on the web.
  • How web content is provided statically or generated dynamically.
  • How a client uses HTTP to request a web resource from a web server.
  • How a server informs the client about its request succeeding or failing.
  • How different kinds of media are identified within a web page.
  • How hyperlinks create a “web” of pages that spans the Internet and therefore the world.

Preparation

This week's preparation is to watch some short videos about the World Wide Web and then to install software on your computer that will let you run your own web server.

Videos: WWW and HTTP

What is the world wide web? (4 minutes) https://www.youtube.com/watch?v=J8hzJxb0rpc
What is HTTP? (7 minutes) https://www.youtube.com/watch?v=LZJNj-HHfII
How a browser displays a web page (10 minutes) https://www.youtube.com/watch?v=DuSURHrZG6I

Software: Python 3

Linux: you probably already have Python 3 installed, but if not then install it from your repository manager (using, e.g., sudo apt install python3)

MacOS: install from MacPorts (using sudo port install python39) or from Homebrew (using brew install python3) or download an installer from python.org

Windows: download an installer from python.org

Click here for detailed Windows instructions

When Python is installed you should be able to run either

python3 --version

or

python --version

and see something like “Python 3.5.3” or “Python 3.9.0” printed. You should also be able to run the same python3 (or python) command like this

python3 -m http.server

and see output that looks like “Serving HTTP on :: port 8000 (http://[::]:8000/) …”.

(Press Control+c to stop the program.)

Notes

The three self-preparation videos cover the following topics.

What is the world wide web?

https://www.youtube.com/watch?v=J8hzJxb0rpc (4 minutes)

  • the Web can be used for any activity built around organising or exchanging data
  • the Web is accessible from computers, smart phones, and even cars
  • the Web is not the Internet
    • the Internet is the network computers use to communicate with each other
    • the Web is just one application protocol that uses the Internet for communication
  • a Web server is a computer that is always connected to the Internet, specifically designed to store information and share it with Web browsers
  • one or more Web sites can be hosted on a Web server
  • Web sites are identified by the IP address of their server, usually in the form of a domain name
  • the name (IP address) says which server has the Web site content we want
  • the Web is special because of its non-linear organisation of data (compared to a book which is read linearly, page by page, in sequence)
  • each page or other resource on a Web server has a unique path name that comes after the server name
  • a Uniform Resource Locator (URL) identifies a Web document or resources
    • when people say “a Web address” they usually mean “a URL
  • a URL combines a protocol (http) with a server address (its DNS name) and a path name to a resource on that server (such as a Web page or media file)
  • URLs can be embedded in Web pages in the form of hyperlinks
  • when you click on a hyperlink your browser displays the document that it refers to
    • this is what most people call “following a link”
  • a single Web page can link to many other related pages or media files
    • unlike a linear book, additional information and ideas can be linked to and expanded as soon as they are encountered
  • the hyperlinks therefore form a loose, interconnected network, like a spider's web
  • in fact you can even say that the “The Web” doesn't really exist
    • “The Web” is made from all the the spaces between Web pages and the resources that they link to
    • it is a web of relationships, and not a physical thing
    • rather like a family tree, which clearly exists but is not actually a physical thing

What is HTTP?

https://www.youtube.com/watch?v=LZJNj-HHfII (7 minutes)

  • a protocol is a standard procedure (or set of rules) governing how to do something
  • on the Internet, the Hyper Text Transfer Protocol (HTTP) governs how a Web client (browser) asks a Web server for a document or media file
    • a Web client requests content or resources
    • a Web server responds by delivering the content or resource to the client
  • HTTP and the Web are an evolution from sharing plain text files to sharing graphics- and multimedia-rich documents
  • HTML is the language of Web pages which lets you create links to resources stored on any Web server anywhere in the world
  • clicking on a link fetches and displays that resource (often a Web page)
  • a URL is a Uniform Resource Locator that identifies:
    • a specific protocol (often http)
    • an Internet server address (usually by its domain name) and port number (often omitted to use the default)
    • a path to a resource located on the server
  • port 80 is used for normal HTTP, and port 443 is for secure HTTPS (encrypted communication)
  • to fetch the resource described by a URL using HTTP:
    1. the client sends a GET request to the appropriate port on the server, along with the path of the resource it wants
    2. the server sends back the content of the resource,
  • if an error occurs the server sends back a standard document that looks like a Web page and which specifies a status code indicating what the problem was
  • the status code in the response is encoded as a number:
1xx the server is providing the client with some requested information
2xx the request succeeded and the desired document or resource is provided in the response
3xx the requested resource has moved
4xx the request failed because of a client problem or error; e.g., status code 404 means “Resource Not Found”
5xx the request failed because of a server problem or error

The developers of the Firefox browser provide a nice summary of HTTP status codes here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

How a browser displays a web page

https://www.youtube.com/watch?v=DuSURHrZG6I (10 minutes)

This goes a little deeper into the topics of the other two videos and touches on how content is described in the HTML content of a web page.

  • a Web browser is just one kind of Web client
  • any application that understands HTTP can be a Web client
  • first the user tells the browser what they want to look at in the form of a URL
  • a URL is a Uniform Resource Locator that identifies:
    1. a specific protocol (or scheme)
    2. an Internet server address (usually a domain name) and optional port number
    3. a resource on the server identified by its path within the server's file store
    4. optional parameters following a “?
    5. an optional section name within the page following a “#
  • URLs in the document can also specify other resources needed by the page
    1. cascading style sheets (CSS) describing how the content should be presented
    2. JavaScript programs that add dynamic behaviour to the content
  • the exact same kind of URL can appear in hyperlinks (or “anchors”) inside as Web document
  • to fetch a Web page, given a URL, the client
    1. opens a TCP connection to the server using the “address” part of the URL
    2. sends a HTTP GET request that specifies the resource it wants using the “path” part of the URL
  • in response to a GET request, the server
    1. looks for a file or other resource corresponding to the path part of the GET request
    2. if possible, sends back the content of the resource for the browser to display
    3. if not possible, sends back a Web “page” that describes what went wrong
  • a normal Web page contains a document described using Hyper Text Mark-up Language (HTML)
  • the browser uses the HTML to build a model of the content of the page including paragraphs, section heading, hyperlinks, etc.
  • if there are any other resources needed to display the page, they are specified by URL and are fetched by the browser while rendering the page
  • any style sheets that were specified are used to choose fonts, colours, etc., for paragraphs, headings, tables, and so on
  • any JavaScript programs that are included in the page start to run to add dynamic behaviour to the document
  • based on the different parts of the page, the browser builds a visual representation of the page and renders it on the screen for the user to see

More technical details

If the above videos were not detailed enough, you can find many longer videos that explain the World Wide Web in much greater detail. Here is an example that is maybe one step up in detail from the videos above: How The Web Works (12 minutes)

Exercise

If you have not already done so, follow the instructions above to install Python 3 on your computer.

With Python 3 installed, running a Web server on your computer is super easy. Create a directory to store your web site and change to it. (I usually call mine something like “html”.)

mkdir html
cd html

Use cat or nano to create a file called index.html that has the following contents:

<html>
<body>
<h1>Hello, world!</h1>
<p>Welcome to your Computer-Wide Web.</p>
</body>
</html>

In the same directory, run this command (use python if you don't have python3):

python3 -m http.server

Open a new tab in your Web browser, paste (or type) the following URL into the address bar

http://localhost:8000

and then press return. If all went well, you should see your web page in the browser.

What that URL means

Try modifying the content of the “index.html” file. For example, add more lines containing “<h1></h1>” or more lines containing “<p></p>” (with something interesting instead of “…”, obviously).

Pick a word inside a “<p></p>” section and put “<i>” in front of it and “</i>” after it.

Pick a word inside a “<p></p>” section and put “<b>” in front of it and “</b>” after it.

Try putting “<tt>” and “</tt>” around another word.

How much fun is that? 😀

Don't forget: every time you modify something in your index.html file you must reload the page in your browser to see the change. A convenient way to do this is by pressing Control+r while your browser window is active. (In conjunction with Alt+Tab to switch between applications, you can even edit the index.html file and reload the browser without ever taking your hands off the keyboard.)