~~NOTOC~~ {{page>css&nodate&noeditbtn&nofooter}} ===== Week 12 — The World Wide Web ===== This week's topic is about the world wide web and how it works. ==== Evaluation ==== Up to 10 points can be gained towards your final score by completing the **in-class exercises** on Friday. ==== What you will learn from this class ==== * What the World Wide Web is. * How web servers host the web. * How URLs work to identify the location of resources on the web. * How web content is provided statically or generated dynamically. * How a client uses HTTP to request a web resource from a web server. * How a server informs the client about its request succeeding or failing. * How different kinds of media are identified within a web page. * How hyperlinks create a "web" of pages that spans the Internet and therefore the world. ==== Preparation ==== This week's preparation is to watch some short videos about the World Wide Web and then to install software on your computer that will let you run your own web server. **Videos:** WWW and HTTP | What is the world wide web? | (4 minutes) | https://www.youtube.com/watch?v=J8hzJxb0rpc | | What is HTTP? | (7 minutes) | https://www.youtube.com/watch?v=LZJNj-HHfII | | How a browser displays a web page | (10 minutes) | https://www.youtube.com/watch?v=DuSURHrZG6I | /* | How The Web Works | (12 minutes) | https://www.youtube.com/watch?v=hJHvdBlSxug | */ **Software:** Python 3 Linux: you probably already have Python 3 installed, but if not then install it from your repository manager (using, e.g., ''sudo apt install python3'') MacOS: install from [[https://www.macports.org/install.php|MacPorts]] (using ''sudo port install python39'') or from [[https://rajputankit22.medium.com/install-python3-in-macos-high-sierra-675d58913e6b|Homebrew]] (using ''brew install python3'') or download an installer from [[https://www.python.org/downloads/mac-osx/|python.org]] Windows: download an installer from [[https://www.python.org/downloads/windows|python.org]] ++++ Click here for detailed Windows instructions | - Open this web page: https://www.python.org/downloads/windows - Click on "Latest Python 3 Release" - Scroll to the bottom of the page - Download "Windows x86-64 executable installer" (today that happens to be: https://www.python.org/ftp/python/3.9.0/python-3.9.0-amd64.exe) - Run the installer (e.g., right-click on it and choose "open"). - Select **both** "Install launcher for all users" **and** "Add Python 3.9 to PATH" {{python-install-1.png?direct}} {{12-windows-install-path.png?direct}} - Click "Customize installation" - Select everything {{python-install-2.png?direct}} - Click "Next" - Select "Install for all users" (and check the location at the bottom has changed to something like "C:\Program Files\Python39") {{python-install-3.png?direct}} - Click "Install" - Allow administrator access if prompted. - Close the "Setup was successful" window. {{python-install-4.png?direct}} The next time you open a command line (MobaXterm, WSL, etc.) you should be able to run python and check the version using this command python --version which should print something like "Python 3.9.0". ++++ When Python is installed you should be able to run either python3 --version or python --version and see something like "''Python 3.5.3''" or "''Python 3.9.0''" printed. You should also be able to run the same ''python3'' (or ''python'') command like this python3 -m http.server and see output that looks like "''Serving HTTP on :: port 8000 (http://[::]:8000/) ...''". (Press ''Control''+''c'' to stop the program.) ==== Notes ==== The three self-preparation videos cover the following topics. === What is the world wide web? === https://www.youtube.com/watch?v=J8hzJxb0rpc (4 minutes) * the Web can be used for any activity built around organising or exchanging data * the Web is accessible from computers, smart phones, and even cars * the Web is not the Internet * the Internet is the network computers use to communicate with each other * the Web is just one application protocol that uses the Internet for communication * a Web server is a computer that is always connected to the Internet, specifically designed to store information and share it with Web browsers * one or more Web sites can be hosted on a Web server * Web sites are identified by the IP address of their server, usually in the form of a domain name * the name (IP address) says which server has the Web site content we want * the Web is special because of its non-linear organisation of data (compared to a book which is read linearly, page by page, in sequence) * each page or other resource on a Web server has a unique path name that comes after the server name * a Uniform Resource Locator (URL) identifies a Web document or resources * when people say "a Web address" they usually mean "a URL" * a URL combines a protocol (http) with a server address (its DNS name) and a path name to a resource on that server (such as a Web page or media file) * URLs can be embedded in Web pages in the form of //hyperlinks// * when you click on a hyperlink your browser displays the document that it refers to * this is what most people call "following a link" * a single Web page can link to many other related pages or media files * unlike a linear book, additional information and ideas can be linked to and expanded as soon as they are encountered * the hyperlinks therefore form a loose, interconnected network, like a spider's web * in fact you can even say that the "The Web" doesn't really exist * "The Web" is made from all the the //spaces// between Web pages and the resources that they link to * it is a web of //relationships//, and not a physical thing * rather like a family tree, which clearly exists but is not actually a physical thing === What is HTTP? === https://www.youtube.com/watch?v=LZJNj-HHfII (7 minutes) * a protocol is a standard procedure (or set of rules) governing how to do something * on the Internet, the Hyper Text Transfer Protocol (HTTP) governs how a Web client (browser) asks a Web server for a document or media file * a Web client //requests// content or resources * a Web server //responds// by delivering the content or resource to the client * HTTP and the Web are an evolution from sharing plain text files to sharing graphics- and multimedia-rich documents * HTML is the language of Web pages which lets you create links to resources stored on any Web server anywhere in the world * clicking on a link fetches and displays that resource (often a Web page) * a URL is a Uniform Resource Locator that identifies: * a specific //protocol// (often ''http'') * an Internet server //address// (usually by its domain name) and port number (often omitted to use the default) * a //path// to a resource located on the server * port 80 is used for normal HTTP, and port 443 is for secure HTTPS (encrypted communication) * to fetch the resource described by a URL using HTTP: - the client sends a ''GET'' request to the appropriate port on the server, along with the path of the resource it wants - the server sends back the content of the resource, * if an error occurs the server sends back a standard document that looks like a Web page and which specifies a //status code// indicating what the problem was * the status code in the response is encoded as a number: | 1//xx// | the server is providing the client with some requested information | | 2//xx// | the request succeeded and the desired document or resource is provided in the response | | 3//xx// | the requested resource has moved | | 4//xx// | the request failed because of a client problem or error; e.g., status code ''404'' means "Resource Not Found" | | 5//xx// | the request failed because of a server problem or error | The developers of the Firefox browser provide a nice summary of HTTP status codes here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status === How a browser displays a web page === https://www.youtube.com/watch?v=DuSURHrZG6I (10 minutes) This goes a little deeper into the topics of the other two videos and touches on how content is described in the HTML content of a web page. * a Web browser is just one kind of Web client * any application that understands HTTP can be a Web client * first the user tells the browser what they want to look at in the form of a URL * a URL is a Uniform Resource Locator that identifies: - a specific protocol (or //scheme//) - an Internet server address (usually a domain name) and optional port number - a resource on the server identified by its path within the server's file store - optional parameters following a "''?''" - an optional section name within the page following a "''#''" * URLs in the document can also specify other resources needed by the page - cascading style sheets (CSS) describing how the content should be presented - JavaScript programs that add dynamic behaviour to the content * the exact same kind of URL can appear in hyperlinks (or "anchors") inside as Web document * to fetch a Web page, given a URL, the client - opens a TCP connection to the server using the "address" part of the URL - sends a HTTP ''GET'' request that specifies the resource it wants using the "path" part of the URL * in response to a ''GET'' request, the server - looks for a file or other resource corresponding to the path part of the ''GET'' request - if possible, sends back the content of the resource for the browser to display - if not possible, sends back a Web "page" that describes what went wrong * a normal Web page contains a document described using Hyper Text Mark-up Language (HTML) * the browser uses the HTML to build a model of the content of the page including paragraphs, section heading, hyperlinks, etc. * if there are any other resources needed to display the page, they are specified by URL and are fetched by the browser while rendering the page * any style sheets that were specified are used to choose fonts, colours, etc., for paragraphs, headings, tables, and so on * any JavaScript programs that are included in the page start to run to add dynamic behaviour to the document * based on the different parts of the page, the browser builds a visual representation of the page and //renders// it on the screen for the user to see === More technical details === If the above videos were not detailed enough, you can find many longer videos that explain the World Wide Web in much greater detail. Here is an example that is maybe one step up in detail from the videos above: [[https://www.youtube.com/watch?v=hJHvdBlSxug|How The Web Works]] (12 minutes) ==== Exercise ==== If you have not already done so, follow the instructions above to install Python 3 on your computer. With Python 3 installed, running a Web server on your computer is //super// easy. Create a directory to store your web site and change to it. (I usually call mine something like "''html''".) mkdir html cd html Use ''cat'' or ''nano'' to create a file called ''index.html'' that has the following contents:

Hello, world!

Welcome to your Computer-Wide Web.

In the same directory, run this command (use ''python'' if you don't have ''python3''): python3 -m http.server Open a new tab in your Web browser, paste (or type) the following URL into the address bar http://localhost:8000 {{12-hello-world.png?direct}} and then press ''return''. If all went well, you should see your web page in the browser. ++++ What that URL means | ''http'' is the protocol, ''localhost'' is the DNS name of the server (and ''localhost'' always means the computer on which the program, in this case your web browser, is running), and ''8000'' is the port on which your Python web server is communicating. ++++ Try modifying the content of the "''index.html''" file. For example, add more lines containing "''

''...''

''" or more lines containing "''

''...''

''" (with something interesting instead of "...", obviously). Pick a word inside a "''

''...''

''" section and put "''''" in front of it and "''''" after it. Pick a word inside a "''

''...''

''" section and put "''''" in front of it and "''''" after it. Try putting "''''" and "''''" around another word. How much fun is that? 😀 **Don't forget:** //every// time you modify something in your ''index.html'' file you //must// reload the page in your browser to see the change. A convenient way to do this is by pressing ''Control''+''r'' while your browser window is active. (In conjunction with ''Alt''+''Tab'' to switch between applications, you can even edit the ''index.html'' file and reload the browser without ever taking your hands off the keyboard.)
/* syllabus */ /* * Local Variables: * eval: (flyspell-mode) * eval: (ispell-change-dictionary "british") * eval: (flyspell-buffer) * End: */