KUAS Engineering

Week 05 — File systems

Evaluation

Up to 10 points can be gained towards your final score by completing the in-class assignment on Friday.

Preparation

1. Complete the self-preparation assignment at home before next class

This week's self-preparation assignment is part practical preparation and part study.

First, install a command-line environment on your computer that you can use to complete the next few weeks of this course. (Linux and Mac users already have a suitable command-line environment; there is nothing to do. Windows users have several options; please follow that link and install one of the options on your laptop computer.)

Second, Review the notes on this page before coming to class. Use Internet search engines to find online tutorials, Wikipedia articles, etc., for any additional information (or explanation) that you might need.

2. Check your understanding of file system concepts using the self-assessment questionnaire

Answer each question in the self-assessment questionnaire as honestly as you can. (Do not press 'submit' when you are finished.)
Check your scores.
Revise those topics having the lowest scores.
When you have made progress with understanding or skill, update your scores.
Repeat from step 2 until you feel comfortable with most (or all) of the topics.

On Thursday evening, press 'submit'. In class on Friday we will check which topics were difficult for everyone.

To succeed at the in-class assignment for this class you should understand the topics outlined in the next section and further explained in the “Notes” section.

What you will learn from this class

How to use the SI, JEDEC, and IEC prefixes associated with file size and storage capacity.
The purpose of the file system within a computer system.
How data is stored physically on a disk.
The parameters that can affect the efficiency of data storage on a disk.
How the OS manages the allocation of physical storage to files and directories.
How files and directories are logically organised on the disk.

Notes

A filesystem (or file system) manages the storage of data on a device (such as a solid-state disk, hard disk drive, or USB flash memory drive). Many hundreds of different filesystems exist, each one providing different tradeoffs between speed, reliability, safety, security, etc. Almost all of them manage data by splitting it into files (sequences of bytes) placed within a hierarchy of directories (which map symbolic names to individual files). In most situations a filesystem refers to a single logical repository of files residing on a single physical medium (e.g., SSD or HDD).

Storage sizes

Sizes of files (and filesystems, and the physical media on which they exist) are measured in bytes. One byte contains eight bits, where a bit is a single binary digit (0 or 1). A bit is the smallest unit of information possible. A byte contains eight bits because that is sufficient to store a single character in English. For English and many European languages, a single character in a document will be represented as a single byte. Asian characters are typically larger. In Japanese, common characters are represented using two bytes but uncommon characters can require up to four bytes of storage.

storage sizes (traditional units)
unit	name	size (decimal SI¹ units)	size (binary JEDEC² units)
1 kB	kilobyte	10³ = 1,000 bytes	2¹⁰ = 1,024 bytes
1 MB	megabyte	10⁶ = 1,000,000 bytes	2²⁰ = 1,048,576 bytes
1 GB	gigabyte	10⁹ = 1,000,000,000 bytes	2³⁰ = 1,073,741,824 bytes
1 TB	terabyte	10¹² = 1,000,000,000,000 bytes	2⁴⁰ = 1,099,511,627,776 bytes

¹ Système international, otherwise known as the metric system
² Joint Electron Device Engineering Council

storage sizes (IEC binary prefixes)
unit	name	size
1 KiB	kibibyte	2¹⁰ = 1,024 bytes
1 MiB	mebibyte	2²⁰ = 1,048,576 bytes
1 GiB	gibibyte	2³⁰ = 1,073,741,824 bytes
1 TiB	tebibyte	2⁴⁰ = 1,099,511,627,776 bytes

A byte is too small a quantity to be useful when discussing an entire filesystem or disk. SI prefixes are often applied to storage sizes, as shown in the table on the right. Permanent storage (such as disk drives) usually use the SI (metric) prefixes, which increase by factors of 1000. Volatile storage (such as computer memory) usually use the JEDEC (binary) prefixes, which increase by factors of 1024.

Available disk space reported by the OS might use either system leading to discrepancies between advertised (by the manufacturer) and actual (available to the computer user) storage space. For example, a drive advertised in SI units as containing 1 TB of space would be reported as providing only 931 MB of space by an OS using JEDEC units. This can be a cause for confusion, or even legal action.

A new set of binary prefixes was standardised by the IEC, corresponding to the JEDEC binary sizes but using a different spelling and (somewhat silly-sounding) names. The prefixes are formed from the SI/JEDEC ones by inserting an 'i' between the multiplier (k, M, G, etc.) and the unit (B). Their names are formed from the first two letters of the corresponding SI/JEDEC prefix (ki, me, gi, te, etc.) and the first two letters of the word 'binary' (bi); hence kibibyte, mebibyte, and so on.

Linux and recent versions of MacOS have adopted these prefixes for displaying information that uses 1024-based scales (such as memory statistics). Windows and the mass media have so far ignored them.

Filesystems

Filesystems are used to store:

User documents and data
Documents can be application-specific files (.doc, .xls, etc.) or plain text files.
Applications
Applications (Word, PowerPoint, your e-mail reader, etc.) are stored in files containing programs that the computer can execute. On Windows, look in the directories under 'C:\Program Files' to find lots of applications. On MacOS, look in '/Applications'. On Linux, look in '/usr/bin'.
The OS and its utility programs
The operating system is really just another application, but is run automatically when the computer is booted.
Virtual memory
When running many applications the required memory size can exceed the available physical memory size. The OS deals with this by temporarily moving unused parts of applications from memory onto the disk, and reloading them when they are required again. On Windows this disk space comes from a file called 'C:\pagefile.sys'. On MacOS look in '/private/var/vm'. On Linux you probably have part of the physical disk dedicated to this task, using its own filesystem that is independent of OS and user files.

Filesystem organisation

No matter what filesystem is being used, the application sees a very simple model of storage:

Documents are stored as sequences of bytes in named files.
Files are organised within directories, which are mappings from file names to file contents.
Directories can also be placed inside directories, leading to a tree structure.

A directory maps names onto files. In the example on the right, 'Documents' is a directory containing two entries. One entry points to a regular file called 'local' (containing a document represented as a sequence of bytes). The other entry points to another directory called 'MobaXterm'. Using a 'family tree' metaphor, 'Documents' is called the parent directory of 'MobaXterm'.

The root directory, which is at the top of the tree structure, has no name. (File and directory names are stored in their parent directory. Since the root directory has no parent directory, there is simply nowhere to store a name for it.)

To specify a particular file or directory, start at the root and describe the path that must be followed to find that file or directory. For example, the 'Documents' directory has the following path:

root directory
'Users' directory
'piumarta' directory
'Documents' directory

Each element in the path is separated by a “/” character (or “" on Windows). The root directory has no name so we start with an empty name, then the separator, then 'Users', another separator, and so on. The final path is therefore: '/Users/piumarta/Documents' (or '\Users\piumarta\Documents' on Windows).

On most computers there is only one root directory. For historical reasons Windows is an exception and has one root directory per filesystem, or 'volume' in Windows terminology. Each volume is named by a single letter followed by a colon, in this case 'C:'. Usually the volume name is prepended to paths (i.e., added at the beginning of the path). The correct path to the 'Documents' directory in Windows would therefore be: 'C:\Users\piumarta\Documents'.

Windows explorer shows you the path to the current directory above the list of files. If you click in it you will see it written in the notation shown above. You can also type into the location bar, or copy/paste from/to it, using the same notation.

Special directory entries

Every directory contains two special entries whose names are '.' and '..'. The name '.' points to the inode of the directory itself (so the path '/Users/././././.' refers still to the '/Users' directory). The name '..' points to the inode of the parent directory (so the path '/Users/Administrator/../piumarta/.' refers to my account's directory). The only exception is the root directory, which has no parent, and so for it the name '..' points back to the root directory again.

Allocating storage for files and directories

When you open a folder in Mac Finder or Windows Explorer, or type 'ls' in a command line window, you are looking at a list of directory entries. Every entry in the directory has a unique (file or directory) name and associates that name with some storage on the disk where the contents of the file or directory are stored. The structure describing where the contents are stored is called an index node, or inode for short. Inodes are not stored in the directories, but in a separate table on the disk. There is exactly one inode specifying where the contents for any given file or directory are stored on the disk. However, more than one name can be associated with a given inode (and therefore file) by having more than one directory entry specify the same inode as the location of the file's contents.

An inode contains all the information relating to a file's or directory's contents, including:

type: regular, directory, symbolic link, special
link count: how many directory entries point to this inode
size: total number of bytes in the file that the inode describes
meta data: information about the file/directory itself
- permissions: who can access the content to read, write, execute it
- ownership: who created it
- timestamps: when the content was created, last modified, last accessed

The inode also contains a list of blocks on the disk where the contents of the file/directory are actually stored. Each block has a fixed sized, typically 4096 bytes, and is identified by a number (unique within the filesystem) that can easily be converted into the physical location on the disk where the block is stored.

(Windows, just to be difficult, calls its inodes 'MFT records' where MFT stands for 'master file table'.)

Physical storage of blocks on the disk

A disk drive, whether solid-state or rotating, stores information in a fixed number of fixed-sized sectors. A sector is typically 512 bytes long, and so a 1TB disk would contain 2,147,483,648 sectors of data. A sector is the smallest unit of data that can be transferred to or from the disk.

While sectors can be addressed directly, most filesystems do not do that for reasons of efficiency. Instead they combine several sectors into a block and treat that as the smallest unit of data when managing space on the disk. A typical block size is 8 sectors, or 4096 bytes. (Of course, Windows has to be different to everyone else and uses the terms 'allocation unit' or 'cluster'.)

Each block has a unique number, and data is always read or written to the disk in multiples of the block size.

Why am I bothering to tell you about block (allocation unit size) size? Efficiency.

Before you can store information on a disk you have to format it. Formatting writes all the data structures that are needed to describe an empty filesystem onto the disk. It puts in place the framework into which you can start creating new files and directories.

While you are formatting a disk you will likely be given the choice of what block (allocation unit) size to use. The default on many filesystems (including Windows, surprisingly) is 4096 bytes. You can almost certainly increase or decrease this size (by factors of a power of 2).

The smallest unit of information that can be allocated is one block. A one-byte file therefore consumes an entire block for its contents, no matter what block size is chosen. The rest of the block is wasted. This is called internal fragmentation of the storage and there is nothing you can do about it (other than reduce the block size).

On the other hand, a huge file will contain very many blocks of data and will therefore have a very large block list in its inode. This wastes space in the inode and potentially makes accessing the file less efficient, because the contents of the file are distributed over many different blocks located far apart on the surface of the disk. (This is what most people mean when they say 'fragmentation' in the context of disk storage. It is also the problem that is solved by the 'disk defragmenter' tool in Windows which attempts to rearrange the blocks belonging to each file to keep them closer together on the disk, ideally in a single contiguous sequence of blocks.)

If you expect your disk to contain mostly very small files (common in big data analytics) then a small block (allocation unit) size will perform better. If you expect your disk to contain mostly very large files (common in audio/video media editing) then a large block (allocation unit) size will perform better. If you have no idea what to expect then the default block (allocation unit) size is probably going to be fine.

File abstraction

In your programs you will use functions such as open(), read(), write(), and close() to access and modify the contents of files. Those functions directly manipulate the directories, inodes, and disk blocks described above. The filesystem's job is to make sure that happens safely and efficiently, and that you never notice all of the underlying complexity.