DATA



Filesystems

Durable storage in computers is managed and structured by filesystems.

They organize the available storage space and provide means to quickly index and find stored data. The smallest unit of data organization is a file. A filesystem defines a common namespace for indexing files and controls where in storage media each file is located. A filesystem is, however, not concerned with the internal structure of files (the file format), which is defined by individual application software.

Files

All digital documents and computer programs are stored in files. A computer file is a uniquely identified storage container that holds a linear sequence of bits of some length (the file size). Files may have a single or multiple names within a filesystem and additional associated metadata, often timestamps for creation, latest update and latest access time as well as ownership information and access control attributes.

File systems

Filesystems define the internal structure and external conventions by which files are placed and retrieved on storage media. Many different kinds of filesystems exist. They are optimized for the differences in storage media such as hard drives, flash memory, optical media and magnetic tape. Some filesystems allow files to grow and shrink in size over time without reallocating the position of stored data (e.g. NTFS, HFS, ext4 as disk filesystems) while others may lack such operations due to restrictions imposed by the underlaying media (e.g. LTFS for tape and ISO 9660 for optical disc). Some filesystems even provide access to remote storage via network protocols (e.g. NFS, SMB, AFP). Only a few filesystems for optical and tape media are standardised and accessible across operating systems. Most filesystems are, however, native to an operating systems (e.g. NTFS for Windows, HFS for OSX, or ext4 for Linux). Sometimes drivers exist to make them available across platforms.

Namespaces

Most filesystems organize file names into hierarchies of directories or folders. Each folder can contain an often unlimited number of files and other folders, forming a tree structure. Each file in such a hierarchy has, besides its file name, also a file path which describes the sequence of folder names starting at the root of the directory tree down to the directory where the file resides. The characters used to separate directory names in a path are specific for a particular operating system, not the storage media or file system used. Unix-like systems, for example, use a forward slash (/) while Windows systems use a backslash (\).

File names must be unique within a folder and almost every filesystem restricts their maximum length or the path length, but limits differ. Some filesystems are case-sensitive, meaning they allow mixed upper and lowercase filenames and treat identical names that only differ in case as different. Also, the allowed character set may differ between filesystems. All these differences make it often difficult to copy data across filesystems from different vendors. For greatest interoperability it is necessary to limit file names and if possible also path names to maximal 255 characters length and only use uppercase ASCII characters (A-Z), numbers (0-9) and a few special characters (_-.).

Storage Space Management

A filesystem typically allocates and manages raw storage space in fixed size blocks. The block size can be adjusted during filesystem creation to let a user optimize for a few large or many small files. A block then becomes the smallest unit of allocation. Hence every file occupies at least a single block or multiples of block size. The filesystem keeps track of used and unused blocks. When files are deleted, their blocks are marked for reuse. When files are created, the filesystem looks for a free space of a size matching the request. Fragmentation occurs when a file is stored in non-contiguous blocks. This can happen when a file grows beyond its pre-allocated size or when storage media run full. Access to fragmented files is considerably slower, degrading overall performance and user experience.

Data Integrity

Modern filesystems support a feature called journalling, which employs transactions to keep track of changes to the filesystems metadata and sometimes also file content. Journalling helps prevent data corruption and data loss in the event of system crashes and power failures. Depending on the implementation journalling adds a small or no performance overhead at the benefit of much greater reliability.

Only very few filesystems are designed with end-to-end data integrity in mind (ZFS is the noteworthy example). The importance of the issue is widely underestimated because storage systems can face a multitude of error scenarios such as silent data corruption or degradation when data rests on disk, current spikes, firmware and driver bugs, or DMA parity errors when copying data between disk and RAM. A filesystem must provide special protecting from such errors or otherwise the data may silently get corrupted.

Filesystems Overview