DATA



File Transfer Software

Files can be transfered between computers in many different ways and over many different protocols.

Some protocols are superior in local networks, others are specifically designed for wide area network transfer. Today, it is important to only use protocols that employ encryption and authentication of the remote servers to protect confidentiality and integrity of data.

Direct File Transfer

The simplest and most robust form of file transfer is a dedicated file transfer protocol which performs an explicit file transfer. Direct file transfer works in client/server mode, requiring a client program that pushes or pulls data and a server program that serves client connections. Historically the File Transer Protocol (FTP) was used across industries. For secure transmission FTP is combined with TLS (FTPS) today. FTPS is not to be confused with the Secure File Transfer Protocol (SFTP) which is a file transfer extension to the Secure Shell (SSH) and also widely in use by system operators. Some storage services support Web interfaces using HTTPS as a transport protocol, for example the Amazon Simple Storage Service (S3) which has become an industry standard in cloud storage.

Internet file transfer typically operates over TCP, which is known to perform suboptimal over congested links and long distance connections. Companies like IBM/Aspera offer proprietary protocols and software that uses patented ways of dealing with packet loss and network congestion to achieve extremely fast transfer speeds over UDP.

Cloud Synchronization

Because direct file transfer protocols require a server that is reachable on a known public IP address and a well-known port, they cannot be easily used behind firewalls, home routers and between mobile computers. To fix this problem some now popular file synchronization services emerged (e.g. Dropbox, Box, OneCloud). They combine always available servers, called cloud storage, with a client program that watches the shared online storage and a local directory and copies all new or updated files to keep all storage locations in sync.

Cloud sync uses proprietary protocols and software. Many protocols lack essential configuration options to selectively copy or delete files and and most vendors do not even encrypt files before upload. For business workflows, cloud storage should only be used for already encrypted data unless the shared information is already publicly available.

Networked File Systems

Networked file systems are special network protocols that allow remote shared access to large storage volumes for multiple users. The actual file transfer is transparent to the user because the remote filesystem appears in a similar way as local storage.

Networked filesystems have clear benefits over direct file transfer because only data that's currently required by an application is moved to a client on the fly. Hence clients can process files larger than available local storage. The processing can also start instantly without waiting for a download to complete. However, access to remote files may be considerably slower when the amount of users or traffic exceeds the capacity of network links and the storage server's I/O.

Concurrent access becomes an issue when multiple users want to update the same file. Typical applications, such as video editing software, are not prepared for this case. Network filesystems provide therefore a feature called file locking to prevent undesired interference. Locking enables a single user to obtain exclusive write permissions to a file while all other concurrent users are at most permitted to read.

Examples for widely used networked file systems are NFS (Network File System, originally developed by Sun Microsystems), SMB/CIFS (Server Message Block/Common Internet File System, used with Microsoft Windows) and AFP (Apple Filing Protocol, used with Apple computers).

The inherent network latency isssues make the use of networked file systems only practically feasible in local networks even though most protocols can also operate across the Internet. For security reasons they should, however, not be publicly exposed.

Peer-to-Peer File Transfer

Bittorrent is a peer-to-peer file transfer protocol for large files that works without central servers or central authority. Bittorrent can be used to reduce server and network load of a single operator when distributing large and popular files because the load is spread across all computers participating in a download. For example, Amazon S3 supports Bittorrent as a more efficient alternative to HTTP.

Bittorrent clients identify files based on a cryptographic hash (SHA1) of its content. The SHA1 hash serves as a file integrity checksum, so peers are able to verify received data immediatly and don't have to trust other peers. Downloads happen in segments, whereas different segments can be downloaded from multiple source peers in parallel. Downloads can be paused and resumed at any time. The actual file data is stored with all peers that have already received segments. Peers participating in transmitting a file are called swarm. A swarm regularly updates its global state so all peers can learn about which sources offer which file segments.