Understanding cloud storage models
Who would have thought that storing bits could get so incredibly complicated? Storage has always contained a plethora of protocols, from Fibre Channel to iSCSI to SMB in all its variations, but the arrival of flash and the continual growth of virtualization have turned an already dense topic into a tangled jungle of acronyms, protocols, and abstractions.
The virtualization of the data center has prompted a virtualization wave in storage as well, gradually pulling storage away from physical protocols and toward logical, abstracted storage models like instance storage and volume storage. By providing abstractions, the data center has steadily decoupled virtual machines from storage protocols.
The story of cloud storage is in many ways a story of virtualization. I’ll start with physical environments, move to virtualization, where virtual and physical models begin to diverge, and finish with the cloud, where the physical is almost completely abstracted by virtual models.
At the root of all storage is some set of physical storage protocols, so I’ll begin with a quick recap of physical storage. Three major classes of physical storage models are in use today: direct attached storage (DAS), the storage area network (SAN), and network attached storage (NAS).
DAS. Direct attached storage is the simplest storage model. We are all familiar with DAS; this is the model used by most laptops, phones, and desktop computers. The fundamental unit in DAS is the computer itself; the storage for a server is not separable from the server itself. In the case of a phone it is physically impossible to remove the storage from the compute, but even in the case of servers, where it is theoretically possible to pull disk drives, once a drive is separated from the server, it is generally wiped before reuse. SCSI and SATA are examples of DAS protocols.
SAN. Eventually the storage industry recognized the utility of separating storage from the compute. Rather than attaching disks to each individual computer, we placed all the disks on a single cluster of servers and accessed the disk over the network. This simplifies storage management tasks such as backup and failure repair. This division of storage and compute is often called shared storage, since multiple computers will use a single pool of storage.
NAS. While SANs allow us to move LUNs between one computer and another, the block protocols they use were not designed to concurrently share data in the same LUN between computers. To allow this kind of sharing we need a new kind of storage built for concurrent access. In this new kind of storage we communicate with the storage using file system protocols, which closely resemble the file systems run on local computers. This kind of storage is known as network attached storage. NFS and SMB are examples of NAS protocols.
Virtualization changed the landscape of the modern data center for storage as it did for compute. Just as physical machines were abstracted into virtual machines, physical storage was abstracted into virtual disks.
In virtualization, the hypervisor provides an emulated hardware environment for each virtual machine, including computer, memory, and storage. VMware, the initial modern hypervisor, chose to emulate local physical disk drives as a way to provide storage for each VM. Put another way, VMware chose the local disk drive (DAS) model as the way to expose storage to virtual machines.
From storage protocols to storage models
That VMware chose to implement virtual disks, a DAS-style block storage model, on top of NAS or SAN, illustrates one of the interesting characteristics of modern data center storage. Because the IO from a virtual machine is handed off to software in the hypervisor, rather than to hardware on a device bus, the protocol used by the VM to communicate with the hypervisor does not need to match the protocol the hypervisor uses to communicate with the storage itself.
This leads to a separation between the storage model that is exposed upward to the VM and administrator, and the storage protocol that is used by the hypervisor to actually store the data. In the case of virtual disks, VMware designed them according to a DAS storage model, then used a NAS storage protocol to implement them.
The landscape of the data center is shifting again as virtualized environments morph into cloud environments. Cloud environments embrace the virtual disk model pioneered in virtualization, and they provide additional models to enable a fully virtualized storage stack. Cloud environments attempt to virtualize the entire storage stack so that they can provide self-service and a clean separation between infrastructure and application.
Cloud environments come in many forms. They can be implemented by enterprises as private clouds using environments like OpenStack, CloudStack, and the VMware vRealize suite. They can also be implemented by service providers as public clouds such as Amazon Web Services, Microsoft Azure, and Rackspace.