Hello Readers! Last month I wrote “Docker???!! Images???!!!” a blog which gave quick introduction to Docker and its key benefits. This blog will be mainly focused on what are Virtual Machines, Containers and Containers vs VM.
Software development and deployment models have changed drastically over the last few decades. While software development has transitioned from the classical waterfall to agile & scrum models, deployment techniques have moved to more lightweight and lean approaches as well.
Especially, the last couple of years have been a definitive and transformative phase for the DevOps (Development-Operations) industry as it has seen a radical shift from the traditional VM () model to a modern Container-based approach.
Let’s understand what Virtual Machines are:
As the name itself suggests, a virtual machine represents a logical/software abstraction of a machine that can be used by end users as a physical machine. The virtualization can be done at platform level – thereby emulating a complete operating system or at the process level – restricting it to a single Process.
In both cases, a virtualization layer acts as an intermediary to the host machine to request resources such as CPU, memory, hard disk etc. A hypervisor/ virtual memory manager (VMM) is the tool that typically provides this virtualization layer and is used to create, manage and maintain virtual machines running on a host. VirtualBox, Virtual PC and Parallels are some of the most commonly used hypervisors.
The ascent of virtualization over the years is unsurprising as it solves the a major business challenge – you can run multiple operating systems on a single machine thereby reducing both capital (CAPEX) and operation (OPEX) expenditures. Although virtualization started off as a software-only concept, several leading OEMs such as Intel, AMD and others adopted hardware-assisted virtualization (aka accelerated virtualization) thereby providing an end-to-end virtualization environment, which provides significantly improved compute capacity. Virtual machines also help streamline the various operations aspects of deployment and maintenance including high availability, disaster recovery etc.
While the advancements in virtualization technology have continued over the years, it’s still a resource-heavy proposition as each VM operates like a self-contained system with its own resource needs.
In that sense, a more lightweight approach where logical entities can share and the operating system as well as associated resources if required. Therefore, ‘containers’ which are logical entities running on the same operating system as the host offer a more efficient alternative. Containers can be viewed as transparent clients that use the resources as it they were applications running on the host system.
While the concept of containers has been around for quite some time, the containerization approach has seen a serious adoption when ‘Docker’ was introduced.
Why Docker uses Container technology?
The essence of Docker-based containers is that they eliminate the heavy lifting involved with setting up Virtual Machines and making sure your applications run seamlessly across multiple such instances. Therefore, application developers are abstracted from any platform specific details and administrators can leverage standard environments (called Docker images) without worrying about differences in terms of operating system versions, CPU and memory configuration etc.
Since Docker containers do not run a guest operating system, they usually are much quicker to start than virtual machines. Similarly, Docker is more effective when it comes to snapshotting your application into an image and deploying it across multiple environments – development, test and production.
Under the Hood
Docker harnesses some powerful kernel-level technology and puts it at our fingertips. The concept of a container in virtualization has been around for several years, but by providing a simple tool set and a unified API for managing some kernel-level technologies, such as LXCs (Linux Containers), cgroups and a copy-on-write filesystem, Docker has created a tool that is greater than the sum of its parts. The result is a potential game-changer for DevOps, system administrators and developers.
Docker provides tools for creating and working with containers as easy as possible. Containers sandbox processes from each other. For now, you can think of a container as a lightweight equivalent of a virtual machine.
Linux Containers and LXC, a user-space control package for Linux Containers, constitute the core of Docker. LXC uses kernel-level namespaces to isolate the container from the host. The user namespace separates the container’s and the host’s user database, thus ensuring that the container’s root user does not have root privileges on the host. The process namespace is responsible for displaying and managing only processes running in the container, not the host. And, the network namespace provides the container with its own network device and virtual IP address.
Another component of Docker provided by LXC are Control Groups (cgroups). While namespaces are responsible for isolation between host and container, control groups implement resource accounting and limiting. While allowing Docker to limit the resources being consumed by a container, such as memory, disk space and I/O, cgroups also output lots of metrics about these resources. These metrics allow Docker to monitor the resource consumption of the various processes within the containers and make sure that each gets only its fair share of the available resources.
In addition to the above components, Docker has been using AuFS (Advanced Multi-Layered Unification Filesystem) as a filesystem for containers. AuFS is a layered filesystem that can transparently overlay one or more existing filesystems. When a process needs to modify a file, AuFS creates a copy of that file. AuFS is capable of merging multiple layers into a single representation of a filesystem. This process is called copy-on-write.
The really cool thing is that AuFS allows Docker to use certain images as the basis for containers. For example, you might have a CentOS Linux image that can be used as the basis for many different containers. Thanks to AuFS, only one copy of the CentOS image is required, which results in savings of storage and memory, as well as faster deployments of containers. An added benefit of using AuFS is Docker’s ability to version container images. Each new version is simply a diff of changes from the previous version, effectively keeping image files to a minimum. But, it also means that you always have a complete audit trail of what has changed from one version of a container to another.
Traditionally, Docker has depended on AuFS to provide a copy-on-write storage mechanism. However, the recent addition of a storage driver API is likely to lessen that dependence. Initially, there are three storage drivers available: AuFS, VFS and Device-Mapper, which is the result of collaboration with Red Hat.
As of version 0.7, Docker works with all Linux distributions. However, it does not work with most non-Linux operating systems, such as Windows and OS X. The recommended way of using Docker on those OSes is to provision a virtual machine on VirtualBox using Vagrant.
Containers VS Virtual machines
Docker is certainly the most influential open source project of the moment. Why is Docker so successful? Is it going to replace Virtual Machines? Will there be a big switch? If so, when?
Let’s look at the past to understand the present and predict the future. Before virtual machines, system administrators used to provision physical boxes to their users. The process was cumbersome, not completely automated, and it took hours if not days. When something went wrong, they had to run to the server room to replace the physical box.
With the advent of virtual machines, DevOps could install any hypervisor on all their boxes, and then they could simply provision new virtual machines upon request from their users. Provisioning a VM took minutes instead of hours and could be automated. The underlying hardware made less of a difference and was mostly commoditized. If one needed more resources, it would just create a new VM. If a physical machine broke, the admin just migrated or resumed her VMs onto a different host.
Finer-grained deployment models became viable and convenient. Users were not forced to run all their applications on the same box anymore, to exploit the underlying hardware capabilities to the fullest. One could run a VM with the database, another with middleware and a third with the webserver without worrying about hardware utilization. The people buying the hardware and the people architecting the software stack could work independently in the same company, without interference. The new interface between the two teams had become the virtual machine. Solution architects could cheaply deploy each application on a different VM, reducing their maintenance costs significantly. Software engineers loved it. This might have been the biggest innovation introduced by hypervisors.
A few years passed and everybody in the business got accustomed to working with virtual machines. Start-ups don’t even buy server hardware anymore, they just shop on Amazon AWS. One virtual machine per application is the standard way to deploy software stacks.
Application deployment hasn’t changed much since the ’90s though. Up until then, it still involved installing a Linux distro, mostly built for physical hardware, installing the required deb or rpm packages, and finally installing and configuring the application that one actually wanted to run.
In 2013 Docker came out with a simple, yet effective tool to create, distribute and deploy applications wrapped in a nice format to run in independent Linux containers. It comes with a registry that is like an app store for these applications, which is called “cloud apps” for clarity. Deploying the Nginx webserver had just become one “docker pull nginx” away. This is much quicker and simpler than installing the latest Ubuntu LTS. Docker cloud apps come preconfigured and without any unnecessary packages that are unavoidably installed by Linux distros. In fact the Nginx Docker cloud app is produced and distributed by the Nginx community directly, rather than Canonical or Red Hat.
Docker’s outstanding innovations are the introduction of a standard format for cloud applications, including the registry. Instead of using VMs to run cloud apps, Linux containers are used instead. Containers had been available for years, but they weren’t quite popular outside Google and few other circles. Although they offer very good performance, they have fewer features and weaker isolation compared to virtual machines. As a rising star, Docker made Linux containers suddenly popular, but containers were not the reason behind Docker’s success. It was incidental.
Containers vs. Other Types of Virtualization
So what exactly is a container and how is it different from hypervisor-based virtualization? To put it simply, containers virtualize at the operating system level, whereas hypervisor-based solutions virtualize at the hardware level. While the effect is similar, the differences are important and significant, which is why I’ll spend a little time exploring the differences and the resulting differences and trade-offs.
Both containers and VMs are virtualization tools. On the VM side, a hypervisor makes slices of hardware available. There are generally two types of hypervisors:
- Type 1: runs directly on the bare metal of the hardware
- Type 2: runs as an additional layer of software within a guest OS.
While the open-source Xen and VMware’s ESX are examples of Type 1 hypervisors, examples of Type 2 include Oracle’s open-source VirtualBox and VMware Server. Although Type 1 is a better candidate for comparison to Docker containers, I don’t make a distinction between the two types for the rest of this article.
Containers, in contrast, make available protected portions of the operating system—they effectively virtualize the operating system. Two containers running on the same operating system don’t know that they are sharing resources because each has its own abstracted networking layer, processes and so on.
Operating Systems and Resources:
Since hypervisor-based virtualization provides access to hardware only, you still need to install an operating system. As a result, there are multiple full-fledged operating systems running, one in each VM, which quickly gobbles up resources on the server, such as RAM, CPU and bandwidth. Containers piggyback on an already running operating system as their host environment. They merely execute in spaces that are isolated from each other and from certain parts of the host OS. This has two significant benefits. First, resource utilization is much more efficient. If a container is not executing anything, it is not using up resources, and containers can call upon their host OS to satisfy some or all of their dependencies. Second, containers are cheap and therefore fast to create and destroy. There is no need to boot and shut down a whole OS. Instead, a container merely has to terminate the processes running in its isolated space. Consequently, starting and stopping a container is more akin to starting and quitting an application, and is just as fast.
Both types of virtualization and containers are illustrated
Isolation for Performance and Security:
Processes executing in a Docker container are isolated from processes running on the host OS or in other Docker containers. Nevertheless, all processes are executing in the same kernel. Docker leverages LXC to provide separate namespaces for containers, a technology that has been present in Linux kernels for 5+ years and is considered fairly mature. It also uses Control Groups, which have been in the Linux kernel even longer, to implement resource auditing and limiting.
The Docker demon itself also poses a potential attack vector because it currently runs with root privileges. Improvements to both LXC and Docker should allow containers to run without root privileges and to execute the Docker demon under a different system user.
Although the type of isolation provided is overall quite strong, it is arguably not as strong as what can be enforced by virtual machines at the hypervisor level. If the kernel goes down, so do all the containers. The other area where VMs have the advantage is their maturity and widespread adoption in production environments. VMs have been hardened and proven themselves in many different high-availability environments. In comparison, Docker and its supporting technologies have not seen nearly as much action. Docker in particular is undergoing massive changes every day, and we all know that change is the enemy of security.
Thank you for reading my blog …!
Keep watching this space for more posts on Docker.
Links which helped write this Blog: