Unikernels - The Rise of the Library Hypervisor in MirageOS by Anil Madhavapeddy, Mindy Preston and Martin Lucina (Docker)

Written by GemmaG (Gemma Gordon)
Published: 2016-10-06 (last updated: 2016-10-08)

Slides: Slideshare

Making hypervisors interesting again!

Conventional hypervisors

  • run full guest OSs with complex emulation needs
  • scaffolding for device emulation, instruction emulation etc
  • hard to compose into existing infrastructure without wrapping a full hypervisor layer

Lots of legacy code and the hypervisor is sitting at the bottom of the stack - inflexible and heavyweight.

How can distributed systems use hardware protection more flexibly and composably?

Want a way to use the primitives that the OS provides in more flexible and lightweight ways

Unikernels
library operating systems - break kernels into libraries -> link them with a config file and boot layer and run portable microservices that boot directly on hypervisors or Unix.

Library philosophy is the main and most important feature - link them with your application and they provide functionality but don't impose policy.

How do you democratise unikernels?

  • benefits are lost when deploying on existing clouds - longer boot on AWS for example
  • difficult to manage hypervisor resources from inside a container without host privilege

Library hypervisors

Follow the philosophy of unikernels by extending the "kit" model and break down the hypervisor into libraries

  • core functionality (CPU and memory)
  • benefit: reduction in TCB and better fit to container-native infrastructure with privilege dropping
  • drawback: no existing support in OSs

BUT actually there is a lot of open source support if you look closely

What's changed?

  • OSX hypervisor framework: shipped but not used that much
  • bHyve - freebsd built

bYhve + xHyve + OSX hypervisor = HyperKit -> Docker for Mac

Now Linux have dev/kvm -> ukvm -> MirageOS 3.0

Possible for average dev to build hypervisor applications easily

Docker 4 Mac

Native OSX experience working with existing workflows

  • uses HyperKit based on xHyve and bHyve
    • sandbox friendly: run as non-root with privileges of the local user. All implementations that usually go into the kernel are put into the hypervisor framework. OSX kernel -> userspace -> process hardware -> VPNKit
  • embeds linux: Alpine Linux is lightweight and optimised for fast boot

HyperKit library structure

DEMO independent files that are optional that represent IO protocols 240kb in size - full hypervisor built in seconds

In hyperkit most functionality is linked as a library if app doesn't need a protocol, it isn't linked

Networking

Conventionally: containers are virtualised as ethernet frames - how do you get these out? Kernel modules act as a bridge which gets complicated...

How can we use the corporate installations without kernel modules? Unikernels!

Instead of bridging them, we can write functions using VPNKit: opposite of deep package inspection. Took TCP/IP stack and inverted it - give us the ethernet, reconstruct the TCP state flows and translate them into socket calls for us - now have a hypervisor that uses normal OSX kernel sockets to push traffic to the outside world without bridges - all virtualisations are hidden.

All network traffic is generated from normal socket calls - so interacts well with normal workflows.

Summary of Docker 4 Mac:

  • HyperKit is used to virtualise for domain-specific purposes
  • links MirageOS unikernel libraries

MirageOS + Solo5

Build handled by Docker but docker run shouldn't need privileges

MirageOS 3.0 has a new library hypervisor for Linux developed by IBM, Docker and Cambridge University.

Solo5

  • runs as a unix process and opens /dev/kvm for hardware isolation
  • ukvm is a small, modular monitor - can be 10k in size
  • can run privilege separately: one process opens /dev/kvm and drops privileges and executes the unikernel
  • boot times are same as process fork times as all device setup is handled in-process

QEMU: 25,000 lines ukvm: 1162 lines

boot times are close to 1 second for ukvm

  • due for stable release within the next month
  • intended to be a unikernel template for other projects to share hypervisor code
  • liberally licensed under BSD/Apache2/ISC to encourage adoption and embedding

BOF tomorrow

DEMO

Martin - worked on bulk of integration for solo5

Mirage 3.0 targets - highlighting the lightweight nature of ukvm and kvm/qemu

mirage 3.0 targeted to run on virtio hypervisor

size difference of hypervisor QEMU = 8mb binary UKVM = 67k binary

same unikernel and same code

can run in a docker container, with hypervisor part running in a container

image: 3 layers - current debugging dev build unikernel runner ukvm binary unikernel


REVISITED: How can distributed systems use hardware protection more flexibly and composably?

Unix binaries being compiled and run without any drama - now question is how can we plumb through docker components e.g. distributed storage, networking etc.

Hardware protection - new advanced functionality - how to take a nice dev experience and utilise the functionality that is usually locked away inside the hypervisor

We want to hear about your use cases and hack on some code!

An idea might be to test SwarmKit - run a massive process that will simulate the ethernet and give it a sense of failure.