149 lines
5.3 KiB
Markdown
149 lines
5.3 KiB
Markdown
|
# Secure virtualised workloads
|
||
|
|
||
|
* **workload**: any job/payload that needs to be executed on infrastructure
|
||
|
* straight on OS
|
||
|
* in a VM
|
||
|
* in a container
|
||
|
|
||
|
## Virtual machines
|
||
|
|
||
|
![VM architecture](./img/ch09/vm_diagram.png)
|
||
|
|
||
|
* physical hardware
|
||
|
* CPU, memory, chipset, I/O...
|
||
|
* resources often underutilized
|
||
|
* no isolation
|
||
|
* hardware-level abstraction
|
||
|
* virtual hardware
|
||
|
* encapsulate all OS and application state
|
||
|
* virtualization software
|
||
|
* hypervisor/VMM
|
||
|
* extra level of indirection to decouple hardware and OS
|
||
|
* strong isolation between VMs
|
||
|
* improves utilization
|
||
|
* secure multiplexing
|
||
|
* isolation on hardware level
|
||
|
* failure of one VM does not affect others
|
||
|
* entire VM is a file
|
||
|
* easy to snapshot, clone, move, distribute
|
||
|
* create once, run anywhere (well we try)
|
||
|
* types
|
||
|
* **type 1**: hypervisor runs on bare metal (no host OS) (VMWare, Microsoft
|
||
|
Hyper-V, KVM...)
|
||
|
* **type 2**: hypervisor runs on host OS (Virtualbox, VMWare Workstation...)
|
||
|
* relies on host OS to manage calls to hardware
|
||
|
* adds latency
|
||
|
* security risks of host OS exploitable
|
||
|
* aimed towards developers
|
||
|
|
||
|
![Type 1 virtualisation](./img/ch09/type_1_hypervisor.png){width=50%} \ ![Type 2 virtualisation](./img/ch09/type_2_hypervisor.png){width=50%}
|
||
|
|
||
|
## Containers
|
||
|
|
||
|
* virtualization on OS level
|
||
|
* much more lightweight -> more dense utilization
|
||
|
* share same host OS / kernel
|
||
|
* advantages
|
||
|
* much faster startup
|
||
|
* easier to manage
|
||
|
* more containers per host than VMs
|
||
|
* no hardware isolation, so security issues
|
||
|
* the future
|
||
|
* blur the line between contains and VMs
|
||
|
* **Kata-containers**: lightweight VM per container (better security)
|
||
|
* **Microsoft HyperV**: sometimes wraps containers in lightweight VM
|
||
|
* Linux Security Modules (LSM)
|
||
|
* hostile processes can break out of container (badly configured
|
||
|
namespaces, kernel exploits...)
|
||
|
* LSM defines mandatory access control
|
||
|
* lists allowed capabilities (syscalls) per process
|
||
|
* defined by sysadmin
|
||
|
* prevents niche syscalls from being exploited
|
||
|
* types
|
||
|
* **OS-level containerization**: spawn containers straight on host OS + kernel
|
||
|
* isolation using kernel functionality (namespaces, cgroups...)
|
||
|
* no need for full guest OS
|
||
|
* no hardware extensions
|
||
|
* attackers could escape container and compromise host
|
||
|
* Docker
|
||
|
* **micro-VM**: containers in lightweight VMs on host
|
||
|
* utilizes hardware-enforced isolation
|
||
|
* containers do not share kernel
|
||
|
* safer
|
||
|
* slower startup, worse performance
|
||
|
* **unikernel**: application compiled together with tailored kernel
|
||
|
* monitor appplication on syscalls used
|
||
|
* once known, construct microkernel and fixed-purpose image
|
||
|
* no user space, only kernel space
|
||
|
* much smaller attack surface (kernel only contains what's necessary)
|
||
|
* runs straight on hypervisor or bare metal
|
||
|
* small footprint, quick to start
|
||
|
* **sandboxing**: container in sandbox running copy of host kernel
|
||
|
* syscalls translated to host kernel
|
||
|
* good isolation
|
||
|
* slow
|
||
|
* not all syscalls supported (yet)
|
||
|
|
||
|
|
||
|
![Container layout](./img/ch09/container.png){width=50%} \ ![Micro-VM layout](./img/ch09/micro_vm.png){width=50%}
|
||
|
|
||
|
![Unikernel layout](./img/ch09/unikernel.png){width=50%} \ ![Sandbox layout](./img/ch09/sandbox.png){width=50%}
|
||
|
|
||
|
## Linux kernel isolation support
|
||
|
|
||
|
* [https://linuxcontainers.org/]
|
||
|
* built into Linux kernel
|
||
|
* LXC (Linux Containers)
|
||
|
* OS-level virtualization for running containers on Linux host
|
||
|
* low-level, difficult to use
|
||
|
* LXD (Linux Container Hypervisor)
|
||
|
* built on top of LXC
|
||
|
* Canonical development
|
||
|
* focus on containerising entire operations systems, not individual applications
|
||
|
|
||
|
### Cgroups
|
||
|
|
||
|
* control groups
|
||
|
* Linux feature to separate processes into groups
|
||
|
* resource limiting e.g. cpu shares
|
||
|
* prioritization e.g. cpu pinning
|
||
|
* device access
|
||
|
|
||
|
### Namespaces
|
||
|
|
||
|
* provide isolated view of global resources for a group of processes
|
||
|
* only see other processes in namespaces
|
||
|
* only see allowed devices, users, file system...
|
||
|
* 2 PIDs: global one and one within namespace
|
||
|
* own root file system (copy of host root)
|
||
|
|
||
|
## WebAssembly
|
||
|
|
||
|
* W3C standard for portable high-performance applications
|
||
|
* binary code
|
||
|
* compiled to virtual CPU
|
||
|
* runs in runtime
|
||
|
* portable compilation target
|
||
|
* near-native performance
|
||
|
* WebAssembly System Interface (WASI): OS-level functionality + integrated
|
||
|
security
|
||
|
|
||
|
## Trusted execution environment
|
||
|
|
||
|
* confidential computing: protect data in use
|
||
|
* at-rest data: data on storage, just encrypt it
|
||
|
* in-transit data: use ewncryption
|
||
|
* in-use data: needs to be decrypted before it can be used in application
|
||
|
* TEE looks to address data in use security concern
|
||
|
* protect *guest* from untrustworthy *host*
|
||
|
* confidentiality: unauthorized entities cannot view data used in TEE, data
|
||
|
is encrypted in-memory
|
||
|
* integrity: prevent tampering (checksums)
|
||
|
* provable origin: hardware-signed evidence of origina and current state so
|
||
|
client can verify and decide to trust code running in TEE
|
||
|
* AMD Secure Encrypted Virtualization (SEV, SEV-ES)
|
||
|
* Intel Software Guard Extensions (SGX)
|
||
|
* Intel Trusted Domain Extensions (TDX)
|
||
|
|
||
|
![Container architecture](./img/ch09/container_diagram.png)
|