diff --git a/09_secure_virtualised_workloads.md b/09_secure_virtualised_workloads.md new file mode 100644 index 0000000..c6e30ce --- /dev/null +++ b/09_secure_virtualised_workloads.md @@ -0,0 +1,148 @@ +# Secure virtualised workloads + +* **workload**: any job/payload that needs to be executed on infrastructure + * straight on OS + * in a VM + * in a container + +## Virtual machines + +![VM architecture](./img/ch09/vm_diagram.png) + +* physical hardware + * CPU, memory, chipset, I/O... + * resources often underutilized + * no isolation +* hardware-level abstraction + * virtual hardware + * encapsulate all OS and application state +* virtualization software + * hypervisor/VMM + * extra level of indirection to decouple hardware and OS + * strong isolation between VMs + * improves utilization +* secure multiplexing + * isolation on hardware level + * failure of one VM does not affect others +* entire VM is a file + * easy to snapshot, clone, move, distribute +* create once, run anywhere (well we try) +* types + * **type 1**: hypervisor runs on bare metal (no host OS) (VMWare, Microsoft + Hyper-V, KVM...) + * **type 2**: hypervisor runs on host OS (Virtualbox, VMWare Workstation...) + * relies on host OS to manage calls to hardware + * adds latency + * security risks of host OS exploitable + * aimed towards developers + +![Type 1 virtualisation](./img/ch09/type_1_hypervisor.png){width=50%} \ ![Type 2 virtualisation](./img/ch09/type_2_hypervisor.png){width=50%} + +## Containers + +* virtualization on OS level +* much more lightweight -> more dense utilization +* share same host OS / kernel +* advantages + * much faster startup + * easier to manage + * more containers per host than VMs +* no hardware isolation, so security issues +* the future + * blur the line between contains and VMs + * **Kata-containers**: lightweight VM per container (better security) + * **Microsoft HyperV**: sometimes wraps containers in lightweight VM +* Linux Security Modules (LSM) + * hostile processes can break out of container (badly configured + namespaces, kernel exploits...) + * LSM defines mandatory access control + * lists allowed capabilities (syscalls) per process + * defined by sysadmin + * prevents niche syscalls from being exploited +* types + * **OS-level containerization**: spawn containers straight on host OS + kernel + * isolation using kernel functionality (namespaces, cgroups...) + * no need for full guest OS + * no hardware extensions + * attackers could escape container and compromise host + * Docker + * **micro-VM**: containers in lightweight VMs on host + * utilizes hardware-enforced isolation + * containers do not share kernel + * safer + * slower startup, worse performance + * **unikernel**: application compiled together with tailored kernel + * monitor appplication on syscalls used + * once known, construct microkernel and fixed-purpose image + * no user space, only kernel space + * much smaller attack surface (kernel only contains what's necessary) + * runs straight on hypervisor or bare metal + * small footprint, quick to start + * **sandboxing**: container in sandbox running copy of host kernel + * syscalls translated to host kernel + * good isolation + * slow + * not all syscalls supported (yet) + + +![Container layout](./img/ch09/container.png){width=50%} \ ![Micro-VM layout](./img/ch09/micro_vm.png){width=50%} + +![Unikernel layout](./img/ch09/unikernel.png){width=50%} \ ![Sandbox layout](./img/ch09/sandbox.png){width=50%} + +## Linux kernel isolation support + +* [https://linuxcontainers.org/] +* built into Linux kernel +* LXC (Linux Containers) + * OS-level virtualization for running containers on Linux host + * low-level, difficult to use +* LXD (Linux Container Hypervisor) + * built on top of LXC + * Canonical development + * focus on containerising entire operations systems, not individual applications + +### Cgroups + +* control groups +* Linux feature to separate processes into groups + * resource limiting e.g. cpu shares + * prioritization e.g. cpu pinning + * device access + +### Namespaces + +* provide isolated view of global resources for a group of processes + * only see other processes in namespaces + * only see allowed devices, users, file system... + * 2 PIDs: global one and one within namespace + * own root file system (copy of host root) + +## WebAssembly + +* W3C standard for portable high-performance applications +* binary code + * compiled to virtual CPU + * runs in runtime +* portable compilation target +* near-native performance +* WebAssembly System Interface (WASI): OS-level functionality + integrated + security + +## Trusted execution environment + +* confidential computing: protect data in use + * at-rest data: data on storage, just encrypt it + * in-transit data: use ewncryption + * in-use data: needs to be decrypted before it can be used in application + * TEE looks to address data in use security concern +* protect *guest* from untrustworthy *host* + * confidentiality: unauthorized entities cannot view data used in TEE, data + is encrypted in-memory + * integrity: prevent tampering (checksums) + * provable origin: hardware-signed evidence of origina and current state so + client can verify and decide to trust code running in TEE +* AMD Secure Encrypted Virtualization (SEV, SEV-ES) +* Intel Software Guard Extensions (SGX) +* Intel Trusted Domain Extensions (TDX) + +![Container architecture](./img/ch09/container_diagram.png) diff --git a/img/ch09/container.png b/img/ch09/container.png new file mode 100644 index 0000000..98154b3 Binary files /dev/null and b/img/ch09/container.png differ diff --git a/img/ch09/container_diagram.png b/img/ch09/container_diagram.png new file mode 100644 index 0000000..7f310df Binary files /dev/null and b/img/ch09/container_diagram.png differ diff --git a/img/ch09/micro_vm.png b/img/ch09/micro_vm.png new file mode 100644 index 0000000..3a4aacc Binary files /dev/null and b/img/ch09/micro_vm.png differ diff --git a/img/ch09/sandbox.png b/img/ch09/sandbox.png new file mode 100644 index 0000000..67dceb9 Binary files /dev/null and b/img/ch09/sandbox.png differ diff --git a/img/ch09/type_1_hypervisor.png b/img/ch09/type_1_hypervisor.png new file mode 100644 index 0000000..020d424 Binary files /dev/null and b/img/ch09/type_1_hypervisor.png differ diff --git a/img/ch09/type_2_hypervisor.png b/img/ch09/type_2_hypervisor.png new file mode 100644 index 0000000..9d8f0f8 Binary files /dev/null and b/img/ch09/type_2_hypervisor.png differ diff --git a/img/ch09/unikernel.png b/img/ch09/unikernel.png new file mode 100644 index 0000000..6e8c885 Binary files /dev/null and b/img/ch09/unikernel.png differ diff --git a/img/ch09/vm_diagram.png b/img/ch09/vm_diagram.png new file mode 100644 index 0000000..ffb914b Binary files /dev/null and b/img/ch09/vm_diagram.png differ