The Open Secret about Confidential Computing

Apr 07, 2024

Confidential Computing is an emerging field that aims to protect running workloads (“data in use“) from their environment, thus reducing the Trusted Computing Base (TCB). For VMs, this means that the threat model is updated to not trust the hypervisor. The main push is from the public cloud vendors to enable running more sensitive workloads. In short, the CPU is trusted and creates a clean VM/enclave that can be measured and attested. The attestation can be sent off to a trusted environment that can exchange it for the secrets needed to perform the work.

If you read the confidential compute marketing from Azure and GCP it is easy to think that this is a solved problem. If you read Trust in Computer Systems and the Cloud - which is an excellent introduction to the topic - it acknowledges that this is a hard problem with a lot of things to get right, and a lot of things that are considered out of scope (chapter 11). If you read Security Engineering: A Guide to Building Dependable Distributed Systems - a highly recommended read in general - it is more blunt; saying that the ecosystem can be undermined by a single extracted attestation key, which has already been demonstrated (section 20.6, third edition).

The three main technologies used by Azure and GCP are AMD SEV-SNP (VM), Intel SGX (enclave) and Intel TDX (VM). Intel TDX uses Intel SGX for attestation. Both Intel SGX and AMD SEV have had their attestation keys extracted. In the case of Intel SGX this was used to break UHD Blu-Ray DRM. There are so many side channels in Intel SGX that they used a table to keep track of which CPUs are vulnerable to which attacks. When Google looked at both AMD SEV-SNP and Intel TDX they found a bunch of issues in both. This week further issues in both solutions were published by researchers at ETH Zurich.

Over time the Intel and AMD solutions should become harder targets as bugs are found and fixed. However, some architectural decisions might continue to produce bugs; the Intel and AMD approaches are based on microcode running on the same cores as other workloads, rather than dedicated hardware. This makes them more exposed to side channel attacks (which is the source for many of the known bugs). The upside is that some things can be patched without having to scrap hardware. Though, this also enables the update process to be a potential attack vector.

AWS uses their proprietary Nitro system, which looks interesting on the surface, but lacks public details and external scrutiny, so it’s hard to compare to the Intel and AMD solutions. Conceptually, Azure and GCP shift some of the trust to the CPU vendor (AMD or Intel, which are already trusted), while AWS take on the same role with their own custom hardware.

Don’t get me wrong: I think reducing the TBC, which is the aim of Confidential Computing, is a great and desirable goal. It should increase the cost of attacks and help protect sensitive data. But let’s be clear eyed about the current state of things; measured enthusiasm is in order.

stiankri's blog

Discussion about this post