Unlocking the Cloud with Confidential Computing

Unlocking the Cloud with Confidential Computing

Nathaniel McCallum, co-founder of the Enarx project, gave a talk about Enarx and Confidential Computing at the 10th edition of All Things Open.

Please find the transcript of the talk below:

We're going to be talking about unlocking the cloud with Confidential Computing.

So everybody loves the cloud, right? The cloud is wonderful. It gives us all sorts of great things that we love that make our life better. All of these things, particularly time to value, is really great, right? Where we can start? An infrastructure project where we can have it up and running within 24 hours, in many cases. Because we can move quickly and we have that flexibility, this has impacts all across our institutions. And there's lots of reasons to love the cloud. So this is not a talk to try to convince you not to use the cloud, okay?

However, we do need to recognize that there is unfortunately a little bit of a dark side to the cloud, and there are very broadly speaking in the industry, concerns about public Cloud security. You can see for example the results here from the Bitglass 2020's Cloud security report, where 93 of respondents were concerned about public Cloud security and they may have some reason for this. Four million dollars, according to a study that was just published just a little bit ago this year, is now the average cost of a data breach, and unfortunately for those of us who are in the United States which is, given the hands we saw for those who live in this area, this is nine million dollars on average. And it's not better in the public Cloud, it's actually worse. If you're comparing basically two organizations, and one organization has very low Cloud utilization and another organization has very high Cloud utilization, the average cost of a breach increases by 68 percent. So it is not a small number.

We also have a shifting security liability landscape and this is something that probably should not be underestimated. Just this month,  October 5th, the former Uber Chief security officer Joseph Sullivan was found guilty in Criminal Court from the data breach that they had in 2016. This marks the very first time that an executive was actually held criminally liable for a data breach's that occurred in history, so it's not just about how much money we're sending. We saw that those numbers are pretty big, right? Four or nine million dollars on average is a pretty big amount of money. There's no jail time associated with that as of this month.

So the question is how do we unlock all of the potential of the cloud while decreasing the risks that are associated and the cost. And we're going to do this by going through a set of data breaches. I've chosen data breaches which I hope are representative of what we have and what I'm going to highlight here is that we actually have a security class problem not a security discipline or a procedure discipline problem. So these are just three of the larger breaches we saw this year. All of these are from 2022.

We had the first one which was the Red Cross. I think this was back in April, and third party critical vulnerability did not actually bypass the authentication systems. We're going to see that this is a pattern. In fact, this is a pattern that's rampant around the industry that we tend to think of this as an authentication problem, but it's not an authentication problem. What happened was by bypassing, by using this critical vulnerability, they were able to access credentials which were then used to launch an attack against a second system. That second system of course performed authentication normally using the credentials that it was given and granted access to the user, and so we ended up with an unauthorized user access.

A second study we had slightly later in the year was Cash App, and this exposed financial data investment portfolios and such for 8.2 million users. The initial cause of this was that they terminated an employee and they forgot to disable his account. Is anybody worried about that, anyone? It's a problem, right? I mean, this is normal stuff and this is a discipline problem, okay? They fail to revoke the ex-employee's account and because that ex-employee was bitter, that ex-employee was able to gain unauthorized user access and exfiltrate all of that data.

A third attack that happened this year was the LAPSUS$ attacks. Now they were able to attack multiple companies so let's note that this is not a single person problem, there's not an institution, a single institution, that's at fault here, because they didn't practice good discipline if we're attacking this wide of a swath of companies that are basically already working very hard on security. Then we don't have a discipline problem, we have a security class problem. So the attackers were able to exfiltrate hundreds of gigabytes of data and this included I think was like 190 gigabytes of private source code from Microsoft, which was then published publicly and the LAPSUS$ attacks were motivated by financial purposes, right? They were extorting the companies that they were involved with and they used a variety of techniques, but the most successful ones consistent across all of the attacks was a combination of fishing and multi-factor authentication bombing. Now this combination of attacks you'll notice what the first one does is it captures the user's first Factor credentials, and then the second one (MFA bombing) is where in order to get past a multi-factor authentication they just keep authenticating over and over and over again until the user eventually clicks this.

Now this same strategy, the MFA bombing, was also done in the most recent attack with Uber that just happened, what was this last month or beginning of last month where they did the same MFA bombing attack and they were able to get total compromise of everything in Uber. Uber completely lucked out because the attacker was not able, it appears, to actually get in and and do more any real damage, but it was not because they didn't have the keys to the kingdom. They did, they had everything, all of the cloud accounts for Uber.

So the last thing I want to point out here is that although we had some different initial causes, the final cause of all of these attacks was on authorized user access. Is this surprising to anyone? This is the endemic that we are combating in the industry and I want to ask the question: is unauthorized the correct terminology? And I'm going to say that it's not. The reason why it is not the correct terminology is because we, when we generally talk about deploying authentication and authorization systems, there was no fault whatsoever in the author in the authentication or authorization of these systems. The authentication completed successfully, right? The attacker gained credentials from somewhere and was able to use those credentials to attack a system that was functioning completely perfectly. In order to get the data, second authorization completed successfully. The authenticated user had permission to access the data so there is not a problem with the authentication system. That means that this is not an authorization problem. In fact, the common thread that underlies all of these attacks is that they used unattested code with authenticated and authorized user credentials. What do I mean by this? I mean that there was no way to limit what the user could do with the data once they were given access to it.

Let's look at how we used to do things. Anybody old enough to remember the days when we had secure networks that didn't have authentication and you could just download the data that you wanted? Does anybody still think that that's sane? Nobody thinks that's sane, right? So yesterday we sort of thought: okay, you know, we've got a secure network, it's in my company I can protect everything. I'll just put all the data on there and it's fine. But today we think that's completely nuts and if you tried to sell that to any CISO or any CSO worth their grain of salt, they're gonna throw you out immediately. And basically the pattern of this looks like is that the user on the remote system is unknown and also the code that they're using to access it is also unknown, right? Users don't interface with data directly, they don't spin the wheel on the disk and watch the bits go by with their eyes. All users have a bit of software that sits between them and the data that interprets the data for them, that they use then to exfiltrate it. So the user who is accessing the data is unknown and the code is unknown and when they ask for the data, the unauthenticated access says, sure, here's the data you can have it go to tap.

Now we would also then, back in the day, have an untrusted network and we would add authentication and authorization processes in order to be able to stop users from getting data that they weren't allowed to have access to, and so the main thing that has changed historically or changed in this pattern is that in the trusted network case we had users who were unknown and in the untrusted network we could know who the users are but in both cases we did not know what the code was on the other side.

In modern deployments, we talk about zero trust, but what zero trust really is, it's just saying all networks are insecure. We're not going to have some trusted network where people don't need to do authorization. We're going to do that all the time and so the user is known in this case no matter what network you're on and you can ask for data and if you are authenticated and authorized then you receive that data.

However, we think of this as seen today and what I'm arguing is that we shouldn't, we have new technologies now that we're going to be talking about that mean that we can actually do better than this. And in the future, when we look back at ourselves five, ten years from now, we're going to say how did we ever allow our code to be completely unidentified?

What Confidential Computing brings to the table is the ability to know information about the code on the remote side. So as part of your authorization decision when you are responding to this attested request for data you can know for a fact what that code is on the other side and therefore you have a reasonable knowledge of what it's going to do with that data and if the same user authenticates with different code we simply don't authorize that user and the reason for this of course is because we want to be able to scope what
code can do with data. And what I would like to argue here is that the root cause of a substantial number of our breaches today are caused by the inability to scope what code can do with data once it's received it.

Now we have a coping strategy for this. Our coping strategy basically boils down to blaming the victim. How many of you have heard these statements: you should have rotated your password, right?  You should have enabled multi-factor authentication. You should have learned how to identify multi-factor authentication bombing. And in more sophisticated attacks, they're actually doing SIM transfers on cell phones and now we're telling people to lock down their SIM transfers. So this can't be done. Notice that in all of these cases the pattern that re-emerges over and over and over again is that we put the onus and the responsibility on data security on the users. This simply does not scale and because it does not scale we will continue to see vulnerabilities over and over again. And not only are we going to continue to see vulnerabilities but, as attackers get better at them, the cost of those vulnerabilities is going to rise particularly as the value of our data actually rises as well.

And you can see I've given a statistic here. What we have is that the failure of each of these alternatives, we can't attest the code that's happening on the other side today. Or rather, we can, but most people don't know we can. So since we can't do that, we try all these alternatives but all they do is really push the burden onto the user and we're failing as an industry to keep up with the data breach cost inflation which is right now 12.7 percent over the year, right? So that the cost of each data breach is going up by that amount every year. Are your budgets increasing to match that amount every year? I would certainly imagine that they're not. Fortunately we do have some new tools to be able to help with this problem.

The Confidential Computing Consortium is, we'll talk about how they gather the industry together in a minute, but we're focused on building a technology called Confidential Computing and this introduces a new type of data isolation.

So type one is the one we all know about. We have a server and it's running, you know, multiple applications. It may even have multiple tenants on it and type 1 isolation ensures that workload A does not mess with workload B. This type of isolation is actually built on a hardware primitive. It's built on the ability to have a virtual address space because we all have MMUs in our CPUs and the kernel can swap and see which process has visibility to each memory, and then all of our security strategies are built upon this core hardware primitive that has existed since 1962. And so each workload basically gets its own view of memory and because it has its own view of memory we can now build these security primitives on top of that, and that's what isolates workloads. And we're actually pretty good at this today. I don't think that this is a bad thing. This is something we do pretty well and we do it at scale.

But the way that we do it is the question because the way that we do it is, we depend upon type 2 isolation, and that is the only way to secure workloads from each other is to secure the host from the workloads, right? So if you have a workload on that running on that host and it's able to attack the kernel and is able to breach the kernel, suddenly you've now bypassed that hardware primitive we talked about, the ability to have a virtual address space. And because you can bypass that and you can have whatever view of memory you want, all of a sudden you're able to now mess with other workloads.

So the way that we today lock down workload from workload is by isolating the host from the workload, so type 1 and type 2 isolation. But we actually need to move to a system where we have a third type of workload from host isolation. VMS and containers do not handle this. They do not do this today, they didn't attempt to, it is currently out of scope. And what we want to do in this case is we want to actually protect the workload from the host. It doesn't matter if you're ring zero and it doesn't matter if you have access to every single page mapping in the system and you can read all of the memory directly, because with Confidential Computing each workload itself exists in a hardware established trusted execution environment and that trusted execution environment is protected by cryptography. So generally speaking, each workload is encrypted with a separate key, and this is set up in the hardware and even the kernel does not have access to be able to mess with this. So even in the highest cases of data breach, we can now keep our workloads isolated from one another. Not because we're protecting the host from the workload, but because we're protecting the workload from the host.

Now there's a Consortium for this, so nobody's doing it alone. The Confidential Computing Consortium was founded by the Linux foundation and is focused on the enablement of Confidential Computing hardware using open source software. So this is all the good stuff we want, right? We want new hardware primitives to be able to give us the new security tools that we need to be able to solve these classes of problems and we want this to be powered by open source so that we can do this at scale and we can do this effectively and so that all of our code is auditable and reviewable and all the bugs are shallow. The Confidential Computing Consortium is very well attended, it has a very large membership list and my own particular company Profian is in that list. We are happy to say we are
joined by esteemed colleagues from all other institutions around the industry and, you know, there's many here. You may notice a few that aren't here. I would encourage you to go just sort of browse the list and see who isn't involved.

Confidential Computing, according to the definition, published at a white paper with an unfortunately long URL, minimally provides three things: 1) first, it provides data confidentiality, this means that the memory pages of the application are encrypted so, even if you tap the memory bus and read the accesses, all you see is ciphertext; 2) second, it provides data integrity, this means that you can't tamper, you can't flip the bits on this data without it being detected. Now you might be able to successfully launch a bit flip, for example. The key point is that the software will then begin to fault or the hardware will begin to fault at the software more accurately speaking, because it's detected that a modification has occurred; 3) the third thing that Confidential Computing minimally provides is code integrity, this means that the code that is actually running inside of this trusted execution environment can't be tampered with by the host. All of these guarantees are necessary in order to provide the things on the right.

So Confidential Computing may additionally provide 1) code confidentiality, this is where the code that you're actually running is itself confidential from the host; 2) it may provide authenticated launch. Authenticated launch means that before the environment is set up it can perform some sort of cryptographic validation to prove that the environment is sane. But it's the important difference here between authenticated launch and attestation is that authenticated launch occurs before the first instruction is executed inside the TEE; 3) the third thing that Confidential Computing can optionally provide is programmability, this is the ability to basically write any application you want and run it in this. Some systems may have a limited approach. Profian and Enarx, which we're going to talk about, do not; 4) the fourth thing that it may additionally provide is a attestability. This is attestation which we're talking about and this is in our opinion the most important new security primitive that we've had since 1962; 5) the last one is the ability to recover, so if there is some sort of defect in the hardware or in the firmware, how do we recover in the case when the TEE has been breached and is itself vulnerable? The ability to recover that platform is a important feature, but it's one that is currently optional in the Confidential Computing definition.

Now Profian is particularly interested in providing the highest possible security at the lowest possible cost. This means that we want to provide additional guarantees beyond the three that are on the left, so we always provide code confidentiality, we always provide programmability, testability. Because we provide testability we don't need authenticated launch, so that one we don't provide it, but it's not needed, and the last one is that we only run on hardware that offers the ability to do platform recovery.

The key thing I don't want us to get away from however is that even though attestation is in the optional column, according to the definition, this is not a statement of its importance. It's simply a statement of where we are in the industry today that not every technology that is trying to enter the Confidential Computing space currently provides attestation. However it is my view that attestation is the primary feature and that all other features in Confidential Computing are there and exist in order to support attestation as the new Cloud security primitive. Attestation requires confidentiality of data. If you can attest a platform to prove its trustworthiness and the remote party then discloses data to that remote entity, if the TEE does not provide confidentiality of data, then the attestation was worthless. Likewise with integrity of code and data, if you don't have integrity of code and data, then admins can tamper with their runtime behavior, right? They can manipulate what's happening on the inside of the TEE, so what good is it to attest the state of a code to a remote party if after the attestation has been given and the data has been disclosed the host can just simply change the behavior of the program? The third feature that's required is a hardware root of trust. This is the thing that actually produces the measurement. We get a signature from the actual hardware and that's rooted in a chain that goes back to the hardware manufacturer. The hardware manufacturer is guaranteeing that this particular combination of a CPU and its firmware is through this root of trust, something that is valid and should be trusted. If you don't have a hardware root of trust, on the other hand, how can you trust the attestation at all because anyone could just make up whatever measurement they want and send it over the wire. So notice that even though there are
sets going back here, even though there are a set of required features on the left hand side, all of them are required for the purposes of attestation, and attestation is the key thing that we are driving. It's the value that we want out of Confidential Computing.

There are a variety of Confidential Computing technologies, but they come in different flavors and you can't just migrate from one to the other. It's not like you can just run one and then run in another one. First of all, they run on different CPUs, so if your CPU instruction set isn't the same, then you're out of luck. But the other thing is that there's process models. This is where the memory within the process is encrypted and there's also a virtual machine model where you bring up a separate address space and that adds separate address space at least part of it is encrypted. There are a variety of technologies here from various companies. Everyone's playing in this space if they make a chip. You're thinking about providing Confidential Computing. The two that are widely available today
are SGX and SEV, and we'll see precisely how we use those in a moment.

So I'm here to mostly talk about Enarx. Enarx is an open source, opinionated, container like, batteries included, multi-cloud, multi-language, multi-hardware, WebAssembly application deployment framework, built for Confidential Computing and owned by the Linux Foundation. In fact we have the notorious claim to fame to be the very first open source project that was accepted by the Linux Foundation into the Confidential Computing Consortium (we can claim that only by about five minutes, but we'll take it). Enarx's aim is to solve security at scale. We want to increase security through tight integration.

We want first of all ubiquitous data in use, at rest, and in transit encryption. This should just always be on everywhere all the time. There's no reason to turn it off and if it's on all the time it means that the data is never decrypted. This allows us to do something really profound which is that we can separate the user who manages the resources of an application from visibility into the data of the application. So remember that use of reach we were talking about before most of the time in these high profile vulnerabilities, high profile data breaches. What's happening is someone whose responsibility it is to make sure an application has the resources it needs to run also necessarily has visibility of the data, and we want to divide those two. We want to separate that up. We want to allow people to have admin rights so that they can allocate resources, memory, disk storage, and network, and so forth to the application, but we don't want to hand them all of the data itself.

Second, we want confidentiality and integrity of code and data. So this is a higher bar than what the Confidential Computing Consortium defines.

Third, we want code scoping through automatic attestation. The way most platforms do attestation today is they leave it up to the developer. Good luck! Attestation is incredibly hard to do, and it's incredibly hard to do across multiple technologies. It involves careful cryptographic analysis and one step leads to complete and total compromise of platform.

Fourth, we provide integrated certificate and key management. This is not an add-on. If we're going to have encryption all the time everywhere, then the management of the keys and certificates associated with that encryption always has to be integrated and always has to be completely automated so that there's no admin involvement whatsoever. The most important thing is that the admin must never see the private key, so you can communicate with an application, you can talk to its APIs, you can perhaps interact with data through access control, but you never get direct access to data, and you never get direct access to the keys. We believe that this will provide a decreased cost and time to value due to the simplicity of this model.

It's also our goal to provide an efficient mesh of attested applications. So much of the talk in the confidential Computing world today is what I call the ego model, that is: I have my application I want to deploy my application in my cloud and I want me to be able to see the attestation of my application. Notice that there's no third parties involved in that sort of thinking, but the reality of the situation is, the moment we actually get this up and running, for me I'm going to have to prove it to you and so we need to be able to build
applications that are intelligent about how they can exchange data. They can always be attested, they can always know what the other party on the other side is going to do with the data that's being disclosed.

The last thing that Enarx aims to provide is auditability and decision transparency, and what we mean by this is all the code is always open. We write it in such a way that it is easy to audit, so that you know what's actually happening, and anytime we have to make a decision on behalf of the user, we provide the evidence that was made in order to make that decision. This means that you as an auditor trying to fix or trying to determine the scope of a breach after the fact, you always have access to the data that we used to make that decision. We meaning the platform of the code right on your behalf, so you can determine whether a decision was made correctly at any point within your audit logs.

Enarx provides a Keep architecture. The Keep is what we call the TEE, the trusted execution environment. The Keep prefers by the way to the most secure part of a castle. We tend to like castle terminology and the Keep roughly looks like this. We have a technology specific layer on the bottom today. This supports SGX and SEV and will support other technologies as the hardware becomes available and drivers become mature. On top of that we immediately go to a WebAssembly runtime and we use the WebAssembly system interface, or WASI, in order to provide system APIs on top of that. We're able to use language bindings such as WASI libc and then you can deploy your application like normal and we call this whole bundle a Keep.

WebAssembly is really a critical piece in our construction of Confidential Computing. First of all, it's fast. Basically the code runs in near native performance and this is why everyone is shifting to it in the browser there. It provides a minimal trusted computing base, so it's very very small, very lightweight. Basically the entirety of the runtime is less code than it will take in order to boot to your Linux kernel on a standard OS. Third, it provides runtime portability and this is particularly crucial for when you have a hardware vulnerability. If you're running on platform A and platform A becomes vulnerable, do you want to go back and re-architect your application to port the code over to something else? Do you want to recompile it for another platform and then retest, perhaps going through a week's worth of tax cycles or so? No, what you want to do is you want to redeploy the exact same binary immediately within minutes to a different platform that's not vulnerable. Fourth, WebAssembly provides us functional equivalence and this is important because when you're running Confidential Computing on multiple different CPUs and you produce an artifact out of that, and that artifact contains an attestation saying this ran in this Confidential environment with this code, how do you know that the code was the same if you're running it first on on Intel and second on ARM different CPU instructions? If you're running native code, you're going to get different cryptographic measurements and now you have to build an entire management service on top to keep track of all of the management of all of the versions of all of the code you've ever created so that you can know which ones are equal to each other. But by using WebAssembly and jitting at the last moment, we instead allow for all of the workloads to just have the same measurement regardless of which platform they're on. WebAssembly is of course multi-language, provides rich APIs. Another important feature from WebAssembly is shared nothing linking. This is an up and coming feature in WebAssembly, but it basically allows you to link together multiple WebAssembly binaries into a single deployable artifact where, as an artifact of linking, no memory is shared between the linking, so a compromise in one part of the WebAssembly doesn't have access to anything else in the WebAssembly, and this is really important for secure multi-party compute use cases where you want to have multiple parties that potentially distrust each other executing code and doing analysis in the same compute environment. The last reason for the importance of WebAssembly is industry consolidation. The whole industry is moving rapidly towards WebAssembly support. If you're coding in a language, there's somebody working on making it work well in WebAssembly today.

Enarx provides not only the runtime but some additional services as well. These services are going to continue to grow. Right now we provide two services: one is called Steward and the other is called Drawbridge. Stewart is an attestation aware Certification Authority so when your application comes up, your workload comes up, we do the attestation and we issue your application a certificate that can be used for encrypted communications and so forth. The second is Drawbridge, and Drawbridge is where the application is staged for launch. You can think of it sort of like Docker Hub, and the important thing here is that this allows us to give a name and a version to applications so that you don't have to just look at cryptographic measurements. You can actually have semantic awareness of what's actually happening in your audit logs. All of this provides a system where we can deliver the code confidentially into the Keep so the Keep comes up essentially empty, it proves itself to the remote party, and only then over a TLS connection is the application actually delivered into the Keep.

Unfortunately I'm about out of time, and I would love to give you a demo. I'll show you this real quickly while I talk about what the demo would have been. We have written an application called Cryptle. It's everyone's favorite game Wordle, or at least was until it got
unpopular. I don't play it anymore but it was very fun for a time. But this was basically a multiplayer version of this game and we wanted to provide a scenario where we could actually run a service and the access control is whether or not you can guess the words or not and we want to see if people can through memory scanning actually access the words. We would have shown, you can see here, that it turns purple when you guess someone else's word and, no surprise, when you run this on Wasmtime or Enarx in debug mode. You can just scan the memory and you can find the word out without trying to guess, so you can bypass the access controls. The most important bit is here, where we can show by scanning the memory that you can't actually find any words because everything is encrypted and you don't get any access to that data whatsoever, and that's roughly how Confidential Computing works.

You can actually try this yourself. So go to try.enarx.dev and you basically just click this button and it's going to pick randomly one of the technologies for you. We ended up with SGX this time and you can see that the very first application selected to run is the Cryptle application and I can just click deploy and it's starting that workload now. So it takes maybe a second or so to come up and then when we click on the link (the certificate's not currently trusted in my browser, we are going to fix that). You can see I now have a running Cryptle application and that's all it took to launch an application, a Confidential Computing application. By the way, it's written in Rust, all the source code is available. Feel free to play with it. There are other applications you can look here, in other languages. We are growing this support. You can also deploy your own applications in this environment as well, so you don't need hardware access to play with this and you can basically get started today.

Where can you run Enarx? Pretty much everywhere. Most of the clouds have support for it, the only major one that does not right now is Google and it's because their bare metal service is just one generation too old. That's a problem that will be fixed I'm sure in the coming months.

For more information check out enarx.dev, try it out yourself at try.enarx.dev. You can check out our GitHub and please star the project. We love to see what people are interested in. It'll also give you notifications about interesting stuff that's going on in the project and if you want to chat with us, come to chat.enarx.dev and we're happy to answer questions. We're happy to help you get it up and running. Anything that we can do to help out, we're glad to help with.

I'll give a one minute overview of Profian. Profian is a new next generation Cloud security startup. We closed a five million seed round last year in July and we are the custodian of the Enarx project. We have launched Profian Assure, which is a set of support and services around the Enarx platform, so that it does have Enterprise support. We did that in August. We're also providing services like cross-institution attestation and the list of services will be growing in the coming days. The thing I most want to drive home about Profian is our vision: we want to make digital privacy and trust possible everywhere for everyone. That's what we are trying to do. Thank you so much for coming!