[00:00.000 --> 00:22.960] Hello everyone, I am Hugo Le Feuvel, PhD student at the University of Manchester. [00:22.960 --> 00:28.120] In this talk, I will present the result of my research on compartments interface vulnerabilities, [00:28.120 --> 00:34.440] a work that will appear in NDSS 23. This is the result of a collaboration between Manchester, [00:34.440 --> 00:42.400] Bucharest, Rice and Unicraft.io. Before starting to talk about interface [00:42.400 --> 00:47.680] vulnerabilities, let me bring a little bit of necessary background. And a very important [00:47.680 --> 00:53.520] notion in this work is compartmentalization. Compartmentalization is about decomposing [00:53.520 --> 00:58.840] software into lesser-privileged components, such that components only have access to what [00:58.840 --> 01:04.360] they need to do their job on. Compartmentalization is not particularly something new, so let [01:04.360 --> 01:10.240] me illustrate it with a real-world example, web servers. Typically, web servers are composed [01:10.240 --> 01:15.960] of components that do, on the one hand, privilege things like listening to port 80, and on the [01:15.960 --> 01:21.280] other hand, of other components that perform risky operations like parsing network-provided [01:21.280 --> 01:27.080] data. If we have these two components in the same process, then this process has to be [01:27.080 --> 01:32.480] root, and that's problematic because if an attacker manages to compromise the network-facing [01:32.480 --> 01:38.360] component, for example, then it will immediately own the root process. So what people do in [01:38.360 --> 01:45.320] practice is decomposing or compartmentalizing the server into two entities in separate processes, [01:45.320 --> 01:50.560] the master, which is privileged and not exposed to risky operations, and the worker, which [01:50.560 --> 01:56.280] is deprivileged and exposed to network data. Both entities then communicate over shared [01:56.280 --> 02:01.480] memory. Thus, if the worker gets compromised, it will not be able to perform privileged [02:01.480 --> 02:08.080] operations and will remain contained. Recently, we have seen really nice advances in the field [02:08.080 --> 02:13.120] of compartmentalization. People have been taking more fine-grained, more arbitrary, [02:13.120 --> 02:19.200] and more automatic approaches to compartmentalization. And what these work do is taking arbitrary [02:19.200 --> 02:26.400] applications, identifying a particular component that may be untrusted or risky, or trusted [02:26.400 --> 02:33.160] and critical, and compartmentalizing it automatically or semi-automatically. The granularity of [02:33.160 --> 02:38.760] the component can be very variable, ranging from libraries to blocks of code. Notice that [02:38.760 --> 02:43.360] I'm talking about compartments here, not processes, as the isolation technology too [02:43.360 --> 02:49.160] is very variable. In short, the goal of these works is quite ambitious. It's about compartmentalizing [02:49.160 --> 02:56.480] legacy software with a low engineering effort and a low performance cost. Unfortunately, [02:56.480 --> 03:02.240] as we're highlighting in this work, things are not as easy as they might seem. And privileged [03:02.240 --> 03:08.760] separated software, cross-component interfaces are the attack surface. And there, all sorts [03:08.760 --> 03:14.720] of things can go wrong security-wise. Let me give you a few examples. Let's say we have [03:14.720 --> 03:20.560] two compartments. One on the left, malicious, and the other one on the right, trusted, protecting [03:20.560 --> 03:26.800] some secret. The compartmentalization mechanism guarantees us that Compartment 1 cannot access [03:26.800 --> 03:33.080] Compartment 2's memory directly. So that doesn't work. However, Compartment 1 is still able [03:33.080 --> 03:40.240] to do legitimate API calls to Compartment 2 with, for example, an invalid pointer. If [03:40.240 --> 03:45.880] Compartment 2 doesn't validate the pointer, it will risk exploitation. Another example [03:45.880 --> 03:52.000] is the usage of corrupted indexing information, for example, a size, index, or bounds, as [03:52.000 --> 03:57.880] is done in this function. Another one is the usage of a corrupted object, such as a tampered [03:57.880 --> 04:04.840] file pointer. And there are many others which will go through partially in the next slide. [04:04.840 --> 04:10.080] In this work, we unify all of these vulnerabilities under the concept of compartment interface [04:10.080 --> 04:17.080] vulnerabilities, or SIVs. SIVs encompass traditional confused deputies, IAGO attacks, which are [04:17.080 --> 04:22.360] SIVs specific for the system called API, and their references and their influences under [04:22.360 --> 04:28.840] influence and probably many others. They are all attacks revolving around misuse of a legitimate [04:28.840 --> 04:35.080] interface. SIVs are very common when compartmentalizing and modified applications, as we further [04:35.080 --> 04:41.720] highlight in this talk. They affect all compartmentalization framework because they are a fundamental [04:41.720 --> 04:48.080] part of the problem of privilege separation. To put it in more precise words, we define [04:48.080 --> 04:54.800] SIVs as the set of vulnerabilities that arise due to lack of or improper control and data [04:54.800 --> 05:01.920] flow validation at Compartment Boundaries. We observe three classes of SIVs, data leakages, [05:01.920 --> 05:07.200] data corruption, and temporal violations. Within data leakages, we differentiate between [05:07.200 --> 05:11.800] address leakages, which can be leveraged to de-rentamize compartments and mount further [05:11.800 --> 05:18.760] attacks, and compartment confidential data leakages, which result in information disclosure. [05:18.760 --> 05:24.920] Both are due to data oversharing and sharing of uninitialized memory. We have already illustrated [05:24.920 --> 05:30.040] a range of data corruption attacks in the previous slide, but generally, there are [05:30.040 --> 05:36.520] not to happen in situations where interface-crossing data is used without appropriate sanitization. [05:36.520 --> 05:40.240] They can affect control as well as non-control data. [05:40.240 --> 05:47.200] Finally, temporal violations include vulnerabilities like expectation of API usage ordering, usage [05:47.200 --> 05:51.600] of corruptive synchronization primitives, or a shared memory time of check to time of [05:51.600 --> 05:57.440] use. Temporal violations are usually caused by a wide range of behaviors, including missing [05:57.440 --> 06:04.120] copies, double fetches, and generally lack of enforcement of API semantics. This is a [06:04.120 --> 06:09.640] broad and succent overview, but the paper provides a full taxonomy including an analysis [06:09.640 --> 06:18.200] of existing defenses. So having observed and characterized the problem, we asked a few questions. [06:18.200 --> 06:25.120] How many SIVs are there at legacy-imported APIs? Are all APIs similarly affected by SIVs, [06:25.120 --> 06:30.080] for example, taking library API generally versus module APIs generally? Do we observe [06:30.080 --> 06:36.160] systematic differences? How hard are these SIVs to address when compartmentalizing? [06:36.160 --> 06:40.640] And finally, how bad are they? If for some reason you don't fix one of them or just [06:40.640 --> 06:46.400] decide to not fix them at all, what is the impact on the guarantees that compartmentalization [06:46.400 --> 06:51.480] can give to you? We believe that it is really critical to understand [06:51.480 --> 06:57.960] these points to be able to provide countermeasures that are adequate, systematic, and usable. [06:57.960 --> 07:03.360] And so the approach that we take in this work to answer these questions is to design a tool, [07:03.360 --> 07:08.320] and more particularly a fuzzer, specialized to detect SIVs at arbitrary interfaces, and [07:08.320 --> 07:15.640] we call this tool Comfuzz. Then we apply Comfuzz at scale to a range of applications and interfaces [07:15.640 --> 07:22.360] to gather a dataset of real-world SIVs. Finally, we study, systematize, patronize [07:22.360 --> 07:29.480] the resulting dataset to extract numerous insights on the problem of SIVs in compartmentalization. [07:29.480 --> 07:33.800] In the next slides, I will give a quick overview of Comfuzz before focusing on the dataset [07:33.800 --> 07:38.320] and insights. So let me give you a high-level overview [07:38.320 --> 07:43.280] of the fuzzer first. Taking unmodified applications, we instrument [07:43.280 --> 07:48.320] them to intercept cross-compartment calls. Compartments are freely defined, for example, [07:48.320 --> 07:54.240] a particular library boundary or an internal component interface. [07:54.240 --> 08:00.520] We based our prototype on dynamic binary instrumentation using Intel PIN, but also explored other [08:00.520 --> 08:07.640] instrumentation approaches, for example, LLVN-based. The interface between the trusted and untrusted [08:07.640 --> 08:13.400] components is automatically detected using binary debug information. [08:13.400 --> 08:18.720] Our fuzzing monitor then drives the exploration by ordering mutations of the data flow to [08:18.720 --> 08:24.640] simulate attacks from the malicious compartment to the trusted compartment. [08:24.640 --> 08:29.120] The workload used to drive the program is application-specific, for example, benchmark [08:29.120 --> 08:35.120] tools, test suites, custom workloads, etc. You could even plug another fuzzer like OSS [08:35.120 --> 08:39.520] there. Finally, the fuzzer automatically triages [08:39.520 --> 08:48.120] and stores crash reports that includes de-duplicating, reproducing, minimizing, etc. [08:48.120 --> 08:52.320] The paper provides much greater details on these technical matters, and I will be happy [08:52.320 --> 08:57.200] to elaborate if you have questions. Using Comfuzz, we gathered a substantial [08:57.200 --> 09:02.720] dataset that we carefully dissected. Here you can see the paper's big table that [09:02.720 --> 09:07.160] summarizes the dataset. Let's have a closer look at it. [09:07.160 --> 09:15.720] Overall, we applied Comfuzz to 25 applications and 36 APIs, for a total of 39 scenarios. [09:15.720 --> 09:21.840] We considered a selection of library APIs, module APIs, and internal component APIs, trying [09:21.840 --> 09:26.440] to focus on scenarios that make sense in popular software. [09:26.440 --> 09:32.600] In fact, 16 of these scenarios have been previously considered by about 12 studies in the literature, [09:32.600 --> 09:35.160] and the attacks that we find apply to them as well. [09:35.160 --> 09:44.680] In total, we find 629 SIVs. We classify these SIVs in five impact classes, read impact, [09:44.680 --> 09:51.560] write impact, execution, memory allocator corruption, and null point under reference. [09:51.560 --> 09:56.400] With this data, the first questions that we try to answer are how many SIVs are there [09:56.400 --> 10:04.120] at legacy or unmodified arbitrary APIs, and are all APIs or code similarly affected? [10:04.120 --> 10:09.520] And looking into this, we quickly confirmed that SIVs are absolutely widespread among [10:09.520 --> 10:15.000] unmodified APIs or code. Having said that, we also highlighted significant [10:15.000 --> 10:20.120] disparities of prevalence among scenarios, and that's the really interesting part. [10:20.120 --> 10:26.120] For example, we observed variations of SIV counts from 0 to 105 across APIs. [10:26.120 --> 10:29.160] That's quite significant. Take a look at this plot, which represents [10:29.160 --> 10:35.520] for each scenario the number of vulnerable API endpoints versus non-vulnerable. [10:35.520 --> 10:41.480] It clearly shows that SIV prevalence among applications and APIs is very heterogeneous. [10:41.480 --> 10:48.560] We have large and almost totally SIV-free APIs, and small and fully vulnerable APIs. [10:48.560 --> 10:55.840] In fact, we show an entire absence of correlation between API size and SIV count in this dataset. [10:55.840 --> 11:03.360] So while clearly, yes, SIVs are widespread, no, not all APIs are similarly affected. [11:03.360 --> 11:09.600] This motivates us to look into the patterns and effects that influence these observations. [11:09.600 --> 11:15.560] And doing so, we observe recurring APIs and patterns that result in SIVs. [11:15.560 --> 11:20.720] This really comforts us in the idea that the presence of SIVs is influenced by structural [11:20.720 --> 11:26.880] properties of the API, rather than API size or quantity of shared data. [11:26.880 --> 11:32.000] In this talk, I will present one of these patterns, but there are more in the paper. [11:32.000 --> 11:36.160] And the particular pattern I want to go through concerns modular APIs. [11:36.160 --> 11:42.560] Indeed, we noticed that modular or module APIs are the most SIV-vulnerable interfaces [11:42.560 --> 11:46.960] in our study. On average, we observe that module APIs feature [11:46.960 --> 11:51.920] more SIVs and worse SIVs than any other class of APIs. [11:51.920 --> 11:55.840] And looking at the structure of these interfaces, it makes sense. [11:55.840 --> 12:01.440] Unlike library APIs, module APIs must be very generic and yield high performance. [12:01.440 --> 12:07.200] As a consequence, we have patterns with the application exposing its core internals [12:07.200 --> 12:12.160] and its core states to the module to achieve their generosity and performance. [12:12.160 --> 12:17.200] But this results in a much larger attack surface exposed to the module. [12:17.200 --> 12:23.040] Take the example of this data structure exposed to potential malicious modules by the Apache [12:23.040 --> 12:28.560] HTTP core. It is a very complex with over 75 fields, [12:28.560 --> 12:34.080] 60% of which point us, referencing core data structures like memory pools, connection [12:34.080 --> 12:39.920] state structures, or mutexes. What we observe with this pattern is a [12:39.920 --> 12:48.320] somewhat counter-intuitive thing. Modularity is not always good for compartmentalization, [12:48.320 --> 12:52.240] and in many cases, it can even be counterproductive. [12:52.240 --> 12:55.680] This is only one of the patterns that we highlight, and there are more in the paper. [12:57.040 --> 13:02.240] Now, having shown that SIVs are widespread but affecting applications unequally, [13:02.240 --> 13:05.840] or APIs, let's look at their concrete security impact. [13:05.840 --> 13:11.440] And the first thing that we confirm is that they are quite impactful. In fact, [13:11.440 --> 13:18.000] over 75% of scenarios present in our dataset show at least one right vulnerability. [13:18.640 --> 13:24.960] And worse than that, about 70% of write and read and 50% of execute vulnerabilities [13:24.960 --> 13:31.040] are arbitrary, which means that the attacker, which means that when the attacker controls [13:31.040 --> 13:36.240] a write or read primitive, then they are likely to be able to read and write anywhere. [13:37.040 --> 13:41.520] And while only a smaller portion of these scenarios have execution impact, [13:41.520 --> 13:46.720] it is likely that read and write primitives will be combinable to achieve execution capabilities. [13:47.680 --> 13:52.080] In this talk, I will be concretely illustrating this impact with practical scenarios [13:52.080 --> 13:57.920] and real-world SIVs taken from the dataset, where we demonstrate key extraction from a protected [13:57.920 --> 14:02.320] OpenSSL. Once again, here, we show more details in the paper. [14:03.440 --> 14:08.720] So here, we assume a scenario with two compartments, where the goal is to isolate OpenSSL. [14:09.920 --> 14:13.200] For example, from a compromised web server engine X. [14:14.640 --> 14:20.960] Isolating OpenSSL, or part of OpenSSL, is a popular application of compartmentalization, [14:20.960 --> 14:26.240] both in the literature and in the industry. Thus, here, the compartment interface and [14:26.240 --> 14:33.360] therefore the attack surface is the OpenSSL public API. Unfortunately, we find several SIVs [14:33.360 --> 14:39.600] that enable for read, write, and execution impact. Take this option setting primitive, for example, [14:39.600 --> 14:46.240] which is part of the OpenSSL public API. It differences an interface crossing pointer, [14:46.240 --> 14:51.520] sets it, and returns it, clearly resulting in an arbitrary read and write oracle. [14:51.520 --> 14:56.640] Any attacker that can compromise the application's control flow will likely be able to extract [14:56.640 --> 15:02.880] SSL keys easily. Thus, clearly, if the API is not carefully enough sanitized, [15:02.880 --> 15:07.200] the benefits will be pretty low, at most a form of weak hardening. [15:09.200 --> 15:14.480] Now, you could tell me that it's not a good idea to protect at the public API anyways, [15:14.480 --> 15:20.240] and that we should rather choose the OpenSSL internal key API that's much smaller. [15:20.240 --> 15:26.480] So, let's take a look at it. This time, we have NGINX and most of OpenSSL in the untrusted compartment, [15:26.480 --> 15:32.880] while we have the small key handling part of OpenSSL together with the keys in the protected compartment. [15:33.840 --> 15:39.600] Unfortunately, here too, we find several SIVs. Take a look at this function of the internal key [15:39.600 --> 15:45.760] API, for example. I only put the signature for simplicity's sake because the function is implemented [15:45.760 --> 15:53.280] in per-generated assembly. You can manipulate the in pointer to point to the key that you cannot [15:53.280 --> 16:00.640] directly access, encrypt with a known key, and then decrypt to get the secrets. Hence, here again, [16:00.640 --> 16:05.840] attackers that can manage to compromise the application are likely to be able to easily [16:05.840 --> 16:12.640] extract the key. Unfortunately here, fixing the SIVs requires to make the component stateful, [16:12.640 --> 16:18.640] which is a fairly drastic design change. Overall, through these two examples, [16:18.640 --> 16:24.640] I showed how existing OpenSSL isolation strategies collapse when confronted to SIVs, [16:24.640 --> 16:31.360] and how important they are security-wise. To conclude this talk, let's take a quick look [16:31.360 --> 16:38.640] at countermeasures. How do we tackle SIVs? Overall, we see two ways. First, making progress on [16:38.640 --> 16:45.640] automatic and systematic countermeasures. Our paper highlights the limitations as part of our SIV taxonomy. [16:46.640 --> 16:52.640] Second, learning from our study of patterns. We also believe that software component APIs [16:52.640 --> 17:00.640] should be designed to feature low compartmentalization complexity in the first place. We provide a set [17:00.640 --> 17:08.640] of guidelines to achieve this. The two approaches are complementary. Even in the presence of [17:08.640 --> 17:14.640] countermeasures, well-designed APIs are wishable, as the first point is known to be fundamentally [17:14.640 --> 17:20.640] harder. I will not have enough time to go over all the guidelines, but let me try to give you the [17:20.640 --> 17:30.640] gist of them. First, not every interface is a good boundary for privilege separation. Maybe a particular [17:30.640 --> 17:36.640] API doesn't fit privilege separation, and that's fine. In this case, it will be hard to harden anyways. [17:36.640 --> 17:42.640] Second, we recommend that major attention should be dedicated to reducing the complexity of [17:42.640 --> 17:48.640] interface crossing objects. Avoiding, for example, sharing of resource handle, system resource [17:48.640 --> 17:54.640] extracts, synchronization primitives, et cetera. If this is not possible, it should bring us back [17:54.640 --> 18:00.640] to the first point. The interface is probably not the right point to compartmentalize. For example, [18:00.640 --> 18:06.640] because components are too deeply entangled. Third, compartmentalizable components should [18:06.640 --> 18:12.640] enforce API semantics to be safe. For example, ordering or concurrency support. Under distrust [18:12.640 --> 18:18.640] scenarios, it is not acceptable anymore to assume that the caller will respect them or face the [18:18.640 --> 18:24.640] consequences. We are slowly coming towards the end of this talk, so let me summarize the key points [18:24.640 --> 18:32.640] that I wanted to make. Civs should be at the center of every compartmentalization approach, and you [18:32.640 --> 18:38.640] will likely not achieve tangible security benefits without considering them. API design patterns [18:38.640 --> 18:44.640] influence the presence of civs and their severity. Overall, it's not so much about the size of the [18:44.640 --> 18:52.640] API. It's about the complexity of API crossing objects. Addressing civs is not just a matter of [18:52.640 --> 19:00.640] writing a few checks. In fact, strong solutions often require refactoring the API. Thus, [19:00.640 --> 19:06.640] compartmentalizing apps goes much further than just setting and enforcing bounds. [19:06.640 --> 19:14.640] We want this work to be an appeal for more research towards addressing the problem of civs, [19:14.640 --> 19:20.640] systematically finding them, addressing them, or telling you what interface may good compartmentalization [19:20.640 --> 19:26.640] boundaries. If you are interested in this work, I invite you to check out our paper and the code [19:26.640 --> 19:36.640] and data set of Confuzz. I will now be more than happy to take questions. Thank you. [19:56.640 --> 20:06.640] Thank you, Hugo, for this very accessible talk on this important topic of securing interfaces. [20:06.640 --> 20:14.640] One question maybe that I can start with is something that you brought up yourself as well. [20:14.640 --> 20:20.640] You say it's more about compartmentalization, but it also applies obviously to TEs. Can you comment [20:20.640 --> 20:28.640] a bit on that? Is that something you consider Confuzz, your physics could be extended to [20:28.640 --> 20:38.640] something like Gromine? Actually, maybe there are two different parts. I think the conceptual part [20:38.640 --> 20:44.640] about compartment interface vulnerabilities, maybe we could remove the compartment out of [20:44.640 --> 20:50.640] interface, out of compartment interface vulnerabilities, and just get interface vulnerabilities. I think [20:50.640 --> 20:56.640] it has also been described by other works previously, notably some of the work that you did, Joe. [20:56.640 --> 21:02.640] I think that applies to TEs really, really well. I think it's just a generic problem about [21:02.640 --> 21:12.640] interfaces, and that fully applies to TEs. Regarding the fuzzer, from a very technical point of view, [21:12.640 --> 21:22.640] I think that it might need some adaptation to be run on existing TE software, but it's absolutely [21:22.640 --> 21:28.640] feasible. I think that it could apply there as well. We didn't really explore it because [21:28.640 --> 21:34.640] obviously at some point we needed to restrict the scope of what we're doing, but I think it makes [21:34.640 --> 21:40.640] sense. Following up on that as well, I think you mentioned in your slides one of the technologies [21:40.640 --> 21:46.640] that you could use for compartmentalization. It's not only TEs, it's also something like Cherry. [21:46.640 --> 21:54.640] It uses capabilities, and I'm wondering, TEs are not great in these vulnerabilities because [21:54.640 --> 21:58.640] you have these confused specialty attacks that you explained also, where you have a pointer [21:58.640 --> 22:04.640] that you essentially can dereference. With Cherry, with capabilities, you have sort of [22:04.640 --> 22:12.640] native mitigations for many of those, capabilities I think were made with the idea of avoiding [22:12.640 --> 22:18.640] confused deputy. Can you comment a bit on what underlying technology can mean for the vulnerability [22:18.640 --> 22:28.640] of compartmentalization? I'm not sure if I can, I don't think I can share my screen, but maybe [22:28.640 --> 22:36.640] I can. But you can put a link maybe in the chat for people. Actually in the paper we did [22:36.640 --> 22:46.640] talk about this, so I'm just going to share my screen, but maybe I can. I'm sorry, I just [22:46.640 --> 22:52.640] broke everything. I just posted the link, I don't know if I triggered something terrible. [22:52.640 --> 22:58.640] I think I see the link, I think you unmuted that or something. So the paper goes in data, [22:58.640 --> 23:06.640] can you summarize maybe in the minute that remains? Absolutely, yes. So Cherry provides [23:06.640 --> 23:12.640] some features that as you said are really nice in addressing some of the spatial part [23:12.640 --> 23:20.640] of the compartment interface spectrum, of the SIV spectrum. It does not solve everything, [23:20.640 --> 23:30.640] it's not magic. Like many of the leakage issues remain, many of the temporal issues remain [23:30.640 --> 23:38.640] as well, because to some extent they are a little bit more high level than just spatial [23:38.640 --> 23:47.640] things on memory. So they still apply. For example, the issues with ordering of interface [23:47.640 --> 23:53.640] calls. If you have an interface that has some ordering expectations, for example calling [23:53.640 --> 23:59.640] function one before function two, and you don't respect that, Cherry is not necessarily [23:59.640 --> 24:06.640] going to help you. So this is going to remain. So it does address part of it, but it's not [24:06.640 --> 24:18.640] necessarily going to help you. Thank you.