[00:00.000 --> 00:11.000] Hello everyone, my name is Andrea, and today I'm going to tell you about what I think [00:11.000 --> 00:12.000] about fuzzing in Brazil. [00:12.000 --> 00:18.000] This presentation is not about fuzzing itself, but rather how we've failed at it. [00:18.000 --> 00:26.000] So before I start with the big pose of fuzzing, I will tell you a bit about fuzzing itself. [00:26.000 --> 00:30.000] I hope some of you already know about it, I don't have a lot of time. [00:30.000 --> 00:35.000] So fuzzing is basically an automated testing technique. [00:35.000 --> 00:44.000] The idea is to just send random input to a program to see how it behaves in that case, [00:44.000 --> 00:53.000] and how it works is that you use typically a tool, like a buzzer, that is going to generate random input for you, [00:53.000 --> 00:58.000] and then you're going to close some functions with that random input, [00:58.000 --> 01:06.000] and the buzzer is going to report some findings, and if it finds any interesting input files, [01:06.000 --> 01:09.000] it's going to write them to a course. [01:09.000 --> 01:15.000] Findings in this case can have crashes, can be hands, but can also be timeouts. [01:15.000 --> 01:20.000] So for fuzzing, when you first do it, you typically start from an empty corpus, [01:20.000 --> 01:24.000] but as you run fuzzing, you're going to generate some interesting inputs, [01:24.000 --> 01:31.000] which is helpful because in the next ones, you can just reuse those inputs and start from scratch. [01:31.000 --> 01:34.000] This helps with finding interesting things faster. [01:34.000 --> 01:41.000] So in this VMM, we implemented fuzzing for VM RTIO. [01:41.000 --> 01:47.000] We have three fast targets, one for the RTIO queue, one for the serialization of the RTIO queue, [01:47.000 --> 01:53.000] and one for the RTIO WISO in the Rescue Memo project. [01:53.000 --> 01:56.000] We only have implementation for the packet, so that's what we fuzzed. [01:56.000 --> 01:59.000] During fuzzing, we discovered three crashes, [01:59.000 --> 02:05.000] and only one of them is triggerable by any quotation malicious driver, [02:05.000 --> 02:08.000] and what we have now is that we are able to run fuzzing [02:08.000 --> 02:14.000] for every request that you're submitting to Rescue Memo to the VM RTIO repository. [02:14.000 --> 02:18.000] The fuzzing is apparently using link fuzzer, [02:18.000 --> 02:24.000] and besides the fuzzing that is happening in Rescue Memo itself, [02:24.000 --> 02:27.000] the folks from Cloud Hypervisor are also running fuzzing, [02:27.000 --> 02:32.000] and we also discovered a timeout theme in the package. [02:32.000 --> 02:36.000] So this actually brings me to our first report. [02:36.000 --> 02:44.000] So what is it you want? It should actually be... [02:44.000 --> 02:53.000] It's a people, and that is me. [02:53.000 --> 02:58.000] The first people is that you actually have to run the timeout, [02:58.000 --> 03:02.000] just in the code field for what you're in fuzzing for. [03:02.000 --> 03:06.000] Because the default, for example, for the fuzzing that we were using, [03:06.000 --> 03:11.000] is actually 20 minutes, and since we are just working with what I always would have used, [03:11.000 --> 03:16.000] and there's nothing that can possibly take 20 minutes to process, [03:16.000 --> 03:22.000] so we have to adjust the timeout to 60 seconds in our case, [03:22.000 --> 03:28.000] and this is something that was recommended by the folks from Cloud Hypervisor. [03:28.000 --> 03:33.000] Now, how we're running fuzzing in Rescue Memo is at the library level. [03:33.000 --> 03:37.000] The advantage of this is that it's easier to set up. [03:37.000 --> 03:40.000] So it's really important that it's easy to set up. [03:40.000 --> 03:42.000] It is a good thing. [03:42.000 --> 03:45.000] People are like, oh, but you're running fuzzing at the library level, [03:45.000 --> 03:49.000] so you don't have to have the kernel that's like so easy, so simple. [03:49.000 --> 03:51.000] So they're like, yeah, it's great, right? [03:51.000 --> 03:53.000] I mean, like, this is a good thing. [03:53.000 --> 03:58.000] And yeah, it's a good thing because you can also run on almost any host. [03:58.000 --> 04:02.000] You just have to have a fuzzing install in the repository, [04:02.000 --> 04:04.000] and then you just run fuzzing. [04:04.000 --> 04:07.000] And it also runs in a user space. [04:07.000 --> 04:09.000] There's also disadvantages, of course. [04:09.000 --> 04:14.000] The first one being that you cannot cover the whole repair setup, [04:14.000 --> 04:19.000] so that means that you're going to have some things that are great to be fuzzed. [04:19.000 --> 04:23.000] And then because you are fuzzing in user space, [04:23.000 --> 04:26.000] we need to do some more things for the driver, [04:26.000 --> 04:30.000] and this tends to be a bit complicated. [04:30.000 --> 04:34.000] And also you can find false positives. [04:34.000 --> 04:38.000] With the false positives, the idea is that you will find crashes [04:38.000 --> 04:40.000] that otherwise would not be triggered by a driver, [04:40.000 --> 04:43.000] because maybe you have some other chase in place. [04:43.000 --> 04:47.000] I would say that it's still important to fix these ones as well, [04:47.000 --> 04:50.000] because you never know how you're going to change your code [04:50.000 --> 04:57.000] and how it might end up actually triggering those findings in the future. [04:57.000 --> 05:00.000] And for the mocking of the driver, how it works, [05:00.000 --> 05:04.000] we've already simplified here, [05:04.000 --> 05:07.000] but the idea is that the driver is writing something in memory, [05:07.000 --> 05:10.000] and then the device reads what the driver wrote in memory, [05:10.000 --> 05:14.000] and it does stuff with the data. [05:14.000 --> 05:17.000] We want to fuzze in the human, and part of the piece we're doing in the human [05:17.000 --> 05:20.000] is this side of the device, [05:20.000 --> 05:24.000] and then what we need to mock is actually the driver's side of the communication. [05:24.000 --> 05:28.000] And in fuzzing the human, what we did is that we started this mocking [05:28.000 --> 05:30.000] of the driver from the beginning, [05:30.000 --> 05:33.000] so we needed it anyway to run some unit tests, [05:33.000 --> 05:36.000] we needed it for other kind of testing as well. [05:36.000 --> 05:40.000] So we had an initial mock interface from the beginning, [05:40.000 --> 05:45.000] and when we wanted to do fuzzing, we just evolved the mock driver [05:45.000 --> 05:50.000] in order to support that as well. [05:50.000 --> 05:54.000] Okay, so at the high level, how it happens right now in Rasmussen, [05:54.000 --> 05:56.000] is that we parsed the random bytes, [05:56.000 --> 06:01.000] we initialized the mock driver with the data that was parsed by fuzzer. [06:01.000 --> 06:04.000] At the high level, it ends up with some descriptors [06:04.000 --> 06:07.000] and some key functions that have some random input [06:07.000 --> 06:10.000] that they need to process, [06:10.000 --> 06:13.000] and then we create the queue, [06:13.000 --> 06:18.000] and we call these random functions with random input. [06:18.000 --> 06:26.000] And yeah, the second before is that if you are trying to do fuzzing [06:26.000 --> 06:29.000] and you just start when the project is already mature, [06:29.000 --> 06:34.000] what is going to happen is that it's going to be a bit difficult, [06:34.000 --> 06:38.000] you might find it very complicated to retrofit it. [06:38.000 --> 06:43.000] So instead, I know that it's not necessarily viable to start fuzzing [06:43.000 --> 06:45.000] when you start the project, [06:45.000 --> 06:48.000] but what you can do instead is that you can keep fuzzing [06:48.000 --> 06:50.000] in the back of your head, [06:50.000 --> 06:55.000] and then when you create some mock objects or some unit tests, [06:55.000 --> 07:03.000] you can think about how you can actually reuse them in fuzzing as well. [07:03.000 --> 07:07.000] Which is what we did but not very well. [07:07.000 --> 07:10.000] So one of the crashes that we actually found [07:10.000 --> 07:14.000] was that the mock driver was crashing on invalid input. [07:14.000 --> 07:19.000] So we had to adapt these actually to return errors, [07:19.000 --> 07:21.000] even though it was just one test, [07:21.000 --> 07:23.000] we couldn't just crash on invalid input anymore. [07:23.000 --> 07:28.000] So the idea is to return errors at the level where you want to do fuzzing, [07:28.000 --> 07:38.000] that can be processed at higher levels and so the fuzzing can crash. [07:38.000 --> 07:43.000] And now for structural interfuzzing. [07:43.000 --> 07:45.000] So with those structural interfuzzing, [07:45.000 --> 07:48.000] how it works is that the fuzzer is going to generate some random bytes, [07:48.000 --> 07:53.000] and then you have to interpret these as the bytes that you have to use [07:53.000 --> 07:55.000] for your library. [07:55.000 --> 08:00.000] So with structural interfuzzing, it's really nice because there are some tools [08:00.000 --> 08:05.000] that are just going to basically interpret the random bytes as a structure [08:05.000 --> 08:06.000] that you actually need. [08:06.000 --> 08:11.000] So it's super nice what it does is that it significantly reduces the code [08:11.000 --> 08:13.000] that you need to write, [08:13.000 --> 08:18.000] and even the rest of the information is arbitrary. [08:18.000 --> 08:22.000] Now, we had to change it unfortunately, [08:22.000 --> 08:25.000] before we knew that we had only 270 lines of code, [08:25.000 --> 08:30.000] and now we have around 740 lines of code for the fuzzer. [08:30.000 --> 08:34.000] And unfortunately, it came with some problems, [08:34.000 --> 08:38.000] so that's why we have to actually basically... [08:38.000 --> 08:42.000] The most important part is that it's not really produceable. [08:42.000 --> 08:46.000] So you can't really use the corpus that you had in previous runs, [08:46.000 --> 08:48.000] which was a big problem for us, [08:48.000 --> 08:54.000] because basically what happens is that arbitrary is introducing some... [08:54.000 --> 09:00.000] some randomness in one terminal so that it goes right into the input. [09:00.000 --> 09:11.000] And that basically means that you cannot use the corpus from previous runs. [09:11.000 --> 09:19.000] The thing for here is that we would like that we can do incremental improvements [09:19.000 --> 09:20.000] for the fuzzer, [09:20.000 --> 09:24.000] and we didn't check that what we want to increment [09:24.000 --> 09:26.000] can actually be implemented through N. [09:26.000 --> 09:31.000] So, yeah, instead, a better point would be to make sure [09:31.000 --> 09:38.000] that we can actually reuse the corpus that we generated. [09:38.000 --> 09:42.000] Okay, and now about when fuzzing actually fails. [09:42.000 --> 09:46.000] So, we had a PR in U.S. Human, [09:46.000 --> 09:50.000] at this point we were already running fuzzing for cool requests, [09:50.000 --> 09:57.000] and there was a PR that was introducing actually an overflow. [09:57.000 --> 10:03.000] So here the overflow is that the packet header size addition to the packet length [10:03.000 --> 10:05.000] can actually overflow, [10:05.000 --> 10:10.000] because the packet length is set up by the joint. [10:10.000 --> 10:14.000] This bug, well, I actually found it during code review, [10:14.000 --> 10:19.000] so it was a bit unexpected because I was hoping that the fuzzer was going to find it, [10:19.000 --> 10:21.000] which was not the case. [10:21.000 --> 10:23.000] So after some dive deep, [10:23.000 --> 10:29.000] I realized that running fuzzing for just 15 minutes might not actually be enough, [10:29.000 --> 10:34.000] because this bug was triggered with fuzzing about 40 minutes instead. [10:34.000 --> 10:39.000] So, how we fixed that is that we added a fuzzing session that is optional [10:39.000 --> 10:41.000] and that fuzzed for 24 hours. [10:41.000 --> 10:46.000] This one is to be started manually by the U.S. Human Maintainers, [10:46.000 --> 10:50.000] and should only be started when there are cool requests [10:50.000 --> 10:54.000] that actually impact the fuzzing relation. [10:54.000 --> 10:59.000] This is because we're also consuming a lot of resources [10:59.000 --> 11:05.000] when doing fuzzing, and also you don't want to block all the cool requests for 24 hours. [11:05.000 --> 11:10.000] So typically the last one on the slide is a page that quickly needs to be executed, [11:10.000 --> 11:15.000] so blocking it for one day might not be reasonable for all the cool requests. [11:15.000 --> 11:20.000] So the people here was not trying to fuzze for long enough, [11:20.000 --> 11:27.000] and, yeah, instead we had to work our way to find a way to not block cool requests, [11:27.000 --> 11:31.000] but at the same time to provide a way to fuzze. [11:31.000 --> 11:34.000] Fourth coverage for rust. [11:34.000 --> 11:39.000] So in rust you can actually get coverage information by running LLVM. [11:39.000 --> 11:44.000] In rust you only get light coverage. [11:44.000 --> 11:48.000] So basically this was the starting point of the presentation. [11:48.000 --> 11:54.000] I was thinking I would come here and I'm going to show you how great it is to run fuzzing for [11:54.000 --> 12:00.000] 15 minutes and then more minutes and then the coverage and all these really extravagant things. [12:00.000 --> 12:06.000] And so we ended up with fuzzing for 15 minutes generating 10 for these regions [12:06.000 --> 12:10.000] and the coverage of around 82%. [12:10.000 --> 12:14.000] So it's like, well, it's okay, that's good. [12:14.000 --> 12:17.000] So then let's just run with some minimal coverage as well. [12:17.000 --> 12:20.000] So this is some coverage that we generated from unit tests. [12:20.000 --> 12:25.000] Let's just feed through the fuzzer and see how this changes. [12:25.000 --> 12:27.000] There was no change, actually. [12:27.000 --> 12:30.000] So I was like, okay, it's not bad, not bad. [12:30.000 --> 12:32.000] Let's just run for two weeks. [12:32.000 --> 12:38.000] So what do you think this might happen now? [12:38.000 --> 12:44.000] So actually, it's working. [12:44.000 --> 12:51.000] At this point I was like, you press, I have to change my presentation, so it's not what I expected. [12:51.000 --> 12:53.000] But instead I learned something, right? [12:53.000 --> 12:58.000] So you can't actually use coverage to decide where to stop fuzzing. [12:58.000 --> 13:04.000] So instead what you can do is that you can use coverage information to see when, [13:04.000 --> 13:09.000] what the parts of your code are not actually covered. [13:09.000 --> 13:14.000] And yeah, well, that's about it, actually. [13:14.000 --> 13:18.000] We see the summary of the people that we read into. [13:18.000 --> 13:24.000] And I think now we have a lot of different questions. [13:24.000 --> 13:30.000] Did you look at how the fuzzer works and then what areas were not covered [13:30.000 --> 13:33.000] and try to figure out why it wasn't found in those areas? [13:33.000 --> 13:38.000] Yeah, so the question was if we looked at how the fuzzer works and what areas were not covered. [13:38.000 --> 13:41.000] Yes, we did, and I have a slide for that. [13:41.000 --> 13:44.000] Thanks for the question. [13:44.000 --> 13:49.000] Okay, so actually, I have two slides for that. [13:49.000 --> 13:53.000] There were some functions that we were not calling on purpose. [13:53.000 --> 13:58.000] So because on the virtual queue, for example, we have some functions [13:58.000 --> 14:03.000] that are just iterating over the script chain and then they're doing something with the data. [14:03.000 --> 14:08.000] And at the virtual queue level, you can't do something with the data. [14:08.000 --> 14:14.000] So it's like, okay, this needs to be fast at the high level, like at the device implementation level. [14:14.000 --> 14:18.000] So it's like, okay, we're not going to call these functions, which is a bit hilarious [14:18.000 --> 14:23.000] because that's where Proc Hypervisor actually found the timeout problem, [14:23.000 --> 14:27.000] which we were not able to reproduce with the virtual queue, but still. [14:27.000 --> 14:33.000] And we actually did this one function that shouldn't be called during fuzzing. [14:33.000 --> 14:39.000] And then I rerun the fuzzing and, yeah, it's a bit better, but it's still not great. [14:39.000 --> 14:46.000] And then I looked into what, well, actually, you can't see very well. [14:46.000 --> 14:48.000] That's unfortunate. [14:48.000 --> 14:55.000] Yeah, so I looked into what actually is not covered and you're not seeing there, so you have to trust me. [14:55.000 --> 15:00.000] These are actually errors. [15:00.000 --> 15:04.000] So the printing of errors to files. [15:04.000 --> 15:12.000] So since in the fuzzing we're not actually initializing a logger, these things cannot be triggered by fuzzing. [15:12.000 --> 15:19.000] So there's lots of error in this printing to a file that's not happening through fuzzing. [15:19.000 --> 15:22.000] Yeah. [15:22.000 --> 15:28.000] What's subsequently taken to actually make sure it covers everywhere which needs to be covered? [15:28.000 --> 15:33.000] And so discovering certain areas which clearly aren't covered. [15:33.000 --> 15:35.000] I didn't understand the question. [15:35.000 --> 15:36.000] Which areas of view? [15:36.000 --> 15:44.000] Well, what's subsequently taken to make sure the areas which weren't covered in the fuzzing are going to be covered in the future? [15:44.000 --> 15:51.000] Oh, okay, so the question was what measures are we taking? [15:51.000 --> 15:58.000] In order to make sure that all that was covered before is going to be covered in the next generations? [15:58.000 --> 16:00.000] Yeah. [16:00.000 --> 16:03.000] None? [16:03.000 --> 16:05.000] So right now we're not doing anything. [16:05.000 --> 16:12.000] This whole coverage thing is just something that I need for the presentation and it's not automatic in any way. [16:12.000 --> 16:27.000] But this is actually a good point for future investment to make sure that we're covering code because what we help with as well is that we make sure that new functions that we are adding to the code are also covered. [16:27.000 --> 16:33.000] So it's a great point to make that way. [16:33.000 --> 16:35.000] We're talking about the structure of our fuzzing. [16:35.000 --> 16:36.000] Yeah. [16:36.000 --> 16:41.000] And you mentioned that we cannot reuse the code to explain a bit more about that. [16:41.000 --> 16:47.000] Okay, so the question was how structured are we fuzzing and just that we cannot reuse the code. [16:47.000 --> 16:56.000] Let me see if I actually have a question here. [16:56.000 --> 16:57.000] No. [16:57.000 --> 17:08.000] Okay, so the idea is that what we were using, which is arbitrary, when it was taking the uniformed fuzzer was also adding some randomness to it. [17:08.000 --> 17:16.000] So because it was random, basically, every time it was writing the purpose to the file, it was introducing some randomness to it. [17:16.000 --> 17:24.000] So when the same people get to read again, then it would not have been the same. [17:24.000 --> 17:26.000] So where does the randomness come from? [17:26.000 --> 17:28.000] Where does the randomness come from? [17:28.000 --> 17:38.000] This is just how arbitrary it decided to implement it. There's actually an issue in arbitrary that they are aware of the problem with their not actually... [17:38.000 --> 17:42.000] It doesn't seem like they are just fixing it for some reason. [17:42.000 --> 17:55.000] So what we ended up doing is that we ended up doing some custom serialization with info, which is also a very well-known Rust package. [17:55.000 --> 18:03.000] It's not much more digital than it is arbitrary. And it doesn't matter. [18:03.000 --> 18:11.000] When you discover a bug with this puzzle, does it transform into a unit that's not the one? [18:11.000 --> 18:17.000] The question is, when we discover a bug, does it try to... [18:17.000 --> 18:27.000] Yeah, the way that we are fixing this kind of problem is that we are always adding a regression guess for them just to make sure that they don't... [18:27.000 --> 18:33.000] I was wondering about the computation requirements. So how many cores are you using? [18:33.000 --> 18:37.000] How many cores we are using? [18:37.000 --> 18:45.000] So when we read for two weeks, we actually used 96 cores. [18:45.000 --> 18:52.000] In uniqueness, I do not... So when you are running on coding, because I don't know exactly how many... [18:52.000 --> 18:58.000] One? Maybe? I don't know. But I think we have been running on 96 cores as well. [18:58.000 --> 19:01.000] That was another one. Can I see that guy? [19:01.000 --> 19:04.000] One minute. [19:04.000 --> 19:15.000] We found a color case that we are trying to shrink the crazy smaller color steps. [19:15.000 --> 19:20.000] Oh, this is... I don't know. Let's shut up afterwards. [19:20.000 --> 19:35.000] Thanks.