[00:00.000 --> 00:11.400] Hello everyone, I'm Mehdi Bessa, CTO of Metrual Security, and today I'm going to present [00:11.400 --> 00:19.200] you Bastion Lab, a secure data privacy-friendly framework written in Rust. [00:19.200 --> 00:21.320] Is it working? [00:21.320 --> 00:22.320] Yeah. [00:22.320 --> 00:24.520] It's better this way. [00:24.520 --> 00:28.800] So when making this project, we came across one big problem. [00:28.800 --> 00:33.000] Let's say, for example, you are one hospital and you want to share critical data, such [00:33.000 --> 00:38.520] as ECG data, Earth rate, respiration rate, and so on, what, for example, a startup that [00:38.520 --> 00:46.320] is working on data as a deep learning algorithm to detect anomalies in those data. [00:46.320 --> 00:52.160] The most usual way today is to use a Jupyter Notebook that you can isolate from network [00:52.160 --> 00:58.560] and all, but unfortunately, this is not the appropriate way because Jupyter Notebooks [00:58.560 --> 01:05.520] allow arbitrary code execution, and with some way, you can extract the data without [01:05.520 --> 01:11.160] even the data owner seeing that you did that, which is a big problem, mostly with sensitive [01:11.160 --> 01:12.160] data. [01:12.160 --> 01:16.880] Our solution, try to fix this issue. [01:16.880 --> 01:20.080] For example, you will not have direct access to data. [01:20.080 --> 01:27.600] You will only have limited operations allowed, so really what you need to, for example, aggregate [01:27.600 --> 01:34.560] the data, only extract what you need, only do, for example, some average and calculation [01:34.560 --> 01:39.680] on the microcept of data, but most importantly, you can only have sanitize and authorize [01:39.680 --> 01:47.440] output allowed, meaning, for example, if I don't want the startup or any other actor [01:47.440 --> 01:53.120] that work on my dataset to see the name of my patients or some critical data such as [01:53.120 --> 02:00.560] if they have hypertension or so on, I can just set up in the policy, and they will not [02:00.560 --> 02:06.240] be able to access that unless I explicitly authorize it, and yet, nothing's forced me [02:06.240 --> 02:08.320] to accept it. [02:08.320 --> 02:12.360] I'm going to present to you very quickly our API. [02:12.360 --> 02:13.360] Don't get mad with me. [02:13.360 --> 02:19.720] It's the API is in Python because the API is in Python by the server in Rust, so don't [02:19.720 --> 02:23.240] get mad yet. [02:23.240 --> 02:24.240] Okay. [02:24.240 --> 02:30.120] It doesn't look as bad as I thought, actually it doesn't. [02:30.120 --> 02:31.120] Yeah. [02:31.120 --> 02:32.120] Sorry about that. [02:32.120 --> 02:33.120] Is that working? [02:33.120 --> 02:44.120] No, I think the resolution is not there. [02:44.120 --> 02:45.120] That's okay. [02:45.120 --> 02:46.120] That's okay. [02:46.120 --> 02:48.120] I will go on with just explanation. [02:48.120 --> 02:49.120] Okay. [02:49.240 --> 02:51.280] That was fun. [02:51.280 --> 02:55.360] No, that's fun. [02:55.360 --> 02:56.880] Ah, thanks. [02:56.880 --> 03:04.220] Sorry about that, so no Pytons, so good for you in a way! [03:04.220 --> 03:06.800] Thanks. [03:06.800 --> 03:13.880] All experiments at Rust in Berlin Bastion lab, we had seven reasons to choose Bastion [03:13.880 --> 03:17.760] to make our projects, which is the biggest reason, memory safety. [03:18.760 --> 03:23.160] I think you know all very well what I'm talking about here. [03:23.160 --> 03:33.400] The very paranoid way Rust has to handling multi-trading, no mutable static unless you [03:33.400 --> 03:37.600] use lazy static and any other technique. [03:37.600 --> 03:41.280] That was a pain to bypass, but we did manage. [03:41.280 --> 03:46.720] And the minimum call-based size, thanks to what, thanks to Rust being a low-level programming [03:46.720 --> 03:47.720] language. [03:47.720 --> 03:52.440] It's ideal for trusted execution environments as we are working with, such as for example [03:52.440 --> 03:56.000] AMD, ACV, IntelliJX, and so on. [03:56.000 --> 04:01.240] The less call-based, the less big the cost-based, the easier it is to audit. [04:01.240 --> 04:08.400] Now for the performance reason, as I said, Rust is a low-level programming language, [04:08.400 --> 04:14.680] very close to seeing term of execution speed, but the biggest reason is polar because our [04:14.680 --> 04:25.440] APR easily relies on it, except that we implemented a network stack to never allow anyone to access [04:25.440 --> 04:27.040] to the data directly. [04:27.040 --> 04:34.600] Polar as well offers one of the best performance in working with datasets and so on, join aggregation [04:34.600 --> 04:35.600] and so on. [04:35.600 --> 04:38.200] It was the easiest way to do it, plus it's in for Rust. [04:38.200 --> 04:40.640] There is no binding and so on. [04:40.640 --> 04:42.600] Thanks for that. [04:42.600 --> 04:46.480] So you can see here the performance, the benchmark we made. [04:46.480 --> 04:51.800] We use Panda as a reference, that is, as you can see here, more than terrible compared [04:51.800 --> 04:53.300] to Polars. [04:53.300 --> 04:58.520] We compared to Polars, lazy, all solution that is lazy by default. [04:58.520 --> 05:04.320] Lazy means I'm only executing a query when I strictly need it. [05:04.320 --> 05:08.320] Learn about Panda that is eager and we do it all the time. [05:08.320 --> 05:13.040] That makes a big difference, plus I'm working on only the data that I need and not the world [05:13.040 --> 05:15.600] dataset if I don't need to. [05:15.600 --> 05:17.560] That's another benchmark on a bigger set. [05:17.560 --> 05:24.280] You can see that Panda is still off the roof and never compared to the other one. [05:24.280 --> 05:29.720] Now though, how did we do that? [05:29.720 --> 05:33.160] We thought to use the best crates that are available for doing that. [05:33.160 --> 05:42.760] We wanted to use Tonic and Tokyo because Tonic offers GRPC which will allow us to make [05:42.760 --> 05:48.480] clowns that are not in Python if we need to, thanks to the protobuf that is implemented [05:48.480 --> 05:52.800] in many languages and the GRPC protocol as well. [05:52.800 --> 05:59.680] Polar, as I mentioned it already, rings because in addition of setting up a policy, for example, [05:59.720 --> 06:04.880] if I don't want people to access a specific two names or whatever, there's that. [06:04.880 --> 06:10.760] But rings, we always ring implementation directly to verify a piece and I need to provide my [06:10.760 --> 06:14.440] public key to access to the server. [06:14.440 --> 06:21.000] We use ring implementation to do that directly and to check if the key matches and if the [06:21.000 --> 06:22.000] key is real. [06:22.000 --> 06:27.440] And Tokyo because we are using heavily like MAD, the multi-trading, the asynchronous move [06:27.440 --> 06:29.480] and so on. [06:29.480 --> 06:38.480] For example, when you need to accept a dataset, we spawn a new thread that will send a request [06:38.480 --> 06:45.200] to the data owner saying, do you want to accept this request that is about to leak sensitive [06:45.200 --> 06:46.200] data? [06:46.200 --> 06:51.760] It's not right in this way, but it is this and can come with now. [06:51.760 --> 06:55.720] Instead of blocking the whole process, I will have a thread that will time out after a while [06:55.720 --> 06:58.400] if I don't access it, I reject it. [06:58.400 --> 07:05.200] But I can say yes or no and other requests such as a simple one or not, sensitive one [07:05.200 --> 07:06.200] will be accepted. [07:06.200 --> 07:12.440] And so move plus tonic to Tokyo that makes very well together and allow many connections [07:12.440 --> 07:13.440] as we want. [07:13.440 --> 07:17.280] This is the best we could dream of. [07:17.280 --> 07:21.160] As I was about to show, that was supposed to be in the Gullo collab, but it's an easier [07:21.160 --> 07:23.360] representation here. [07:23.360 --> 07:29.080] We have for simplicity reason, sorry, Python code, but only a few lines. [07:29.080 --> 07:35.640] The data in all part that uploads the dataset set up a policy and for example, I can reject [07:35.640 --> 07:41.200] my dataset, but I can allow sensitive request, but I want it to log it. [07:41.200 --> 07:42.200] Oh, shit. [07:42.200 --> 07:43.200] Thanks, everyone. [07:43.200 --> 07:44.200] Thank you. [07:44.200 --> 07:45.200] Thank you. [07:45.200 --> 07:46.200] Thank you. [07:46.200 --> 07:46.200] Thank you. [07:53.360 --> 07:54.360] Thank you. [07:54.360 --> 07:54.360] Thank you.