Thank you, thank you everybody for coming.
Before we start, I need to say two things.
First of all, I'm sorry for this dev room.
I'll try to speak as loud as I can, but if you don't see the slides, they are available
online.
Second, this is a talk about databases.
We are database researchers.
So first of all, we don't know everything.
Second of all, we might also not understand everything, but regardless, we hope to give
you a different perspective about this important problem, which is how to store data securely
using trusted execution environments as a technology.
Sorry.
So we are PSD students.
We work at CWI Amsterdam, and specifically our research focuses on secure databases.
In particular, we do stuff like encrypted query processing, secure multi-party computation,
data privacy, and so on.
Here our research question is how to protect data in use.
In fact, it's very easy to protect data at rest, but we also want to hide it while it's
being processed.
Our example here is related to cloud.
So nowadays it is very common practice to outsource data management to cloud providers,
but the thing is we also need to protect information from people who have access to the servers
and internal attacks.
There are some techniques to analyze data while keeping it encrypted, but like homomorphic
encryption for instance, but unfortunately this field doesn't yet have encouraging performance
results.
So we here need to look for something simpler and more efficient to protect our data while
it's being processed.
In this talk, of course, we're here.
So we talk about trusted execution environments, and we want to employ them as a technology
to ensure confidentiality and isolation of the data.
But before we start with this, we need to first understand the different technologies
and different techniques to split the components of a database system in a trusted execution
environment.
In this talk, we focus about Intel SGX, and for who didn't know, it's basically a series
of hardware instructions to split memory in a secure and insecure part, where the secure
part we're going to call Enclave.
And in the database field, Intel SGX specifically the first one is a very popular choice for
development because it's the most mature one, and there is the most research on it.
But at the same time, there are some performance limitations to run workloads that are very
typical for database systems.
In particular, the biggest problem here is the limited-page cache size, which is 120
megabytes in Intel SGX1.
That being said, we're going to explain here many different models to split our DBMS.
We have the... does this work?
We have the full DBMS split, which means that basically we're going to put all the database
inside the Enclave with just a very tiny layer of IO library to handle system calls.
Then we have the middle DBMS split, which is something in between.
It allows more fine-grained optimization and code splits.
Usually approaches just put the query execution engine inside the Enclave, and everything
else is going to stay out.
And then we have the minimal DBMS split, where only the operators and the comparators are
inside the Enclave, where with operators and comparators, I mean plus, minus, equal,
and so on.
Now we have a general understanding of the different models.
We can start with some practical examples.
So here's a personal favorite, it's called StelDB, and it's a Postgre's extension.
We have some Postgre's people here.
I'm very biased on this, but basically, StelDB is employing the third model that I mentioned,
which is the minimal DBMS split.
So basically it's only implementing operators and comparators.
This choice was probably made because of the very limited memory that we can use.
And so of course there are some trade-offs.
If we do not have the full DBMS, of course there is more information leakage.
For instance, people might be able to infer the size of the database and the operations
that we are making inside the Enclave.
And at the same time, even though the secure part is so limited, the performance is kind
of bad.
So here we are going to have here 5% to 30% overhead in transactional queries, where
transactional queries are workloads that are very heavy in inserts and updates on current
data.
So, yeah, so this is a very good project, but still not quite what we would like to have
if we are running actual real-world workloads.
There are more examples here of other databases.
We have a lot of implementations of SQLites.
And they are, I think all of them are full DBMS split, but regardless, they add at least
one or two orders of magnitude of overhead to the queries.
We have a MariaDB kind of encrypted database, which is called Ageless.
I think I saw some people from Ageless here, or there was a, it was at FOSDEM a few years
back.
And yeah, Ageless is basically this database that is designed to run inside an Enclave and
uses MariaDB and ROXDB storage.
It also has encrypted authenticated data in disk and in memory as well, and encrypted
network connection.
So it's a very nice project.
Then we have an implementation of Microsoft SQL Server.
I'm sorry about this.
It's not open source, I know, but unfortunately it's one of the most relevant works in the
field because it actually implements the query engine in the Enclave and splits the data
between sensitive and insensitive tables.
So it's a very novel idea, but unfortunately it doesn't work because this kind of models
also assumes a very big Enclave size and due to the limitations of AgX1, this is not
pretty feasible in practice.
And then we have one analytical engine where with analytic, I mean doing analytics, so
business intelligence workloads on a lot of data and historical data.
And yeah, this is called OblityB and it implements Oblivious Physical Operator for analytical
processing in the cloud.
But yeah, once again, this is really, really slow because of the Enclave size.
So our contribution here is taking all of this that we have and we notice two things
in here.
First of all, the big majority of these implementations on AgX1 are transactional and because
analytical workloads really don't scale because of the volume of the data and they overhead
called by last level cache misses and EPC swapping.
The second problem is that there is no research on SGX2.
So SGX2 was released a couple years ago, but still all the prototypes that I mentioned
were made for SGX1.
I'm not saying they don't work, but I'm saying there are no benchmarks, there are no implementations,
so there are specifically tailored for SGX2.
So here our contribution is to try and bridge the gap between efficient and secure analytical
processing.
To do so, we use our database, DacDB.
Disclaimer here, we are not affiliated with DacDB, we are not paid by them.
It's just an open source database that we happen to use because it's developed in our
research center.
So DacDB is open source, it's embedded, columnar analytical system, I'm sorry, there are a
lot of buzzwords here, I'm going to explain that later.
It's in C++11 without additional dependencies and it's actually been ported to SGX1 in 2022
by some, our master student.
And before explaining what we did with DacDB and SGX1 and 2, I need to give you some fundamental
concepts about database internals.
We start here with column storage.
So the difference between row and column storage is that basically data in column storage is
stored in columns because if you do analytical workloads, we don't need usually all the columns
that we have, we just need a few of them.
So it is more efficient to store the columns all together such that we can only fetch what
we need.
And also this kind of column format is also very much, can very much benefit from compression
because usually there is a lot of correlation between the data and our data is also huge.
So we can definitely implement some sort of compression and DacDB specifically implements
column lever compression where data is stored in column and then compressed.
Now we also need to talk a little bit about vectorized execution.
This is similar to the CMD instructions that you probably know of but applied to databases.
So instead of performing operations to one row at a time, we perform it in batches.
So instead of having a row fetching it, elaborating it and returning it, we do the same process
with batches.
So you can see this example, very, very simple query.
And here our function next is going to return many tuples rather than one.
And we push only the relevant blocks of data up and down the query plan.
And this is more efficient because we have less system calls and we can also take advantage
of the CPU more efficiently.
Now thank you for the attention.
Now Lotte is going to explain you how we ported DacDB to SGAX.
Thank you, Illa.
Okay, so before we go directly to SGAX2 and how we did it, we first are going to pay some
attention to how it actually has been done to SGAX1 because the master student, he ported
DacDB in two different ways.
The first one is the full database management split.
The main issue was here of course because of the low memory capacity that not the whole
database or not all the data would fit in the enclave.
And the second issue here is that system calls are not directly callable inside the
enclave.
So you either need to reimplement all system calls or the necessary system calls or you
need to use some kind of library which maintains this kind of IOS심 layer.
So the master student, he used Graphene.
Nowadays, Graphene is actually called Grameen.
And I think last year and the year before there were also talks at FOSDEM about Grameen,
how this exactly works.
So with Grameen and with fully porting DacDB into the enclave, there was a 20 time slowdown
actually and this is mainly cause because of the expensive EPC swapping.
So to mitigate this, the student tried to instead of keeping all memory buffers inside
the enclave, pull some memory buffers outside the enclave and crib them outside the enclave
and this way try to run DacDB.
And this already gave a significant speed up but still there was a 30 time slowdown.
The second approach that he did was the minimal DbMS split.
He put basically all the operators inside the enclave and left the rest out of the
enclave because this enabled to have factorized processing still and that really increases
performance.
And a second optimization that he did was replacing Ecos and Ocos by a synchronous request
in a shared buffer also called the switches mode I think.
And this also helped but still there was a 10 time slowdown.
So a couple years later, now there is of course SGX2 and it doesn't suffer from the main
memory limitation.
So now it's basically easier to port DacDB as a whole to the enclave to SGX2 and we
did it also with the use of Grameen which also improved the last years a lot so that
made it actually surprisingly easy.
And we did some benchmarks to actually see okay so what performance difference is there
if you run a database fully inside the enclave.
So before going into the results we did the benchmarks with TPCH which is a standard industry
benchmark for analytical workloads so basically for data science workloads so there are no
inputs inserts or updates but just analytics basically.
And we compared it first with Grameen itself because since Grameen replace a system calls
it also incurs some overhead but as you can see most overhead is caused by SGX self.
On average we would say there is a 10 to 20 percent overhead but here we normalize baseline
DacDB so that you actually can see the actual overhead per query and there are some specific
queries such as query 12 and query 15 where the overhead is actually more than twice.
So this might be a bit problematic.
So we did some research we tried to identify okay so what is it in these queries that
causes the overhead and we found that mainly strangely enough the overhead is introduced
by O-calls so by E-enters and E-exits and we tried to investigate a bit further which
system called it then was but there was some kind of timing function that seems to be executed
outside of the enclave.
And also within these queries there are two times as much page faults and well one optimization
that we tried we're still working on it but was increasing the factor size in DacDB because
usually in DacDB the factor size consists of 2048 tuples and usually this gives low
L1 cache misses but it can incur many EPC calls so with increasing the factor size you
basically maximize IO and IO is very expensive in the enclave and we actually found that
if you increase the factor size to 16384 that the performance overhead is actually minimized
for this workload and a small note is that not for all queries actually the performance
improved but just for the queries with a lot of overhead it seems to be really beneficial
to increase the factor size in DacDB.
So this is very much work in process it's more a prototype than something you can actually
use in production so please don't do it yet but we can conclude that analytics can actually
perform people from the relatively efficient in SGX2 and the overhead seems to be acceptable
but the question now is we can protect data in use so data in secure memory but what about
the data in unsecure memory right now because if you go outside the enclave the data is
not protected by default and so we will actually need some kind of encryption mechanism and
DacDB right now has actually parquet encryption so we are already capable of encrypting parquet
files and decrypting them inside the enclave and then perform secure analytics but in the
end our goal is to design to build something that is fully functional and that is fully
secure actually for users that want to do secure analytics with DacDB.
So yeah this is our plan for the future we will of course open source everything but
yeah thank you for your attention.
Hi thank you for a very nice talk so I was wondering you talked about like this overhead
that you were attributing to the old calls going out of the enclave and some of the commercial
SGX frameworks use these techniques where you actually batch these together and they
are commonly called asynchronous old calls so did you look into that at all and or do
you have like some insights how that could affect the performance.
Okay so your question basically is if we looked into the asynchronous old calls right or the
asynchronous buffer basically well the master student from he looked into that and indeed
improved performance we were planning on actually doing some benchmarks with this specific mode
but we just didn't do it yet but we are still investigating but as far as I understand it
is a little bit less secure to use this mode so yeah it will always be trade off but I
suspect that it will improve performance quite a bit so reduced the overhead in the end.
Yeah probably a stupid and provocative question have you tried shoving the whole database in a
secure instance like 7S and P or TDX and comparing the performance between like SGX and TDX or 7S
and P solution. Okay so the question is did we use other secure environments basically the answer
is no so we have no performance comparisons yet but the plan is actually to do that indeed because
if you want like not everybody is able to run SGX to write so the hardware field is pretty
fragmented and we also want to kind of find solutions or at least have comparisons of which
one is the best to use and or maybe even made some kind of framework that people can adopt to
easily run also on different kind of hardware instructions. Yeah. Thank you I want to ask
about the fully secure on the slide. Have you talked about side channels and what's your vision on
that? Do you want to answer? Yeah in short yes this is a problem because all the research that we
found there is always a trade off between performance and security and literally all the papers
build this sort of model like cost model in not in terms of cost but in terms of information leakage.
So a lot of people papers just say that yes we acknowledge that there are going to be some
trade off some attacks in fact yes yes this is absolutely the case that it can happen but
right now the goal was first to have something that is somewhat functional on some sort of
database workloads because as I said the big limitations of SGX-1 made the whole thing completely
invisible but now that this is actually possible we can also focus on how to fix these issues but
unfortunately research tended not to acknowledge this issue so much in the past but for future
yes we will.