you
you
you
you
you
different search parameters like a bit of
genomic data and this is our architecture so I already mentioned lens
this is what the researcher sees in their browser this is the front end and then it
has its own back end which we actually call spot and in some projects the old
spot which is made in Java is still running but we are about to replace it
with a new spot which is made in Rust then there's these are beam proxies
they're also made with Rust focus is made with Rust Blaze is a store and it is
made with closure and then we have those operations which are shell scripts
mostly so what is happening here a researcher says I need to find samples
with type plasma that come from donors with diagnosis C61 for example and where
the age at diagnosis is between 1450 for example and then that request goes to
spot where it's packed into a certain beam task beam is a task broker which
actually solves problems of strict network environments we face in hospitals
in Germany because of the data protection concepts so on the sites which are
hospitals by banks there are beam proxies which ask beam do you have a task
for me and when they do beam sends the task and focus this component here gets
the task then focus unpacks the task decides for which endpoint it is a task
blaze is only one of the possible stores and also we can query other
applications so it's not only for sorry for database types we also have another
application which is exporter called exporter and one more which is called
reporter so those can also query blaze in their own ways blaze is actually a
fire server fire is a standard of exchange of information in e-health and
healthcare in general and medicine and focus then runs the query against blaze
or against some other store it gets sorry I keep clicking it gets the results
return results to a beam proxy which returns it to beam which returns it to
lens backing which is spot and in the end the browser gets the result and this
component here Laplace this is used for obfuscation obfuscation of data is done
on sites so unobfuscated data never leaves the sites we decided it was the
best to put it there and we have multiple projects that actually run our bridge
heads these set of applications on sites we call them bridge heads you can look
later in our bridge head repository which installs all those components so we
have a lot of projects those are some of the projects that actually run bridge
heads this is map of Germany which with bridge heads in Germany but besides
German Biobank node we also have the European version of it which has biobanks
in other European countries that's bbmri eric then German cancer consortium I
already mentioned and cancer core Europe which intends to facilitate a
translation of clinical research into new drugs and then because children
usually have different types of cancers and cancers differently affect children
we have a separate project which is intended to facilitate the invention of
drugs for pediatric cancers and also applying applying existing drugs which
are for adults but also for those genetic markers for which no drugs exist it
is intended to facilitate personalized medicine this is another project we have
this is for cancer images so MRI CT pet cat it is intended to actually enable AI
analysis of images and then I mentioned beam beam is a distributed task broker
which enables communication with biobanks which are behind the proxies and
have very exotic configurations it handles the encryption beam proxies on
each side encrypt all the traffic and decrypted and it also handles certificates
and it only allows outbound connections which means it is only possible that
beam proxies connect to beam and then we have focus which is a query dispatcher
in which the obfuscation happens so first I need to mention CQL that's what we
use it is clinical quality language I know that there's another CQL which means
something else so parts of CQL come from front end and currently we are working
with certain query replacements to prevent CQL injections but soon we should
have a translation of abstract syntax tree from lens from front end into CQL
completely done in focus I'm working on it and also abstract syntax tree gets
translated or rather simplified for you came the project for a medical imaging
I mentioned before as I said it uses the sampler Laplace library yeah these QR
codes you can scan them and you can get to the GitHub repository I hope it is
large enough and also if you want if you want to get to the beam repository this
is the QR code and the problem with aggregated data is still that with a
search narrow enough it could be deduced in which store in which database or in
which Biobank samples or data about a certain patient are stored so we need to
offer a similar level of privacy to the patients who are supposed to consent
they are more likely to consent to having their samples and their data
available if they know that their level of privacy is the same if they are in a
Biobank and if they are not in a Biobank because we obfuscate the data enough we
add a small number and we round it up I'm gonna mention why K anonymity means
that for each set of parameters there would be at least K patients for whom
they the search would return results but that's still not enough because we can
we have some rare diagnosis we can narrow the age range enough so that we
could have searches return only one patient and that's why we had to do this we
use a Laplace distribution with certain parameters we take a random value from
the distribution we add it to every count in all those counts in all those
stratifiers we get for example for each diagnosis for each sample type and this
shows how depending on the values we can lower the privacy but we can make the
data more usable so here we would get more higher values with B which is 0.1
and here we get more lower values but values that are closer to the true state
of the database are actually more usable privacy budget is something that
everybody has to decide for themselves but sensitivity depends on what is being
obfuscated it is the number of those resources per patient so if it's diagnosis
then it's the number of diagnosis per patient if it's a samples then it's the
average number of samples per patient so we are working with 10 and 3 and
4 patients of course it is one patient per patient this is the library and it is
a rust crate and we also made it Java library for our friends in Erlangen who
use it in their Java projects it is highly configurable but I have included
parameters that might be needed in medical informatics so of course epsilon
and delta I mentioned before but also what to do with values under 10 we round
them to 10 some might want to round them down to 0 or they can be obfuscated in
the usual way also for zeros we have chosen not to obfuscate them that is
because after the search there comes another process the researchers select
the biobanks they want to negotiate with and then use the tool which is called
negotiator which was made by our friends in Czechia and in the negotiator
they describe the research they intend to use and in the biobank the head of the
biobank or whoever is tasked with it but in any case real humans decide who is
going to get those samples because samples are very valuable and once they
used up you don't have them anymore and it could be last sample for a combination
of diagnosis and certain sample type certain genetic markers especially so
we didn't want for those biobanks that really have zero values to okay that's
it we didn't want them to border people in biobanks so all our code is open
source you can scan this and you are going to then get to our organization
on github you can look at our other also software and if you want to join us
live in beautiful Heidelberg help cancer research then scan this this is a job
posting just please don't be don't the fact that German languages mention
prevent you from applying because my German is still not good enough and it
is not a requirement really you will be asked to learn German but the company
pays for it so thank you
you