All right, so this is the talk on SpiceDB.
Thanks everyone for showing up.
So early in the morning, I'm starting to lose my voice
because there's a long day yesterday of talking
and meeting awesome people.
This is my first FOSDOM.
So who am I?
My name is Jimmy Zalinski.
I'm the co-founder of a company called
OthZed, an OthZed-billed SpiceDB.
Previously, I worked at Red Hat and CoralS.
So I've been around in the container and Kubernetes
ecosystem for a pretty long time, basically
since the beginning.
There, I'm actually a maintainer of OCI,
which is the standard specification for Linux containers.
And I've also started a bunch of projects in that space,
notably Kubernetes operator framework and some others.
This talk is entitled SpiceDB.
But since FOSDOM is more of a developer community conference,
I really wanted to focus less on this talk being a vendor
pitch for SpiceDB, but actually more of a level
set about the problems in the authorization space
and the history and status quo of that.
So that everyone understands what might be the best tool
to solve their problems.
I'm not going to try to sell you SpiceDB for all problems,
because the more informed you are,
the better you can pick the product that's actually
going to complement your software stack
and what you need.
And that means there's going to be way more qualified people
using SpiceDB and way more qualified people using other
authorization tooling.
Obviously, I'm the most jazzed about SpiceDB
because I created it.
So why are we all here?
We're all here because there is a not-for-profit organization
called OWASP, which is the Open Worldwide Application
Security Project.
And they kind of got started in the early 2000s.
And they're famous for having this list called the Top 10.
And the Top 10 is basically an enumeration
of the highest risk, the highest threats for web security.
And as of 2017, broken access control was number five.
As of 2021, broken access control is number one.
That means this is the biggest threat to the web
and to all the applications running internet
facing to the web.
But really, the question is, how do we actually
get to this point?
Like, how did this happen?
And how did it happen so quickly?
I'm not going to point any fingers,
but what I'm actually going to do
is kind of dive into two different groups of stakeholders
in kind of the history of authorization.
There is kind of the academia and people
publishing papers in this space, kind of defining concepts.
And then there's the industry practitioners
that are actually building the software
and realizing these systems as they're actually
connected to the web.
I'm going to start with academia first.
So on the right-hand side, you're
going to see a timeline.
And then on the left-hand side, there's
going to be some notes.
And then not for this slide, but you'll
see QR codes in this corner as well.
Those QR codes are going to link to the specific novel paper.
So if you're interested in any of these particular concepts,
you can feel free to scan the QR codes.
But our history kind of of authorization
is going to actually start in the 80s.
And it kind of gets really kicked off
with this publication of the Trusted Computer System
Evaluation Criteria, which is a security practices book
published by the US Department of Defense.
And in it, it's outlining a lot of different security
practices that are effectively a part of the military,
the United States military.
And in it, they kind of describe these two different access
control systems, discretionary and mandatory.
Now, discretionary is effectively just
if you created the idea or the information, you can share it.
And if you're then given access to that, you can share that.
It's at your discretion.
I kind of use file systems.
And Google Docs is an example here.
It's not a perfect one-to-one match.
But if someone shares a file with you on a UNIX file system,
you can copy that file if you have read access.
And then you can change whatever permissions on that
and share that, similarly with Google Docs.
So it's at your discretion how you're
going to share that information once you're given read access.
Then there's mandatory access control,
which is effectively a long list, an exhaustive list,
of all the access for a particular thing.
Most notably, people are kind of most familiar with SE Linux
as the example of this.
If you're unfamiliar with SE Linux,
it's a way of locking down the Linux kernel.
Honestly, it kind of comes with a negative connotation
because mandatory access control is very verbose
and very difficult to get right because you have
to enumerate absolutely everything.
Some people say that the three-letter agency
at the US government that created this
are the only people who actually know how
to configure this correctly.
I don't know if that's actually true or how many people use it.
I know Red Hat is one of the folks that actually
does promote SE Linux.
But the one thing about this slide I really
wanted to kind of drive home is these ideas,
they're as old as the military and war itself.
There's nothing novel about the 80s where these ideas got
invented.
But what actually happened was someone only actually ever
thought to write this down in the 80s.
So it took that long after using these ideas for many, many,
many years.
So we jump roughly 10 years, 9 years to 1992,
which happens to also be the year I was born.
That makes me feel relatively old.
But in 1992, we get this paper published on role-based access
control.
And role-based access control, often called
RBAC, is kind of where actually most people
believe the state of the art for authorization systems is.
The core idea is basically there is a group that
is assigned access to a particular thing.
And those groups are called roles.
And then you map users into these roles.
And by means of being in this role,
you get access delegated to you.
The kind of number one problem with RBAC
is that everyone kind of defines it differently.
If you build any enterprise software,
you're going to talk to clients and they're
going to ask you for RBAC.
But the difference is if I look at two different enterprise
applications, how they implement RBAC entirely differently.
The only commonality is this mapping of users
into groups that then have access.
This is kind of going to be a recurring theme
across all of these papers published in academia,
anything with Starback, because they're documenting concepts,
but not actually specifications that would give you
an ultimately cohesive and designed and secure system.
So kind of most famously, the biggest issue kind of with RBAC
is that there really is no scope.
If you say someone is an admin, does
that mean they're an admin of the entire web app?
Does that mean they're an admin of a particular resource
in the app?
You just don't know until you actually build it yourself.
So there's not really an easy way
to reason about these systems until you actually touch them.
So we jump actually well into the future now into 2015.
And now is when the paper on ABAC, which is attribute based
access control, is written.
Effectively, the idea behind ABAC is
to kind of generalize on RBAC and say,
the role that you're assigned, that is just one attribute
that your user can have.
And other attributes might be that you logged in
with this IP address.
Many other dynamic attributes can be assigned to you.
The kind of really important thing about ABAC
is it's providing this real time context.
So now you can kind of write rules,
like are they connecting from this country, this subnet,
this time.
You can delegate access at particular windows of time
and kind of perform more logic on these attributes
that folks have.
And now we're going to take a huge digression back to 1965.
So if you're unfamiliar, Multix is actually
this operating system that was developed between MIT, GE,
and Bell Labs.
You might not remember it, but it actually inspired
an operating system you're probably familiar with.
Unix.
So Unix is actually an attempt at making
Multix concepts ported to less expensive hardware.
Multix is often credited as the operating system,
like the first operating system that has access control
for the file system.
I actually don't know if that's true,
but it's often credited as that.
So in Multix, you have a file system tree,
so you get hierarchical structure.
And then at every branch, which would be a file or a directory,
you can have five different attributes assigned to that file.
You get read, write, exact append.
These are all kind of file operations
that you'd be familiar with.
But there's this fifth one that's super interesting called
Trap, and that actually gives you the ability
to do callbacks and to see functions.
And it was initially designed so you
could do file walking in user space.
But kind of like the whole thing with Multix
and the reason why I bring it up is because there was
inheritance, there was aback, and there
was user-defined functions in an authorization system.
In 1965, when in academia, the ideas behind attributes
were published in 2015.
So there are systems using these concepts,
but they maybe haven't been formalized and written down
in the concrete form.
And this is kind of like a huge issue with the whole space,
because people are doing things, but they're not really
studying how to make these systems robust with these ideas.
They're kind of more just documenting these ideas ad hoc.
So getting back to the normal timeline, we hit 2019.
It's actually in 2007 that the term is coined
relationship-based access control.
And the idea behind this is actually
that by establishing a chain of relationships,
like Jimmy is a speaker at FOSDOM and speakers at FOSDOM
have access to the FOSDOM speaker matrix chat.
If you can follow these chains of relationships,
you can actually go from Jimmy has access
to the FOSDOM speaker room.
So this term is kind of coined around then,
and it's looking forward at what tech in the Web 2.0 era
will look like.
It's published initially while considering
how Facebook, the social graph, works internally.
So when you share photos on Facebook,
you say, friends of friends can view this.
You're literally defining it in terms of relationship
to yourself.
So we hit 2019, and actually that's
when Google publishes a paper called Zanzibar, which
is documenting an internal system at Google powered
by these concepts.
And the difference and the reason why I have 2019 for you
back is because Google is documenting a concrete
implementation of this.
Unlike a lot of these other papers talking about concepts,
it's talking about an application of these concepts
and really giving you a framework for how
to use this effectively and in a correct way
across multiple products at Google.
So then in 2021, SpiceDB is open source,
which is also implementing the similar concepts to Zanzibar.
And obviously, I'm going to get into that later.
There are other models like Starbucks,
but that's kind of like the primary ones
that I see mostly in industry.
You can dive into Wikipedia if you're interested in other ones.
But now you've got kind of the industry side of things.
We're leaving academia.
And industry has this problem, which
they go to build in a web application.
And your first job is just build the MVP, the minimum viable
product of your web application.
So what you're going to do is do what you do with everything
in a web application, which is store data in a database,
probably the relational database you're using
for everything else.
And you're going to try to check if a user has particular access
based on some data you store in the database.
It might maybe going to be a role if you're inspired by RBAC.
But maybe it's just a numeration of the list of users
that can do a particular thing.
So you may have written code that looks like this.
But the problem is this falls over at some point in time,
whether fundamentally you build a system that actually
is just really slow, or you have to build a new system that
is way faster than you ever intended it for it to be.
Or you basically get users of your software
that demand new functionality that is not actually
possible for you to implement until you refactor
your authorization code.
So a great example of that is if they want recursive teams.
So if you have groups of users, what
if you have groups of groups?
Or groups of groups of groups, right?
That is something that most people cannot build,
or they don't build in their initial MVP.
And when you get functionality like that,
you're forced to completely rewrite your authorization
system.
The other thing that could happen to you
is your company buys another company,
and they're based in a different continent.
And that means all the requests for checking permissions
now have to travel across an ocean if they want to be correct.
That's a huge problem.
And making sure that the performance is actually
going to be viable, and the answers you're
going to get for authorization questions are correct
is a difficult problem.
So you hit one of these kind of big issues,
and then you kind of are forced to enter the cycle
that I'm going to get into.
But these numbers are kind of fudged.
But the whole point is that if you
take an engineer, probably with expertise in that web app,
has worked on this authorization system,
it's going to take them a while to implement this.
It's going to be super sensitive because someone else is
going to have to review it.
That person is going to also have to be deeply embedded
in that code base.
They're going to be extraordinarily careful
because any mistake that happens in this code base
is going to be a CVE.
It's giving access to people that shouldn't
otherwise have access.
So that's going to take a long time.
Then you're going to do QA. You might actually
have to perform a security audit before you can deploy
this software because you're deploying
to enterprise environments.
And then you're also probably going
to want to take extra time rolling out these changes
into production.
You probably don't want to deploy it to everyone all at once.
You probably want to deploy to a minor subset
just in case you find something wrong with the code.
And all of this just takes time.
And the problem is it's actually putting security
of your software at odds with development velocity.
Fundamentally, it's going to take you too long
to add this functionality.
And you're going to want to take shortcuts.
But shortcuts are security flaws in your software.
So then as rinse and repeat, you basically
don't know how long until the pain is going to build up
where you're forced to rewrite these authorization systems.
And that is the mystery box entirely.
You could finish or not even be finished rewriting
your authorization system.
And then all of a sudden, a new user
sets some requirement for you.
And you're doomed.
You have to completely rewrite the thing you just thought
you re-architected to be future proof.
So how do we fix this?
There's never ending cycle.
And OAS themselves actually have recommendations for this.
They say you should no longer adopt RBAC,
but take concepts from A-BAC and RE-BAC.
Obviously, I'm biased towards RE-BAC
because I think it's a more modern approach to this.
But the OAS folks also give you some high level benefits
to why you would do something, like why
you would adopt these new ones over RBAC.
I'm going to just take this from the RE-BAC perspective.
When you're doing a graph-like thing,
a relationship-based system, you're forced to basically talk
about individual entities.
So this user, Jimmy, has access to this particular document.
Because you're doing that, it has this kind of buzzword,
fine-grained.
You're not resolving Jimmy to a role or a group.
You're actually following Jimmy directly to the document.
So you're talking about individual entities in the system.
So as a result, you get actually more fine-grained access.
I'm not trying to generalize about any users
or paint over anything.
I'm actually talking about the exact objects I care about.
And that means you can actually develop systems
where you delegate access to a particular row in a database
or a cell in a spreadsheet.
And all of these systems are designed for speed
because they understand they're going
to have to store a lot of data to be this fine-grained.
And then because your applications
are only talking about the direct objects that they care about,
any of the relationships in the in-between
don't get written in your code.
So you just ask the question, can this user
perform this action on this thing?
How they got access to that?
And if you ever refactor or change
how they get access to that, that
does not live in your code base anymore.
That means you can make changes to your permission system
and not change a single line of code
in any of your web applications.
And believe me, when you do that for the first time,
it is a magical feeling because you
don't have to touch any code.
So then there's also multi-tenancy and management
ease.
And this is just simplicity around modeling.
And then with ABAC and REBAC systems,
you're paying it forward.
So our back might be really easy conceptually for you
to implement at the beginning.
But these systems, the ABAC and REBAC ones,
they're more focused on forward thinking.
If you need to make changes, like I just described,
you can change REBAC designs without changing code.
It may be a little bit more effort
for you to get started in building and integrating
with one of these systems.
But by day two, if you ever need to make a change,
it's going to pay dividends.
So I wanted to get deeper into this Zanzibar paper
I talked about earlier, which kicked off the interest in REBAC
that you see today.
Basically, Zanzibar is a purpose-built graph database
that is very specifically optimized for one thing, which
is finding a path in a graph.
And by virtue of finding that path,
that means that the user has access
to that particular thing.
It's actually one of the few good things
that came out of Google+.
So there's only two things that came out of Google+.
There is Zanzibar internally at Google and then Google Photos.
The novelty of this paper is actually
that it is solving an authorization problem
with a focus on distributed systems.
So if you'll notice, the title of the paper
is called Zanzibar Google's Consistent Global Authorization
System.
So it is fundamentally trying to tackle authorization
as a distributed systems problem, which is not really
something else has done in the past,
because they kind of acknowledge that if they're
going to deploy one system at Google,
it needs to work across all geos in the world.
And it has to be extremely, extremely reliable,
and it can never be wrong.
These are really difficult requirements.
But the anecdote I like to use is when you're
on a cloud provider like Amazon and you go to provision
something like, say, an S3 bucket,
you're always choosing what region.
But actually, if you go to set IAM rules in a cloud provider
like Amazon, you don't pick the region.
That is because these systems fundamentally
have to be global.
And when you're designing them yourself at a particular scale,
you need to think about how you're
going to make your system global.
And so this paper actually inspired two companies,
Carta and Airbnb, to go forward and implement
their own internal systems based on the ideas in this paper.
None of them are truly 100%, I would say,
authentic to the original paper, but rather the paper
refused with the requirements of their business at the time.
So I think the real superpower to Zanzibar, though,
is this, which is if you go to send someone a Google Doc in Gmail
and they don't have access, Gmail will pop up a box
and tell you, hey, you didn't give access to this person.
That fundamentally means that Gmail actually
has a way to ask questions and check permissions that
are built into Google Drive.
So that means you could have one central source of truth
for authorization data that your whole application suite can
share, microservices can share.
And this is incredibly powerful because not only does it
allow integrations like this, but it also
lets you have that central source of truth
where if you need to audit something,
you can just ask that one service.
It's the only service you have to trust.
It's the only service that you have to query
if you're trying to really dig into any of this data.
So you have a problem like an outage or something, an incident,
and you need to understand what the access control looked like.
So you might be wondering, how do I Zanzibar?
So this is exactly what we set out to do.
Basically, the year after the paper was published,
my co-founders and I left Red Hat to found
and basically build SpiceDB in the open source.
There were some folks experimenting with the ideas
around Reback at the time.
But no one was really moving the needle towards making this
a production thing that you could use
in a real enterprise environment or at a real tech company.
We originally prototyped the thing in Python.
It was type annotated, lazily evaluated, functional Python.
So it was way faster than you'd ever think Python should be,
but it was not fast enough, so we ended up rewriting it
and go and open sourcing that.
The name is actually inspired by Dune
because internally at Google, the project
was actually called Project Spice because the ACLs must flow.
So the timing for that has actually
been really good with all the Dune resurgence in the movies,
but internally at OZ, all of our software
is named Dune References as homage.
But if we fast forward to today, the SpiceDB community
has actually gotten contributions
from a lot of companies, big names like Netflix, GitHub,
Google, Red Hat, and Plaid.
And there are production users in small companies, startups,
where it's just the co-founders, all the way up to Fortune 50
companies.
But I still haven't actually told you what SpiceDB is.
So SpiceDB is, as I described with Zanzibar earlier,
this extremely parallel graph database.
So developers basically apply a schema,
just like you would for a relational database.
And I've given an example schema here,
kind of modeling a Google doc.
And then what they do is they store data inside that database
and query that data according to that schema.
And it's really magic where you can actually make schema changes
and not in a forward compatible way that
lets you actually modify your permission systems
without changing any code.
So we don't actually have a SQL API,
despite being a database.
We give you GRPC and HTTP APIs.
And effectively, like the primary interface
we recommend as GRPC for latency reasons.
Because authorization is in the critical path of everything,
your web applications are going to do,
and possibly everything at your business,
you really have to make sure the stuff is fast.
Thus, everything needs to be kept in memory.
Everything needs to be returned in single digit
milliseconds.
So GRPC is actually pretty critical for that.
And then in addition to the actual main server,
we also expose servers for power and dev tools.
So you can get auto-complete and things in your editor.
But then also integration testing services.
So it's Kubernetes native.
Designed from the beginning, our background
is all in Kubernetes.
So actually, SpiceDB is self-cluster.
So if you deploy just SpiceDB directly onto Kubernetes,
it will discover other nodes and actually
start to divide and shard up the in-memory graph
that it's using to actually serve this data across them
automatically.
We also offer a SpiceDB operator in the open source,
which will then do automated updates for SpiceDB.
Notoriously, having zero downtime updates for a database
is very tricky.
So we just took that problem off the table for most people
and just implemented it automatic for anyone
using Kubernetes.
So we remain true to Zanzibar's goals of consistency
at scale.
So we actually have pluggable data storage systems.
And basically, depending on what your requirements are,
say you need to deploy everywhere in the globe,
you can actually store all of your raw relationship data
in something like Spanner or a Cockroach DB.
And then you can deploy regional deployments of SpiceDB
that will exist as independent caches for those geos.
But fundamentally, they're sharing all the same core data
and they're consistent across those environments.
If that sounds too complicated for you
or you don't really need that, you're just single region shop,
that's fine.
We also have deep integrations of Postgres or MySQL
if you just want to use something like Aurora or Amazon
or ES.
Obviously, then there's also memory for testing.
We also have a tool called Zed.
Zed is the CranLine tool.
It basically manages cluster credentials as backups.
It gives you a command for every single SpiceDB API.
And I just kind of give an example
of running kind of with debug flags permissions check.
You can actually see it gives you a whole graph traversal.
It shows you a tree of how you actually
computed whether or not someone has access
with timing data associated with all that.
So you can see where things slow down.
We have a web IDE.
So actually, the two things you just saw, SpiceDB and Zed,
we compile to WebAssembly and then run that in the browser.
And then we basically build that all on top of Monaco,
the engine that powers VS code.
And give you a full IDE where you
don't have to install any of the software I just showed you.
You can just go to play.offz.com
and start playing with this stuff.
Run Zed against live data.
You can load in test data.
And what we actually do is we can generate exhaustively
all of the paths available in the graph for you.
So there's somewhat of a model checking happening here.
So you can actually prove exhaustively
that all of the ways you can traverse the graph
are the ways you think they are.
And that basically lets you prove that a system is correct
without you deploying it into production
or having someone do a extremely long security
on it on your process.
And then you can check this stuff into CICD.
So if you make a change to the schema,
you can actually guarantee that certain assertions always
pass and that everything is exhaustively checked.
So Zanzibar is not a silver bullet.
We actually have had to extend Zanzibar
in a bunch of different ways.
So SpiceDB remains true to all of the core concepts
that you'll find in Zanzibar.
But not everyone is Google.
So effectively, not everyone relies on users
being represented the same way.
So we are kind of more flexible
with how people can model their own users.
And then we kind of add on developer experience
because at Google they can say,
you're forced to use the software.
When you're building open source software,
you can't force people to use your software.
You have to compel them to use your software
by having a better experience
than what they're currently doing.
We've also added kind of contextual relationships
with ABAC.
So that means relationships can actually exist
basically dynamically based on context
that you provide at runtime.
That was a joint project with Netflix.
So if you're wondering how you SpiceDB,
you can go to our Discord,
discord.gg slash SpiceDB or check out GitHub,
basically anywhere on the internet
where you expect to find open source projects.
SpiceDB is there.
So thanks everyone.
Thank you.
Thank you.