[00:00.000 --> 00:12.360]  Hello. Can you see my slides? Yeah. I have only wide background, so the light at the
[00:12.360 --> 00:23.520]  top shouldn't be a big issue. And yeah, we're good to go. Okay. Hi, I'm Stefan. I work on
[00:23.520 --> 00:28.040]  generally open source stuff as a freelancer, and I'm here to present something I've been
[00:28.040 --> 00:35.960]  working on as a site project in the last few months. This is part of the Game of Trees project,
[00:35.960 --> 00:43.360]  which I started in November 2017 at an OpenBSD hackathon in Berlin. It's compatible with
[00:43.360 --> 00:49.560]  Git repositories and the Git network protocol, but apart from that, it's not trying to replicate
[00:49.560 --> 00:55.520]  Git specifically, but it's just the idea to reuse these formats because they're very widely used.
[00:55.520 --> 01:03.440]  And they're fairly okay and well designed, so we can just keep using them and not make up our
[01:03.440 --> 01:12.040]  own. And yeah, because it's written on OpenBSD, it uses a lot of OpenBSD specific APIs. There's
[01:12.040 --> 01:18.520]  actually a portable version that's maintained by Thomas Adam, who also does the T-Max terminal
[01:18.520 --> 01:26.640]  multiplexer portable version, and you can install this on various systems. And I think Thomas always
[01:26.640 --> 01:31.400]  likes to also explore more options for other systems. If you're interested, if yours is not
[01:31.400 --> 01:37.240]  listed, you can talk to him. And yeah, it's ISC licensed because it aims to be basically as
[01:37.240 --> 01:43.840]  pleasing to OpenBSD developers as possible. That's the whole idea. Now, what we currently have
[01:43.840 --> 01:51.280]  is what's working really well is the client side. And this is basically sort of feature complete at
[01:51.280 --> 01:56.600]  this point. You might want to have some more convenience things, but all the basics are there.
[01:56.600 --> 02:03.400]  Everything is working. You have several frontends which I'll present in the following slides. You
[02:03.400 --> 02:08.040]  have a lot of code that's shared by these frontends, which I've labeled library here because it's in
[02:08.040 --> 02:13.320]  the lib directory of the source tree. One thing that this program does, which is very specific,
[02:13.320 --> 02:20.360]  is that it will not touch repository data outside of programs that are separate and are called
[02:20.360 --> 02:27.800]  lib exec helpers. From the programs point of view, if you use the library, you don't see this. You
[02:27.800 --> 02:32.520]  just like say open a repository and fetch me some objects and so on. But internally, it will
[02:32.520 --> 02:38.480]  actually start other programs that restricts themselves a lot using pledge and unveil and so
[02:38.480 --> 02:45.920]  on. And those will actually pass the repository data. This is the current list of commands. And
[02:45.920 --> 02:51.320]  I'm quite happy with this set actually. I've been working with this set for the last five years or
[02:51.320 --> 02:58.480]  so. They've slowly been added over time. But I feel very productive with these. And I don't miss
[02:58.480 --> 03:02.920]  anything. I know that some people would like some additional things. But at this point,
[03:02.920 --> 03:07.480]  we're mostly like fine tuning. And you can read the manual page on this URL if you like.
[03:07.480 --> 03:12.640]  You can actually read it from start to finish in order to get a good idea of how the system works
[03:12.640 --> 03:19.120]  and how it's supposed to be used. There's also a got admin utility which sort of mirrors CVS
[03:19.120 --> 03:24.760]  admin or SVN admin in the sense that if you're doing something that only requires like specific
[03:24.760 --> 03:30.520]  things where you do something with a repository specifically, you would use that command. This
[03:30.520 --> 03:35.040]  isn't complete. There are some things that I would still like to add here, which we'll go into
[03:35.040 --> 03:41.240]  later. But it's already prepared a lot of code for the server that I'll talk about. Because
[03:41.240 --> 03:47.240]  for example, dealing with pack files is necessary for the server as well as this tool. We have a
[03:47.240 --> 03:55.240]  curses command line, a base terminal browser thing. You can read commits with that and look at
[03:55.240 --> 04:01.880]  diffs and blame files and so on. It's working really well. And most recently, there's a developer
[04:01.880 --> 04:06.400]  Mark Jamsak who added a lot of convenience to this like vertical scrolling, diff stat display and
[04:06.400 --> 04:11.960]  all sorts of nice things. It doesn't work quite well on repositories that have a lot of merge
[04:11.960 --> 04:18.400]  commits. I found that some repositories are hard to browse if they use a lot of merges. But for
[04:18.400 --> 04:22.880]  simple repositories, it's really good. And if something is missing and you feel like you would
[04:22.880 --> 04:27.600]  like to use this on a repository with lots of merges, you can please make suggestions as to
[04:27.600 --> 04:33.280]  what we could improve there. You also have a web front end, which is sort of like CVS web or
[04:33.280 --> 04:41.440]  VUVC. And it's also using the God code internally to show you files on a web browser and commits
[04:41.440 --> 04:47.160]  and logs and so on. That's written by Tracy Emery. And most recently, Omar Polo has been doing a
[04:47.160 --> 04:52.400]  lot of refactoring there and added a templating mechanism, for example, to deal with generating
[04:52.400 --> 04:58.480]  the HTML, not from printf, but with something more generic. And it's quite nice. It also has
[04:58.480 --> 05:03.440]  RSS feeds for tags, which is probably rarely outdated, but I think it's kind of nice. You can
[05:03.440 --> 05:15.560]  be notified of new releases that way. Okay, so about the server. So the goal of one of the major
[05:15.560 --> 05:20.320]  milestones for any version control system that's ever been developed is that eventually you want
[05:20.320 --> 05:26.320]  to be self hosting. And so far, we've been using a Gitulite setup for this project. And that's
[05:26.320 --> 05:31.960]  working well, but I would really like to be able to run this on an OpenBSD server using my own
[05:31.960 --> 05:37.400]  code. So after putting this off for a long time, because I always thought it would be a lot of
[05:37.400 --> 05:42.200]  work, I finally ran out of things to do on the client side and said, okay, I'm going to look into
[05:42.200 --> 05:48.760]  several things now and started talking to people at Hackathons in September and summer last year,
[05:48.760 --> 05:55.720]  basically, and started working in September. By now, you can install it on OpenBSD current. It's
[05:55.720 --> 06:02.720]  not yet in the portable version. Thomas and Omar were going to look at that, but it might take
[06:02.720 --> 06:12.200]  some time still, but eventually it should arrive there. Now, the main use cases I want to support
[06:12.200 --> 06:17.440]  with this are exactly two. One is, of course, I want to be self hosting for my own open source
[06:17.440 --> 06:23.320]  projects and maybe also private repositories. And the other is I want to enable what OpenBSD is
[06:23.320 --> 06:29.440]  using now with CVS, which is anonymous distribution of source code over SSH, where you know that the
[06:29.440 --> 06:35.320]  server you talk to is genuine and should have the right source code for you, but the client doesn't
[06:35.320 --> 06:42.280]  need to authenticate. And every time I want to get source code from a platform like GitHub or
[06:42.280 --> 06:49.640]  GitLab or other forages that exist with God, I have to upload an SSH key because they will not
[06:49.640 --> 06:56.000]  accept my SSH connection. And because God only uses SSH, it doesn't implement HTTP support. This
[06:56.000 --> 07:02.960]  is really annoying. And it's not really a technical problem to do this. It's just basically that in
[07:02.960 --> 07:09.080]  their software, they didn't foresee this use case. But I think it's very nice. And you can actually
[07:09.080 --> 07:15.680]  go and try this now if you like. This is the code that I'm talking about running on a server and
[07:15.680 --> 07:23.200]  it's serving God code and God portable. You have the Husky fingerprints, which you can not take a
[07:23.200 --> 07:29.680]  photo of or whatever. It's also on the website. And yeah, if all of you all at the same time would
[07:29.680 --> 07:37.280]  now go and trigger this, you'd probably trap my SSH rate limiter, especially if HostM is behind
[07:37.280 --> 07:43.640]  that, which I hope not. But yeah, be gentle. Maybe if you want to clone from this repo,
[07:43.640 --> 07:48.520]  pick a slide number in your head from between 10 and 37. And when the slide comes up,
[07:48.520 --> 07:57.720]  you start your clone, then you'll be fine. So yeah, I'd like to explain a bit what the
[07:57.720 --> 08:03.520]  Git protocol is doing because without knowing this, you will not understand what a server should be
[08:03.520 --> 08:11.200]  doing. And it turns out that if you leave out HTTP and all this stuff and just concentrate on the
[08:11.200 --> 08:16.560]  playing it protocol, it's actually really quite simple. If you don't, if you also ignore some
[08:16.560 --> 08:20.120]  protocol extensions, which we haven't implemented yet. So this is like really a bare bone clone
[08:20.120 --> 08:25.640]  that that we will go through. It's not very complicated. The main thing to understand is
[08:25.640 --> 08:31.640]  that when you're using SSH, the Git client will actually go and run the login shell of the user
[08:31.640 --> 08:37.520]  and then give that a command to run. And Git basically hardcoded the names of these executables
[08:37.520 --> 08:43.840]  in its protocol. So you cannot be a Git protocol without calling Git upload pack on the server
[08:43.840 --> 08:50.080]  when you log in, right? Also there's Git receive pack for the other side when you're when you're
[08:50.080 --> 08:57.920]  when you're sending something. Anyway, if you run God clone with the dash v flag, you will see a
[08:57.920 --> 09:06.160]  trace that is very similar to what I'm showing now. It's I've left out a few bits. But initially,
[09:06.160 --> 09:13.040]  so this is only Git protocol version zero slash one Git protocol version two changed a bit some
[09:13.040 --> 09:18.400]  things in a good way. But I haven't implemented that. So we're seeing a version one trace.
[09:18.400 --> 09:26.960]  Initially, the server just sends one message which says I have one of the branches I have has this
[09:26.960 --> 09:32.280]  comment hash and this name. And oh, I also have some capabilities. You can see in the trace,
[09:32.280 --> 09:39.040]  these are hidden behind a null byte. Because I suppose very old versions of Git clients didn't
[09:39.040 --> 09:43.520]  really understand the capabilities yet and the null byte made them not read that part of the
[09:43.520 --> 09:48.240]  message. So they and also for version two, they did the same thing, hiding a version
[09:48.240 --> 09:52.480]  announcements behind two null bytes, because then the next kind, you know, this is a bit
[09:52.480 --> 09:58.800]  hacky, but seems to work. Don't worry about the capability capabilities. It's not important what
[09:58.800 --> 10:04.160]  they are. What's important to understand also is that each message is wrapped in a packet line,
[10:04.160 --> 10:10.880]  they call it. And that's simply a length plus data framing format for these messages.
[10:10.880 --> 10:18.120]  So then the server keeps sending messages for every branch it has. And here's one more,
[10:18.120 --> 10:23.840]  its main branch happens to be the same as had, because had is a similar to main, but you know,
[10:23.840 --> 10:29.480]  not important. And the client just keeps storing these. And eventually the server sends a flush
[10:29.480 --> 10:36.120]  packet, which is just a zero length packet and says, okay, I'm done. And in response to which
[10:36.120 --> 10:41.120]  the client will tell the server what it wants. So the client sends similar messages also includes
[10:41.120 --> 10:45.280]  its capabilities in the first message it's sending. And basically says, oh, yeah, I want this
[10:45.280 --> 10:49.800]  commit and this commit and this commit. And eventually it also sends a flush packet to
[10:49.800 --> 10:57.960]  terminate that list. Now if we're doing a clone, right? So we have nothing. But if we already
[10:57.960 --> 11:03.520]  had commits, we could now tell the server what we have by sending half lines, which look just
[11:03.520 --> 11:08.920]  the same as the want lines with more commit IDs. And the server then builds a second set of commits
[11:08.920 --> 11:13.360]  in its memory to say like, oh, okay, the client has all of these already, I don't need to send
[11:13.360 --> 11:18.320]  those and don't need to send any objects that are hanging off these commits. It's basically just an
[11:18.320 --> 11:22.960]  optimization to keep the pack file small that will be sent next, right? So you're not doing a full
[11:22.960 --> 11:27.080]  clone every time you do a full clone initially. And then once you have something, you tell the
[11:27.080 --> 11:33.280]  server what you already have. So you only fetch the new stuff. And yeah, because we're doing a clone,
[11:33.280 --> 11:38.640]  we're just setting a server we're done. And now the client's protocol is already finished. So this
[11:38.640 --> 11:44.360]  is basically the last message the client will ever send. And the server sends one more message in
[11:44.360 --> 11:49.200]  response, which is in this case, a knack, not acknowledged. I don't know why they chose these
[11:49.200 --> 11:56.280]  words, aka knack. But essentially what these do is for a knack, the server keeps sending knacks
[11:56.280 --> 12:00.720]  while the clients are sending half lines to say like, I haven't found a common ancestor yet,
[12:00.720 --> 12:07.120]  please send me more. Because without a common ancestor, the server cannot determine a subset
[12:07.120 --> 12:12.520]  of the commit graph to use for the pack file. Because if the client sends totally unrelated
[12:12.520 --> 12:17.120]  commit hashes, the server doesn't know, then the server cannot use this to optimize the pack
[12:17.120 --> 12:23.000]  file. So it keeps sending knack. And in another case where you would have a common ancestor,
[12:23.000 --> 12:27.960]  the server would send an act and commit hash. And the client would then stop sending half lines
[12:27.960 --> 12:36.960]  for this branch. The exact details of this part of the protocol are a bit complicated. And they
[12:36.960 --> 12:44.360]  kept adding extensions to this behavior. So the actual knack and act processing depends on various
[12:44.360 --> 12:50.240]  options that you can set in the protocol, which are all documented in the Git docs. But it's not
[12:50.240 --> 12:55.240]  important for us here now. Basically, the server just tells us, well, I have no common ancestors
[12:55.240 --> 13:02.520]  because you don't have any commits. That's fine. And then the server starts calculating the set of
[13:02.520 --> 13:09.800]  objects it wants to put in the pack file. And what Sony has colored, Git calls us something else.
[13:09.800 --> 13:15.840]  It calls us like counting and enumerate. I don't know which step does what. But what we do is we
[13:15.840 --> 13:21.400]  have the whole graph and we keep coloring nodes in the graph. It's kind of like mine or theirs or
[13:21.400 --> 13:25.520]  something like this. And then eventually we have a subsection, which in this case would be all of
[13:25.520 --> 13:31.800]  it. And of all the commits first, and then you go through these commits and traverse all the trees
[13:31.800 --> 13:37.160]  and collect all the trees and blobs that you need to include for the client. And then you have a
[13:37.160 --> 13:44.480]  lot of objects. And you sort them in a certain way. And you go through and check whether you
[13:44.480 --> 13:49.320]  already have a delta for any of these objects and whether the delta base will also be included in
[13:49.320 --> 13:55.080]  the packet sending so that you can avoid creating a delta for this object. You just reuse the delta
[13:55.080 --> 13:58.440]  that you already have somewhere, which is an optimization for performance and very important.
[13:58.440 --> 14:03.160]  If you don't do that, your server is going to be super slow. And then you deltify some of the
[14:03.160 --> 14:08.280]  rest of the objects and you're good to go. Now you know what you need to know to start
[14:08.280 --> 14:14.360]  generating a pack file stream. And you start sending this out to the client. And the client
[14:14.360 --> 14:21.640]  downloads it. Once it has everything indexes the pack, which is a step where you have the pack file,
[14:21.640 --> 14:26.800]  which is full of compressed and deltified objects. You don't know what's in it because the server
[14:26.800 --> 14:31.200]  didn't tell you anything about the objects. You just told the server, send me this. The server
[14:31.200 --> 14:36.240]  sends you something. Now you don't know what's in there. And to use the pack file, you always need
[14:36.240 --> 14:41.840]  to have an index for it, which tells you which object ID is at which offset in the pack file.
[14:41.840 --> 14:47.600]  So you just read the whole thing. And because Git uses intrinsic object identifiers, you can
[14:48.240 --> 14:52.720]  calculate the IDs yourself based on the contents of the blobs and the trees and the commits and so on.
[14:52.720 --> 15:00.000]  So you build that up. And then for any of the deltified objects, you also need to make sure
[15:00.000 --> 15:05.440]  that you can actually combine all the deltas to get the right content. And that's the last step.
[15:06.080 --> 15:09.600]  That takes quite a while. And then once you're done with that on a big pack anyway,
[15:09.600 --> 15:12.960]  it takes a long time. And then once you have that, you know, okay, I have this pack.
[15:13.760 --> 15:18.000]  The commit I want it is in there. All the objects that are hanging off of it are,
[15:18.000 --> 15:23.200]  you know, by nature of the hashing structure that Git is using are there. So that's fine,
[15:23.200 --> 15:30.240]  we're going to use this. Then you just create a reference for the Git client to find its initial
[15:30.240 --> 15:36.560]  commit and you can use the repository. In the push case, it works slightly differently.
[15:38.240 --> 15:44.160]  You still have this reference list announcement at the beginning. And instead of saying what it
[15:44.160 --> 15:49.280]  wants, the client proposes reference updates to say, oh, I would like to change the main branch
[15:49.280 --> 15:53.920]  to point to this commit. And I would like to change or add this tag or something like this.
[15:54.640 --> 15:58.320]  And then it just sends a pack file. And then the server has to index this and
[15:58.960 --> 16:03.760]  figure out that everything is fine. And whether it wants to change these references or not.
[16:04.400 --> 16:06.880]  And give feedback to the client to say, like, yes, okay, I've,
[16:07.760 --> 16:13.200]  you have changed this branch or you've added this tag and so on. So that's it for the protocol
[16:13.200 --> 16:17.600]  overview. You can find a lot of documentation in Git source tree about this.
[16:19.600 --> 16:22.160]  They moved the files recently. So if you have an older
[16:23.040 --> 16:27.040]  Git source checkout, it might still be in documentation slash technical,
[16:27.680 --> 16:31.200]  but in the current version, it's in documentation slash Git protocol
[16:31.920 --> 16:35.680]  dash packed attack system is the main one for this, but there are also others,
[16:35.680 --> 16:39.520]  similarly named files, which you can also read if you want to know more.
[16:39.520 --> 16:47.600]  Okay, another thing we need to talk about, because this is important to understand why
[16:48.640 --> 16:53.040]  we would need to write our own server in the first place, because there are already several
[16:53.040 --> 16:59.520]  server implementations, right? Why do we want our own? Well, when you write server software,
[16:59.520 --> 17:04.000]  especially an open BSD, there are a few design patterns that we use that are not
[17:04.720 --> 17:08.880]  commonly used elsewhere, I would say. I mean, I've never really seen them used widely outside
[17:08.880 --> 17:15.040]  this project, so it's a bit unique in that way and the way it does things, but these things
[17:15.040 --> 17:20.720]  are important to us. So for example, you know that SSH recently had a release where they had a
[17:20.720 --> 17:25.760]  double free and advisory products like yesterday, I think, or two days ago said like, oh, this is
[17:25.760 --> 17:31.840]  not believed to be exploitable. That is because of this. It's not because SSH code is generally
[17:31.840 --> 17:36.880]  great or something. It's because of the design patterns. And so we want these design patterns
[17:36.880 --> 17:44.720]  to be used. And so one of the things you do is that you split your program into several processes
[17:44.720 --> 17:51.520]  that have different tasks. And for each task, you decide what kind of system calls does this task
[17:51.520 --> 18:01.760]  need? And how can I make sure that a process that has network access isn't also able to
[18:01.760 --> 18:07.280]  start new programs or open files and so on. There's unveil which restricts the view of the file
[18:07.280 --> 18:12.480]  system and allows you to completely hide like your dot SSH directory, for example. And other
[18:12.480 --> 18:16.640]  things, basically, it says the program, for example, the God client says, I need the repository,
[18:16.640 --> 18:21.040]  I need the work tree, I need slash temp, that's all I need to see. And I don't need to see anything
[18:21.040 --> 18:27.440]  else. When you start new programs, you always fork an exec, which means that when you do the
[18:27.440 --> 18:32.720]  exec, the program will be restarted from scratch and OpenBSD's memory randomization will kick in
[18:32.720 --> 18:36.880]  and load all the code segments and text segments and stuff in different locations again,
[18:38.320 --> 18:42.800]  which you do for every request so that when somebody learns information about the outer space
[18:42.800 --> 18:48.800]  from an info league, they cannot use it on the next request. You have messages over pipes to
[18:48.800 --> 18:53.520]  communicate between these programs. And of course, you will have to have access to files
[18:53.520 --> 18:58.240]  and networks somehow, right, especially in isolated contexts. And there what you do is you pass
[18:58.240 --> 19:04.000]  file descriptors over these pipes so that one process opens resources and the other
[19:04.000 --> 19:08.560]  less privileged one is using them. So these are the these are the patterns we use.
[19:11.040 --> 19:16.320]  Okay, and so basically, this is what this is. It's a Git server that runs as this kind of
[19:16.320 --> 19:23.680]  multi process program. It only supports SSH. Git user account I mapped to regular shell accounts
[19:23.680 --> 19:28.000]  because I didn't want to reimplement user management. You can have a special purpose
[19:28.000 --> 19:32.560]  login shell for these users to restrict them, if you want. And access permissions are said
[19:32.560 --> 19:36.720]  per repository. I don't want to go very complicated and make it like per branch or something. It's
[19:36.720 --> 19:40.880]  just like, no, if you have access to the repo, you have access, which is good enough, for example,
[19:40.880 --> 19:44.240]  for OpenBSD's model where you get an account and you can commit anywhere.
[19:44.240 --> 19:50.160]  And when you configure this thing, this is basically what you need to do. You create your
[19:50.160 --> 19:55.920]  repositories, make sure they're owned by the right user that you run the demon as. And you have at
[19:55.920 --> 20:02.320]  least one repository in your configuration file, which has a path, but the repository is and access
[20:02.320 --> 20:06.880]  permissions for either, in this case, the example would be a group of developers, which you have
[20:06.880 --> 20:15.360]  in ETC groups and an anonymous user, which we can only read. Now, my initial implementation
[20:15.360 --> 20:22.000]  of this looked something like this. It was functional and I could write a test suite for
[20:22.000 --> 20:31.280]  it, which was the main part. This could actually be used to fetch and push changes. But the design
[20:31.280 --> 20:37.440]  wasn't very good in terms of this multiprocess aspect because the parent started, then it
[20:37.440 --> 20:42.080]  started a reader process and a writer process and that was it. And then all these processes were
[20:42.080 --> 20:48.160]  always used for every connection. It did allow us to at least get this up and running, though.
[20:48.800 --> 20:54.080]  And I don't know, I asked for a bit of review and got shocked responses to say like, no,
[20:54.080 --> 20:58.080]  you're doing this all wrong. Fork and X needs to be done per request and so on. So yeah, okay.
[20:58.080 --> 21:06.240]  But at least functionally, it was already quite okay. And the repository code there is
[21:06.240 --> 21:11.760]  reusing a lot of the code that I already had for like God admin and so on. So I mostly had to
[21:11.760 --> 21:23.280]  rewrite a lot of code for the parent process from scratch, which was all of this. This is
[21:23.280 --> 21:27.920]  what it looks now. So the parent basically encompasses or used to encompass all of this
[21:27.920 --> 21:34.000]  functionality and we'll go through each one by one. So right now, in this current implementation,
[21:34.000 --> 21:40.720]  you have the parent when it starts up, must start as root in order to be able to do certain things
[21:40.720 --> 21:49.440]  like open, like start the listener process as root, for example. And it uses pledge as
[21:49.440 --> 21:54.000]  standard IO proc exact, which means basically standard is you always want that it's like
[21:54.000 --> 21:58.960]  printf and stuff like this. Then you have proc and exact, which allows you to fork and execute
[21:58.960 --> 22:03.840]  programs. And you can also send and receive file descriptors. And that's it what it can do.
[22:04.480 --> 22:10.080]  It also currently does an unveil on itself. So with an X permission, so it can re execute itself
[22:10.080 --> 22:15.200]  with different option flags to start other versions of itself, basically that we will start later.
[22:15.200 --> 22:21.600]  I'm not sure if this is really sound because it used to be said that unveil would inherit to child
[22:21.600 --> 22:27.600]  processes. And I'm not sure what happened to this. Currently, it does not. So it does not inherit,
[22:27.600 --> 22:32.720]  so I can do this and not lose access to, for example, the slash temp directory in the processes I'm
[22:32.720 --> 22:37.040]  starting next. But if that ever changes, we would have to adapt this, but it's not a big deal.
[22:37.040 --> 22:43.920]  You start a listen process, which opens the actual Unix socket that this demon accepts connections on.
[22:43.920 --> 22:48.080]  So basically, if you're a local user on the system, you can always access it through the socket,
[22:48.080 --> 22:53.360]  but you would normally run this shell that we have to, which does this for you and speaks the
[22:53.360 --> 23:00.720]  appropriate protocol. It then drops privileges. And the listen process runs as just standard IO
[23:00.720 --> 23:11.440]  synaptic Unix. Unix is needed to operate on the Unix socket. It also does an unveil because
[23:12.640 --> 23:17.520]  the Unix pledge allows you to bind other sockets and bind would create other sockets for you
[23:17.520 --> 23:24.720]  somewhere. And we wanted to prevent that. So by unveiling everything, basically hiding everything
[23:24.720 --> 23:33.520]  with unveil, there's no way to create additional Unix sockets for this process. And this process
[23:33.520 --> 23:40.160]  is also, as an initial kind of dust prevention mechanism, this enforces a connection limit
[23:40.160 --> 23:45.040]  per UID so that not one user can just connect to the socket and spam it and prevent access for
[23:45.040 --> 23:53.200]  everyone else. Now, the shell is one of the most sensitive parts because this is where users log
[23:53.200 --> 23:58.960]  in and you actually confine them to this program. So you want this to be reasonably secure. It starts
[23:58.960 --> 24:04.640]  out with standard IO, receive FDN Unix to be able to connect to the Unix socket. But once it's
[24:04.640 --> 24:09.520]  connected, it drops that capability so it can no longer open new ones or do other things related
[24:09.520 --> 24:14.960]  to that. It only has a file of scripture it can talk on. And that's it. And then it starts
[24:14.960 --> 24:20.080]  demonstrating these packet lines that we saw to messages that are internal to the program
[24:20.080 --> 24:30.880]  and go over the pipe to the parent. The parent will then start an authorization process which
[24:30.880 --> 24:40.960]  only runs once. And what this does is it gives itself access to the password database of the
[24:40.960 --> 24:50.400]  system using the SCAT-PW syscall and also hides all the file system. And I think this is, this
[24:50.400 --> 24:55.600]  shows something very nice about Pledge and Unveil when used in combination because I'm actually
[24:55.600 --> 25:01.920]  reading ETC password and ETC group files, right? But Unveil, as per Unveil, I shouldn't be able to
[25:01.920 --> 25:08.000]  access those. But because I declared that I want to use the password database, the kernel knows
[25:08.000 --> 25:13.680]  that this process is okay. It's okay for this process to access those files. So it bypasses
[25:13.680 --> 25:18.400]  Unveil in that specific case. Which means I don't have to worry about how the security mechanism
[25:18.400 --> 25:23.520]  is implemented. I don't have to go and say, oh, is my libc when I ask for users going to open this
[25:23.520 --> 25:28.000]  file? Well, maybe I should add an exception for that. Or is it going to do this and such and such
[25:28.000 --> 25:33.680]  syscall? I don't have to worry. I just say like, Pledge, I will do that. And Unveil, I will do
[25:33.680 --> 25:37.440]  that. And they take care of it, which is great for a programmer. It's really nice to program
[25:37.440 --> 25:43.920]  against this. So what this process then does, of course, is matches the users that are logged in
[25:43.920 --> 25:48.640]  against the access rules in the config file you saw earlier and reports the result to the parent
[25:48.640 --> 25:54.640]  and just exits because that's all it needs to do. It's just a one-shot thing. Now, the parent
[25:55.920 --> 26:01.840]  starts two processes if authorization has succeeded. And the shell is kind of waiting
[26:01.840 --> 26:05.440]  because it's like, hey, I sent a message, but you haven't responded yet. But yeah, we're busy,
[26:05.440 --> 26:13.920]  we're setting up. So we start two things right now, a session process and a repository read or
[26:13.920 --> 26:20.080]  write process. Currently, the naming of these is horribly bad. It just was the best I could come
[26:20.080 --> 26:25.360]  up with. And it kind of grew organically from the initial setup with those three processes you saw
[26:25.360 --> 26:30.320]  earlier. But for example, the repository write process is not actually writing to the repository,
[26:30.320 --> 26:36.240]  which you'll see later. So I'm not very happy about this. And also, the session process is
[26:36.240 --> 26:42.480]  basically the most powerful component of the system right now. It's the only one that can
[26:42.480 --> 26:46.960]  actually read, write, the repository and create files in there. It can also do the same as slash
[26:46.960 --> 26:52.240]  temp. And for that, it needs all these pledges with like read path, write path, create path.
[26:52.800 --> 26:57.200]  And it also needs file attributes and file locking because when it changes references for
[26:57.200 --> 27:01.680]  clients, it needs to make sure that they get locked so that you don't have file system races
[27:01.680 --> 27:06.400]  where two clients commit at the same time and then you end up with a reference that's been
[27:06.400 --> 27:13.680]  overwritten. It also creates temporary files, which the repository process needs and gives it
[27:13.680 --> 27:19.440]  the file descriptors. It handles installing of the pack files and so on. And it has the
[27:19.440 --> 27:27.120]  git protocol state machine in it. So that's a bit, I would like to continue work there to split
[27:27.120 --> 27:32.240]  this up more, but because I had to have a functional implementation and I had to, like,
[27:32.240 --> 27:36.480]  I wanted to have something functional to clone from, which is there now, which is on the internet.
[27:36.480 --> 27:40.640]  That's fine. But going forward, this needs to be revisited for sure.
[27:44.160 --> 27:48.640]  The repository read and write process is apart from the name for repo write. I'm okay with
[27:48.640 --> 27:56.800]  how that's worked out. Both of them can only read from the repository. And what the reader does is
[27:57.520 --> 28:03.120]  it is responsible for creating a pack file and streaming the result to the God shell over
[28:03.120 --> 28:08.880]  a pipe that is created by the session process and handed to both the shell and the reader.
[28:08.880 --> 28:18.880]  And the writer is responsible for receiving a pack file and indexing it. So the indexing
[28:18.880 --> 28:27.280]  is almost done. So the indexing is done there. Okay. I have one minute left, one minute and
[28:27.280 --> 28:32.640]  a half. I quickly go through some implementation improvements. It's still like to do. So we should
[28:32.640 --> 28:37.440]  verify what the client has uploaded. Currently we trust it, what to do. The config file is
[28:37.440 --> 28:41.280]  parsed every time a process starts, which isn't ideal, which works, but it's bad if you're changing
[28:41.280 --> 28:47.440]  the file while the process is running. Yeah, session I already mentioned. And the state machines
[28:47.440 --> 28:51.120]  have some funny bugs. So these really need to be rewritten. They're basically like switch statements
[28:51.120 --> 28:55.520]  and if and so on. And I'd like to properly separate that out with tables and state transition
[28:55.520 --> 29:00.560]  functions and so on. But it was just a quick way of getting things working. But we already saw
[29:00.560 --> 29:06.160]  like thousands of flash packets flying through this process because an end of file on a socket
[29:06.160 --> 29:11.680]  triggered a flash packet and that was kind of stupid. This has been fixed, but there will still
[29:11.680 --> 29:20.640]  be other bugs like that. We should have some built-in checks so that commits can be verified
[29:20.640 --> 29:25.600]  according to project policies and things like denying merge commits if you don't want them
[29:25.600 --> 29:33.040]  or binary files and so on, preventing a forced push. I'd like to have commit notifications
[29:33.040 --> 29:37.360]  where you, for example, send an email or you can send an arbitrary HTTP request so that
[29:37.360 --> 29:40.640]  if you really want to have a post-commit hook script, you run it somewhere else and we'll
[29:40.640 --> 29:48.640]  give you information and trigger it. Yeah, also it should really keep track of what this space
[29:48.640 --> 29:54.800]  it has when it accepts pack files and not fill the disk and fail. We should be able to remove
[29:54.800 --> 30:00.080]  redundant pack files that have accumulated over time. I'd like to add SHA2 support
[30:00.080 --> 30:06.400]  and enable it by default once that works so that we use the SHA2 because we have zero
[30:06.400 --> 30:11.280]  production deployments right now and unlike it, so we can just use the new format they've already
[30:11.280 --> 30:17.040]  defined. Service at rebasing is another thing. I'm out of time, so I'm not going to go into that,
[30:17.040 --> 30:31.120]  but I think this is it. Sorry for the quick part of you. Thank you very much. I
[30:31.120 --> 30:47.920]  encourage you to ask a question about the hallway. Okay, good.