[00:00.000 --> 00:06.520]  Looking at this from the angle,
[00:06.520 --> 00:09.280]  how can I manage such a large graph in a good way,
[00:09.280 --> 00:12.480]  and moving forward to that, so Nikolai.
[00:12.480 --> 00:14.640]  Thanks for the introduction.
[00:14.640 --> 00:17.200]  We have to speak up because the audience only for the-
[00:17.200 --> 00:20.560]  Okay. So my name is Nikolai Kondashov.
[00:20.560 --> 00:22.920]  I work at Red Hat on the CKI project,
[00:22.920 --> 00:24.000]  which has built in one of
[00:24.000 --> 00:29.200]  those Linux kernel testing systems for Red Hat and for Upstream.
[00:29.200 --> 00:32.000]  I also work in the kernel, louder?
[00:32.000 --> 00:38.640]  Okay. I also work with the Kernel CI Upstream Community on
[00:38.640 --> 00:40.160]  the KCI DB project,
[00:40.160 --> 00:44.360]  which is the source of this presentation,
[00:44.360 --> 00:47.720]  and I do electronics and embedded as a hobby.
[00:47.720 --> 00:50.400]  Okay. So I'm going to walk you
[00:50.400 --> 00:53.520]  quickly through the kernel contribution workflow,
[00:53.520 --> 00:56.720]  through the testing systems,
[00:56.720 --> 00:59.600]  then what we are trying to do with KCI DB at Kernel CI,
[00:59.600 --> 01:03.180]  and then how we want to solve the problem,
[01:03.180 --> 01:05.680]  and what the actual problem is with
[01:05.680 --> 01:08.480]  the Kernel CI process in general.
[01:08.480 --> 01:10.600]  Then I go briefly through the data model,
[01:10.600 --> 01:13.360]  and what kind of a few questions,
[01:13.360 --> 01:15.800]  what a few queries that we need,
[01:15.800 --> 01:19.040]  and how it went with Neo4j,
[01:19.040 --> 01:21.600]  and what we can do instead.
[01:21.600 --> 01:25.480]  So the kernel contribution workflow,
[01:25.480 --> 01:28.200]  I don't know if everybody's familiar with that.
[01:28.200 --> 01:30.400]  I hope not because it's not very pleasant.
[01:30.400 --> 01:33.840]  But basically, you do your changes,
[01:33.840 --> 01:34.600]  you commit your changes,
[01:34.600 --> 01:37.040]  then you make an email out of that and send it to
[01:37.040 --> 01:40.120]  a mail list and to a maintainer for them to review,
[01:40.120 --> 01:41.120]  to give you feedback,
[01:41.120 --> 01:44.280]  then you repeat that again until everybody's satisfied,
[01:44.280 --> 01:46.800]  including maintainer, whoever is concerned with that change.
[01:46.800 --> 01:50.560]  After this, your patches get merged into
[01:50.560 --> 01:53.480]  a sub-tree for the particular subsystem that you were changing,
[01:53.480 --> 01:56.240]  and then sometime later, this is getting merged into
[01:56.240 --> 01:58.840]  the mainline which Linus maintains,
[01:58.840 --> 02:01.320]  and you're done basically.
[02:01.320 --> 02:04.280]  But at any point in that process,
[02:04.280 --> 02:06.640]  you can get some test results for your change.
[02:06.640 --> 02:08.080]  It could be if you're lucky,
[02:08.080 --> 02:10.280]  you can get it before it even gets reviewed,
[02:10.280 --> 02:12.520]  or sometime it gets reviewed,
[02:12.520 --> 02:15.320]  or after it was merged, any time.
[02:16.800 --> 02:20.680]  So there's a whole bunch of
[02:20.680 --> 02:24.160]  kernel testing systems, this is just a sample.
[02:24.640 --> 02:27.440]  Each of them is trying to solve their own problem.
[02:27.440 --> 02:30.800]  For example, CKI is a Red Hat system,
[02:30.800 --> 02:35.720]  they would test particular hardware that our customers use,
[02:35.720 --> 02:38.640]  particular features that our customers request,
[02:38.640 --> 02:39.600]  to make sure that they work,
[02:39.600 --> 02:41.400]  that the distribution works,
[02:41.400 --> 02:43.560]  Intel tests their hardware,
[02:43.560 --> 02:46.560]  their graphics cards, and make sure that those work.
[02:46.560 --> 02:51.480]  Google fuzzer system calls, SysColor and SysBot,
[02:51.480 --> 02:54.840]  LKFT from Linaro, they test ARM boards,
[02:54.840 --> 02:58.520]  and finally, kernel CI is aiming to be
[02:58.520 --> 03:00.520]  the official CI system for the Linux kernel,
[03:00.520 --> 03:02.600]  it's supported by Linux Foundation,
[03:02.600 --> 03:04.760]  and they're trying to run tests on
[03:04.760 --> 03:08.760]  the whatever hardware others can provide, we can have.
[03:10.120 --> 03:14.040]  You can see everybody has their own interest in that game.
[03:14.040 --> 03:17.160]  So this is how your various email reports can look from
[03:17.160 --> 03:19.200]  those systems correspondingly,
[03:19.200 --> 03:24.720]  and this is their dashboards from different systems.
[03:24.720 --> 03:29.040]  So kernel CI, as I said,
[03:29.040 --> 03:32.320]  is striving to be the DCI system,
[03:32.320 --> 03:35.120]  and we have a testing system and
[03:35.120 --> 03:37.320]  the hardware management and
[03:37.320 --> 03:41.360]  the framework and everything to run the tests in various labs,
[03:41.360 --> 03:42.880]  and these labs can be located in
[03:42.880 --> 03:45.360]  different premises by people who
[03:45.360 --> 03:50.160]  have some hardware to run them on the test zone,
[03:50.160 --> 03:54.000]  and then that gets collected and put into the database,
[03:54.000 --> 03:57.400]  and then we have various other CI systems
[03:57.400 --> 04:01.920]  collecting their results and sending them to the KCIW database,
[04:01.920 --> 04:05.080]  and KCIW was conceived as a system to try to
[04:05.080 --> 04:07.960]  reduce the effort that all CI systems
[04:07.960 --> 04:09.360]  have to put into their dashboards,
[04:09.360 --> 04:11.840]  into their reports, and instead have
[04:11.840 --> 04:15.280]  one dashboard and one report if possible or close to that,
[04:15.280 --> 04:18.120]  and as well to save the developer's attention,
[04:18.120 --> 04:21.280]  which is a precious resource because as you see,
[04:21.280 --> 04:24.360]  it's not so easy to investigate every report
[04:24.360 --> 04:27.680]  and from different CI systems
[04:27.680 --> 04:30.440]  because they are differently formatted emails,
[04:30.440 --> 04:32.440]  different data, different dashboards,
[04:32.440 --> 04:33.520]  you have to look at them this way,
[04:33.520 --> 04:35.680]  that way, and you have to figure it out.
[04:35.680 --> 04:38.280]  So that's case IDB is the effort to bring
[04:38.280 --> 04:40.720]  this one into all the wall.
[04:40.720 --> 04:45.440]  So conceptually, it's very simple,
[04:45.440 --> 04:47.840]  these are systems and JSON which can consist
[04:47.840 --> 04:51.360]  like various objects in any combination,
[04:51.360 --> 04:53.000]  and we have the database we put them in,
[04:53.000 --> 04:55.040]  we have the dashboard to display that,
[04:55.040 --> 04:58.480]  and we have a subscription system where you can give
[04:58.480 --> 04:59.560]  some rules and say like, okay,
[04:59.560 --> 05:01.520]  I want to see these results from this test and from
[05:01.520 --> 05:03.760]  this tree or for this architecture or whatever,
[05:03.760 --> 05:06.040]  and we can generate the reports based on
[05:06.040 --> 05:09.320]  that whenever you need it as the data comes in.
[05:09.320 --> 05:12.160]  One important note about this is that
[05:12.160 --> 05:16.560]  compared to our regular CI system where you control everything,
[05:16.560 --> 05:20.600]  in this system, the data can come in in any order.
[05:20.600 --> 05:22.360]  In a regular CI system, you have
[05:22.360 --> 05:24.880]  the results come in the same order as commits come in.
[05:24.880 --> 05:27.160]  So if you tested something earlier,
[05:27.160 --> 05:28.560]  that means for an earlier commit,
[05:28.560 --> 05:29.680]  if you tested something later,
[05:29.680 --> 05:31.200]  it's for a later commit,
[05:31.200 --> 05:35.240]  and you can have a line of history with those results.
[05:35.240 --> 05:36.960]  But for case IDB,
[05:36.960 --> 05:40.040]  since various different CI systems,
[05:40.040 --> 05:43.280]  they get in any order you wish.
[05:43.280 --> 05:49.520]  So we have about 100,000 test results per day,
[05:49.520 --> 05:51.400]  a few thousands of builds,
[05:51.400 --> 05:54.160]  and hundreds of 100 revisions per day tests
[05:54.160 --> 05:57.280]  that received by the case IDB database.
[05:57.280 --> 06:00.880]  Well, actually, I think, yeah, that's correct.
[06:00.880 --> 06:02.480]  That's the correct scale.
[06:02.480 --> 06:04.480]  So it looks something like this as
[06:04.480 --> 06:07.800]  Grafana is like a prototype dashboard.
[06:07.800 --> 06:09.160]  We're thinking about building a new one,
[06:09.160 --> 06:11.880]  but I don't know how soon that's going to happen.
[06:11.880 --> 06:15.240]  So graphs, tables, all that jazz.
[06:15.240 --> 06:19.880]  This is our prototype reports look like this.
[06:19.880 --> 06:24.000]  So what's the problem with the kernel CI in general,
[06:24.000 --> 06:26.480]  not with the kernel CI, the project?
[06:26.480 --> 06:29.800]  So first of all,
[06:29.800 --> 06:32.280]  kernel is intended to be
[06:32.280 --> 06:33.680]  an obstruction layer for hardware.
[06:33.680 --> 06:35.080]  That's this whole purpose,
[06:35.080 --> 06:37.360]  and to make it easier to write software.
[06:37.360 --> 06:41.160]  So in theory, to make sure that it works,
[06:41.160 --> 06:42.360]  you have to test it with every piece
[06:42.360 --> 06:43.840]  that you're abstract away from.
[06:43.840 --> 06:45.800]  But that's not possible, of course,
[06:45.800 --> 06:47.400]  and hardware is expensive,
[06:47.400 --> 06:51.080]  so it's a natural scarcity in this whole system.
[06:51.080 --> 06:54.040]  Then the tests, since you cannot get
[06:54.040 --> 06:56.240]  all the hardware at the same time,
[06:56.240 --> 06:58.600]  and you cannot possibly run all the tests on
[06:58.600 --> 07:02.640]  all the hardware for every commit that people post,
[07:02.640 --> 07:05.240]  it means that sometimes the tests run on this hardware,
[07:05.240 --> 07:06.280]  sometimes on that hardware,
[07:06.280 --> 07:08.000]  sometimes they don't run,
[07:08.000 --> 07:11.480]  and the tests themselves are not so reliable
[07:11.480 --> 07:12.360]  because there's a lot of
[07:12.360 --> 07:13.920]  concurrency management in the kernel,
[07:13.920 --> 07:15.440]  and that's hard to get right,
[07:15.440 --> 07:17.040]  and in general, things happen at
[07:17.040 --> 07:18.560]  the same time in the operating system,
[07:18.560 --> 07:21.640]  so then sometimes they're not so reliable.
[07:21.640 --> 07:24.880]  So you can get a pass on your change,
[07:24.880 --> 07:27.040]  even if it's broken or get a fail on your change,
[07:27.040 --> 07:29.040]  even if it's not broken,
[07:29.040 --> 07:31.680]  or even if it's somebody else's change that broke it,
[07:31.680 --> 07:32.920]  basically, hell.
[07:32.920 --> 07:37.400]  So it's hard to remove noise from those results,
[07:37.400 --> 07:40.640]  and for developers,
[07:40.640 --> 07:43.000]  it's hard to investigate even a valid change.
[07:43.000 --> 07:44.120]  While it's a kernel,
[07:44.120 --> 07:45.680]  you have to meet all the conditions,
[07:45.680 --> 07:47.640]  and well, sometimes you have to get the right hardware,
[07:47.640 --> 07:49.040]  or ask people for the right hardware,
[07:49.040 --> 07:51.680]  or ask them to actually run the test and send you results,
[07:51.680 --> 07:54.880]  like you know, over email takes a while.
[07:54.880 --> 07:58.720]  So if we start sending people emails with
[07:58.720 --> 08:02.240]  results that are not valid,
[08:02.240 --> 08:04.280]  false positive, false negatives,
[08:04.280 --> 08:07.800]  then people kind of get pissed because of that,
[08:07.800 --> 08:10.760]  because it takes such a long time to reproduce them.
[08:10.760 --> 08:14.920]  So a lot of CI systems resort to
[08:14.920 --> 08:17.160]  human review before sending those reports,
[08:17.160 --> 08:18.840]  like they see the failures,
[08:18.840 --> 08:20.360]  they say, okay, well, let's send this to
[08:20.360 --> 08:22.480]  this mail list and then they send them,
[08:22.480 --> 08:26.760]  and only a few manage without that so far.
[08:26.760 --> 08:31.400]  So obviously, nobody stops the development to fix CI,
[08:31.400 --> 08:33.280]  because there's just so many developers,
[08:33.280 --> 08:35.760]  and if one system breaks something,
[08:35.760 --> 08:39.160]  like another subsystem doesn't want to care about that,
[08:39.160 --> 08:43.520]  and the feedback loop is just too long.
[08:43.520 --> 08:44.720]  So tests keep running,
[08:44.720 --> 08:46.800]  keep failing, and it takes a while to fix them.
[08:46.800 --> 08:49.360]  So instead of the ideal case where you can move
[08:49.360 --> 08:53.880]  past, only move past the tests if they pass,
[08:53.880 --> 08:55.280]  and then do all the stages,
[08:55.280 --> 08:57.120]  like a review, and then it's merged, and it's test,
[08:57.120 --> 08:58.800]  and it's fine, and then you can upstream it,
[08:58.800 --> 09:03.360]  you get something like this where all tests fail,
[09:03.360 --> 09:05.800]  okay, it's probably not our problem,
[09:05.800 --> 09:07.480]  not have time to investigate it,
[09:07.480 --> 09:10.680]  or we just didn't get any test result with new one.
[09:10.680 --> 09:16.760]  So what we're trying to do is we got to fix this, right?
[09:16.760 --> 09:20.720]  So we got to fix the test results.
[09:20.720 --> 09:25.440]  So we fix the test result.
[09:25.440 --> 09:27.760]  We look at the test output conditions,
[09:27.760 --> 09:30.280]  et cetera, and we add a rule to the database saying like,
[09:30.280 --> 09:32.400]  okay, well, this failed,
[09:32.400 --> 09:33.640]  but we know about this,
[09:33.640 --> 09:34.840]  here's the bug that was open,
[09:34.840 --> 09:37.280]  so don't complain to developers,
[09:37.280 --> 09:39.880]  don't waste their attention,
[09:39.880 --> 09:42.960]  and it looks like this,
[09:42.960 --> 09:44.240]  shiny and sparkly,
[09:44.240 --> 09:45.560]  but after a while,
[09:45.560 --> 09:47.080]  we get this fix into the test,
[09:47.080 --> 09:49.680]  and we repeat the process with another issue.
[09:49.680 --> 09:52.440]  So these things are already working in
[09:52.440 --> 09:54.480]  separate CI systems like the CKI.
[09:54.480 --> 09:57.840]  There's a UI screen for an issue in the kernel,
[09:57.840 --> 10:00.240]  it says like, okay, look for this output in the test,
[10:00.240 --> 10:01.840]  for this string in the test output,
[10:01.840 --> 10:04.000]  if you see it for this test,
[10:04.000 --> 10:08.720]  then we consider it a kernel bug and don't raise the problem.
[10:08.720 --> 10:12.520]  Okay, so or bug log CI,
[10:12.520 --> 10:14.720]  Intel's CI system,
[10:14.720 --> 10:16.640]  they have like a huge form.
[10:16.640 --> 10:18.600]  For file in this, you can see another string that
[10:18.600 --> 10:20.040]  is you're supposed to look in
[10:20.040 --> 10:22.200]  the error output and the conditions and
[10:22.200 --> 10:25.240]  what kind of status you want to assign to the test, et cetera.
[10:25.240 --> 10:29.360]  So here's a dog tags for you to take a breath,
[10:29.360 --> 10:31.800]  and for me to take a drink.
[10:36.800 --> 10:39.960]  So I'll dive into the model.
[10:40.400 --> 10:44.480]  We start with checkouts which basically just specify
[10:44.480 --> 10:47.280]  what kind of revision you're checking out,
[10:47.280 --> 10:51.120]  we have taken it from repository branch and which commit,
[10:51.120 --> 10:52.720]  and if you have patches applied on top,
[10:52.720 --> 10:55.320]  and the patch log and everything like that,
[10:55.320 --> 10:57.800]  then we aggregate that to get the revision data,
[10:57.800 --> 10:59.600]  like from multiple checkouts of the same revision,
[10:59.600 --> 11:01.520]  they get the same single revision,
[11:01.520 --> 11:04.680]  and they have builds which link to the checkouts,
[11:04.680 --> 11:07.400]  to say like, oh, we just tested this check out,
[11:07.400 --> 11:09.680]  and therefore link to the revision.
[11:09.680 --> 11:11.960]  The builds describe which architecture,
[11:11.960 --> 11:13.760]  compiler and configuration,
[11:13.760 --> 11:15.840]  output files and logs and everything,
[11:15.840 --> 11:18.240]  and we get the test results finally,
[11:18.240 --> 11:20.000]  and yeah, builds can fail,
[11:20.000 --> 11:23.120]  they have failed builds all the time and it stops nobody.
[11:23.120 --> 11:27.360]  So we have kind of test which we are running
[11:27.360 --> 11:29.520]  the environment to train on,
[11:29.520 --> 11:31.760]  what kind of result it was,
[11:31.760 --> 11:33.800]  the status result, pass, fail, et cetera,
[11:33.800 --> 11:37.120]  and the output files logs and stuff like that, very typical.
[11:37.120 --> 11:41.240]  Then we get the issues which describe like which bug it is,
[11:41.240 --> 11:43.560]  and who is to blame like the kernel,
[11:43.560 --> 11:45.080]  the test or the framework,
[11:45.080 --> 11:47.960]  and we will have the pattern there matching the test results,
[11:47.960 --> 11:49.360]  okay, this test, this output,
[11:49.360 --> 11:51.600]  what you saw on that screen.
[11:51.600 --> 11:55.280]  The status that it should have and the issue version,
[11:55.280 --> 11:58.320]  because we want to change those issues over time,
[11:58.320 --> 12:00.280]  and finally have the incidents which are linked in
[12:00.280 --> 12:03.240]  those builds and issues together,
[12:03.240 --> 12:05.800]  so saying like, oh, this is the issue with this build,
[12:05.800 --> 12:07.800]  and things like that.
[12:07.800 --> 12:11.840]  So that's all we keep in the relational database,
[12:11.840 --> 12:14.760]  but then we got to talk about the revisions.
[12:14.760 --> 12:20.160]  So revisions could be just a commit to get history,
[12:20.160 --> 12:21.840]  and here's your graph.
[12:21.840 --> 12:26.920]  So that's the basic thing that we've tried to do,
[12:26.920 --> 12:30.040]  but we also need to have revisions of
[12:30.040 --> 12:31.720]  patches applied on top and somebody
[12:31.720 --> 12:33.200]  posts the patch on the main list.
[12:33.200 --> 12:35.040]  We take it, apply it to some commit,
[12:35.040 --> 12:37.520]  which is pointed to and we test it,
[12:37.520 --> 12:39.080]  we get the results,
[12:39.080 --> 12:42.120]  and we know it was applied to this commit.
[12:42.120 --> 12:46.120]  Then somebody reworks that patch and posts a new version,
[12:46.120 --> 12:48.920]  they got a link, both the commit we tested
[12:48.920 --> 12:52.360]  upon and to the previous revision of the patch set.
[12:52.360 --> 12:55.320]  Then there is this weird thing when
[12:55.320 --> 12:57.520]  maintainers keep a special branch for
[12:57.520 --> 13:00.360]  CI for the testing systems to pick up
[13:00.360 --> 13:03.040]  their work and test and send them results,
[13:03.040 --> 13:04.480]  and they just keep pushing there like they're
[13:04.480 --> 13:06.120]  working on something, they push there,
[13:06.120 --> 13:08.640]  they get results after a while from testing,
[13:08.640 --> 13:10.320]  then they push a new version,
[13:10.320 --> 13:13.040]  and then they get new results and they got to say like,
[13:13.040 --> 13:15.200]  okay, this is the Git commit history,
[13:15.200 --> 13:17.160]  but we also know that we checked
[13:17.160 --> 13:18.760]  this branch out previously,
[13:18.760 --> 13:22.000]  so this is the child of that branch,
[13:22.000 --> 13:24.320]  of that previous revision.
[13:26.320 --> 13:29.080]  This basically it.
[13:29.080 --> 13:31.320]  Well, as you probably all know,
[13:31.320 --> 13:32.800]  this is a directed acyclic graph,
[13:32.800 --> 13:37.000]  so test directed edges and it doesn't loop on itself.
[13:37.000 --> 13:41.480]  So that's about what I know about graphs.
[13:41.480 --> 13:45.360]  So bear with me.
[13:45.360 --> 13:48.640]  Finally, I think that there's
[13:48.640 --> 13:50.440]  just too many build and test results to
[13:50.440 --> 13:53.560]  put them all into a graph database at least so far.
[13:53.560 --> 13:55.760]  I might be wrong, but that's my idea.
[13:55.760 --> 13:58.440]  We obviously need to keep the graph of
[13:58.440 --> 14:00.800]  the revisions to be able to reason about them,
[14:00.800 --> 14:03.240]  but we might be able to put
[14:03.240 --> 14:05.280]  issues there as well in the same database
[14:05.280 --> 14:07.560]  if it saves us something.
[14:07.560 --> 14:11.440]  So this is just a short list.
[14:11.640 --> 14:14.520]  Basically, what we want to know,
[14:14.520 --> 14:17.000]  okay, as the data commit comes in,
[14:17.000 --> 14:17.880]  the test results you got to
[14:17.880 --> 14:19.800]  try them and match them against the issue.
[14:19.800 --> 14:21.400]  So we can say, okay, we found an issue here,
[14:21.400 --> 14:25.000]  so don't raise the flag or something like that,
[14:25.000 --> 14:27.040]  like similar, okay.
[14:27.040 --> 14:30.720]  There is no issue here on test result,
[14:30.720 --> 14:33.800]  but we want to raise the flag because there's actually an issue.
[14:33.800 --> 14:38.920]  We cannot possibly try all the issues
[14:38.920 --> 14:42.520]  against all test results because there's going to be a lot.
[14:42.520 --> 14:45.960]  So we have to build a priority for those issues,
[14:45.960 --> 14:48.720]  and then we have to cut off that priority somehow,
[14:48.720 --> 14:50.280]  and say like, okay, at this moment,
[14:50.280 --> 14:51.680]  we can tell the developer that we've
[14:51.680 --> 14:53.080]  basically tried these results,
[14:53.080 --> 14:54.240]  you can go take a look,
[14:54.240 --> 14:55.640]  but we can still continue and
[14:55.640 --> 14:57.640]  try those issues as the time goes on.
[14:57.640 --> 15:02.640]  So we have to base that on one of
[15:02.640 --> 15:05.400]  the criteria that we might need is how far,
[15:05.400 --> 15:08.680]  for example, that revision is from the current situation,
[15:08.680 --> 15:12.320]  like if this issue only appeared somewhere,
[15:12.320 --> 15:14.480]  I don't know, like 1,000 commits ago,
[15:14.480 --> 15:17.160]  or 1,000 is not that much for the Linux kernel,
[15:17.160 --> 15:19.920]  okay, 10,000 commits ago,
[15:19.920 --> 15:22.400]  then we don't need to try it right now.
[15:22.400 --> 15:23.800]  We can tell the developer, okay, it's fine,
[15:23.800 --> 15:25.160]  and then we'll go and continue
[15:25.160 --> 15:27.720]  try it and if we find something,
[15:27.720 --> 15:29.960]  then we can raise the alarm.
[15:29.960 --> 15:33.600]  Okay, then we can ask,
[15:33.600 --> 15:35.960]  like what were the last X-test results,
[15:35.960 --> 15:37.720]  like for this particular test,
[15:37.720 --> 15:42.760]  for this number of commits to be able to say,
[15:42.760 --> 15:47.000]  okay, this test wasn't often failing,
[15:47.000 --> 15:49.320]  okay, it was failing sometimes, but that's okay,
[15:49.320 --> 15:51.320]  but if it suddenly starts failing more often,
[15:51.320 --> 15:52.680]  we got to raise the alarm,
[15:52.680 --> 15:54.880]  or if it stops failing so often,
[15:54.880 --> 15:58.840]  we got to also raise the alarm and see what's changed.
[15:58.840 --> 16:01.520]  Then we need to track the performance trends,
[16:01.520 --> 16:06.160]  of course, over the history of the development,
[16:06.160 --> 16:09.080]  and once again, we cannot do this just based on time,
[16:09.080 --> 16:11.440]  because some systems move at
[16:11.440 --> 16:14.560]  a different speed and some systems might start to decide to,
[16:14.560 --> 16:16.640]  okay, we're going to test this old branch because
[16:16.640 --> 16:21.080]  somebody if some of our clients wants to base their BSP on it,
[16:21.080 --> 16:24.520]  wants to base the release some software with that kernel,
[16:24.520 --> 16:25.680]  and we got to start testing it,
[16:25.680 --> 16:26.760]  and it starts coming in like
[16:26.760 --> 16:28.800]  the last year's release or something,
[16:28.800 --> 16:31.400]  and we cannot just take that data into account
[16:31.400 --> 16:34.480]  for testing the current releases or vice versa.
[16:34.480 --> 16:39.160]  So, or for stable kernel maintainer,
[16:39.160 --> 16:42.440]  if Greg wants to release a branch,
[16:42.440 --> 16:44.040]  he might want to see like,
[16:44.040 --> 16:45.680]  okay, which issues were discovered starting
[16:45.680 --> 16:49.880]  from the previous release in this branch,
[16:49.880 --> 16:53.400]  and finally, yeah,
[16:53.400 --> 16:54.760]  like just for the dashboard, like,
[16:54.760 --> 16:56.760]  okay, I want to see issues in this branch,
[16:56.760 --> 16:59.760]  or which branches contain this issue.
[16:59.760 --> 17:05.440]  So, that's what we tried to do with Neo4j.
[17:05.440 --> 17:06.520]  I did basic things,
[17:06.520 --> 17:07.920]  so I wrote a little script to get
[17:07.920 --> 17:11.160]  the Git log in a particular format,
[17:11.160 --> 17:15.960]  and then generate the data for commits and for relations.
[17:15.960 --> 17:21.600]  It was a little over a million commits look like this,
[17:21.600 --> 17:24.360]  and it was a little more relations,
[17:24.360 --> 17:26.880]  because as you probably know,
[17:26.880 --> 17:30.240]  a commit can have more than one parent in Git,
[17:30.240 --> 17:32.920]  and it looks like this, very simple.
[17:32.920 --> 17:37.520]  So, I loaded this into Neo4j with something like this.
[17:37.520 --> 17:39.840]  This is updated to the latest release.
[17:39.840 --> 17:42.840]  It was different than created an index for
[17:42.840 --> 17:46.720]  hashes and then loaded the relations,
[17:46.720 --> 17:48.400]  and it worked fine,
[17:48.400 --> 17:50.160]  but not a few days ago when I
[17:50.160 --> 17:52.360]  tried the Thresh Neo4j release,
[17:52.360 --> 17:55.680]  it just hung like this forever.
[17:55.680 --> 17:57.480]  So, I don't know, I could not give you
[17:57.480 --> 17:59.040]  a fresh data how it works right now,
[17:59.040 --> 18:02.160]  but I tried it last year,
[18:02.160 --> 18:05.920]  and I couldn't get answer a simple question
[18:05.920 --> 18:08.240]  if these two commits are connected.
[18:08.240 --> 18:10.880]  It was just go on forever,
[18:10.880 --> 18:12.960]  then run out of RAM.
[18:13.880 --> 18:17.840]  But with Epoch, I could do that.
[18:17.840 --> 18:19.560]  I could get the answer.
[18:19.560 --> 18:24.200]  It was okay, but if I wanted to get
[18:24.200 --> 18:26.200]  the nodes between those two commits,
[18:26.200 --> 18:28.240]  it would do the same thing.
[18:28.240 --> 18:32.960]  But with Git, I complete that in milliseconds.
[18:32.960 --> 18:34.960]  So, here you go.
[18:34.960 --> 18:37.480]  I think the problem, well, in my opinion,
[18:37.480 --> 18:40.320]  is that the graph management databases and
[18:40.320 --> 18:44.080]  software there aimed at a general graph problem,
[18:44.080 --> 18:46.240]  and not tuned to DAGs.
[18:46.240 --> 18:49.640]  How Git does that, Git is tuned to DAG,
[18:49.640 --> 18:51.440]  they have a lot of optimizations for that,
[18:51.440 --> 18:53.000]  and there are streaks to make
[18:53.000 --> 18:55.480]  like repositories like the Linux kernel work.
[18:55.480 --> 18:59.880]  So, I don't know nothing how you do this.
[18:59.880 --> 19:01.080]  This is magic to me,
[19:01.080 --> 19:03.560]  and this would be new to me in this book.
[19:03.560 --> 19:05.880]  But from a purely engineering perspective,
[19:05.880 --> 19:07.880]  I would have liked to see something like
[19:07.880 --> 19:11.040]  a support for databases that are restricted for DAGs only,
[19:11.040 --> 19:13.480]  and that apparently could be done
[19:13.480 --> 19:16.200]  with not so much computation.
[19:16.200 --> 19:18.400]  Then, once you have that,
[19:18.400 --> 19:20.640]  then you can do some branching and say like,
[19:20.640 --> 19:21.800]  okay, if we are DAG database,
[19:21.800 --> 19:24.080]  then we can do the optimizations
[19:24.080 --> 19:27.120]  and do the fast thing with them.
[19:27.120 --> 19:29.760]  So, the full back plan is obviously just
[19:29.760 --> 19:32.800]  put everything in Git, put the commits,
[19:32.800 --> 19:34.960]  and the patches, and all the branches,
[19:34.960 --> 19:37.960]  and all the subsystems, it's going to be giant repo.
[19:37.960 --> 19:39.400]  Maybe we can manage that,
[19:39.400 --> 19:41.040]  and then query it with libGit2,
[19:41.040 --> 19:45.200]  which is the library that Git uses to work with the data.
[19:45.200 --> 19:48.680]  Then, well, shuttle the commits with the relational database.
[19:48.680 --> 19:51.160]  Okay, we want to see if between those releases,
[19:51.160 --> 19:52.760]  we have issues and we take
[19:52.760 --> 19:57.160]  the commit hashes from Git and then query the database with that.
[19:57.160 --> 19:59.920]  That's all. Thanks.
[19:59.920 --> 20:15.960]  So, we can help you with the Neo4j things.
[20:15.960 --> 20:23.520]  It's just like literally this string, this length.
[20:23.520 --> 20:26.680]  No, it's text index is for full text back for.
[20:26.680 --> 20:29.800]  Okay. Well, it was just this one thing.
[20:29.800 --> 20:35.520]  So, do you have the data somewhere to try it out?
[20:35.520 --> 20:36.640]  Of course. Of course.
[20:36.640 --> 20:38.240]  There's a link from the slides to
[20:38.240 --> 20:41.040]  the script that you can use yourself on any Git repo.
[20:41.040 --> 20:47.000]  Yeah. Any more questions?
[20:47.000 --> 20:48.120]  Yes?
[20:48.120 --> 20:50.240]  Did you try any other graph databases?
[20:50.240 --> 20:54.400]  Well, I looked at the question is,
[20:54.400 --> 20:57.480]  did I try any other graph databases?
[20:57.480 --> 20:59.280]  Yeah, I looked at a bunch of them.
[20:59.280 --> 21:03.440]  Some of them require so much setup that I was just floored,
[21:03.440 --> 21:05.120]  but I read the documentation.
[21:05.120 --> 21:08.720]  I couldn't see any indication that it would be any different
[21:08.720 --> 21:11.040]  because nobody says anything about DAGs,
[21:11.040 --> 21:12.800]  any optimizations or anything.
[21:12.800 --> 21:17.240]  I tried memgraph before this talk,
[21:17.240 --> 21:23.040]  but I had the same problem with loading revisions,
[21:23.040 --> 21:24.440]  I think for some reason.
[21:24.440 --> 21:26.600]  Because previously, I could load revisions.
[21:26.600 --> 21:31.240]  I guess in Neo4j, the syntax for indexes has changed since then.
[21:31.240 --> 21:35.280]  Maybe I did create indexing correctly as I was just hinted at.
[21:35.280 --> 21:38.640]  But I could load them in reasonable time before in Neo4j and
[21:38.640 --> 21:42.000]  everything fine and like in query and except that thing.
[21:42.000 --> 21:44.920]  In memgraph, I just hit the wall because it's
[21:44.920 --> 21:46.960]  a different syntax slightly, it was slow.
[21:46.960 --> 21:51.720]  But yeah, no such luck and it took like four gigabytes of disk space.
[21:51.720 --> 21:56.720]  So, not too bad, okay.
[21:56.720 --> 21:59.720]  What version of Neo4j was successful?
[21:59.720 --> 22:01.320]  I don't remember now.
[22:01.320 --> 22:04.120]  I think it was, if I take a look now,
[22:04.120 --> 22:04.720]  I think I-
[22:04.720 --> 22:07.640]  The version will also be successful, it's just research.
[22:07.640 --> 22:14.240]  I tried one Neo4j desktop 1.4 before,
[22:14.240 --> 22:16.400]  1.415 and that worked.
[22:16.400 --> 22:22.400]  I don't know which one, which version it was included.
[22:22.400 --> 22:26.400]  Any other questions?
[22:26.400 --> 22:27.400]  Thank you so much, Nikolaj.
[22:27.400 --> 22:28.400]  Thank you.
[22:28.400 --> 22:33.400]  Thank you, everyone.
[22:33.400 --> 22:36.400]  I'm still looking forward to work with data in the graph database.
[22:36.400 --> 22:41.400]  Because I think that's actually good for the graph database.
[22:41.400 --> 22:44.400]  And so we can make it work and then Dexter,
[22:44.400 --> 22:48.400]  you can come back and do some large scale analysis on the data.
[22:48.400 --> 22:50.400]  Okay, that would be great.
[22:50.400 --> 22:52.400]  That's what you can do.
[22:52.400 --> 22:54.400]  Yes, thank you.
[22:54.400 --> 22:56.400]  Thank you.
[22:56.400 --> 23:15.400]  Thank you.