[00:00.000 --> 00:08.280] I am Gabor Sarnas and I'm here with David Proha. [00:08.280 --> 00:14.320] We work at CWI Amsterdam and we're here to present you the LDBC social network benchmark. [00:14.320 --> 00:15.600] What is the LDBC? [00:15.600 --> 00:18.560] The abbreviation stands for Linked Data Benchmark Council. [00:18.560 --> 00:23.320] It is a non-profit company founded in 2012 and its mission is to accelerate the progress [00:23.320 --> 00:25.640] in the field of graph data management. [00:25.640 --> 00:30.800] And to this end, it designs and governs the use of graph benchmarks and everything we [00:30.800 --> 00:34.960] do is open source under the Apache version 2 license. [00:34.960 --> 00:40.360] From an organizational perspective, LDBC consists of more than 20 members who all have some [00:40.360 --> 00:42.360] vested interest in graph data management. [00:42.360 --> 00:46.760] We have financial service providers like the End Group, database vendors like Oracle, [00:46.760 --> 00:53.120] Neo4j and Tigrograph, cloud vendors like AWS and hardware vendors like Intel. [00:53.120 --> 00:58.840] Also we have individual contributors like David and me who contribute to the benchmarks. [00:58.840 --> 01:04.000] So to put things into context, the last two decades has seen a rise in the use of modern [01:04.000 --> 01:06.440] graph database management systems. [01:06.440 --> 01:10.880] Typically, the data model used in these systems is called a property graph, which is a labelled [01:10.880 --> 01:15.520] graph where both the nodes and the edges can have an arbitrary number of properties. [01:15.520 --> 01:20.320] For example, this is a small social network consisting of five person nodes and a single [01:20.320 --> 01:23.000] city node, which is the city of SPA. [01:23.000 --> 01:25.040] And the properties can be on the nodes. [01:25.040 --> 01:30.040] For example, here the nodes have names and the edges have attributes like the date when [01:30.040 --> 01:31.560] the friendship was established. [01:31.560 --> 01:35.760] We can see that Bob and Carl met in 2015. [01:35.760 --> 01:40.920] And if you want to run a query on this system, we can use a graph query where we look for [01:40.920 --> 01:42.920] matches of a given graph. [01:42.920 --> 01:46.280] So here the query says we want to start from Bob. [01:46.280 --> 01:51.240] We want to use an arbitrary number of edges to reach some person who lives in SPA and [01:51.240 --> 01:55.680] we want to do an aggregation to return the number of those people. [01:55.680 --> 02:01.800] If you want to evaluate this, we then start from the person Bob, push to all the people [02:01.800 --> 02:06.520] transitively, which are known by Bob directly or via multiple edges. [02:06.520 --> 02:08.680] This means all four people here. [02:08.680 --> 02:13.240] We shrink it down to the people who actually live in SPA, then add up the results and get [02:13.240 --> 02:14.960] the result too. [02:14.960 --> 02:21.280] So graph databases use something called a visual graph syntax, also known as the sqr graph [02:21.280 --> 02:27.920] syntax, which is similar to the popular cipher language of Neo4j. [02:27.920 --> 02:32.080] And here this query is actually really similar to the graph pattern that I have shown. [02:32.080 --> 02:36.360] So there are similarities in how the nodes are formulated, how the edges are captured [02:36.360 --> 02:41.080] in this text, and also how the transitive closure of the little asterisk is captured [02:41.080 --> 02:42.440] in the query language. [02:42.440 --> 02:46.960] So this is a very intuitive and concise way of formulating the queries. [02:46.960 --> 02:51.240] If we deconstruct this query, we can see three main components. [02:51.240 --> 02:52.640] The one is relational operators. [02:52.640 --> 02:55.400] Obviously, we still need relational operators. [02:55.400 --> 02:58.680] We want to be able to identify people by filtering. [02:58.680 --> 03:03.000] So we filter for Bob, we filter for SPA, and also we want to sometimes aggregate. [03:03.000 --> 03:06.360] So the count aggregation is part of this query. [03:06.360 --> 03:10.560] The pathfinding is really elegant in this formulation because we have nodes asterisk [03:10.560 --> 03:13.920] which captures that we can use an arbitrary number of edges. [03:13.920 --> 03:20.600] And the pattern matching which connects the person to SPA is also very concise and readable. [03:20.600 --> 03:26.240] So what is interesting from a future work perspective on graph databases? [03:26.240 --> 03:30.200] Obviously, relational operators are quite well known at this point, and there are endless [03:30.200 --> 03:32.960] papers and techniques on how to implement these. [03:32.960 --> 03:37.720] But we believe that pathfinding and pattern matching is really good in graph databases [03:37.720 --> 03:42.200] compared to traditional relational systems because they provide a more concise syntax [03:42.200 --> 03:44.800] and better algorithms and implementations. [03:44.800 --> 03:49.520] Interestingly enough, even in the last 15 years, there have been lots of papers on better [03:49.520 --> 03:56.160] BFS algorithms, better factorization representations for graph patterns, multi-wavers, case optimal [03:56.160 --> 03:57.720] joins, and so on. [03:57.720 --> 04:01.000] So we believe that these should be adopted by more and more systems. [04:01.000 --> 04:05.400] And to this end, we designed benchmarks that try to push the state of the art and the four [04:05.400 --> 04:08.000] systems to adopt better and better techniques. [04:08.000 --> 04:10.160] David will talk about these benchmarks. [04:10.160 --> 04:11.520] Yeah, hi. [04:11.520 --> 04:14.800] So I will give an overview about the social network benchmark. [04:14.800 --> 04:21.520] And so first, we'll go through three steps of this benchmark, so the data sets, two example [04:21.520 --> 04:25.360] queries, and the update operations done in this benchmark. [04:25.360 --> 04:31.360] So here we see a small example of the data sets where on the left side, we see persons [04:31.360 --> 04:37.440] with friendships, forms, and network, and these persons post messages on the social network [04:37.440 --> 04:42.880] and can reply to each other forming a tree-shaped data structure. [04:42.880 --> 04:48.640] And now we will do one query on this very small data set example. [04:48.640 --> 04:54.720] So with query nine, we want to retrieve messages posted by a given person, friend, and friends [04:54.720 --> 04:58.280] of friends before a given date. [04:58.280 --> 05:00.640] And the dates are here shortened for simplicity. [05:00.640 --> 05:06.240] So if we would start with BOP, we will traverse to their friends and friends of friends, retrieve [05:06.240 --> 05:12.120] the messages, and then filter out the ones that are actually before Saturday. [05:12.120 --> 05:15.520] And then we touch upon 10 nodes in this data. [05:15.520 --> 05:21.240] Suppose we would start from another person, so for example Finn, and we traverse again [05:21.240 --> 05:24.640] to their friends and friends of friends. [05:24.640 --> 05:30.240] Here we see that we touch upon five different nodes. [05:30.240 --> 05:33.200] So half of the one of BOP. [05:33.200 --> 05:40.680] And this difference can actually be troublesome since runtimes for the same queries are different [05:40.680 --> 05:45.240] and therefore doesn't help in understanding what's happening. [05:45.240 --> 05:51.640] So for this benchmark, we actually want to select parameters that have similar runtimes [05:51.640 --> 05:56.200] and also to actually stress the technical difficulties in these systems. [05:56.200 --> 06:00.320] So we select the parameters more carefully. [06:00.320 --> 06:05.520] So here we see an example of when we do not select the parameters carefully, just a uniform [06:05.520 --> 06:06.520] random. [06:06.520 --> 06:12.400] And we can see here a trial model, distribution by model, and one with many outliers. [06:12.400 --> 06:14.520] And we don't want that. [06:14.520 --> 06:22.400] So in the data sets, there are also statistics provided in this example for each person, [06:22.400 --> 06:25.240] the number of friends and friends of friends. [06:25.240 --> 06:32.240] Then we want to select persons with similar number to get more predictable runtimes. [06:32.240 --> 06:37.960] And so if we do that, then we can see here an example that we have unimodal distributions [06:37.960 --> 06:40.880] with very tight runtimes. [06:40.880 --> 06:47.800] And that improves also in understanding these, like the behavior of the queries. [06:47.800 --> 06:50.600] So now we're going to the updates. [06:50.600 --> 06:57.280] And for example, if Eve and Gia wants to be friends, we insert a nose edge. [06:57.280 --> 07:01.120] And this is then formed into the network. [07:01.120 --> 07:04.080] Suppose that the next operation is inserting a comment. [07:04.080 --> 07:09.120] So Gia comments replies on a message posted by Eve. [07:09.120 --> 07:12.880] And both messages are posted on the same date. [07:12.880 --> 07:15.280] Then we have another problem. [07:15.280 --> 07:20.360] Because when we are executing these operations concurrently, it can happen that the reply [07:20.360 --> 07:26.720] is earlier than the message in such a network, posting an error. [07:26.720 --> 07:31.320] And to mitigate this, we introduce dependency tracking. [07:31.320 --> 07:36.880] So for each operation, and also includes the edges, but just for simplicity, the notes [07:36.880 --> 07:40.960] are here with the dependent dates. [07:40.960 --> 07:45.360] We include for each operation a creation date and dependent date. [07:45.360 --> 07:50.360] The creation date is when it's scheduled to be executed, and the dependent date is the [07:50.360 --> 07:58.040] one that's, like in this case, for M6, is the creation date of M3. [07:58.040 --> 08:02.760] And here we can see, actually, that each operation is dependent on each other, forming [08:02.760 --> 08:06.360] a whole chain in the social network. [08:06.360 --> 08:10.880] Suppose now that Eve wants to leave the social network and removes her account. [08:10.880 --> 08:15.920] And so we start with deleting the notes of Eve, and this will trigger a cascading effect [08:15.920 --> 08:22.520] by, since we then need to remove the edges connected to Eve, the messages posted, and [08:22.520 --> 08:25.440] also the replies to those messages. [08:25.440 --> 08:31.200] We can actually see, like, this huge cascading effect, and that can actually have a large [08:31.200 --> 08:38.440] impact on the data distribution, and also therefore the executability of these operations. [08:38.440 --> 08:45.320] And furthermore, it also influences for selecting the parameters, which we have shown before. [08:45.320 --> 08:49.640] And we want to include this delete because it prohibits append only data structures in [08:49.640 --> 08:54.120] databases and also stress the garbage collector of these systems. [08:54.120 --> 09:00.360] Now we are going to give another example to also stress the temporal aspect of this benchmark. [09:00.360 --> 09:04.680] So suppose we want to find a path between two persons. [09:04.680 --> 09:11.080] So we have a start person and a destination person, and, for example, Finn and Gia. [09:11.080 --> 09:16.040] Then we can see here that we have a four-hole path between these persons. [09:16.040 --> 09:22.120] But at one point in the benchmark, it can happen that a node's edge is removed, and [09:22.120 --> 09:25.160] then there is no path anymore. [09:25.160 --> 09:29.560] It can also happen that there's another edge inserted between Carl and Gia, and then we [09:29.560 --> 09:31.840] have a path again. [09:31.840 --> 09:36.640] And so for the same parameters, we can actually have three different outcomes. [09:36.640 --> 09:39.480] And to mitigate this, we do temporal parameter selection. [09:39.480 --> 09:46.240] So each parameter is assigned in a time bucket to actually ensure that we have similar results [09:46.240 --> 09:49.320] and therefore also similar run times. [09:49.320 --> 09:52.680] Now going through the benchmark workflow. [09:52.680 --> 09:59.600] So we start by the data gen, and the data gen provides us with a temporal graph spanning [09:59.600 --> 10:05.400] over social media activity for three years, and it is simulated closely to the, similar [10:05.400 --> 10:08.560] to the Facebook social network. [10:08.560 --> 10:15.480] It's a spark-based data generator that can generate data up to 30 terabytes, and it contains [10:15.480 --> 10:24.720] the, you know, skewed data sets, for example, with the nodes and person data in this data. [10:24.720 --> 10:31.160] And so the output is a data set suitable for loading into the system on a test, updates [10:31.160 --> 10:38.280] which are then executed during the benchmark, and statistics where we can select the parameters. [10:38.280 --> 10:42.360] And the selection of the parameters is done in the parameter generator. [10:42.360 --> 10:48.240] This ensures the stable query run times and assigns parameters into a temporal bucket. [10:48.240 --> 10:55.920] So a parameter can, it may include parameters that once are inserted into the data sets [10:55.920 --> 11:00.600] or before they are removed from the network. [11:00.600 --> 11:06.760] So and then we have a benchmark driver who schedules these operations and ensures that [11:06.760 --> 11:11.120] they can be executed with using the dependency tracking. [11:11.120 --> 11:16.920] And this is especially important when executing the operations concurrently. [11:16.920 --> 11:21.640] And lastly, we have the system on the test where we have, for example, graph databases, [11:21.640 --> 11:24.120] triple stores or relational databases. [11:24.120 --> 11:27.920] And now Gabor will go further into the workloads. [11:27.920 --> 11:36.520] Okay, so graph workloads are actually quite diverse in terms of what they are trying to [11:36.520 --> 11:40.680] achieve, and our benchmark reflects that by having multiple workloads. [11:40.680 --> 11:45.580] We have the social network benchmark interactive workload, which is transactional in nature, [11:45.580 --> 11:47.720] so it has loads of concurrent operations. [11:47.720 --> 11:52.920] The queries here are relatively simple, so they always start in one or two person nodes, [11:52.920 --> 11:55.800] the same as David presented before. [11:55.800 --> 12:00.000] And here the systems are striving to achieve a high throughput, so the competition is getting [12:00.000 --> 12:03.360] as many operations per second as possible. [12:03.360 --> 12:08.400] We are happy to report that we have official results from the last three years, where systems [12:08.400 --> 12:13.560] started with slightly above 5,000 operations per second and have sped up exponentially, [12:13.560 --> 12:19.600] now being close to 17,000 operations per second on a 100 gigabyte dataset. [12:19.600 --> 12:23.280] The other workload of the social network benchmark is called business intelligence. [12:23.280 --> 12:27.800] This is an analytical workload where the queries touch on large portions of the data. [12:27.800 --> 12:33.400] For example, this query in this slide shows a case where we start from a given country [12:33.400 --> 12:37.720] and then find all triangles of friendships in that country. [12:37.720 --> 12:40.400] It's easy to see that this is a very heavy hitting operation. [12:40.400 --> 12:45.360] It may touch on billions of edges in the graph, and it also has to do a complex computation [12:45.360 --> 12:46.880] to find those people. [12:46.880 --> 12:52.240] So here system can use either a bulk update or a concurrent update method, and they should [12:52.240 --> 12:57.400] also strive to get both a high throughput and low query run times. [12:57.400 --> 12:58.960] This benchmark is relatively new. [12:58.960 --> 13:02.560] It was released at the end of last year, so we only have a single result, which was done [13:02.560 --> 13:05.040] by a collaboration of Tiger Graph and AMD. [13:05.040 --> 13:10.080] We're happy to report that there are more audits under way, so we are going to release [13:10.080 --> 13:13.600] more results in 2023. [13:13.600 --> 13:18.080] So probably you can see from this presentation that these benchmarks can get fairly complex [13:18.080 --> 13:20.240] and implementing them is not trivial. [13:20.240 --> 13:23.960] So we did our best to provide everything our users need. [13:23.960 --> 13:27.680] For each of the workloads that we have presented, we have a specification, we have detailed [13:27.680 --> 13:33.680] academic papers who motivate the design choices and the architecture of these benchmarks. [13:33.680 --> 13:39.160] We released a data generator as well as pre-generated datasets, and we have benchmark drivers and [13:39.160 --> 13:42.160] at least two reference implementations for each of the workloads. [13:42.160 --> 13:46.600] Moreover, we have guidelines on how to execute these benchmarks correctly, how to validate [13:46.600 --> 13:51.320] the results of a given system, and how to ensure that the system will lose your data [13:51.320 --> 13:54.240] or mingle up the transactions. [13:54.240 --> 13:58.680] So we have asset compliance tests and recovery tests. [13:58.680 --> 14:00.920] This leads us to our auditing process. [14:00.920 --> 14:05.760] Similarly to the TPC, the Transaction Processing Performance Council, we have a rigorous auditing [14:05.760 --> 14:11.840] process where vendors can commission an independent third party who will rerun the benchmark in [14:11.840 --> 14:17.680] an executable and reproducible manner, and they will write up it as a full disclosure [14:17.680 --> 14:23.560] report so that the benchmark is understandable by whoever wants to see that result. [14:23.560 --> 14:29.000] This is important because LDBC is trademarked worldwide, and we only allow official audited [14:29.000 --> 14:32.440] results to use the term LDBC benchmark result. [14:32.440 --> 14:35.960] This is not to say that we don't allow people to use this benchmark. [14:35.960 --> 14:40.440] Researchers, practitioners, and developers are welcome to use the benchmark. [14:40.440 --> 14:41.440] They can run it. [14:41.440 --> 14:45.640] They can report the results if it is accompanied by the appropriate disclaimer that this is [14:45.640 --> 14:49.800] not an official LDBC benchmark result. [14:49.800 --> 14:53.480] I would like to talk a bit about standard GraphQL languages. [14:53.480 --> 14:57.560] This is an important topic because this has been a pain point for GraphSystems for many [14:57.560 --> 14:58.560] years. [14:58.560 --> 15:03.080] There is a bit of a tower of Babel out there with many languages, both of them using some [15:03.080 --> 15:07.240] sort of visual graph syntax, but always with slightly different semantics and a slightly [15:07.240 --> 15:12.600] different syntax, which makes it difficult for users to adopt these techniques and may [15:12.600 --> 15:16.840] put them in a position of being locked in by their vendors. [15:16.840 --> 15:21.480] In the next couple of years, there are going to be new standard queer languages. [15:21.480 --> 15:24.920] These focus on pathfinding and pattern matching. [15:24.920 --> 15:26.840] The first one is called SQL PGQ. [15:26.840 --> 15:31.440] This is an extension to the SQL language and PGQ stands for property graph queries. [15:31.440 --> 15:36.600] This is going to be released next summer, and GQL, the standalone GraphQL language, [15:36.600 --> 15:39.360] is going to come out in 2024. [15:39.360 --> 15:43.680] We are happy to report that even though we have two new languages, the pattern matching [15:43.680 --> 15:48.920] core of them, the visual graph syntax that we all know and love, is going to be the same, [15:48.920 --> 15:52.840] so users can port at least those bits of their queries. [15:52.840 --> 15:57.240] To give you a taste of how this will look like, here is query 9 that David presented [15:57.240 --> 16:00.560] in the social network benchmark interactive workload. [16:00.560 --> 16:02.520] This query can be formulated in SQL. [16:02.520 --> 16:08.280] It's not too difficult, but the new variants, SQL PGQ and GQL, can represent it as terms [16:08.280 --> 16:12.920] of a graph pattern, and this is a much more concise formulation. [16:12.920 --> 16:16.920] The difference is even more pronounced for query 13 with the path queries. [16:16.920 --> 16:22.240] Here we can see that in SQL PGQ, the pattern is really similar to the visual representation. [16:22.240 --> 16:28.640] It just has a source, a target, and an arbitrary amount of nose edges denoted by nose asterisk [16:28.640 --> 16:29.840] in between. [16:29.840 --> 16:35.040] In SQL, this is a lot less readable, hard to maintain, and it's even less sufficient [16:35.040 --> 16:39.400] because it just implements a unidirectional search algorithm instead of doing a bidirectional [16:39.400 --> 16:42.840] search which has a better algorithmic complexity. [16:42.840 --> 16:46.440] The way LDBC is involved in these new query languages is manifold. [16:46.440 --> 16:52.640] First, it had the G-core design language released in 2018 which influenced these benchmarks. [16:52.640 --> 16:56.680] Then LDBC has the formal semantics working group which formalized the pattern matching [16:56.680 --> 17:04.000] core of these new languages, and LDBC is doing further research to advance the state [17:04.000 --> 17:05.720] of the art on graph schemas. [17:05.720 --> 17:10.560] We have an industry-driven and a theory-driven group, and what they do will end up in the [17:10.560 --> 17:13.600] new versions of these languages. [17:13.600 --> 17:16.600] The outlook is the LDBC Graphalytics benchmark. [17:16.600 --> 17:23.640] This is a more wide benchmark because it can target analytical libraries like NetworkX, [17:23.640 --> 17:27.280] distributed systems like Apache Giraffe, or the GraphBlast API. [17:27.280 --> 17:31.120] This is everything that has to do with analyzing large graphs. [17:31.120 --> 17:36.560] Here the graph is an untyped, unattributed graph, so there are no properties or no labels. [17:36.560 --> 17:42.040] We do use the LDBC social network benchmark dataset, but it is stripped down to the person-nose-person [17:42.040 --> 17:43.040] core graph. [17:43.040 --> 17:48.680] Additionally, we have included a number of well-known datasets like Graph500, Twitter, [17:48.680 --> 17:49.680] and so on. [17:49.680 --> 17:53.280] The algorithms that we run are mostly well-known graph algorithms. [17:53.280 --> 17:58.960] There is the BFS, which starts from a given node and assigns the number of steps that [17:58.960 --> 18:02.320] need to be taken to all of the other nodes to reach them. [18:02.320 --> 18:06.800] We have the famous PageRank centrality algorithm, which highlights the most important nodes in [18:06.800 --> 18:13.480] the network, and we have the local clustering coefficient, community detection using label [18:13.480 --> 18:17.720] propagation, weakly connected components, and shortest paths. [18:17.720 --> 18:20.800] This benchmark is a bit simpler than the social network benchmark. [18:20.800 --> 18:23.600] It does not have a rigorous auditing process. [18:23.600 --> 18:29.040] We trust people that they can run this benchmark efficiently and correctly on their own infrastructure, [18:29.040 --> 18:30.800] and they can report results. [18:30.800 --> 18:35.040] If they do so, they will be able to participate in the Graphalytics competition, which has [18:35.040 --> 18:38.840] a leaderboard for the best implementations. [18:38.840 --> 18:44.160] Wrapping up, you should consider joining the IDBC because members can participate in the [18:44.160 --> 18:45.160] benchmark design. [18:45.160 --> 18:49.080] They have a say in where we are going in terms of including new features. [18:49.080 --> 18:55.040] They can commission audits if they are vendors, and members can gain access to these ISO standard [18:55.040 --> 18:58.000] drafts that I mentioned, SQL, PGQ, and GQR. [18:58.000 --> 19:01.560] Otherwise, these are not available to general public. [19:01.560 --> 19:06.440] Being wise, this is free for individuals, and there is a yearly fee for companies. [19:06.440 --> 19:10.520] To sum up, we have presented three benchmarks, the social network benchmark's interactive [19:10.520 --> 19:15.760] workload, its business intelligence workload, and the Graphalytics graph algorithms workload. [19:15.760 --> 19:16.760] We have more benchmarks. [19:16.760 --> 19:22.720] There is semantic publishing benchmark, which is targeting RDF systems set in the media [19:22.720 --> 19:24.480] and publishing industry. [19:24.480 --> 19:27.880] There is the financial benchmark, which is going to be released this year, which targets [19:27.880 --> 19:34.160] distributed systems, and it uses the financial fraud detection domain as its area, and it [19:34.160 --> 19:36.800] imposes strict latency bounds on queries. [19:36.800 --> 19:39.880] This is quite a different workload from the previous ones. [19:39.880 --> 19:44.440] Of course, graphs are ubiquitous, and they have loads of use cases, so there are many [19:44.440 --> 19:49.040] future benchmark ideas, including graph neural network mining and streaming. [19:49.040 --> 19:51.080] Thank you very much, and we're open to any questions. [19:51.080 --> 19:58.080] Yes. [19:58.080 --> 20:04.560] So, in this one overview that was the graph data set, and the updates were kind of separated. [20:04.560 --> 20:09.960] Is there a possibility to create a graph data set where the updates are included in the [20:09.960 --> 20:14.560] data set, so that the nodes and vertices get time stamps when they were deleted or when [20:14.560 --> 20:15.560] they were added? [20:15.560 --> 20:16.560] Yes. [20:16.560 --> 20:22.680] So, is it possible to create something like a temporal graph with the time stamps of when [20:22.680 --> 20:27.880] the specific node is created and deleted, and this is actually very easy, because this [20:27.880 --> 20:30.920] is the first step that the data gen creates. [20:30.920 --> 20:36.160] So, when David said that it creates a social network of three years, that has everything [20:36.160 --> 20:40.560] that was ever created or deleted during those three years, and then we have attributes like [20:40.560 --> 20:44.480] creation date and deletion date, and then we turn it into something that's loadable [20:44.480 --> 20:49.640] to the database, we hide deletion dates, because the database, of course, shouldn't be aware [20:49.640 --> 20:54.800] of this, but this is something that the data gen supports out of the box. [20:54.800 --> 21:00.040] Okay, but then it's also too able to get this data set with the deletion date, because you [21:00.040 --> 21:02.080] already said that it's hideable. [21:02.080 --> 21:07.720] It's hideable, but we have one which is called the row temporal data set, and that is available, [21:07.720 --> 21:14.680] and we even published that, so that's something that, yeah, it has a lot of chance to be influential [21:14.680 --> 21:16.520] in the streaming community, I believe. [21:16.520 --> 21:18.920] All right, more questions? [21:18.920 --> 21:19.920] Yeah, Michael? [21:19.920 --> 21:48.120] Yeah, Michael? [21:48.120 --> 21:50.840] So the question is, can we extend to other domains? [21:50.840 --> 21:57.360] And we usually emphasize that social networks is not really the domain that is the actual [21:57.360 --> 22:01.360] primary use case for graphs, we just use this because this is really easy to understand, [22:01.360 --> 22:05.640] we don't have to explain person-nose-person, and you can put in all sorts of interesting [22:05.640 --> 22:09.440] technological challenges to a graph domain like this. [22:09.440 --> 22:14.840] It would make sense, and sometimes we are approached by our members saying, we want [22:14.840 --> 22:22.320] to do a new benchmark in the domain X, and we then send them the process that is required [22:22.320 --> 22:27.160] to get one of these benchmarks completed, and that's usually the end of the conversation, [22:27.160 --> 22:33.640] but we are definitely open to have more interesting benchmarks, and of course, a good data generator [22:33.640 --> 22:40.120] is worth gold to all the researchers and the vendors in this community, so that's usually [22:40.120 --> 22:45.560] the hard point, and I would be definitely interested in having a retail graph generator. [22:45.560 --> 22:46.560] Carlo? [22:46.560 --> 22:47.560] Hi. [22:47.560 --> 22:57.560] The question is specifically, what do you see the impact of this will be on the industry [22:57.560 --> 23:14.320] or it's more uneductive of evidence if it's, if the system would have improved, or if the [23:14.320 --> 23:15.320] system would get more robust as in that you detect stuff that is doing things and stuff [23:15.320 --> 23:16.320] get fixed, or what's the, yeah. [23:16.320 --> 23:17.320] So the question? [23:17.320 --> 23:18.320] Yeah, the question is about the potential impact. [23:18.320 --> 23:19.960] What could all this achieve? [23:19.960 --> 23:26.600] And we believe that it will help accelerate the field in the sense that systems will get [23:26.600 --> 23:31.280] more mature, because if you want to get an audited result, you have to pass all the asset [23:31.280 --> 23:36.880] tests, you have to be able to recover after a crash, and ideally you would have to be [23:36.880 --> 23:41.480] fast, so that is hopefully one of the other things that systems will take away. [23:41.480 --> 23:48.120] They will have better optimizers, improved storage, better query execution engines, [23:48.120 --> 23:54.960] and we have seen this in the aftermath of the TPC benchmarks, so those resulted in quite [23:54.960 --> 23:56.320] a big speedup. [23:56.320 --> 24:02.160] So that's one area, and of course there is pricing, we would like that users can get [24:02.160 --> 24:07.240] more transactions per dollar, and the third that we are personally quite interested in [24:07.240 --> 24:09.200] is the new accelerators that come out. [24:09.200 --> 24:14.560] So there are, especially in the field of machine learning, there are cards that do fast sparse [24:14.560 --> 24:19.840] matrix multiplications, those could be harnessed specifically for the analytical benchmarks [24:19.840 --> 24:24.960] that we have, and that would be interesting to see how big of a hassle it is to implement [24:24.960 --> 24:35.040] and how big of a speedup they give, cool, all right, okay, thank you very much.