[00:00.000 --> 00:12.440] Hi everyone, my name is Evgeny and I am a Tech Lead of the Friction Data Project at [00:12.440 --> 00:14.600] Open Knowledge Foundation. [00:14.600 --> 00:20.920] Today I'd like to present you our new development called Friction Supplication. [00:20.920 --> 00:28.520] Friction Supplication is an IDE for C3 and other table or general data formats. [00:28.520 --> 00:35.080] This tool hasn't yet been published, so today it will be more like a future preview, but [00:35.080 --> 00:41.600] you can access the CUTBASE github. [00:41.600 --> 00:49.240] The main purpose of Friction Supplication is data analysis, data validation, data publishing [00:49.240 --> 00:55.120] and many other aspects working in the data. [00:55.120 --> 01:01.920] Let's start from overview of the Friction Data Project. [01:01.920 --> 01:08.480] Friction Data Project helps people to publish and consume data. [01:08.480 --> 01:17.000] It's built on top of open data standards, such as table schema, data resource, data [01:17.000 --> 01:25.520] package, you might have heard about them, and on top of these standards you're working [01:25.520 --> 01:28.160] on different software. [01:28.160 --> 01:36.360] For example, we have developed Friction Framework for Python, so people can describe, validate, [01:36.360 --> 01:41.320] extract or transform their data in Python or command line. [01:41.320 --> 01:49.640] We've developed Friction Repository, which is similar to Friction Framework, but it's [01:49.640 --> 02:00.680] run on github infrastructure using github actions, and it provides continuous data validation. [02:00.680 --> 02:10.280] We also last year presented LiveMark, our data visualization and publishing tool. [02:10.280 --> 02:21.800] And today we'll talk about Friction Supplication, but first of all, I'll describe the Friction [02:21.800 --> 02:27.960] Framework and Friction Repository to show why Friction Supplication. [02:27.960 --> 02:36.880] So Friction Framework is written in Python, and it provides command line interface and [02:36.880 --> 02:48.520] Python interface to describe, extract, validate, and transform, including publishing tabular [02:48.520 --> 02:50.020] data. [02:50.020 --> 03:00.400] So this project had been here for a while and we have a rich community and a lot of [03:00.400 --> 03:02.600] people using this. [03:02.600 --> 03:12.000] But it requires you to work with command line interface or know how to code in Python. [03:12.000 --> 03:20.160] So for many people it's just not possible to use Friction Framework. [03:20.160 --> 03:30.920] So to solve this, we developed Friction Repository. [03:30.920 --> 03:40.520] It's a github action, so it works like continuous data validation service. [03:40.520 --> 03:47.880] If you have data published on github or you store some data related project on github, [03:47.880 --> 03:54.560] you can add Friction Repository github action and on every push to your repository it will [03:54.560 --> 03:56.840] be validating your data. [03:56.840 --> 04:07.320] And this solves the problem of people who are not able to use Friction Framework because [04:07.320 --> 04:11.600] it requires programming and knowing command line interface. [04:11.600 --> 04:19.920] But still it's kind of for tech people who know what's github, how to create github actions. [04:19.920 --> 04:26.560] So having said all of this, I'd like to introduce Friction's application. [04:26.560 --> 04:35.360] It's a fully visual tool that can be published as a web application, more like a demo, and [04:35.360 --> 04:40.880] more importantly published as a desktop application that you can install on your computer. [04:41.120 --> 04:51.200] It's fully visual, it's for non-programmers and our goal is to make it really easy to use [04:51.200 --> 05:00.760] so it will just like us in Excel, the file manager like from Jupyter, Notebook, and the [05:00.760 --> 05:14.920] core features that for any CSV file on your computer you'll be able to see it as a table. [05:14.920 --> 05:24.760] And after this you can see the metadata, what fields, counts it has, what types, you can [05:24.760 --> 05:35.000] edit it, you can add validation checks and see how it validates. [05:35.000 --> 05:49.320] And in general this project just does everything, visually, what our more level projects do [05:49.880 --> 05:55.880] for programmers and command line interface users. [05:58.520 --> 06:05.320] It makes sense to add that it's not only for tables, for example you can edit your [06:05.320 --> 06:13.320] metadata like tables, schemas, or data packages, and here I'm going to show you [06:13.800 --> 06:27.800] how you can really date your data table and fix the errors. So when you upload or open a data [06:27.800 --> 06:38.360] file in Fiction's application it provides you a tabular view and the errors and a lot of data [06:38.360 --> 06:48.840] tables in CSV have errors as shown in red and there is a description of what this error is and [06:48.840 --> 06:58.680] you can just edit the cells like in Excel and clean your data and save it. [06:59.640 --> 07:13.960] Okay so the goal for our better release is also to support creating charts from data tables [07:13.960 --> 07:27.880] and we use here our beloved Vega notation or Vega light and so the idea that [07:29.800 --> 07:42.600] it was possible to do a lot of Vega supported charts using data from your tables so you'd be able to [07:44.520 --> 07:57.560] set what count used for x-axis, what will be the transform, what's the type of this axis etc etc. [07:58.520 --> 08:03.640] Currently it's under development but the goal is to have it. [08:03.960 --> 08:17.640] Of course it will be a non-complete data related application if it weren't for CQL [08:17.640 --> 08:27.400] features so yes I fixed the application possible to query your table or data whatever formats it had [08:28.120 --> 08:38.680] like was it CSV or Excel or whatever it will be indexed in a CQL light database so you will be able [08:38.680 --> 08:47.000] to query your data and not only individual tables but more complex queries joining different [08:47.720 --> 08:56.440] data tables so basically it will be just a CQL interface to [08:58.360 --> 09:00.200] data files on your computer. [09:03.320 --> 09:05.880] Okay let's say we're done [09:06.200 --> 09:15.800] with the dating and editing and analyzing our data tables what we can do in friction application is [09:16.360 --> 09:19.480] pack everything as a data package. [09:20.920 --> 09:28.680] Data package is a standard open standard for describing collection of files as said it was [09:28.920 --> 09:36.920] and it is a cornerstone of friction data project so in Fiction's application you can [09:39.400 --> 09:47.080] create a data package and fill it with your files and then they'll be working on [09:47.320 --> 09:58.040] the future for publishing the data so currently working on Github, Xenodo and C-CAN [09:59.800 --> 10:05.160] targets for publishing a data package as a dataset for example [10:07.240 --> 10:12.920] that is a set of files collection files and it will be Xenodo dataset. [10:13.880 --> 10:21.000] These features are already implemented for Fiction as framework and [10:23.080 --> 10:30.440] here we're just working on the visual interface for them and I think it's [10:32.520 --> 10:40.520] very important to provide this way for non-coding people visual way to publish data. [10:40.520 --> 10:51.000] Okay I think that's it for my presentation and as I mentioned Fiction's application is [10:51.000 --> 11:00.360] approaching better that release in a few months so please stay tuned I think we will be presenting [11:01.320 --> 11:05.560] it also on CSVConf in Argentina this year and [11:08.440 --> 11:17.080] when it's released any feedback ideas and suggestions will be like really amazing for us [11:17.080 --> 11:28.680] and it will be helpful. Also just a side thing just don't forget helping people for grain [11:30.360 --> 11:41.160] just waiting a link here and that's it thanks for your time for your attention and have a great [11:41.160 --> 11:53.960] for them thank you. [11:57.560 --> 12:04.760] Thank you Evgeny for this presentation now it's time for the Q&A in its life [12:05.560 --> 12:12.360] we had some questions I know that a lot of people were waiting for the new app and the [12:12.360 --> 12:20.520] presentation and the first question is from Paul can you describe the design process [12:20.520 --> 12:24.360] for the user interface user experience of the application please. [12:24.680 --> 12:28.600] Yeah first of all thanks for [12:31.480 --> 12:37.720] your words and I probably disabled my camera because it sometimes sounds weirdly so [12:39.080 --> 12:51.080] sorry for that so yes so let me put it this way your fiction is data and we are kind of like [12:54.760 --> 13:01.560] project with a like history already but for us we always kind of like limited [13:01.560 --> 13:09.960] regarding resources as many open source projects and funded projects by some [13:11.480 --> 13:20.600] grants and help by community so our goal is just to keep things like really simple and [13:21.400 --> 13:30.920] for the application we use really standard layout and ways to show the information as I [13:30.920 --> 13:40.760] mentioned for like file files excel type of table views and also we used just really high level [13:41.720 --> 13:53.400] JavaScript library material UI to just eliminate all the low level design decisions it just looks [13:53.400 --> 14:01.160] like I don't know Gmail or Google Drive or just based on material UI style of everything. [14:01.160 --> 14:08.920] I'm sorry you muted me. [14:11.400 --> 14:14.760] Thank you for your answer Eveline another question from [14:14.760 --> 14:21.480] you that we discussed before the conference she really likes and find really useful [14:22.280 --> 14:29.160] this new version and when for you will you use this or frictionless or [14:29.720 --> 14:32.360] might choose for example to use open refine? [14:35.880 --> 14:38.600] I think thanks for the question and [14:43.560 --> 14:48.520] I think you would use frictionless when your work is [14:48.520 --> 14:59.720] first of all tied to frictionless standards and I hope and at least I know partially that [14:59.720 --> 15:03.640] more and more people are starting using frictionless standards in their work that [15:03.640 --> 15:10.040] a package table schema and frictionless application is just kind of like the most natural way to [15:10.360 --> 15:21.880] work with these standards. Regarding open refine of course if you are looking for [15:22.760 --> 15:29.720] some already established and well supported and solution with a lot of plugins etc for now of [15:29.720 --> 15:36.040] course you will choose open refine but I'll just suggest to try frictionless application at some [15:36.040 --> 15:44.520] point and see if it's like more modern provides more features maybe unique features so [15:46.440 --> 15:50.600] it's not fair to ask me for this because yeah I'm just [15:55.320 --> 16:02.680] and another question really precise from Paul the table view is playing plan to scale [16:03.080 --> 16:04.120] how many rows? [16:08.680 --> 16:20.200] So regarding the visual interface it will be just a data frame to the database so [16:21.320 --> 16:30.520] the pagination and it can just work for whatever your local database created by [16:30.520 --> 16:39.000] frictionless application can be like scaled so when it's currently working on the [16:39.000 --> 16:45.000] web version of the application next step will be publishing it as a desktop application and since then [16:47.240 --> 16:55.800] the limit only be like local memory so it's it can work it's a frictionless data goal is [16:56.040 --> 17:06.200] kind of like so I'll say that the frictionless data is work the best for like small data sets and [17:06.200 --> 17:12.280] middle-sized data sets and for middle-sized data sets like million of rows say it will be [17:13.000 --> 17:20.600] good because it's just a just a data frame so it's not like for example excel when you just [17:20.600 --> 17:31.160] limit it to like a little bit of rows okay perfect another question from Oleg the publication to [17:31.160 --> 17:36.680] Zenodo github scan looks great any thoughts about plugging in other targets? [17:37.400 --> 17:45.400] Thanks for the question and for the kudos and [17:48.440 --> 17:52.680] yes yes as the application just [17:56.680 --> 18:02.760] creates a visual interface for the on top of frictionless framework for frictionless framework [18:02.840 --> 18:07.640] we have future request for other data targets and I lately just [18:09.240 --> 18:16.760] made some research to add others try it and you can etc etc so [18:18.440 --> 18:26.520] you'll be happy to add new targets and it's only question of resourcing so when we [18:27.480 --> 18:34.360] have an established version of the application and yeah so currently we kind of like having [18:34.360 --> 18:44.280] three of these scan github and Zenodo gives us a chance to check where a thing uses like a proof [18:44.280 --> 18:53.240] of concept and it's tested and it works it will be like really easily easy to add others [18:53.800 --> 19:02.440] okay that is the future perfect from Paul again can you describe the software [19:02.440 --> 19:06.680] stack for the frictionless application thanks Paul [19:10.040 --> 19:22.680] yes so again as mentioned that as any open source project we have limited resources [19:22.760 --> 19:29.480] the frictionless application is basically a wrapper around the frictionless framework [19:30.280 --> 19:40.840] so it's a so I'm not sure what's exactly the question because for such a deep software like [19:40.840 --> 19:47.800] from visual interface to low level stuff we use a lot so regarding big parts friction [19:47.880 --> 19:52.360] application is a wrapper around frictionless framework just a kind of like a client when [19:52.360 --> 19:57.480] frictionless framework is a server and we also will publish at some point frictionless API [19:58.440 --> 20:05.720] so people will be able to use in python the server we use in the application which may be [20:05.720 --> 20:12.680] useful for maybe creating something in house solutions for data validation and [20:13.560 --> 20:23.480] a site of frictionless parts it's react material UI and do you stand for [20:24.840 --> 20:31.640] state management but sorry I'm not sure if it's really want information or it's to low level [20:33.240 --> 20:40.520] I think that people can also watch your live stream after later and may [20:41.080 --> 20:45.160] be able to have this kind of a precise answer that's why it's perfect [20:46.040 --> 20:53.720] but I have another question it's about the table editor check and does it check [20:54.440 --> 20:59.080] all errors from data application validation including the foreign key constraints [21:01.080 --> 21:09.800] currently honestly it doesn't because it's just incomplete but yeah the goal is just to be [21:11.080 --> 21:17.000] like 100 person aligned with the standards and if it's in the stand if that package standard [21:17.800 --> 21:23.640] say that foreign keys should be validated it will be validated in frictionless application so [21:25.160 --> 21:29.720] the goal for the better release is differently so for this [21:29.880 --> 21:41.320] okay I'm checking if there is any other question no I have some question more about the development [21:41.320 --> 21:48.600] of frictionless and the story I have been working on this project for a long time can you maybe [21:50.440 --> 21:57.480] like tell us about the challenges that you face to the different step of the development [21:58.040 --> 22:06.680] and what is for you like this is especially about open source research after what is for [22:06.680 --> 22:14.920] you the advantage of open source but also the maybe the pro and cons of doing this open source [22:14.920 --> 22:26.280] development sorry I didn't hear the first part it was a yes it's about your experience you have [22:26.280 --> 22:31.160] been in this project for a while what are the challenges that you faced [22:34.440 --> 22:39.320] yeah so it's a good question for six minutes we have [22:42.440 --> 22:47.800] maybe like for the also next few hours so [22:47.800 --> 22:58.120] I'll try to say something maybe less usual than maybe something that will be more interesting than [22:58.120 --> 23:12.920] saying that like resources are limited and etc etc so for us maybe this might be interesting that [23:13.640 --> 23:16.600] from the beginning of the project [23:19.000 --> 23:28.520] also it's also probably kind of usual thing I think the initial idea idea great idea created by [23:28.520 --> 23:37.160] Rufus Polak and of these frictionless standards of course it wasn't you know it was too general [23:37.160 --> 23:46.760] because it was an idea and during the life of the project I would say that at least for the [23:46.760 --> 23:53.720] technical part because I was leading initially like the software only now I'm leading the project [23:53.720 --> 24:03.160] in general I see it like trying to pick good parts and remove bad parts so just trying to [24:03.240 --> 24:09.480] figure out what's really useful thing from frictions that we can provide and what's [24:09.480 --> 24:16.600] just this like you know 80 20 person thing so yeah I would maybe suggest [24:18.200 --> 24:27.080] on your open source project to start like to do like as early as possible this [24:27.640 --> 24:34.280] elimination not of things that it's not your like critical path to being useful [24:37.160 --> 24:43.960] because yeah I can say a lot about the usual thing about documentation contribution like [24:44.920 --> 24:52.040] full request no not like synchronized between like among like contributors etc but it's the [24:52.040 --> 24:59.320] same like for every open source project so and maybe I have um related to this uh [24:59.320 --> 25:05.080] topic a question about um for open source we are discussing a lot about sustainability of project [25:05.800 --> 25:15.560] that's why for you what are the main elements to help to yes to have sustainability for for your [25:15.640 --> 25:20.920] project and in the in the next steps maybe for that [25:23.880 --> 25:30.360] frictions data has been supported for a long time by the swan foundation [25:31.480 --> 25:42.200] what work really thank you and currently it's a it's a core open knowledge project [25:42.920 --> 25:53.640] supported by the open knowledge foundation and but still here it's an ongoing discussion of [25:53.640 --> 26:05.720] sustainability and for so this project like this is like not wide enough like for example [26:06.040 --> 26:15.320] a webpack which is now fully funded by I think open collective and just outside contributors [26:15.320 --> 26:19.320] so it's just just a few project like this can [26:22.200 --> 26:32.440] live like just using donations so in our domain I think project like this still needs to [26:32.600 --> 26:47.080] uh build um collaborations uh this some hopefully like ng also like high level data projects and [26:50.360 --> 27:00.120] start providing some um customizations and tailored versions of the software [27:01.080 --> 27:10.200] uh to kind of like to provide the source resources for core development just what I what I think [27:12.920 --> 27:16.840] perfect checking if there is other any other question [27:17.720 --> 27:24.040] for now we are at the end of the day of full day of conference in bris in brissel that's why I think [27:24.840 --> 27:32.920] there is less people connected right now but do you have like uh take last comment about [27:34.120 --> 27:42.360] like your this next step for your project that you you you explain in your in your presentation [27:42.920 --> 27:51.080] like or take home message last last words to to conclude our Q&A question Q&A live stream [27:51.800 --> 27:59.720] um yeah thank you so the next steps will be publishing uh better release as I said and [28:00.920 --> 28:06.440] we're planning to do so in a few months in March April and uh we're targeting [28:07.800 --> 28:16.920] ccv con in Argentina and it will be great if some of our uh listeners can join us there [28:16.920 --> 28:26.520] in Argentina and I hope you'll be doing more uh live uh version presentation [28:27.960 --> 28:33.560] with all the features already working so that that's that's a short uh short uh [28:35.480 --> 28:42.200] like one real issues at uh afghanians thank you so much for your time and your presentation [28:42.760 --> 28:44.280] yeah thanks a lot thank you [28:46.920 --> 28:48.380] you