[00:00.000 --> 00:16.120] Yes, so I'm Mathias Valdman. I'm the author of this FQ tool. Now I'm going to do, I'm [00:16.120 --> 00:20.320] not going to be able to show all slides, I think. I want to do a lot of demo, because [00:20.320 --> 00:28.360] I think it's a good tool to demo. Sorry, can you talk a bit louder? I can try. [00:28.360 --> 00:34.800] What is FQ? It's a tool and the language and the group of set of decoders to work [00:34.800 --> 00:38.680] with data. More or less it used to be binary data, but now it's also text data, [00:38.680 --> 00:45.680] but I guess it's binary also. In a sense, so it's heavily inspired by JQ, both as [00:45.680 --> 00:53.160] it is JQ, the language, but it's also the CLI tool. It's inspired by how [00:53.160 --> 00:58.520] the arguments work and everything. And it's a tool for querying and displaying [00:58.520 --> 01:05.480] data, exactly like JQ. But it also has an interactive weapon, so you can work [01:05.480 --> 01:09.320] with it in a more interactive way, and it has auto completion and a lot of other [01:09.320 --> 01:14.840] bells and whistles to make it very nice to work with. And it's available for a [01:14.840 --> 01:19.720] lot of operator systems. So I like to call it like, it's like a debugger for files. [01:19.720 --> 01:29.720] That's how I use it. So why JQ? It's a very CLI friendly language. You don't [01:29.720 --> 01:35.600] need any new lines. It's like you, it's like the syntax is very, very tears. You [01:35.600 --> 01:40.240] can do a lot with a little syntax. And it's very composable. You have this [01:40.240 --> 01:46.680] like pipe, more or less like shell pipes. But it has these generators. You can [01:46.680 --> 01:51.200] even do loops and iterate and recurs with even this, using these pipes. You see [01:51.200 --> 01:56.240] these square brackets like that to iterate and dot, dot is to recursive, [01:56.240 --> 02:03.040] recursively through a tree of values, you can say. And it's also kind of like DSL [02:03.040 --> 02:10.240] for like selecting and transforming JSON. It is, you can call it JSON. It's [02:10.240 --> 02:14.640] not, it just happens to have JSON as an input output kind of. Internally, you can [02:14.640 --> 02:20.280] call it like JQ values. So it has arrays and objects and numbers and strings [02:20.280 --> 02:28.320] and boots. So it's kind of like a super set of JSON. And I have an example here [02:28.320 --> 02:32.880] that you, you build a new object that has a key A, that is an array, that is one, [02:32.880 --> 02:37.840] plus two, plus three and empty, and you see that it becomes the object with just [02:37.840 --> 02:44.280] A and array one, five. And empty was just a function that does nothing. It outputs [02:44.280 --> 02:52.360] no, no value at all. And you can also select values from the input. So here we [02:52.360 --> 02:56.640] have the object with A and B and then you create a new object that has this, the [02:56.640 --> 03:02.480] sum key. And then that's the sum of A plus B. So then it's three. And it's a [03:02.480 --> 03:06.880] purely function language based on generators and backtracking. But it has [03:06.880 --> 03:14.800] conditioners, it has function, it has bindings, it has all, all things you need. [03:14.800 --> 03:20.360] And it's, the default is that you run, you have one JQ filter or program that you [03:20.360 --> 03:25.960] run individually on each input file. But then it has, it has functions to tell [03:25.960 --> 03:30.280] it to not behave like that. So you can run one filter on a group of files, for [03:30.280 --> 03:39.640] example. And FQ has support currently for 113 formats. And it has most of them, I [03:39.640 --> 03:44.240] guess, half of them are media related because I work with media. So just like [03:44.240 --> 03:50.880] MP3, MP4, or Flack. And they also have support for like demuxing some of these [03:50.880 --> 03:55.200] forms like you, some of these containers they have support for segmenting and [03:55.200 --> 04:00.080] things. So you can kind of recombine it and then decode the, the demuxed sample [04:00.080 --> 04:05.760] and things. But they have been, other people have added other things. So [04:05.760 --> 04:11.480] executable formats, archiving, networking. So you can do PP cap nowadays, you can do [04:11.480 --> 04:18.080] TCP reassembly even, and even decode the TCP stream. So it's a support for [04:18.080 --> 04:24.640] RTMP, for example. And it's also a support for some serialization for like [04:24.640 --> 04:30.520] message pack and the ASM1 beer and the seat, seabor and those. And it's also a [04:30.520 --> 04:35.480] support for some text format. Some of them you can even decode and encode. So [04:35.480 --> 04:42.000] you can decode it into a JQ or JSON value, transform it with JQ and then [04:42.000 --> 04:46.000] encode it back to some other text format. You can't do it with the binary [04:46.000 --> 04:50.800] formats. I will see if I get to that to explain why that is, that is not easy to [04:50.800 --> 04:57.360] do. So what does it mean when you decode? What does it mean that you FQ [04:57.360 --> 05:02.800] support the format? It means that it can, there's, there is some code is written [05:02.800 --> 05:09.880] in Go. So there's like a kind of a DSL for writing decoders. And that produces a [05:09.880 --> 05:14.920] structure that is like JSON compatible. But each value in this structure also [05:14.920 --> 05:22.360] have no which bit range they come from. And they also have optionally can have [05:22.360 --> 05:26.400] like symbolic mappings. Like you can map the number to a string or string to a [05:26.400 --> 05:32.000] string or boolean to a, so you can in binary format you usually encode some [05:32.000 --> 05:39.120] number that means something. Like this is the upliver if you get that. And for [05:39.120 --> 05:44.720] media this usually means that it's, you decode everything except the pixels or [05:44.720 --> 05:49.120] the audio because it's like, yeah, then you buy, then you can use ffmpeg or whatever [05:49.120 --> 05:53.040] you want. There are some format that actually decodes to the samples. Flack for [05:53.040 --> 05:57.560] example we have support for actually coding. So there is a full flack decoder in [05:57.560 --> 06:04.600] Flackube but you can't, you can't listen to the sound. And some format can use [06:04.600 --> 06:10.000] other formats. That's how they are. They're like hierarchy over that they use. So you [06:10.000 --> 06:14.760] can even type in pfmpeg users, the internet decoder using IPV. And then you [06:14.760 --> 06:18.080] can even end up with the loops here that they're like zip files in the zip file. [06:18.080 --> 06:25.840] And yeah. And there's also support for formats to pass. You can even return [06:25.840 --> 06:29.840] value. You can return like values from a decoder into another decoder. You don't [06:29.840 --> 06:35.520] see this from the outside. But then for example mp4 has some boxes that have [06:35.520 --> 06:39.640] like information about how the samples should be decoded. Like how long are [06:39.640 --> 06:45.760] some fields or things. So then before and before the decoder can pass that [06:45.760 --> 06:53.880] information down to another sample decoder. So how do I use? I use it because [06:53.880 --> 06:59.120] of this. Because I work with media so all more or less the whole mp4 file is [06:59.120 --> 07:05.880] metadata about how to play this. How to seek, how to how to sync, how to yeah [07:05.880 --> 07:13.280] everything. And I guess Derek had a good talk about why this multimedia is [07:13.280 --> 07:20.040] basically endless pain. And I guess FQ can't really fix the pain but it can [07:20.040 --> 07:27.240] locate the pain I guess. So this is what I use it for. I debug the query assist [07:27.240 --> 07:32.840] when I work with media files. And it's used for someone usually at work there's [07:32.840 --> 07:36.400] a media file that has broken that someone it doesn't transcode or it's not in sync [07:36.400 --> 07:40.600] or and they say what what is wrong with this file. And we have to figure out [07:40.600 --> 07:45.200] like what is it? Is it the decoder problem? The maxi problem? Is it the encoder? Is it [07:45.200 --> 07:53.400] is it corrupt? Is it whatever? So FQ is very useful to quickly triage a lot of [07:53.400 --> 08:02.320] broken media files. And I can just short what is the time? Short like what can it [08:02.320 --> 08:07.520] not do? Or what is not good? It's not good for encoding like to change things. [08:07.520 --> 08:13.800] You can change things but it's more about slicing binaries into and then [08:13.800 --> 08:17.960] concatenate them together and then write them out to a new file kind of. So there's [08:17.960 --> 08:21.280] no like you can't just change the value in some JSON structure and just realize [08:21.280 --> 08:28.160] it back because it's like for example I gave an example here with the mp4 [08:28.160 --> 08:34.280] maxi. What would it mean to add or remove a sample in mp4 file? Then you [08:34.280 --> 08:39.400] would have to change all the boxes that describe how big the samples are and it's [08:39.400 --> 08:47.960] just cascades away into yeah and you can see that mp4 ffmpegs mp4 [08:47.960 --> 08:54.680] implementation is 17,000 lines of c-code very dense c-code so it's use ffmpeg [08:54.680 --> 09:01.400] if you want to do those kind of things. And I think you see and Peter will talk [09:01.400 --> 09:11.600] about more about encoding I guess and why why this is complicated. And you can [09:11.600 --> 09:16.240] repair media files with fq but you you probably have to be more or less an [09:16.240 --> 09:23.760] expert in the format that you're fixing. I usually do like I have you you can [09:23.760 --> 09:27.600] fix things by kind of testing the code the configurations or you use some kind [09:27.600 --> 09:32.320] of you encode something and borrow from another media file and yeah [09:32.320 --> 09:37.960] somewhere stitch together to see what it is or so there is I have some fq code [09:37.960 --> 09:44.240] to build like ADTS headers and whatever if you want if you find an AC frame [09:44.240 --> 09:48.280] somewhere you don't even know what because an AC frame has to have metadata [09:48.280 --> 09:55.360] about like what how many channels it has and what profile it is and things. And it [09:55.360 --> 09:59.680] doesn't do any decoders in runtime at the moment you can't write the decoder [09:59.680 --> 10:03.600] you have to write in go now and then compile it so we'll see maybe in the [10:03.600 --> 10:07.080] future it's going to have cacti support. I have to have a prototype for cacti but [10:07.080 --> 10:11.560] it is complicated also it has an expression language I have to write [10:11.560 --> 10:18.580] the parser for that. Maybe I will talk to you about this and see. There is more [10:18.580 --> 10:29.120] slides here but I will you can read them. I want to do a demo instead. Is it big [10:29.120 --> 10:38.080] enough? So fq is just a CLI tool. Work like this and you can do if you want to [10:38.080 --> 10:48.160] list all the formats that it supports. So if you run fq and let's see it has pcap [10:48.160 --> 10:54.000] if you do so the first argument is the filter that you want to run and dot in [10:54.000 --> 10:59.120] jq is just an identity function that is just it gives you what you put into get [10:59.120 --> 11:03.120] the root kind of so here we see that it's a pcap it has a head there it has [11:03.120 --> 11:12.360] packets and some TCP or TCP connections and things. So you can do dash I actually [11:12.360 --> 11:17.040] I want to do I want to do a crash course in the jq. I don't know how many people [11:17.040 --> 11:25.080] know how jq works so I can do a short version how just to show the [11:25.080 --> 11:31.080] particularities of jq. So now I started jq with dash I which gives you [11:31.080 --> 11:36.000] just a null input if you get one input that is just null so it's just a way [11:36.000 --> 11:41.600] of you can at least just execute jq values because it needs to have an input [11:41.600 --> 11:48.600] somehow always. So now you can just write strings or whatever you want to do [11:48.600 --> 11:57.080] and if you do dot there you just yes null and I guess the most special thing [11:57.080 --> 12:03.280] about jq is the comma operator and that outputs a value so you can do 1 comma [12:03.280 --> 12:10.280] 1 2 3 it gives you 1 2 3 but then in jq you can there are some special forms [12:10.280 --> 12:15.800] like this collect which sounds this looks very familiar to us as an array [12:15.800 --> 12:22.320] that then collects those values into something but then in jq you could you [12:22.320 --> 12:29.480] could just write the expressions here or whatever you want so let's say we can [12:29.480 --> 12:37.680] define a function for example so and then you can just collect that function or [12:37.680 --> 12:45.120] you can call it two times or you can map those values yeah so you see how it [12:45.120 --> 12:50.520] was too fast to but you see how you can define function you can have binding [12:50.520 --> 12:54.960] so it's a full functional language that's it's very I like it a lot how it [12:54.960 --> 13:08.960] works so let's back to that pcap file so let's see you can do for example if you [13:08.960 --> 13:14.640] want to see look at the first packet you can do this you can pipe it to D which is [13:14.640 --> 13:21.280] shows more recursively all the if you if you just give it when you do this you [13:21.280 --> 13:27.040] actually run D also but it also show it only showed one level so if you do D it [13:27.040 --> 13:31.480] it will show all of it so here you can see like you can write on jq [13:31.480 --> 13:35.000] expression here it is like show me the first and the last packet and it will do [13:35.000 --> 13:43.080] this and then you can say both of them yeah so you can do things like that so [13:43.080 --> 13:49.760] let's see we can go into the TCP connection and we take the first one we [13:49.760 --> 13:59.080] can do D and we see that this is seems to be HDP request someone has downloaded [13:59.080 --> 14:13.760] the file so let's see we can go to the server stream let's see and there is a [14:13.760 --> 14:17.120] there's one thing about jq is that the jq don't have doesn't have binary support [14:17.120 --> 14:24.760] it only has string so you so fq has to to support binary you can there is some [14:24.760 --> 14:31.320] special functions in fq to tell it that this this string is actually binary or [14:31.320 --> 14:35.720] I want it as binary if it's possible so then you can say like this stream for [14:35.720 --> 14:42.040] example if you if I would just do type it just a string but if I do two bytes you [14:42.040 --> 14:48.440] will actually get the raw bytes and then I can say like I wanted to maybe 400 [14:48.440 --> 14:55.080] first and DD is something that shows the whole it doesn't truncate the output so [14:55.080 --> 15:00.560] here we see that this is some kind of HDP request [15:00.560 --> 15:21.320] so let's say we want to get the bytes for this okay plus so here I think I have [15:21.320 --> 15:27.600] the body of the HDP request so in jq you can more than do for example this that [15:27.600 --> 15:32.920] all all the all the coders in fq are jq functions so you can do this now and it [15:32.920 --> 15:38.000] will just decode it as an mp4 file so now you can start as a sub-repell for [15:38.000 --> 15:45.840] example now you're inside the mp4 inside the the pcap and now you can do and [15:45.840 --> 15:51.960] here you have the whole box 3 for the mp4 file and and you can for example go [15:51.960 --> 15:56.960] into the samples I think this is some kind of subtitles for mp4 file that I [15:56.960 --> 16:07.120] found somewhere and here you have the tracks it has samples so this is like [16:07.120 --> 16:12.040] the raw the raw bytes for that sample and you see it's some it's some kind of [16:12.040 --> 16:21.400] weird XML subtitle and fq has support for XML so you can do this and then you [16:21.400 --> 16:30.520] get a JSON version of the XML see we can see almost there is some let's see I [16:30.520 --> 16:35.560] can write something that takes out all the this is probably not how you write [16:35.560 --> 16:44.200] TTML subtitle parser but we can we can do a quick quick version of one and there [16:44.200 --> 16:49.640] is some functions like grep by that recursive look for some condition so here [16:49.640 --> 16:52.640] for example you can look for [16:54.640 --> 16:57.640] did not work [16:58.840 --> 17:01.840] why [17:02.840 --> 17:05.840] aha [17:10.840 --> 17:17.280] so so now we it recursive defined all those text the objects that had the text [17:17.280 --> 17:23.240] field and then just take took that text field so now you can for example take [17:23.240 --> 17:32.520] this expression go out to the you have the wrap up here [17:36.520 --> 17:42.440] and you can go out to the prompt again and then remove the interactive and then [17:42.440 --> 17:51.960] do this instead or we can do this if you only want text and we can even say [17:51.960 --> 17:57.680] we want all all samples so here is all that that's the thing you can do with [17:57.680 --> 18:00.920] with all these decoders that the codes and blah blah and then you can [18:00.920 --> 18:11.200] iteratively do all this it's like yeah it's nice so let's see and I also want [18:11.200 --> 18:16.240] to show that you can you can actually you don't have to write this all this [18:16.240 --> 18:20.920] when after a while you maybe with your expression starts to get very long so you [18:20.920 --> 18:25.720] want to have more structures I can show you I have some helpers for mp4 files [18:25.720 --> 18:31.040] for example maybe you know so I spend a lot of times in mp4 files because that's [18:31.040 --> 18:36.280] like what is used everywhere nowadays so here is some helpers for example that [18:36.280 --> 18:39.840] can be written in a more structured way that with the indentation and things so [18:39.840 --> 18:50.880] you mean you don't go crazy and I can show here for example this is a bit [18:50.880 --> 18:58.920] long but for example here is an expression that loads this mp4.jq and [18:58.920 --> 19:03.680] it finds the video track and then it uses some jq code to produce a [19:03.680 --> 19:10.000] GNU plot output and it uses a GNU plot and then I have my weird tool I have for [19:10.000 --> 19:14.960] producing images in the terminal so so here's the bit rate for the video bit [19:14.960 --> 19:20.240] rate per bits per second in this video and you see that there's probably this [19:20.240 --> 19:40.040] top sear up over the iframe in the mp4 file maybe we can take questions then yes [19:40.040 --> 19:44.160] yeah you can use that's a hex editor also maybe it's not very convenient as a [19:44.160 --> 19:50.120] hex editor but you can for example you can do dash d bytes or bits then you get [19:50.120 --> 20:03.360] like it's just a dummy decoder kind of so you can do bits a lot be kept so so [20:03.360 --> 20:10.800] now you'll just get it so now you can do so and then you can concatenate this [20:10.800 --> 20:15.560] into you can the kind that you can build this like a binary arrays like one they [20:15.560 --> 20:20.240] are like iolists in Erdogan if you have used that that you can like you it's [20:20.240 --> 20:25.040] just an array with with things that can become binary and then it can and you [20:25.040 --> 20:31.080] can kind of pipe this to two bytes and then you get back so you can kind of [20:31.080 --> 20:40.880] do this yeah so you see that if you can you can build this that's how you can how [20:40.880 --> 20:45.440] you can repair things with fq that you take the samples and then add on some [20:45.440 --> 20:49.240] head there and then you concatenate the bytes and you can even decode it with [20:49.240 --> 20:54.360] the jq again so you can i mean you could you could try decode this as an mp3 [20:54.360 --> 21:11.320] file for example but it yeah it won't work yes i would like to have some more [21:11.320 --> 21:15.640] round timey version i can't show how there is one of the slideshow show [21:15.640 --> 21:23.800] actually how the dsl how the how the go along dsl looks like yeah here here is [21:23.800 --> 21:32.880] how kind of how the go dsl works that you there's like a d function of this [21:32.880 --> 21:36.880] kind of like that the the context for that keeps tracks of where you are and [21:36.880 --> 21:40.640] then you create like new structs and fields and and you have these mappers [21:40.640 --> 21:46.880] that say like this is a binary for made up binary format for live by for open [21:46.880 --> 21:52.800] source licenses that I was going to be a moment there so that's how how most of [21:52.800 --> 21:58.200] the code looks like so if you can do you can write any go code i mean i prefer to [21:58.200 --> 22:02.320] write them to make them look very declarative like so don't use too many [22:02.320 --> 22:17.560] yeah too much weird code in keep them simple is the and other questions yes [22:17.560 --> 22:29.520] it's i would say that it's probably hard to of course to write the code to the i [22:29.520 --> 22:34.960] think it's it creates a lot of options how what what does the user mean when [22:34.960 --> 22:42.000] they change something like do they want yeah it's also like do you want the [22:42.000 --> 22:47.120] checksums to be recalculated or not do you want what happens if you change [22:47.120 --> 22:51.040] something in a if you have the max something and then you change the size [22:51.040 --> 22:56.000] so now the segmenting becomes different so then it cascades to change the whole [22:56.000 --> 23:02.480] file so it's like do you want that to happen it's also like there are [23:02.480 --> 23:07.640] encodings that are there are like many ways to encode the same integer for [23:07.640 --> 23:13.280] example var ints can be encoding many many ways so you would encode and [23:13.280 --> 23:20.280] normalize all that or should it behave should remember try to remember how how [23:20.280 --> 23:45.520] the original thing was decoded yeah so it's it's complicated any other questions