[00:00.000 --> 00:12.960] Hi everybody, welcome to my presentation about Kaitai struct. [00:12.960 --> 00:17.520] I am Peter Pucil and I have a question. [00:17.520 --> 00:21.840] How many of you have any experience with Kaitai? [00:21.840 --> 00:27.800] Okay, there are a few of you. [00:27.800 --> 00:32.600] What is Kaitai struct? [00:32.600 --> 00:41.040] It's a tool for dealing with binary formats, especially parsing. [00:41.040 --> 00:49.240] It is based on a declarative language, Kaitai struct YAML, that can be used to specify arbitrary [00:49.240 --> 00:51.480] binary formats. [00:51.480 --> 01:03.680] It works as a parse generator and it currently supports 11 target programming languages. [01:03.680 --> 01:12.440] Parsing means to convert the binary data you see above to the structure data and object [01:12.440 --> 01:18.440] tree so that you can work with it later. [01:18.440 --> 01:25.240] Today I will also introduce a new functionality, which is serialization. [01:25.240 --> 01:38.920] I've been working on it for the last six months and it currently works in Java. [01:38.920 --> 01:45.320] Serialization means, I didn't mention that, that basically the inverse process. [01:45.320 --> 01:51.920] You want to create a binary file from an object tree. [01:51.920 --> 01:56.800] Something about this story. [01:56.800 --> 02:07.240] So the author of Kaitai struct is Michael Action and the project started in 2014. [02:07.240 --> 02:16.520] In 2016, Michael decided to release the project as open source and at that time the only supported [02:16.520 --> 02:20.600] languages were Java and TrueWe. [02:20.600 --> 02:29.480] In 2017, Michael presented Kaitai struct at FOSDEM and by then it already supported eight [02:29.480 --> 02:36.720] languages and had over 400 stars on GitHub. [02:36.720 --> 02:41.000] Michael also wanted to come today but unfortunately he couldn't. [02:41.000 --> 02:46.800] But if there is some chat or something, I think he should be there so you can ask him [02:46.800 --> 02:49.600] some questions or whatever. [02:49.600 --> 02:51.200] And how is it today? [02:51.200 --> 03:01.680] So we have 11 target languages and over 3,000 stars on GitHub and Kaitai is used in more [03:01.680 --> 03:08.400] than 500 GitHub projects. [03:08.400 --> 03:14.320] So let me share how I discovered Kaitai struct. [03:14.320 --> 03:23.240] This was in 2019 and I was playing electronic keyboard with a band and I wanted to create [03:23.240 --> 03:32.440] a MIDI editor so that I could record the songs on the keyboard and edit them on the computer. [03:32.440 --> 03:41.080] And I wanted the user to be able to upload a sound bank in the sound font to binary format [03:41.080 --> 03:46.360] so that they could control how the song could sound. [03:46.360 --> 03:54.120] And I wanted a web-based MIDI editor so I searched for a JavaScript parsing library [03:54.120 --> 04:01.360] of the.sf2 format but I couldn't find one that would work for me. [04:01.360 --> 04:09.600] So I started writing my own parser but it was really hard and a lot of debugging had [04:09.600 --> 04:13.320] to be done and it was just not fun. [04:13.320 --> 04:24.720] And when I finished I came across Kaitai struct and I found that my two months of work I spent [04:24.720 --> 04:29.160] on this could be done in just one day with Kaitai. [04:29.160 --> 04:38.480] So Kaitai impressed me with its concept, simplicity and versatility and I started contributing [04:38.480 --> 04:40.120] a lot. [04:40.120 --> 04:47.720] And Kaitai also helped me my personal development because until then I'd only programmed in [04:47.720 --> 04:54.440] JavaScript, PHP and a little bit of Python and within a few months I was able to work [04:54.440 --> 04:59.120] in 14 programming languages that were used in Kaitai. [04:59.120 --> 05:10.520] And in 2020 I accepted an offer from Michael to become an administrator of the project. [05:10.520 --> 05:17.760] So in my story I showed what options there are to get a parser. [05:17.760 --> 05:24.800] So the most convenient way that you are probably familiar with is to use a dedicated format [05:24.800 --> 05:28.520] library in the given language. [05:28.520 --> 05:36.240] So it will probably have a user-friendly API and can be optimized for a format. [05:36.240 --> 05:44.280] But sometimes it may be of poor quality and incomplete and it may be difficult to debug [05:44.280 --> 05:46.560] and fix it. [05:46.560 --> 05:55.120] And also for the most common formats like JPEG, L for zip, you can find even several [05:55.120 --> 06:03.080] libraries and you can choose, but for less common formats, some obscure ones, there will [06:03.080 --> 06:06.440] simply be no library in your language. [06:06.440 --> 06:14.680] So we need to look into other options and another option is to simply write your own [06:14.680 --> 06:15.680] parser. [06:15.680 --> 06:26.600] But in my experience this is the worst option because it takes a lot of time and you need [06:26.600 --> 06:34.840] to do a lot of debugging using some debug brains and dumps and it's just not fun. [06:34.840 --> 06:40.640] But it's what most people do, often because they just don't know any better. [06:40.640 --> 06:44.080] So that's why I'm here today. [06:44.080 --> 06:52.480] And well, the problem is that if you have already written a parser for your format in [06:52.480 --> 06:58.640] Python, for example, and then after some time you are asked to create a Java parser for the [06:58.640 --> 07:06.760] same format, you basically need to start again. [07:06.760 --> 07:13.920] So a bit better way is to use a parser combinator, which means that you are essentially still [07:13.920 --> 07:21.040] writing your own parser, but you are using some building blocks from a library. [07:21.040 --> 07:28.880] And a parser combinator typically allows you to declaratively define some sub structures, [07:28.880 --> 07:38.880] but still in the code and like in CU can define structs for the fixed size pieces of the format [07:38.880 --> 07:46.000] and then you can directly interpret some block of bytes with that struct. [07:46.000 --> 07:57.320] And there are many parser combinators, perhaps dozens in popular languages, but as with the [07:57.320 --> 08:08.160] two previous options, you have still the disadvantage that the parser you get this way is still [08:08.160 --> 08:11.240] bound to the particular language. [08:11.240 --> 08:15.160] And it may be even bound to an application. [08:15.160 --> 08:23.120] For example, if it was developed for a graphical editor, so it may be difficult to separate [08:23.120 --> 08:30.120] just a parser from that application to use it somewhere else. [08:30.120 --> 08:39.520] And the fourth option is to use a parser generator, which means that you are not writing the parsing [08:39.520 --> 08:46.880] code directly in the programming language, but instead you describe it in a domain, describe [08:46.880 --> 08:54.520] the format structure in a domain-specific language, and this description can be then [08:54.520 --> 08:58.960] automatically translated into a parser. [08:58.960 --> 09:08.400] So, KeitaStruck falls into this category and it is the KeitaStruck language is designed [09:08.400 --> 09:18.200] so that it's independent of both the application and the programming language. [09:18.200 --> 09:22.600] Here I'll show you how to work with Keita. [09:22.600 --> 09:26.280] The first stage is compilation. [09:26.280 --> 09:36.080] So you take this KeitaStruck specification of the format and in this case, this is a format [09:36.080 --> 09:45.840] that has one byte because this U1 type means unsigned integer of one byte. [09:45.840 --> 09:53.560] And you take this KeitaStruck specification and you compile it using the KeitaStruck compiler, [09:53.560 --> 09:55.560] which is a command line tool. [09:55.560 --> 10:06.360] And as output, you get the source code of the parser, in this case in Python. [10:06.360 --> 10:11.200] The main stage is parsing. [10:11.200 --> 10:18.800] You take, you give the input binary file to the generated parser. [10:18.800 --> 10:28.320] You get in the first step and you give the input binary file to the parser as input and [10:28.320 --> 10:35.040] you get parsed data as output, so an object tree. [10:35.040 --> 10:43.480] And in case of KeitaStruck, the generated parser works with the runtime library so you [10:43.480 --> 10:50.600] need to include it also into your application. [10:50.600 --> 10:51.680] Why use Keita? [10:51.680 --> 10:56.360] What are the advantages? [10:56.360 --> 11:06.680] So as I already mentioned, the advantage is that you write the KSY specification once [11:06.680 --> 11:09.320] and you can use it everywhere. [11:09.320 --> 11:16.320] It standardizes the way we describe binary formats and there are already many formats [11:16.320 --> 11:25.640] described in the Keita format gallery and any described format can be visualized automatically [11:25.640 --> 11:33.040] in a Gravis diagram and the KeitaStruck language is simple, you will see. [11:33.040 --> 11:41.360] There are also several visualization and dumping tool available in KeitaStruck. [11:41.360 --> 11:53.360] So the write once use everywhere feature means that you get parses in 11 programming languages [11:53.360 --> 11:57.920] for free from a single KSY specification. [11:57.920 --> 12:07.440] So in this case, I've had the compiler generate Java, Python and Ruby parsers from a simple [12:07.440 --> 12:14.120] KSY specification you see on the left. [12:14.120 --> 12:20.760] When you look for specifications of binary formats, you will find that each one looks [12:20.760 --> 12:32.640] different and there is no single standard to how to document formats and Keita is used [12:32.640 --> 12:41.200] or intended primarily for creating parses but some people write KSY specification just [12:41.200 --> 12:48.440] to document a format in an easy to understand way because you don't even have to be a programmer [12:48.440 --> 13:00.320] to understand a KSY specification and it's often easier than to read these long PDF documents. [13:00.320 --> 13:08.080] And the Keita project includes an extensive gallery of described formats. [13:08.080 --> 13:28.480] At the moment, there are 181 formats described by 76 contributors and there are also several [13:28.480 --> 13:36.720] hundreds more format specifications in various Keita projects. [13:36.720 --> 13:46.400] And so the Keita format gallery contains formats of various kinds, for example, as you see [13:46.400 --> 13:55.160] archive files, for example, executables, file systems, game data files, multimedia files [13:55.160 --> 14:04.560] and network protocols, you can go to this page and I took it from there. [14:04.560 --> 14:10.680] And this suggests the wide applicability of Keita. [14:10.680 --> 14:19.320] And it offers an idea to create an international database of formats where various obscure [14:19.320 --> 14:26.760] and historical formats would be documented in a uniform way for future preservation. [14:26.760 --> 14:36.320] And this would guarantee that we could basically, we could read the binary files we write now [14:36.320 --> 14:43.800] in like 100 or 200 years from now. [14:43.800 --> 14:49.960] The fact that the Keita extract language is declarative makes it possible to automatically [14:49.960 --> 15:01.600] visualize it, visualize the described format in a Gravis diagram. [15:01.600 --> 15:05.200] The Keita extract language is simple but powerful. [15:05.200 --> 15:10.520] You can describe pretty much any binary format with it. [15:10.520 --> 15:18.840] And a case one specification starts with the meta section and this sets the little end [15:18.840 --> 15:22.760] in byte order as default. [15:22.760 --> 15:28.680] The SEQ section is a sequence of attributes. [15:28.680 --> 15:34.160] The attribute name is in the ID key. [15:34.160 --> 15:45.000] The type U4 means that in this case num underscore files will be an unsigned for byte integer. [15:45.000 --> 15:50.280] You can define your own types in the type section. [15:50.280 --> 15:53.000] A field can also be repeated. [15:53.000 --> 16:04.000] So in this case the files attribute will be a list or an array of base type file. [16:04.000 --> 16:13.320] In the instances section you can define attributes that start at an arbitrary byte position. [16:13.320 --> 16:20.200] You can also use a powerful expression language in many places. [16:20.200 --> 16:27.800] And there is another built-in type is a character string in a certain encoding. [16:27.800 --> 16:38.040] And if you omit the type and only specify the size, the result is a byte array. [16:38.040 --> 16:44.680] There are several visualization and dumping tools available for inspecting files. [16:44.680 --> 16:53.560] And this can be useful for, for example, for finding errors, forensic analysis, or debugging. [16:53.560 --> 17:04.880] And the visualizers allow us to view the structured data parts from the input file based on a [17:04.880 --> 17:09.800] kitesh track specification, so something like this. [17:09.800 --> 17:19.080] And you can use the console visualizer or also the command line to case dump is available, [17:19.080 --> 17:24.720] which can give you the same structured data as you can see in JSON format. [17:24.720 --> 17:30.640] And this can be useful for automation. [17:30.640 --> 17:35.600] But the most popular visualization tool is the Web IDE. [17:35.600 --> 17:39.880] You can check it out on this URL. [17:39.880 --> 17:48.160] And at the top right is a hex-dump of the input binary file. [17:48.160 --> 17:57.280] So in this case I selected this.png file in the file tree on the left. [17:57.280 --> 18:06.480] And at the top left is the kitesh track specification editor, so a KSY spec editor. [18:06.480 --> 18:13.800] And according to the kitesh track specification, the input file is parsed and the result is [18:13.800 --> 18:20.000] the structured data that you see in the object tree at the bottom, bottom left. [18:20.000 --> 18:26.120] And when you edit the kitesh track specification, the input file is automatically parsed again [18:26.120 --> 18:33.520] and the object tree is updated. [18:33.520 --> 18:42.200] Serialization is a new feature in kitesh track and it's being developed thanks to the financial [18:42.200 --> 18:47.800] support of the NLNET Foundation. [18:47.800 --> 18:56.080] While parsing allows you to read binary data to an object, serialization is all about [18:56.080 --> 18:58.080] the inverse process. [18:58.080 --> 19:04.240] So we want to write an object to binary data. [19:04.240 --> 19:16.800] And currently in kitesh track, the serialization for support for Java is fully working and [19:16.800 --> 19:23.160] C-sharp and Python are in development. [19:23.160 --> 19:27.240] There are basically two use cases of serialization. [19:27.240 --> 19:34.960] You can edit an existing file or you can create a new file from scratch. [19:34.960 --> 19:43.240] And the support for serialization greatly extends the use of all written format specifications [19:43.240 --> 19:49.160] because now you can use them not only for parsing but also for serialization. [19:49.160 --> 19:57.480] And this has many uses, for example, you can convert one format into another or it can [19:57.480 --> 20:06.780] be used for fuzzing or video games modding and so on. [20:06.780 --> 20:12.960] This serialization process in kitesh track can be divided into four phases. [20:12.960 --> 20:20.720] First you need to create a ks object and then you fill it with data. [20:20.720 --> 20:27.880] So you set its individual fields or attributes using setters. [20:27.880 --> 20:33.560] Then you should call the underscore check method to check the consistency of the data [20:33.560 --> 20:37.400] with the format constraints. [20:37.400 --> 20:50.720] Finally, we can call underscore write and pass the stream where to write. [20:50.720 --> 20:58.920] And you can actually check out more details of how to use serialization in Java on this [20:58.920 --> 21:01.280] page. [21:01.280 --> 21:12.320] Currently, the serialization support in kitesh track is designed for the general case so [21:12.320 --> 21:16.760] that it works for every conceivable format specification. [21:16.760 --> 21:25.240] While a simple solution would work for perhaps most specifications, well, the solution that [21:25.240 --> 21:29.600] works for all of them was chosen. [21:29.600 --> 21:33.720] Even at the cost of delegating some task to the user. [21:33.720 --> 21:40.960] In the future, I would like to automate these tasks that need to be done manually at the [21:40.960 --> 21:47.560] moment so that it's more convenient for the user. [21:47.560 --> 21:56.840] The basic idea is that the user sets everything, including lengths of sets, magic signatures [21:56.840 --> 22:00.960] and kitesh track checks for consistency. [22:00.960 --> 22:06.000] Also, only fixed length streams are considered. [22:06.000 --> 22:11.800] So once you create a stream, you cannot resize it. [22:11.800 --> 22:20.640] Finally, I would like to talk about the plans for the future. [22:20.640 --> 22:30.800] Design for C-sharp and Python is in development and they should be ready in two months. [22:30.800 --> 22:37.640] There is also interest in adding Rust, C and Julia as target languages. [22:37.640 --> 22:45.240] And I would also like to see Wireshark desectors as a target because the concept of kitesh [22:45.240 --> 22:48.440] is not limited to programming languages. [22:48.440 --> 22:56.200] A target can be anything, for example, we already have a target for construct, which [22:56.200 --> 23:04.640] is a Python library for parsing and serialization of binary data. [23:04.640 --> 23:05.640] Thanks for listening. [23:05.640 --> 23:33.200] Now it's time for our questions. [23:33.200 --> 23:47.840] Yes, there is a dock key for which you can use on attributes and types in many places [23:47.840 --> 24:00.640] and you can write some documentation of the specific element and in some languages, but [24:00.640 --> 24:14.880] it doesn't work like 100% of the time, but the idea is that these documentation should [24:14.880 --> 24:27.000] translate to the generated parser as dock blocks and then the IDs and tools for development [24:27.000 --> 24:32.280] should autocomplete usually this documentation. [24:32.280 --> 24:43.280] Do you support in DNS when generating source code, depending on the target machine? [24:43.280 --> 24:56.080] Yes, there is a feature for calculated NDNS, it is called and you can switch the NDNS or [24:56.080 --> 25:05.720] the default NDNS based on the value of an arbitrary expression basically, so this can [25:05.720 --> 25:06.720] ... [25:06.720 --> 25:12.200] But do you support host NDNS and target NDNS? [25:12.200 --> 25:23.560] Well, not really, but it's not that of a limitation because you can, for example, you can use [25:23.560 --> 25:31.400] parameters, for example, to pass it from your application basically because I don't [25:31.400 --> 25:40.360] know if I can... I don't know if it's a good idea, but another feature of KiteStruck [25:40.360 --> 25:48.880] is that you can define that types can have parameters and even the top level... Yeah, [25:48.880 --> 25:57.840] I should probably at least... Never mind, yeah, and you can define parameters and you [25:57.840 --> 26:09.000] can easily pass a parameter from your application that will somehow change the behavior of the [26:09.000 --> 26:13.200] specification over, yeah, so it's possible. [26:13.200 --> 26:24.400] With KSI, you seem to aim to define specification for certain languages or formats, but for [26:24.400 --> 26:30.400] languages and formats that already have a specification, how can you ensure that these two specs are [26:30.400 --> 26:38.400] actually the same and that you're not passing differently than other parts of it? [26:38.400 --> 26:44.800] I don't... Well, you mean that there is already an implementation of some... [26:44.800 --> 26:53.800] For example, someone's passing ZIP files out there, how do you guarantee that KiteStruck [26:53.800 --> 26:59.400] will pass ZIP files the same way? [26:59.400 --> 27:13.000] You don't basically, but from this point of view, it's just another implementation... Well, [27:13.000 --> 27:20.320] if you compare it to other parsers, for example, so there is, for example, a ZIP parser in [27:20.320 --> 27:27.240] every language, yeah, so ZIP parser library and this KiteStruck specification, it's just [27:27.240 --> 27:34.800] another implementation, so, well, it needs to be developed carefully so that it works [27:34.800 --> 27:35.800] well or... Yeah. [27:35.800 --> 27:44.800] I guess you would need a way to translate from a written specification to the KataI structure [27:44.800 --> 27:51.280] or the other way around to validate that what you wrote as the script actually corresponds [27:51.280 --> 27:57.040] to the actual specification, for example, if a specification is already matched in machine [27:57.040 --> 28:04.200] written, which is readable, I mean, it's not here, we should have a tool to convert from [28:04.200 --> 28:08.360] one to the other, so that would ensure that the passing is correct. [28:08.360 --> 28:14.400] But it doesn't help, because the implementation is done by humans, it's impossible, it's [28:14.400 --> 28:15.400] impossible. [28:15.400 --> 28:16.400] It's just an introduction. [28:16.400 --> 28:17.400] You have to run all those things. [28:17.400 --> 28:18.400] Why? [28:18.400 --> 28:45.920] I'm wondering if it would be possible to add some functionality to that, not only parsing [28:45.920 --> 28:52.480] but some very common functionality, do you think you can add that in the highest form? [28:52.480 --> 28:57.760] Common functionalities, so... Like, for example, there's a binary format and there's [28:57.760 --> 29:04.320] very common functionality everybody uses on that, let's say, like, I don't know, cutting [29:04.320 --> 29:11.960] a part of it or getting, calculating some, I don't know, value, magic value or hash value, [29:11.960 --> 29:19.040] could you add some extra functionality other than parsing in there? [29:19.040 --> 29:27.280] Well, so the question was that if you can, if we can add some common functionality in [29:27.280 --> 29:37.120] addition to the format specification, and the answer is that, well, you can do this [29:37.120 --> 29:46.000] to a certain extent, because there are, I didn't mention them or talk about them, but [29:46.000 --> 29:57.960] there are value instances, and you can prepare some, you can, this is like a calculated attribute, [29:57.960 --> 30:03.520] so you can write an arbitrary expression to it, and this can calculate, for example, [30:03.520 --> 30:14.520] some, like, I wrote a, I wrote a BMP specification or I extended it, and I used this, for example, [30:14.520 --> 30:23.000] to, well, in the BMP format, there are like color masks in different places, depending [30:23.000 --> 30:33.480] on the head version, and I used a value instance to get it from, so, depending on [30:33.480 --> 30:44.000] the version, so either get it from here on here, or if it's a fixed, fixed, I don't [30:44.000 --> 30:50.720] know if it's fixed core palette, or what is it called, so, yeah, we can do this to [30:50.720 --> 31:00.800] a certain extent, but some common functionality, like, I don't know some, well, if it would [31:00.800 --> 31:09.320] require, like, a programming language or something like that, so this would be infeasible, basically, [31:09.320 --> 31:22.200] because then, then we should, we would have, we would have to some, some programming languages, [31:22.200 --> 31:31.240] something language that translates to all targets, which is basically impossible, I [31:31.240 --> 31:32.240] think, yeah. [31:32.240 --> 31:52.760] There is some different type of learning, like service, you know, testing, you are [31:52.760 --> 31:57.760] this tool set to write a comprehensive diff tools [31:58.320 --> 32:01.080] that explains the differences between two binaries [32:01.080 --> 32:06.080] and that can leverage the existing descriptions [32:07.880 --> 32:11.880] to explain what the difference became to find. [32:13.240 --> 32:18.240] Yes, so I think you can compute some diff. [32:18.240 --> 32:23.240] Basically, I would do it, I showed the ksdump tool here. [32:24.920 --> 32:29.920] So I think you could generate the gson dumps of the two files [32:31.840 --> 32:36.840] and compare them, but when I did this, [32:37.240 --> 32:40.240] it was usually very, very massive, [32:40.240 --> 32:49.240] but you can probably improve that somehow, I don't know. [32:49.240 --> 33:02.240] But it's, yeah, okay, so thanks and...