[00:00.000 --> 00:17.000] Okay. Hello, everybody. So, my name is Doji. I work in the tools group at Red Hat. And [00:17.000 --> 00:24.800] so we're here today to, okay, first of all, thank you for staying. So, yeah, I wanted [00:24.800 --> 00:34.720] to talk about application binary interface analysis today. And, okay, first of all, who [00:34.720 --> 00:44.920] doesn't know about Lubabigel and ABI stuff? So, okay, so I think we'll have something [00:44.920 --> 00:52.480] for you guys. So, what are we going to talk about? So, first of all, I'll introduce what [00:52.480 --> 01:02.480] ABIgel is, and we'll look at how it works, what the project brought recently, and what [01:02.480 --> 01:12.760] we're looking for as far as the future goes. So, ABIgel is about doing analysis of application [01:12.760 --> 01:21.480] binary interfaces. So, it's a set of tools that can do things like compare the ABI of [01:21.480 --> 01:34.160] two binaries, or store the ABI of a binary onto a, you know, disk format. It can do comparison [01:34.160 --> 01:43.400] of binaries, you know, that are in packages, like Debian packages, RPMs, star files, etc. [01:43.400 --> 01:53.600] And it is also a shared library that you can use to write more tools if you want. So, that's [01:53.600 --> 01:59.520] all well and nice as far as marketing goes, but then let's look at, you know, what we [01:59.520 --> 02:06.240] mean by ABI. So, suppose you have a simple program, well, a simple binary which has, [02:06.240 --> 02:13.960] okay, simple, no, very complicated, let's say, which has, you know, three functions [02:13.960 --> 02:21.280] that are here. The types of the functions are defined here in a simple hierarchy. Here [02:21.280 --> 02:27.760] you have the first type that inherits, you know, S0 which inherits base type, and let's [02:27.760 --> 02:39.840] say another type here that inherits S0. Okay. So, that's the first version of it. Let's [02:39.840 --> 02:48.000] see if, I don't know if it compiles. Yes, it does. Then I have a second version of it [02:48.000 --> 02:54.400] which looks quite the same, but, okay, what does it do? Okay, what's the difference between [02:54.400 --> 03:01.760] the two? Very simple. I just inserted a, you know, data memba in the base class, and we [03:01.760 --> 03:08.000] want to know what the impact of this is on the ABI, you know, as far as the binary goes. [03:08.000 --> 03:15.400] So, where am I here? I'm in the source code of the project, and so I've built a version [03:15.400 --> 03:22.560] of it. And so, here we have one of the tools which name is ABI diff, which does what you [03:22.560 --> 03:33.680] think it does. And so, if I run it, what does it say? Basically, there are two changes [03:33.680 --> 03:39.800] as far as ABI goes in that battery due to the change of sin. So, the first change is [03:39.800 --> 03:46.280] about, you know, the first function which is here. And so, it is telling us basically [03:46.280 --> 03:54.160] that that function has a parameter type that changed. And the change is about this, you [03:54.160 --> 04:02.200] know, structure, remember? Something is interesting. The size hasn't changed, even though I've [04:02.200 --> 04:08.760] added, you know, a data member in there. So, yeah, you know the drill, right? If you don't, [04:08.760 --> 04:15.600] I can, you know, explain it more. But size hasn't changed. The base class has changed, [04:15.600 --> 04:23.120] and the change here is a data member insertion at a certain offset, blah, blah, blah. So, [04:23.120 --> 04:32.240] this is the impact of the change of the first type on the first interface, right? And so, [04:32.240 --> 04:42.800] there is another interface that got impacted, right? And the parameter of that function, [04:42.800 --> 04:50.000] which was struct s1, changed as well. The base class changed. The base class was struct [04:50.000 --> 04:57.880] s0, right? And the details of s0 change were reported earlier. So, we don't have to, you [04:57.880 --> 05:06.600] know, repeat it again, right? So, here you see that we compute the changes, right? And [05:06.600 --> 05:12.760] we also analyze those changes so that we can detect if things have been, you know, reported [05:12.760 --> 05:22.640] earlier or not. And also, we mess up with more stuff, because here we say, for instance, [05:22.640 --> 05:29.960] that there were two changes, for instance. But one got filtered out. What does that mean? [05:29.960 --> 05:49.960] So, let's see, for instance, if I recall the... Okay, I'll add a special... So, I've asked [05:49.960 --> 05:55.560] ABIDF to show me what, you know, to show me redundant changes, because by default it removes [05:55.560 --> 06:00.600] redundant changes. And we see that we have the third function that was impacted as well [06:00.600 --> 06:09.600] by, you know, the change we created. And so, well, all the changes that, you know, impact [06:09.600 --> 06:18.040] function three were already reported. So, this is why it was suppressed. That change [06:18.040 --> 06:23.200] was suppressed by default, because it was redundant. So, it's not just... We're not just [06:23.200 --> 06:29.600] diffing things. We're analyzing the diffs, and we're trying to, you know, massage those [06:29.600 --> 06:40.320] diffs so that they can be consumed by human beings. So, this is what we mean by analyzing [06:40.320 --> 06:51.320] ABI's, basically. So, how it works. The library used to implement the tools has a front-end, [06:51.320 --> 06:58.000] which is kind of backward. The front-end reads the binary. Usually, it is back-ends that [06:58.000 --> 07:04.320] writes binaries, but, okay, here, backward. So, we read the binary, which has to be in [07:04.320 --> 07:10.280] the ELF format right now, and we build an internal representation of it. We look at [07:10.280 --> 07:18.840] the publicly defined and exported symbols of, you know, declarations, basically functions [07:18.840 --> 07:24.960] and variables. We build a representation of them and their types. And then, we construct [07:24.960 --> 07:31.640] the graph of the types like that and their subtypes, and we pull all that together, and [07:31.640 --> 07:38.760] we call that an ABI corpus. A corpus is an artifact for us that represents the ABI of [07:38.760 --> 07:46.360] the binary we were looking at. And so, there is a middle-end that acts on that internal [07:46.360 --> 07:53.120] representation. Said otherwise, it acts on ABI corpora. Corpora being the plural of corpus [07:53.120 --> 08:01.120] in Latin, right? Let's be pedantic. So, we can, as you've seen, compare two instances [08:01.120 --> 08:09.280] of ABI corpus. Then, we build an internal representation of the result of the comparison. [08:09.280 --> 08:17.200] We call that an diff IR. So, it's a different IR. And then, we perform transformations on [08:17.200 --> 08:24.280] that diff IR, like categorization. So, we would walk the graph and say, okay, this [08:24.280 --> 08:32.400] diff node, we've seen it before. So, we'll mark it as being redundant to this other one. [08:32.400 --> 08:39.080] And then, they can be, you know, transformations that are suppression as well. Well, suppression. [08:39.080 --> 08:46.840] We will mark the nodes as being suppressed. For instance, because the user wrote something [08:46.840 --> 08:54.760] that we call a suppression specification file requiring that some types changes might not [08:54.760 --> 09:06.760] be reported. So, once we have that well-massaged diff IR, we have backends that walk that diff [09:06.760 --> 09:15.920] IR, obviously, or the initial IR and do useful stuff, like writing, you know, emitting reports, [09:15.920 --> 09:25.880] for instance, or emitting, you know, the representation of the ABI corpus in a disk-saved format that [09:25.880 --> 09:36.440] we called ABI XML. So, what we've done recently, so I'm going, you know, a bit fast because, [09:36.440 --> 09:42.680] you know, to let time for questions and stuff, and we can go on and, let's say, not very [09:42.680 --> 09:48.600] structured discussion afterwards, if you like. So, yeah, in the recent times, what we've [09:48.600 --> 09:57.320] done is, well, you know Dwarf, you know that it changes all the time with new versions [09:57.320 --> 10:08.680] of Dwarf producers. So, with GCC-11 and LLVN-14, the default Dwarf version was bumped to the [10:08.680 --> 10:14.920] version 5, which is quite ancient, actually. I think it was released in 2017 or something. [10:14.920 --> 10:23.640] So, yeah, we support most of that right now. And another major thing that happened recently [10:23.640 --> 10:34.200] was that thanks to folks in this room that I won't, don't worry, I won't give your name. [10:34.200 --> 10:40.440] New debug info format were added because, yeah, we started with Dwarf only. And so, the [10:40.440 --> 10:46.120] CTF debug info format was, support was added to the Babigel. So, basically now, if you [10:46.120 --> 10:56.040] have a binary having CTF and or Dwarf, you can choose whatever you want to use as a source [10:56.040 --> 11:07.840] of type information. So, things being how they are, the code got changed a bit to, you [11:07.840 --> 11:17.920] know, to be turned into a multi front-end architecture. We also have a multi backend [11:17.920 --> 11:24.520] architecture, basically, because we have different types of reports. The one I've shown you [11:24.520 --> 11:31.120] is the default one, which is quite verbose. So, some people like it more terse. So, yeah. [11:31.120 --> 11:39.440] And who knows whatever weird request users might come with in the future. So, yeah. Different [11:39.440 --> 11:48.840] report backends. And, well, it's not, doesn't stop there. We are still working on, you know, [11:48.840 --> 11:56.440] on new stuff while coming from user request. So, yeah, the, apparently the new kids on [11:56.440 --> 12:06.000] the block, well, new kids in town now, cool stuff is BPF, right? And with BPF comes BTF, [12:06.000 --> 12:17.240] which is the type description format of BPF. And so, there were some requests to support [12:17.240 --> 12:25.800] that. So, it is now in mainline, even though it's not in Babigel mainline, but it's not [12:25.800 --> 12:31.480] released yet. It should be released in the next version. So, what do we do with that? [12:31.480 --> 12:41.720] What's that thing? Basically, because BTF describes the C types, basically, we are using [12:41.720 --> 12:49.600] that to compare the interface exposed by the kernel to its modules. So, we're doing that [12:49.600 --> 13:00.520] with CTF already, with BTF now, and also with DWARF. With DWARF, it is much less fast, [13:00.520 --> 13:13.200] shall we say, than with the CTF support and BTF. So, people can, people are using that [13:13.200 --> 13:23.680] feature to, you know, analyze the KABI, basically, kernel ABI, that thing that doesn't exist. [13:23.680 --> 13:31.680] And then we've had, you know, weird project specific request over the year. And the last [13:31.680 --> 13:43.360] one that, you know, came in last month, I say, or yeah, yeah, last month, yeah, in January, [13:43.360 --> 13:47.920] was to have a, I call that the library set ABI analysis. So, basically, it's a project [13:47.920 --> 13:55.600] that has a huge library, a huge library, and they're planning to split it in different [13:55.600 --> 14:05.160] libraries, right? But then they keep ABI compatibility, they're supposed to. And so, they would like [14:05.160 --> 14:13.640] to ensure that the set of, you know, broken down libraries has an ABI that is equivalent [14:13.640 --> 14:18.200] or compatible with the first initial one. This is what I, you know, call the library [14:18.200 --> 14:27.000] set ABI analysis. So, we're going to add support for that in, I don't know if it's going to [14:27.000 --> 14:33.080] be in the next version or not. So, yeah, these are the kinds of things we are, we're working [14:33.080 --> 14:43.000] on. So, yeah. And now, I'll let you ask questions if you, if you have any, yeah. [14:43.000 --> 14:49.520] Does the library have any support for language specific ABI? So, languages are good on top [14:49.520 --> 14:52.920] of C, for example, but they have language schemes? [14:52.920 --> 15:06.040] Yeah, exactly. So, yes. So, there, of course, DWARF is multi-language. So, if the compiler [15:06.040 --> 15:12.320] of that language emits DWARF, then we're good to go. There is a small layer of language [15:12.320 --> 15:18.960] specific stuff we add, you know, for reporting so that we can talk, report stuff in the native [15:18.960 --> 15:24.560] language of the programmer, you know, who wrote the thing. So, to give you a concrete [15:24.560 --> 15:33.240] example, right now, we support C++, C, Fortran. Someone asked me for Rust support. So, we [15:33.240 --> 15:38.240] had that, basically. We have some crashes on OCaml. So, I thought we were supporting [15:38.240 --> 15:42.960] it, too, but I need to do some stuff. So, yeah. Basically, yeah. It needs work, but... [15:42.960 --> 15:47.680] The new language, I just have to define a small layer for the mangling logic. [15:47.680 --> 15:59.240] For the mangling logic. So, okay. I can show you, let me show you an example. So, yeah. [15:59.240 --> 16:07.280] I was writing. So, yeah. Let's see. So, you see, for instance, in C++, we'll compare, [16:07.280 --> 16:17.240] so here, you see this function, the function 3. I'll change it in the second version here, [16:17.240 --> 16:26.320] function 3, and add, and I'll add an integer here, right? Yes. Let's, whoops. We compile [16:26.320 --> 16:48.600] that, whoops. And, whoops. Weird stuff happened. So, look at what it is saying here. So, you [16:48.600 --> 16:56.280] see here, because we're in C++, I changed function 3 in the source code. Yeah. Let me [16:56.280 --> 17:02.880] just, yeah. See? I changed function 3 here, and I added a parameter, you know? That's [17:02.880 --> 17:10.240] what the programmer would say. But then, from the binary standpoint, what we're seeing [17:10.240 --> 17:16.000] is that the first function was removed, and then another one got added. This is because [17:16.000 --> 17:24.480] in C++, the name of the symbols of the two functions, the two versions of the functions, [17:24.480 --> 17:32.400] are different. They have a different mangling, okay? So, we go to, we go from the name of [17:32.400 --> 17:41.280] the symbol to the name of the declaration, right? So, but if I do the same in C, then, [17:41.280 --> 17:50.000] like, yeah, I knew you would ask that question. I don't know you, but, so, and I have second [17:50.000 --> 18:07.160] version here. Boom, boom, boom. And, so, here, some, oh, sorry. I changed the name of, sorry, [18:07.160 --> 18:13.480] I changed the parameter of the function there, but this is in C, okay? And so, if I go in [18:13.480 --> 18:25.840] the, ah, sorry, if I go in the shell and I look at, boom, at the two, so, this is, the [18:25.840 --> 18:31.800] first one was hello, and this one is bye, of course, because I think this is going to [18:31.800 --> 18:42.960] be the last C here, because in C, the name of the two symbols are the same. Now, we say [18:42.960 --> 18:48.080] that the function has changed. So, these are the kind of things that we'll have to adapt, [18:48.080 --> 18:54.120] basically, but there is not much to do. In some cases, you have mangling, and in the [18:54.120 --> 18:59.280] others, other cases, you don't. So, you don't have anything to do with the, for the mangling, [18:59.280 --> 19:05.640] the, you know, does that answer your question? Roughly. Roughly, yeah. You have this code, [19:05.640 --> 19:10.720] part of the code, which decodes the mangled name to an unreadable name. No, no, because [19:10.720 --> 19:18.200] the, the, the, the matching is done by dwarf. So, we know that this symbol is for this declaration. [19:18.200 --> 19:21.800] So, we don't have to do the mangling, the mangling or demangling. We, you know, we'll [19:21.800 --> 19:25.920] look at the addresses and we know that this symbol is for that one. So, yeah, we don't [19:25.920 --> 19:30.000] really care about, yeah. Another, yeah, please go ahead. [19:30.000 --> 19:46.000] Oh, there is none. No, no, no, no, no, no, no, no, it's, so, yeah, I, just to, to refresh [19:46.000 --> 19:53.160] the question, to repeat the question for the, yeah. What are the performance issues, you [19:53.160 --> 19:58.120] know, when we analyze, like, big libraries, like, you know, he said, I love VM, but, you [19:58.120 --> 20:08.200] know, there is WebKit, Gecko, etc., etc. So, we have a, when we're looking at, when we're [20:08.200 --> 20:15.360] looking at dwarf, we have a fundamental problem, which is the duplication of types. Here we [20:15.360 --> 20:23.360] are in the business of comparing things, right? And so, when we compare types, basically, [20:23.360 --> 20:30.120] we are in a, the, the land of quadratic algorithms, right? So, things are inherently slow if we [20:30.120 --> 20:40.160] do them naively, right? And so, the thing is, in dwarf, every single type unit is represented. [20:40.160 --> 20:45.920] But then, when you have a, the final binary, the final shell library, for instance, and [20:45.920 --> 20:51.240] you have, I don't know, you know, 1,000 translation units, and in every single translation unit, [20:51.240 --> 20:56.120] you had the string type, for instance, that was used. Then, you will have the string, [20:56.120 --> 21:02.880] the string type represented 1,000 times, at least, you know, in the, in the, in the dwarf. [21:02.880 --> 21:11.680] And so, we must be sure that those 100 occurrences of string are the one and the same. We can't [21:11.680 --> 21:16.920] just look at the name and say they're the same, because they could be otherwise, right? [21:16.920 --> 21:21.760] And so, we have to compare them and make sure they're the same, and then we'll say, okay, [21:21.760 --> 21:28.160] I'll just keep one and throw away the others. This is the duplication of type, it is called. [21:28.160 --> 21:37.000] And so, this process takes a huge amount of time, which is, well, for, for huge libraries, [21:37.000 --> 21:43.920] it can take, you know, it can take forever. So, we have heuristics to make this thing, [21:43.920 --> 21:52.600] you know, be faster, but then, you know, it takes time. So, we have some of the heuristics [21:52.600 --> 22:01.280] that we're using now are, is in the land of partitioning, like we will do things, you [22:01.280 --> 22:11.000] know, like piecewise, and, and try it so that we can do things in parallel, right? It is [22:11.000 --> 22:16.560] not mainline yet, but this is the, you know, the, the future we're, we're thinking about. [22:16.560 --> 22:23.680] Another approach is to have the types be de-duplicated before we intervene. This is what, for instance, [22:23.680 --> 22:31.600] the CTF guys do with C. So, they will do the de-duplication at debug info production time, [22:31.600 --> 22:37.960] and then in that, in that case, we're golden. There is another, another case where we're [22:37.960 --> 22:43.400] doing that is when we're building distribution packages, like, for instance, I don't know, [22:43.400 --> 22:50.800] RPM or Debian package or whatever, there is a tool which is called DWZ, which does the [22:50.800 --> 22:57.880] de-duplication to one extent. Well, when it works, it works. It does the de-duplication, [22:57.880 --> 23:04.280] but the problem is DWZ has the same issue as us, and sometimes when the binary is too [23:04.280 --> 23:11.560] big, DWZ will just give up, and in that case, well, we have to use our little hands and [23:11.560 --> 23:15.240] do the de-duplication in line, and then, well, we'll spend time. [23:15.240 --> 23:20.800] But this because someone should get DWZ, turn it into a library and put it in the linker? [23:20.800 --> 23:21.800] Yes, and, yes. [23:21.800 --> 23:23.200] Do it in link time? [23:23.200 --> 23:29.640] Yeah, we can, yeah, that's, that's something that, that's one of the things that we need [23:29.640 --> 23:38.240] to do to improve the entire ecosystem of these things, and, yeah, that's definitely, yeah. [23:38.240 --> 23:46.280] So, yeah. So, as, do we have other questions? [23:46.280 --> 23:52.480] Yes, are there any other formats that are on your road map? [23:52.480 --> 23:59.800] Right now, no, but, you know, like, three months ago, BTF was not on my road map, so, you know, [23:59.800 --> 24:03.960] the future is not what it used to be, so, I don't know, yeah. [24:03.960 --> 24:13.600] Anyway, so, yeah, we are on, hosted on Sourceware, we still use mailing lists, you send us patches, [24:13.600 --> 24:22.600] and yeah, you can find us on IRC, on the, on the, of this network, and, well, thank you very much.