[00:00.000 --> 00:13.440] Hello, Fozdom. I am Nandini Jomsanas. I am a software toolchain engineer at Ember Cozum. [00:13.440 --> 00:23.000] I lead the Core 5 GNU toolchain project. I am also a UK electronic scholar from UK ESF. [00:23.000 --> 00:34.000] UK ESF encourages young electronic scholars, students to study electronics and pursue a career in the sector. [00:34.000 --> 00:45.000] UK ESF also connects top UK universities with leading employees. [00:45.000 --> 01:01.000] In this talk, I will be giving you a tutorial on how to add a GCC built-in to the RISC-5 compiler. [01:01.000 --> 01:08.000] Okay, so what is a built-in? Well, in C++ and C, there are two types of functions. [01:08.000 --> 01:21.000] You've got your user defined functions and your built-in functions. User defined functions are functions that the programmer has defined within their code so they can use it. [01:21.000 --> 01:26.000] But a built-in function are functions that are already implemented in the compiler. [01:26.000 --> 01:35.000] So the programmer doesn't need to write specific code for it and can directly use these built-ins. [01:35.000 --> 01:41.000] Many low-level architectures in GCC use built-ins. [01:41.000 --> 01:52.000] Built-ins look superficially like any C function, but there are intrinsics to the compiler which are directly implemented within. [01:52.000 --> 02:05.000] These built-ins have specific patterns to be matched in the machine description file and have access to unique individual machine functionalities. [02:05.000 --> 02:16.000] Because they are integrated within GCC, they are more efficient than using just simple inline assembly. [02:16.000 --> 02:28.000] For RISC-5, this presents an excellent opportunity to expose the ISA extension to C and C++ programmers. [02:28.000 --> 02:35.000] This is an example of a simple built-in in GCC which takes the square root of a float. [02:35.000 --> 02:45.000] There are tons and tons of GCC built-ins, but I don't know if you know, but there's probably like two in RISC-5. [02:45.000 --> 02:50.000] And this is why I'm giving you a tutorial about it so we can add more. [02:50.000 --> 02:56.000] It is important to say that yes, we call it a built-in function, but it's not really a function. [02:56.000 --> 03:07.000] There are any corresponding entry or exit points and a just cannot be obtained. [03:07.000 --> 03:12.000] Here is the square root float built-in that is implemented in GCC. [03:12.000 --> 03:21.000] If you want to find it in GCC built-ins.dev, all of the source code will be linked at the end, so don't worry, I will give that to you. [03:21.000 --> 03:38.000] And if you want to make a specific RISC-5 built-in, then you would go into the link below, or the path at the below, which will be in RISC-5 built-ins.cc. [03:38.000 --> 03:43.000] Yes, I'm talking a lot about built-ins, we could simply just use inline assembly. [03:43.000 --> 03:49.000] But this is why we shouldn't be using inline assembly. [03:49.000 --> 03:57.000] If you want to use inline assembly, you have to annoyingly specify the pattern every single time you use inline assembly. [03:57.000 --> 03:59.000] Sometimes you can get it wrong. [03:59.000 --> 04:07.000] GCC does not know about this built-in, so there's a huge risk of data flow information being lost. [04:07.000 --> 04:16.000] Again, GCC does not know about this instruction that you're using with inline assembly, so optimization cannot be used. [04:16.000 --> 04:25.000] The reason we use built-in functions, well, all of your data flow information will be retained. [04:25.000 --> 04:28.000] Patterns can be recognized and used elsewhere by GCC. [04:28.000 --> 04:33.000] You only need to specify the pattern once, and that will be in the machine description file. [04:33.000 --> 04:40.000] And then, voila, you just need to use your built-ins, put in the arguments, and the programmer will be fine. [04:40.000 --> 04:47.000] And again, with built-in functions, they're implemented directly in the compiler. [04:47.000 --> 04:51.000] So GCC will know about it and can use their optimization flags. [04:51.000 --> 04:54.000] What do I talk about when I say optimization? [04:54.000 --> 04:59.000] Well, GCC has a bunch of optimization flags. [04:59.000 --> 05:02.000] Here are two that I'm currently using as an example. [05:02.000 --> 05:07.000] The first one is with the flag minus 0. [05:07.000 --> 05:10.000] I don't think that is. That's the basic level of optimization. [05:10.000 --> 05:13.000] In fact, I don't think that's any optimization at all. [05:13.000 --> 05:20.000] This is just hardcore assembly, which you will use for cv.er, which I'll explain later. [05:20.000 --> 05:29.000] And when you use an optimization flag, minus 02, that will increase performance, reduce compilation time. [05:29.000 --> 05:37.000] GCC optimizes those assembly instructions because it knows that it doesn't need to be used. [05:37.000 --> 05:44.000] You might have noticed that I'm using cv.erw, probably wondering what the hell that is. [05:44.000 --> 05:54.000] Well, cv.erw is part of cv3 to e4ep iso extensions, also core 5 iso extensions. [05:54.000 --> 05:59.000] The cv.erw is part of event load extension. [05:59.000 --> 06:06.000] We are currently implementing version 2 of this in Open Hardwares core 5 GCC and binutils. [06:06.000 --> 06:14.000] The first set of extensions, the first set of versioning has the first five extensions, [06:14.000 --> 06:19.000] and then version 2 has event load, SIMD and bit manipulation. [06:19.000 --> 06:27.000] I would like to emphasize that all of these extensions and instructions are in binutils, the assembly and the linker. [06:27.000 --> 06:34.000] But it's time to add built-ins in GCC. [06:34.000 --> 06:37.000] I am going to be using event load for this tutorial. [06:37.000 --> 06:45.000] This is because event load only has one instruction, so it's a very beginner-friendly task. [06:45.000 --> 06:59.000] That instruction is cv.erw, which will load a word and cause the cv3 to e4ep process cycle to go into sleep state. [06:59.000 --> 07:07.000] This is an instruction that GCC will not know about because it's very machine-specific. [07:07.000 --> 07:12.000] Thus, we need a built-in. [07:12.000 --> 07:20.000] Before we get into all of this, it is very important to call out the naming conventions of these built-ins. [07:20.000 --> 07:27.000] A general convention name for a built-in in GCC will just be built-in and then the instruction name. [07:27.000 --> 07:34.000] But if you want to make it a RISC-5 specific built-in, it will be built-in RISC-5, the vendor and the name. [07:34.000 --> 07:43.000] For a core-5 specific one, it will be built-in RISC-5, cv4 core-5, the extension name and then the instruction name. [07:43.000 --> 07:52.000] Yes, I understand it's a bit long-winded, but it is very important to emphasise which vendor, which architecture you want to use, [07:52.000 --> 07:54.000] what extension, what instruction. [07:54.000 --> 08:00.000] It just makes it a lot easier for the programmer to know which instructions they want to use. [08:00.000 --> 08:07.000] So for my built-in, and if you want to use it, it will be called underscore underscore built-in underscore RISC-5, [08:07.000 --> 08:11.000] underscore cv underscore aw underscore aw. [08:11.000 --> 08:21.000] Because there's only one instruction, I just call it the same thing. [08:21.000 --> 08:25.000] So this is an example of how to use this built-in. [08:25.000 --> 08:28.000] This built-in will take a void pointer. [08:28.000 --> 08:38.000] It will be loading it from a specific memory address and then loading it into a general-purpose register, [08:38.000 --> 08:46.000] which is an unsigned day-to-bit integer. [08:46.000 --> 08:57.000] From this example, yes, the only thing you'll have to do is just put in the pointer and it will return your unsigned day-to-bit integer value. [08:57.000 --> 09:04.000] Can you speak a little louder, please? [09:04.000 --> 09:09.000] Oh, okay, sorry. [09:09.000 --> 09:18.000] Now that I've spoken about what event load is, it's time to add an extension to GCC. [09:18.000 --> 09:34.000] So most of these implementation for adding an extension will be in RISC-5.common.cc. [09:34.000 --> 09:40.000] So we've called our extension xcv, which will be the main extension, [09:40.000 --> 09:45.000] and then you'll have a sub-extensions, which will be xcvew. [09:45.000 --> 09:50.000] There isn't any ISO-specific class, so I'll just use a macro none, [09:50.000 --> 09:57.000] and this will be the first version of it. [09:57.000 --> 10:05.000] Because I am implementing a sub-extension, we'll have to imply it here by putting the sub-extension first [10:05.000 --> 10:13.000] and then the main or parent extension. [10:13.000 --> 10:18.000] Next, we add the corresponding masks and targets. [10:18.000 --> 10:30.000] Before we do all of this, we need to go into RISC-5.opt to emphasize or add the target variable and the corresponding core five flags. [10:30.000 --> 10:40.000] This file is very sensitive, and so you'll have to, even though it's two lines, if you mess it up, then you've got GCC crashing everywhere. [10:40.000 --> 10:51.000] So you have to be very careful in this file, and then you use that flag for your corresponding target, [10:51.000 --> 10:58.000] but you also use it when you have to specify your GCC options. [10:58.000 --> 11:09.000] So I've done that in RISC-5.common.cc, which is here. [11:09.000 --> 11:17.000] Now it gets into the interesting stuff to actually define the built-in. [11:17.000 --> 11:23.000] RISC-5 has a function already made for us, so we can make these built-ins. [11:23.000 --> 11:27.000] That is in RISC-5's built-ins.cc. [11:27.000 --> 11:33.000] It takes in five arguments, and I'll be going through all of these in the following slides. [11:33.000 --> 11:46.000] That'll be the instruction name, the built-in name, built-in type, function type, and availability predicate. [11:46.000 --> 11:54.000] So using this function, I have created my own file, which is called corev.dev, [11:54.000 --> 12:00.000] and this is where all the corev-related built-ins will be in. [12:00.000 --> 12:11.000] My first built-in will be in corev.dev, and the name of the instruction name will be C-V-E-L-W-S-I for single integer. [12:11.000 --> 12:17.000] The name of the built-in that the programmers will be using will be C-V-E-L-W-E-L-W, [12:17.000 --> 12:21.000] but that will be expanded to built-in RISC-5. [12:21.000 --> 12:32.000] Then you've got the corresponding built-in types, function types, availability predicate, and I'll go into that more. [12:32.000 --> 12:41.000] So the instruction patterns, this is probably the most difficult part of the whole built-in implementation. [12:41.000 --> 12:47.000] So the insert name is the name of the associated instruction pattern in the machine description file. [12:47.000 --> 12:52.000] It uses, it takes in five operands, but the last operand is optional, [12:52.000 --> 12:56.000] but I recommend you putting in if you can. [12:56.000 --> 13:05.000] You've got the name, you've got the RTL template, conditions, output template, and instant attributes, [13:05.000 --> 13:13.000] and that will be all in RISC-5.md, but I will be creating my own md for corev-specific, [13:13.000 --> 13:20.000] so we don't merge it into RISC-5.md. [13:20.000 --> 13:25.000] So this is an example of RTL templates or register transfer language. [13:25.000 --> 13:33.000] It's a template that is very, very similar to intermediate representation that GCC uses. [13:33.000 --> 13:42.000] It's a template that GCC will take and then put in the corresponding registers or operands that it needs to do. [13:42.000 --> 13:48.000] So this is my instruction pattern that I will be using for this built-in. [13:48.000 --> 14:01.000] The name will be RISC-5 underscore CV as we've previously defined it. [14:01.000 --> 14:10.000] I am using the set pattern and this will take a destination register and a source register. [14:10.000 --> 14:18.000] The destination, I think, this will be the destination register, the first operand, [14:18.000 --> 14:28.000] and I've used the match operand pattern which will take m as machine mode and the index of this operand, [14:28.000 --> 14:33.000] the predicate and the constraint. [14:33.000 --> 14:40.000] The machine mode for this will be SI which is a single integer, it's 32 bits. [14:40.000 --> 14:43.000] It's zero for the index of this operand. [14:43.000 --> 14:48.000] We usually start with zero as the indexing. [14:48.000 --> 14:57.000] The predicate for this will be a register operand as we'll be loading it into a general purpose register [14:57.000 --> 15:09.000] and the constraint will be R emphasizing as register equals to meaning it's going to be written to. [15:09.000 --> 15:20.000] Next part of this is the source register which will be the memory specific address. [15:20.000 --> 15:26.000] So we're using mem to specify the size of the object being referenced. [15:26.000 --> 15:30.000] SI being single integer, 32 bits. [15:30.000 --> 15:42.000] Again, we're using match operand to match the register or the pointer to the specific address. [15:42.000 --> 15:48.000] The index number will be one because that's the next number. [15:48.000 --> 15:59.000] I am using an address operand and then p specifying as pointer. [15:59.000 --> 16:06.000] I am using an unspect volatile for this instruction because it's a volatile operation. [16:06.000 --> 16:08.000] It's very machine specific. [16:08.000 --> 16:13.000] It can get difficult and there are times where it could be trapped. [16:13.000 --> 16:29.000] We are referencing in this state that is fragile and vulnerable so that is why I've been using an unspect volatile. [16:29.000 --> 16:38.000] Now that I've talked about the RTL pattern, we talk about the condition. [16:38.000 --> 16:46.000] The condition is important to add so that the instruction can only be generated within these conditions. [16:46.000 --> 17:03.000] You can only generate this pattern if the target is to X call VELW and that it's not a 64 bit target. [17:03.000 --> 17:08.000] Next we talk about the orange bit which is the output template. [17:08.000 --> 17:14.000] The output template will be what you will see in the assembly. [17:14.000 --> 17:21.000] You define it with the instruction name so cv.el and then slash t for tad. [17:21.000 --> 17:27.000] This is where you use those index numbers to reference which operands you want to use. [17:27.000 --> 17:33.000] I will be referencing %0 and then %a1. [17:33.000 --> 17:40.000] %0 will be the destination register and %1 will be the source register. [17:40.000 --> 17:50.000] I am using %a to substitute as a memory reference. [17:50.000 --> 17:59.000] Lastly we talk about the optional operand but again this is something we should try to put in if you are going to add a built-in. [17:59.000 --> 18:09.000] We want to tell GCC that this is a load type of instruction and the mode is SI throughout the whole built-in. [18:09.000 --> 18:28.000] The reason I have added this optional operand is that the instruction can still be generated but GCC can now optimise it knowing that it is a load, knowing that it is in machine mode SI. [18:28.000 --> 18:32.000] That is now the big part of the built-in. [18:32.000 --> 18:37.000] We have discussed the instant name and the template name. [18:37.000 --> 18:39.000] Here it comes to the built-in types. [18:39.000 --> 18:46.000] In RISC-5 there are currently only two types of built-in types. [18:46.000 --> 18:52.000] Those built-in types can be found in RISC-5 built-ins.cc. [18:52.000 --> 18:59.000] This is RISC-5 built-in direct and RISC-5 built-in direct no target. [18:59.000 --> 19:12.000] RISC-5 built-in direct corresponds directly to a machine pattern we have just created whereas RISC-5 built-in direct no target does the same thing but the return type will be void. [19:12.000 --> 19:18.000] But we are returning a general register operand or theta bit unsigned integer. [19:18.000 --> 19:26.000] So we will be using RISC-5 built-in direct. [19:26.000 --> 19:29.000] Next comes the function types. [19:29.000 --> 19:35.000] And again, everything is in RISC-5 built-ins.cc. [19:35.000 --> 19:41.000] And currently there are only two types of prototypes for RISC-5. [19:41.000 --> 19:45.000] You can only return. [19:45.000 --> 19:47.000] You can only have a returning type. [19:47.000 --> 19:51.000] You can only have a return type and one argument. [19:51.000 --> 20:06.000] In coming presentations I will be talking about it a bit more because I only have 45 minutes to talk about this presentation. [20:06.000 --> 20:18.000] When it comes to defining which return types and argument types we are using that will be in RISC-5-f types.dev. [20:18.000 --> 20:36.000] So the comment says that it will expand to RISC-5 underscore unsigned integer and then avoid pointer because that's what I will be using for my built-in type. [20:36.000 --> 20:39.000] Lastly we have the availability predicate. [20:39.000 --> 20:45.000] This is very similar to the conditions we had in the RTL template. [20:45.000 --> 20:52.000] So we use this avail function that has been declared in RISC-5 built-ins.cc. [20:52.000 --> 20:57.000] It takes the name of your availability predicate and then the corresponding conditions. [20:57.000 --> 21:13.000] As you can see it's very similar to the condition we had in the RTL template which is a target reference and then it's not a 64-bit target. [21:13.000 --> 21:22.000] Now that we've added the extension and the instruction and the built-in it's time to test it. [21:22.000 --> 21:26.000] And this is a very simple test just to make sure that it works. [21:26.000 --> 21:28.000] It's a compilation test. [21:28.000 --> 21:32.000] It takes in a void pointer with an offset. [21:32.000 --> 21:35.000] It returns an unsigned 32-bit value. [21:35.000 --> 21:38.000] You can see there are comments on the side. [21:38.000 --> 21:40.000] These are deja vu comments. [21:40.000 --> 21:49.000] We are using deja vu because we want to use a simulator or it can be used on microcontrollers. [21:49.000 --> 21:56.000] It's a framework testing model that we use for our test scripts. [21:56.000 --> 22:04.000] The first comment we'll talk about telling it it can be an execution or a compilation test. [22:04.000 --> 22:09.000] So this will be a compilation test because we haven't got an executable target yet. [22:09.000 --> 22:13.000] The second line is to tell you the options for this built-in. [22:13.000 --> 22:24.000] If you don't specify the options then this test won't run because this instruction only works within X core VELW. [22:24.000 --> 22:38.000] And then the last line or the last comment will be for checking if our instruction has been generated in the assembly. [22:38.000 --> 22:41.000] And it should be generated once. [22:41.000 --> 22:43.000] There are dashes to escape. [22:43.000 --> 22:59.000] It's very sensitive because it's a regular expression type of framework. [22:59.000 --> 23:02.000] We've got a run script for this. [23:02.000 --> 23:10.000] It's very important to build GCC because I've been running tests without building GCC and wondering why it doesn't work. [23:10.000 --> 23:15.000] And it wasn't until our GCC experts told us, no, you've got a run build. [23:15.000 --> 23:23.000] You have to run GCC and then run it. [23:23.000 --> 23:34.000] So this shows the results from our run test scripts. [23:34.000 --> 23:39.000] Although it's just one test, there are 18 passes. [23:39.000 --> 23:43.000] That is because it goes through nine optimization levels. [23:43.000 --> 23:57.000] The optimization level goes through a scan assembly test and then a compilation test. [23:57.000 --> 24:04.000] Like I promised, I put up the slides for where all of this will be found. [24:04.000 --> 24:11.000] This will be found in GitHub's Open Hardware Core 5 Vinutils and Core 5 GCC. [24:11.000 --> 24:14.000] This is also part of the Open Hardware group. [24:14.000 --> 24:22.000] We are still looking for volunteers and people to contribute to this project. [24:22.000 --> 24:27.000] And it's very important to also mention the GCC internals manual. [24:27.000 --> 24:30.000] It's probably the guru of GCC. [24:30.000 --> 24:36.000] That's what I rely on the most now. [24:36.000 --> 24:49.000] Thank you for listening to my presentation. Do you have any questions? [24:49.000 --> 24:50.000] Yes? [24:50.000 --> 24:51.000] I have a question. [24:51.000 --> 25:00.000] So I know that these built-in functions are used by the code people, which I think is what came before the Core 5 project, right? [25:00.000 --> 25:08.000] I think they use it for various mathematical functions to speed them up. [25:08.000 --> 25:19.000] I was just wondering, what I'm interested in, what I'm working on, is using higher level compilers to compile into automatically generated kernels. [25:19.000 --> 25:26.000] What's not clear to me right now is that if I use a built-in, then I would need to compile to a C code, right? [25:26.000 --> 25:35.000] Is there any way that you can still reuse part of this work without having to use C code, or would you always need to go to C code? [25:35.000 --> 25:39.000] For now, I've just been using C code, so I'm not really sure. [25:39.000 --> 25:42.000] I don't know. [25:42.000 --> 25:44.000] If you've got a fault, I'm fine. [25:44.000 --> 25:49.000] There's the C API, so you can sort of wire it into it. [25:49.000 --> 25:55.000] This is in the compiler, so you just need to find your own code to reach to the client. [25:55.000 --> 25:58.000] So in this case, you would also use these things in Fortran code. [25:58.000 --> 26:00.000] You could, yeah. [26:00.000 --> 26:08.000] I have an amazing that myself, I've been working with the staff, so there's no reason for this not to work. [26:08.000 --> 26:14.000] It's expressed in terms of a C code, so it has to be expressed somehow. [26:14.000 --> 26:20.000] I was a bit confused more about the built-in concept in general, [26:20.000 --> 26:25.000] because I mean, usually people use C code to not be machine specific, [26:25.000 --> 26:29.000] but if you use it like a built-in, then you become machine specific, right? [26:29.000 --> 26:30.000] Yeah. [26:30.000 --> 26:31.000] Oh, yeah. [26:31.000 --> 26:34.000] It depends on the built-in. [26:34.000 --> 26:37.000] GCC has built-ins that are sort of general. [26:37.000 --> 26:46.000] I mean, like all the maths functions, for example, like a body of maths, it's not machine specific. [26:46.000 --> 26:50.000] And it says, obviously, compiler specific. [26:50.000 --> 26:58.000] It's not that specific in this case, yeah, but because you can have other kind of other mathematics. [26:58.000 --> 26:59.000] Yeah. [26:59.000 --> 27:02.000] Okay, at least architecture specific, right? [27:02.000 --> 27:05.000] Well, actually it is not architecture specific. [27:05.000 --> 27:07.000] It's a general. [27:07.000 --> 27:12.000] Yeah, but even for mathematics built-in functions, you always have, [27:12.000 --> 27:17.000] not always, but mostly, yeah, kind of architecture specific. [27:17.000 --> 27:22.000] Oh, yeah, there can be stuff like encoding of numbers or such like. [27:22.000 --> 27:23.000] Yeah. [27:23.000 --> 27:27.000] It's a sort of, you know, just because. [27:27.000 --> 27:31.000] So it should work, yeah. [27:31.000 --> 27:37.000] Actually, that's one way to avoid these architecture specific. [27:37.000 --> 27:45.000] Like, rather than encoding a non-pattern into your code, just by using a constant or bit pattern [27:45.000 --> 27:53.000] and then sort of casting to proper floating point type, you can use built-in non. [27:53.000 --> 28:08.000] It's a built-in function that produces the correct encoding of a non for your target. [28:08.000 --> 28:10.000] Okay, thank you for listening to my presentation. [28:10.000 --> 28:12.000] Thank you. [28:12.000 --> 28:13.000] For me. [28:13.000 --> 28:24.000] Thank you.