[00:00.000 --> 00:14.000] Okay, LibreSock project is creating free and open source ship design for a family of system [00:14.000 --> 00:20.640] on ship products for powering routers, cell phones, laptops, your laptop maybe in the [00:20.640 --> 00:22.080] future. [00:22.080 --> 00:30.520] So it uses the power extension set, augmented with 3D upcodes, accelerating 3D graphics [00:30.520 --> 00:38.320] and also video acceleration upcodes and audio decoders and coders. [00:38.320 --> 00:48.520] So we needed for avoiding right proprietary binary blobs and drivers and no reverse engineering [00:48.520 --> 00:57.840] needed for supporting our hardware and well it is hard to do so we do it by grants and [00:57.840 --> 01:08.040] donations, we use a pool of community experts on newsnet, IRC, academia and also commercial [01:08.040 --> 01:13.600] partners which will produce our ship as you see later. [01:13.600 --> 01:21.680] So it is architecture is a traditional fetch decode issue and execute pipeline but to increase [01:21.680 --> 01:29.320] performance you use parallel decoders to decode instructions in advance, a vectored issue [01:29.320 --> 01:43.560] so one instruction can generate many results at one time utilizing the functions units [01:43.560 --> 01:54.400] of the ship with the parallel execution units and well managing all things scoreboard dependency [01:54.400 --> 01:56.560] tracking design. [01:56.560 --> 02:07.400] Well, we started from zero from the power is instruction set specification as published [02:07.400 --> 02:17.560] by the open power foundation which is a open standard and you can submit the extensions [02:17.560 --> 02:22.720] to it and that's one of the reasons we chose the power architecture. [02:22.720 --> 02:30.960] So the power has this big manual with all the assembly instructions and all these formats [02:30.960 --> 02:37.920] and we take these formats, these tables and auto-generate by Python and Python decoder [02:37.920 --> 02:47.760] from these specifications and the specification of the power architecture also has a sealed [02:47.760 --> 02:55.600] code and with this sealed code which is for humans but we use it for auto-translation [02:55.600 --> 03:04.600] to a simulator, Python simulator and we start from the beginning just simulating instructions [03:04.600 --> 03:11.680] in software then you use the simulator to test against this one, the last one and the [03:11.680 --> 03:22.200] harder simulator will verify against the software simulator and finally FPGA and even an ASIC. [03:22.200 --> 03:29.160] Let's jump here so this is like an imagine you have an add instruction coming in and [03:29.160 --> 03:36.040] the LU has to process it but before processing it, it has to receive operands like add what [03:36.040 --> 03:41.840] A and B but A and B can be the result of a previous instruction which is still being [03:41.840 --> 03:49.320] processed so it has to wait, wait where, in here, in here and here and when it has to [03:49.320 --> 03:55.760] have a read transaction, read the transaction then it will fill the buffers then the add [03:55.760 --> 04:03.600] instructions can proceed and it will generate a result and condition codes but maybe you [04:03.600 --> 04:10.120] cannot write them right now because you'll overwrite another instruction so we wait here [04:10.120 --> 04:17.240] and here also and this has to be managed so one of my tasks is to simulate this to see [04:17.240 --> 04:27.520] if it works well and do what? Formal verification which is so, so good, so, so interesting. [04:27.520 --> 04:34.960] With normal simulation you just throw random inputs maybe and some test cases but how do [04:34.960 --> 04:42.320] you know that you didn't hit a corner case? Well, the formal verification is like it's [04:42.320 --> 04:50.840] try everything at once. Actually, it starts from a bad result that you don't want to have [04:50.840 --> 05:00.320] and it shows you the input which reaches that bad result. Yes, so that's the bit of [05:00.320 --> 05:09.840] the thing. We get a simple core, first we do not do these function units all in parallel [05:09.840 --> 05:17.800] it just was one to test all of this is working put it here and then we read an instruction [05:17.800 --> 05:26.240] which decode instruction and then run the instruction terribly slowly but we validate [05:26.240 --> 05:34.880] the function units and the decoder. Next step which we do, we did already is we vectorize [05:34.880 --> 05:45.000] it so we put a read 64 because there are 32 instructions you add vector prefix to them [05:45.000 --> 05:51.200] and this vector prefix will tell you to read a predicate so from the vector you say no [05:51.200 --> 05:57.160] I don't want all of them I want the even ones or the odd ones or the ones with pass [05:57.160 --> 06:05.960] it a test like if then else but vectorized and then you run the vector loop so when instruction [06:05.960 --> 06:15.000] again can generate many may take the place of many many instructions and well now we [06:15.000 --> 06:22.680] go to the next steps we have this working now we have to do it in parallel we want to [06:22.680 --> 06:30.440] have performance it is working now performance so to be a performant you need to while you [06:30.440 --> 06:36.000] execute in one instruction you are decoding the next one you are fetching the next next [06:36.000 --> 06:45.040] one and if there is a jump instruction by chance and it doesn't match what you are fetching [06:45.040 --> 06:57.000] you have to reset the pipeline yes test well where you are right now we have a development [06:57.000 --> 07:04.160] environment that any of you can download and test in your computer you can do it is running [07:04.160 --> 07:12.040] in a shoot and then you can do make test to run the tests and if you have an FPGA board [07:12.040 --> 07:18.680] you can compile the bit stream and put into a supported board and we even did a ASIC [07:18.680 --> 07:28.040] with it well for the ASIC we need the PBK which is a process development kit that the factories [07:28.040 --> 07:39.360] don't give it to you freely so that part is done by a third party we don't touch property [07:39.360 --> 07:47.440] stuff but while it was done yes and we hope in the future to have a free PDK with it so [07:47.440 --> 07:58.360] the FPGA is booted we have a bare metal like Arduino like FPGA team running Zephyr OS was [07:58.360 --> 08:07.400] ported with network so networking was proved and the Linux with a serial console yes we [08:07.400 --> 08:15.400] have the test silicon with that little a simple core and it is being carefully tested because [08:15.400 --> 08:24.640] you have few chips produced one not to burn them so they are tested in a lab yes and the [08:24.640 --> 08:31.040] parties underway with the new instructions vector instructions already and the new instructions [08:31.040 --> 08:38.680] been submitted to the open power foundation for standardization and we are porting algorithms [08:38.680 --> 08:48.080] cryptographic algorithms and multimedia etc. so what you aim we aim to port and boot a [08:48.080 --> 08:57.880] Linux distro in the future eventually we want to have a full-term change GCC with vectorization [08:57.880 --> 09:05.920] they find these tensions to include the texture upcodes for 3d acceleration so you notice there [09:05.920 --> 09:14.000] will not be a GPU the instructions the CPU will be the GPU and well we need the hardware [09:14.000 --> 09:23.880] and software developers and testers and also well documentation optional no okay so who [09:23.880 --> 09:31.320] will build the chips well you just have research research money right well who produce thousands [09:31.320 --> 09:38.400] of chips for the market well we are partnered with red semiconductor which is have the mission [09:38.400 --> 09:46.880] of producing these chips producing a powerful and power efficient chip with our car so if [09:46.880 --> 09:54.920] you see some of them some of us some of them with this logo on the shirts you can talk [09:54.920 --> 10:16.360] to them you're here hello David hello people so that's it thank you very much thank you [10:16.360 --> 10:25.320] very much for the presentation there's a few minutes left for questions so in the back [10:25.320 --> 10:40.280] of the room I see someone waving just a moment thanks for your presentation you said that [10:40.280 --> 10:46.880] you had some test chips going on what's the status of the bring-up like how far did you [10:46.880 --> 10:55.240] get in the bring-up process okay we know the clock is working it has an well the azik [10:55.240 --> 11:02.560] it's not only for Libresock but for the academic institution to test their design so they are [11:02.560 --> 11:17.760] testing this the clock generation and well we know the clock generation works just that [11:17.760 --> 11:33.320] so maybe anyone else with a question okay thank you for your attention.