[00:00.000 --> 00:02.000]  You
[00:30.000 --> 00:30.800]  The
[00:30.800 --> 00:31.160]  present
[00:31.160 --> 00:32.200]  the present
[00:32.200 --> 00:32.760]  of the
[00:32.760 --> 00:33.420]  present
[00:35.420 --> 00:36.920]  of the present
[00:36.920 --> 00:37.760]  and of the present
[00:37.760 --> 00:40.480]  of the present
[00:40.480 --> 00:41.960]  and of the present
[00:41.960 --> 00:42.500]  from
[00:42.500 --> 00:43.880]  the present
[00:43.880 --> 00:48.080]  from the present
[00:48.080 --> 00:51.720]  from the present
[00:51.720 --> 00:54.480]  from the present
[00:54.480 --> 00:57.260]  from the present
[00:57.260 --> 01:04.260]  And if you're building a ramp, there's going to be a black line, because that's where the camera is pointing to.
[01:04.260 --> 01:05.260]  Ah, okay, I see.
[01:05.260 --> 01:06.260]  You can compare it to there.
[01:06.260 --> 01:07.260]  It's in the black lines.
[01:07.260 --> 01:08.260]  Okay.
[01:15.260 --> 01:17.260]  Hello, everyone.
[01:19.260 --> 01:20.260]  Silence.
[01:21.260 --> 01:22.260]  Hello, everyone.
[01:22.260 --> 01:27.260]  So, my talk about building FPGA bitstreams with open source tools.
[01:27.260 --> 01:36.260]  The important part here is open source tools, because building bitstreams for FPGAs was quite a pain in the ass.
[01:36.260 --> 01:40.260]  A while ago, you had vendor tools, which were large.
[01:40.260 --> 01:41.260]  You had to install them.
[01:41.260 --> 01:43.260]  Some of them were working on Linux.
[01:43.260 --> 01:45.260]  Some were working on specific distributions.
[01:45.260 --> 01:50.260]  If you have the right version, it wasn't fun.
[01:50.260 --> 01:58.260]  And a few years ago, there were more developments on open source tool chains for FPGAs.
[01:58.260 --> 02:09.260]  And since then, a colleague of mine, Stefan, and I, I'm fiddling around with these tools and trying them out and doing stuff with FPGAs.
[02:09.260 --> 02:13.260]  And the experience of that I will be showing today.
[02:15.260 --> 02:18.260]  About me, I'm Michael Treta.
[02:18.260 --> 02:23.260]  I work at Pengotronics as an embedded Linux developer in the graphics team.
[02:23.260 --> 02:31.260]  So, usually I'm doing software in Linux, sometimes user space, sometimes drivers.
[02:31.260 --> 02:34.260]  So, that's my background.
[02:34.260 --> 02:39.260]  Why am I doing FPGAs first?
[02:39.260 --> 02:40.260]  Agenda first.
[02:40.260 --> 02:45.260]  I will show you the open source FPGA tool chain.
[02:45.260 --> 02:49.260]  Then I will show you an example FPGA-based system.
[02:49.260 --> 03:05.260]  And in the end, I will show you the insights and pain points we had when we were developing this system and give you a conclusion and an outlook on the next steps that we are tackling.
[03:05.260 --> 03:07.260]  Use cases for FPGAs.
[03:07.260 --> 03:15.260]  You want to use them if you have real-time requirements and you need a high data throughput.
[03:15.260 --> 03:18.260]  That's two things that you need in graphics as well.
[03:18.260 --> 03:23.260]  You have high data throughput because you need to push all the image data from one point to another.
[03:23.260 --> 03:29.260]  You have real-time requirements because you have 30 FPS, 60 FPS that you have to address.
[03:29.260 --> 03:34.260]  So, you have a limited time span for each frame where the frame must be finished.
[03:34.260 --> 03:50.260]  And another use case or you would also use them for prototyping such systems because you can fiddle around and start experimenting with different implementations.
[03:50.260 --> 03:55.260]  The FPGA open source tool chain is basically these four steps.
[03:55.260 --> 04:00.260]  You start with some HDL description of your bit stream.
[04:00.260 --> 04:05.260]  Usually it's very low or VHDL.
[04:05.260 --> 04:11.260]  Then you have users which synthesizes the code into a net list.
[04:11.260 --> 04:18.260]  Next PNR routes and places the net list for your specific FPGA implementation.
[04:18.260 --> 04:24.260]  So, next PNR needs to know about your FPGA architecture and which vendor you're using.
[04:24.260 --> 04:31.260]  So, this is different for Xilinx for Altera which is Intel now or Lattice.
[04:31.260 --> 04:37.260]  So, you need to know something about your FPGA internal working for that.
[04:37.260 --> 04:50.260]  And in the end, you have a packer which takes the router bit stream with the configuration and writes an actual bit stream which consists of all the bits that configure the FPGA.
[04:50.260 --> 04:59.260]  That was quick and I won't go deeper into this because, as I said, we were working on this for a while and had several talks on that.
[04:59.260 --> 05:03.260]  One by Stefan, you find it on YouTube.
[05:03.260 --> 05:08.260]  It's about who's in both of those FPGA tools.
[05:08.260 --> 05:19.260]  There he will go deeper into how these tools work, how you can interact with them, how you can call them and how all of this stuff is working.
[05:19.260 --> 05:25.260]  And a second talk of building open hardware with open software by me as well on YouTube.
[05:25.260 --> 05:40.260]  And there I go into details on how we automated this and put the FPGA tool chain into a Yocto-based BSP so that we have reproducibility on the bit streams of the tools that are used,
[05:40.260 --> 05:57.260]  our configuration and other providers of libraries that we are using so we can run a specific checkout of the Git repository of our BSP and be sure that we get the same build as the last time.
[05:57.260 --> 06:14.260]  This is also now in our SCI so we have some CI running that takes the status of the bit stream and gives us a working bit stream again from the previous day.
[06:14.260 --> 06:16.260]  So what's in the bit stream?
[06:16.260 --> 06:22.260]  Usually if you come as a Linux developer you want to run a Linux on your bit stream.
[06:22.260 --> 06:28.260]  That's where you take RISC-5, a soft core CPU.
[06:28.260 --> 06:31.260]  Some of them are able to run Linux.
[06:31.260 --> 06:36.260]  Most of them are able to synthesize for your FPGA.
[06:36.260 --> 06:40.260]  So you have quite a few of these.
[06:40.260 --> 06:53.260]  RISC-5 is one implemented in Spinal HDL, Boom implemented in Chisel, Rocket implemented in Chisel or CVA-6 implemented in System Very Low.
[06:53.260 --> 07:00.260]  I said previously you take code for your FPGA in some hardware description language.
[07:00.260 --> 07:02.260]  I said it's very low.
[07:02.260 --> 07:22.260]  The four cores are implemented in three different languages so this is quite a few of different languages and each of these cores takes some periphery to make it actually work also implemented in the same language.
[07:22.260 --> 07:30.260]  So once you decide for a core you're more or less decide for which hardware description language and tools you are using.
[07:30.260 --> 07:34.260]  That's not something that's really flexible for us.
[07:34.260 --> 07:39.260]  And this is a point where Litex comes in.
[07:39.260 --> 07:49.260]  Litex is implemented in Megan which is another hardware description language which is based on Python.
[07:49.260 --> 08:07.260]  And it supports Linux on Litex and gives various pre-configured FPGA bit streams that contain some RISC-5 core and support various FPGA boards that you can just run.
[08:07.260 --> 08:20.260]  So here you see an example from the Litex Git web page, Git repository. It shows an Akron-based board with a multi-core Linux SOC.
[08:20.260 --> 08:28.260]  You have some access to DDR. You have some access to SATA and UART.
[08:28.260 --> 08:37.260]  So this is basically an SOC that's able to run Linux with enough periphery to actually make it work.
[08:37.260 --> 08:46.260]  So there is an example for that available.
[08:46.260 --> 09:01.260]  The question that arise was okay we have this example system but we want to fiddle around with the FPGA and want to run our own bit stream in there or at least be able to customize it to some point.
[09:01.260 --> 09:16.260]  So our starting question for this was can we add our own custom cores that are written in Berrylook into the bit stream that falls out of Litex.
[09:16.260 --> 09:19.260]  For that we decided to come up with a demo system.
[09:19.260 --> 09:38.260]  So the requirements for that were we are using a lambda concept ECP IX-5 board. You saw it before with an ECP-5 FPGA by lettuce which is supported by users and the entire tool chain.
[09:38.260 --> 09:41.260]  So support there is great.
[09:41.260 --> 09:50.260]  We want to put WaxRisk-5 with Linux into the FPGA to run a Linux.
[09:50.260 --> 09:54.260]  And because it's already there so the demo systems have it.
[09:54.260 --> 10:01.260]  And we want to add to our system an LED ring because LEDs are flashy, LEDs attract people.
[10:01.260 --> 10:07.260]  So it's something nice and we want to have interaction with the system.
[10:07.260 --> 10:22.260]  That's why we put a hand wheel there as well so you can as a user do some inputs to the system and get some feedback from the system.
[10:22.260 --> 10:30.260]  It didn't boot probably because it didn't have power.
[10:30.260 --> 10:37.260]  So starting with WaxRisk-5 with Linux as said, Litex already supports this.
[10:37.260 --> 10:50.260]  You can go to the Linux on Litex repository, look for this file and you'll see an implementation for WaxRisk-5 running on the lettuce ECP IX-5.
[10:50.260 --> 11:02.260]  WaxRisk-5 is written in Spinal HDL. That's neither MyGun nor Veriloc.
[11:02.260 --> 11:15.260]  In Litex or in preparation for using it in Litex, you generate the Veriloc code for this WaxRisk-5 core, wrap it in Python or MyGun.
[11:15.260 --> 11:26.260]  And after that, Litex can just integrate the newly created core or generated core into the SOC.
[11:26.260 --> 11:40.260]  It's an example target which supports Lite DRUM for memory access and Lite SD card for using an SD card.
[11:40.260 --> 11:52.260]  The LED ring supports the WS2812 protocol which is a single wire protocol to control LEDs or more than one LED.
[11:52.260 --> 12:02.260]  It's usually used in other fruit LED stripes but you can find various cheap clones on Alibaba and wherever.
[12:02.260 --> 12:12.260]  There is already a core for controlling this protocol implemented in Litex so it's there coded in MyGun.
[12:12.260 --> 12:24.260]  It works as an IOMAPT bus slave so from Linux you just write to the registers and LEDs flash and change the colors.
[12:24.260 --> 12:37.260]  And on the bus you have four bytes per LED so that's the three colors plus whatever.
[12:37.260 --> 12:46.260]  Input is done via a hand wheel which is from Amazon which is usually used for CNC stuff.
[12:46.260 --> 13:03.260]  So it has 100 steps and you just can turn it around and it gives you a rotary encoder with two signals where you can find the direction of rotation.
[13:03.260 --> 13:07.260]  We took some code of the internet for that.
[13:07.260 --> 13:09.260]  It's implemented in very low.
[13:09.260 --> 13:12.260]  There are various examples for that.
[13:12.260 --> 13:26.260]  We wrapped it in Python so that we can use it in Litex and this one runs as a bus master and is able to control the LED core via this connection.
[13:26.260 --> 13:35.260]  So it just sends right this color to this LED on the bus.
[13:35.260 --> 13:44.260]  So if you put all of it together in the middle we have the FPGA on the right side.
[13:44.260 --> 13:50.260]  We have the already existing system with the Vexrist 5 running Linux.
[13:50.260 --> 13:53.260]  The Lite SD card which is connected to the SD card.
[13:53.260 --> 13:58.260]  The Lite DRAM which is connected to our memory.
[13:58.260 --> 14:03.260]  All of these are put together by a wishbone bus.
[14:03.260 --> 14:05.260]  Details are not important.
[14:05.260 --> 14:09.260]  It's a memory bus to communicate between cores.
[14:09.260 --> 14:19.260]  On the left hand side, up here you see the encoder which is implemented in very low.
[14:19.260 --> 14:22.260]  Which is connected to the hand wheel.
[14:22.260 --> 14:26.260]  So this is one thing we added to the bit stream.
[14:26.260 --> 14:43.260]  And we have the LED.py which is migran for the LED controller which is also added to the bit stream and controls the LED ring.
[14:43.260 --> 14:48.260]  So how this is integrated into Litex?
[14:48.260 --> 14:51.260]  We created a new Litex target.
[14:51.260 --> 14:55.260]  Litex distinguishes between targets and platforms.
[14:55.260 --> 14:57.260]  Platforms are the actual boards.
[14:57.260 --> 15:02.260]  Targets are the SOC that you synthesize into your FPGA.
[15:02.260 --> 15:10.260]  So we took the existing platform and added a new target for our own SOC.
[15:10.260 --> 15:13.260]  Because it's Python we have inheritance.
[15:13.260 --> 15:23.260]  We just inherited from the example base SOC with the Litex core and the Vexus 5 core and the other stuff.
[15:23.260 --> 15:33.260]  We configured it, instantiated it, reconfigured the pins so that we can actually connect our own peripherals.
[15:33.260 --> 15:47.260]  So the hand wheel and the LEDs to the connector of the board, added the LED core and added the rotary encoder.
[15:47.260 --> 15:53.260]  In our case it's a decoder for interacting with the hand wheel and we are done.
[15:53.260 --> 16:02.260]  All of this customization is about 200 lines of Python code.
[16:02.260 --> 16:11.260]  So what we encountered and fixed during this process, after we added our custom cores,
[16:11.260 --> 16:17.260]  suddenly Linux was booting but it was not able to access the SD card anymore.
[16:17.260 --> 16:27.260]  We were suspecting various problems in the end like timing or that our bit stream generation failed or something.
[16:27.260 --> 16:37.260]  In the end it turned out due to adding our new custom cores the memory map on the bus has changed
[16:37.260 --> 16:44.260]  and the SD RAM controller has changed its base address and Linux didn't know about that
[16:44.260 --> 16:50.260]  because we were using just the device tree that we took from the example.
[16:50.260 --> 16:59.260]  So we changed our yachter build to use or to take the configuration for the Linux core,
[16:59.260 --> 17:06.260]  generate a device tree from that and give it to Linux and that way Linux took the correct base address
[17:06.260 --> 17:15.260]  and uses the correct base address and is able to use the SD card again.
[17:15.260 --> 17:22.260]  So that's something that will come up later as well because if you reconfigure your base system
[17:22.260 --> 17:30.260]  and device tree changes you have to make sure that your device tree matches your system that you're using.
[17:30.260 --> 17:41.260]  And one other thing is there is some boot loader in the FPGA or in the bit stream for early bring up
[17:41.260 --> 17:47.260]  which corresponds to the ROM code or the ROM boot loader that Marco mentioned previously.
[17:47.260 --> 17:55.260]  So it's in the bit stream and it usually required so we compiled the boot loader
[17:55.260 --> 18:01.260]  and then it started to recentize the bit stream run place and route again.
[18:01.260 --> 18:07.260]  So this is quite fast but it's still in our case six to eight minutes.
[18:07.260 --> 18:13.260]  That's not something you want to have if you're just compiling a really small binary.
[18:13.260 --> 18:24.260]  So we changed our yachter build to synthesize and keep enough space in the area
[18:24.260 --> 18:29.260]  and compile the ROM boot loader and just put it together afterwards.
[18:29.260 --> 18:37.260]  This works a lot faster if you're fiddling around there.
[18:37.260 --> 18:40.260]  Pain points.
[18:40.260 --> 18:43.260]  Meaghan is not really maintained anymore.
[18:43.260 --> 18:45.260]  There is a successor for it.
[18:45.260 --> 18:49.260]  It's Amaranth.
[18:49.260 --> 18:57.260]  Lightix is currently still using Meaghan migrating it to Amaranth is not really feasible.
[18:57.260 --> 19:03.260]  There are ideas to change it to a new hard description language.
[19:03.260 --> 19:08.260]  I'm not sure what's happening there but we saw especially for simulation
[19:08.260 --> 19:16.260]  we saw that Meaghan generates invalid very low code or code you wouldn't want to simulate.
[19:16.260 --> 19:18.260]  That's something.
[19:18.260 --> 19:21.260]  Currently we have to live with it.
[19:21.260 --> 19:29.260]  Then another observation was that the yachter environment wasn't as reproducible as we expected.
[19:29.260 --> 19:32.260]  We're not sure what's the cause for that.
[19:32.260 --> 19:42.260]  If we are missing it somewhere that there is some seed for compiling or for a place and route which we haven't fixed.
[19:42.260 --> 19:53.260]  We saw that there is sometimes failing bitstream falling out of the yachter build that should be reproducible.
[19:53.260 --> 20:01.260]  We wanted to use JTAG to debug the early boot of the WaxRisk 5 and look into it.
[20:01.260 --> 20:08.260]  We are already flashing our bitstream via JTAG on the FPGA.
[20:08.260 --> 20:18.260]  We have JTAG connected to the board but we are not able to connect it to the WaxRisk 5 JTAG connector
[20:18.260 --> 20:23.260]  so that we could add the WaxRisk 5 core to our JTAG chain.
[20:23.260 --> 20:26.260]  That's something we can work around.
[20:26.260 --> 20:31.260]  We can just use different pins but we have to use a second JTAG for that.
[20:31.260 --> 20:40.260]  That's something that's not that great and we haven't figured out yet.
[20:40.260 --> 20:50.260]  Coming to the conclusion, adding and customizing Litex targets is really convenient.
[20:50.260 --> 20:52.260]  It's something you have to figure out.
[20:52.260 --> 20:59.260]  You have to work into it.
[20:59.260 --> 21:06.260]  Once you're in there, I said it's 200 lines for configuring.
[21:06.260 --> 21:09.260]  That is nice.
[21:09.260 --> 21:16.260]  The step from Blinky to just get an LED flashing to an SOC is really large.
[21:16.260 --> 21:20.260]  It's not really surprising because it's much more.
[21:20.260 --> 21:29.260]  But also from things that you have to configure, things where you misconfigured something and it's just not working,
[21:29.260 --> 21:38.260]  you have really many knobs that you can turn and your system will surprise you.
[21:38.260 --> 21:44.260]  With all this tool chain, Litex, there are different modules in Litex.
[21:44.260 --> 21:46.260]  All of them have to be in sync.
[21:46.260 --> 21:53.260]  Sometimes it happens that if you just update one component, you will run into surprises.
[21:53.260 --> 21:59.260]  It's really important to make sure that you have something around this entire tool chain
[21:59.260 --> 22:03.260]  that keeps all your tools fixed on a specific version error
[22:03.260 --> 22:11.260]  that you know which version you are using also for reporting bugs.
[22:11.260 --> 22:18.260]  That's Yopto doing for us or is the plan to have.
[22:18.260 --> 22:25.260]  The next steps are maybe we want to run a kernel CI on the Linux on Litex system
[22:25.260 --> 22:30.260]  so that we can just take Linux kernel and run it against the system.
[22:30.260 --> 22:36.260]  There comes the problem that kernel CI expects a device feed that's upstream
[22:36.260 --> 22:43.260]  with kind of a conflict with the generation of the device feed that has to match your target.
[22:43.260 --> 22:50.260]  Then we see that Linux takes ages to boot on the WexRisk5 about two minutes.
[22:50.260 --> 22:55.260]  Not sure if that's because the core is just slow
[22:55.260 --> 23:04.260]  or if there is something we are waiting for in user space that can be fixed.
[23:04.260 --> 23:07.260]  The WexRisk5 actually supports multi-core systems.
[23:07.260 --> 23:11.260]  We weren't able to boot this yet.
[23:11.260 --> 23:20.260]  Maybe we want to look into using the same system on the concept with different WexRisk5 cores.
[23:20.260 --> 23:23.260]  As I said before, there are four different cores in different languages.
[23:23.260 --> 23:26.260]  All of them usually generate very low code
[23:26.260 --> 23:33.260]  so it should be possible to integrate it into Litex.
[23:33.260 --> 23:36.260]  Show me the source.
[23:36.260 --> 23:38.260]  It's on GitHub.
[23:38.260 --> 23:47.260]  This is a Yocto meta layer where you can find the code for the editing
[23:47.260 --> 23:52.260]  or the entire SOC configuration, the code for the hand wheel
[23:52.260 --> 23:55.260]  and a few other fixes that we did.
[23:55.260 --> 24:06.260]  You can add this to a Yocto workspace and should be able to reproduce the bitstream that we built.
[24:06.260 --> 24:09.260]  Thank you for your attention.
[24:09.260 --> 24:11.260]  That's me. That's my colleague, Stefan.
[24:11.260 --> 24:15.260]  You can send us an email if you have questions
[24:15.260 --> 24:19.260]  or just ask me and find me somewhere.
[24:19.260 --> 24:28.260]  Thank you.
[24:28.260 --> 24:31.260]  We have time for literally one question.
[24:31.260 --> 24:39.260]  This person here was the first one.
[24:39.260 --> 24:43.260]  Thank you for your talk.
[24:43.260 --> 24:49.260]  I had one question about the address changing in the device tree.
[24:49.260 --> 24:55.260]  In microcontroller world, you can have a lot of microcontrollers
[24:55.260 --> 25:00.260]  with the same addresses for the common devices like for STM32.
[25:00.260 --> 25:15.260]  Do all the STM32 have the same base address for RAM?