[00:00.000 --> 00:12.000]  All right. We are ready. We fixed it. We broke again and we fixed it again or I didn't do anything. The green shirts did it.
[00:12.000 --> 00:16.000]  All right, so next we have Francisco Iglesias. Now we're going to start building.
[00:16.000 --> 00:17.000]  Yes, yes.
[00:17.000 --> 00:27.000]  And to me at least, I'm just going to rant, but to me because it's always interesting to see how emulation is used in the enterprise, in the, you know, people's money world.
[00:28.000 --> 00:33.000]  Or not, or not. Let's see how it goes. All right.
[00:33.000 --> 00:51.000]  Okay. Hi everybody and welcome to this presentation. My name is Francisco Denon. I work at AMD with QMU Development and System C Development.
[00:51.000 --> 01:16.000]  Okay. So I'll try. I have a little threat problem, but okay. So and today I will be speaking about our open source cold simulation solution and the agenda of the talk then.
[01:16.000 --> 01:34.000]  It is, first, I will give a short introduction into what cold simulation is. And thereafter, I will be speaking a little about the AMD silence QMU itself and proceed with introducing live system C.
[01:34.000 --> 01:57.000]  Tell them, as you see, and since the repository system C, tell them closing demo. And lastly, I will show a short demo where QMU is co-simulating with a couple of RTL memories and using the infrastructure live system C.
[01:57.000 --> 02:02.000]  Tell them, as you see. Sorry, can you speak up a little bit more? Even more.
[02:02.000 --> 02:23.000]  Yeah. So in this slide, I tried to capture the one of the trade-offs that is done when you choose simulation technique for your RTL and it is the trade-off between speed and design capacity visibility.
[02:23.000 --> 02:36.000]  And we see that the three techniques that is used for RTL development, RTL simulation or simulation, FPGA prototyping, they all come with a different cost on the simulation speed.
[02:36.000 --> 02:51.000]  And on the left side here also, we have the virtual platforms that are fast and great for software development, but they do not help with pure RTL debugging or development.
[02:51.000 --> 03:19.000]  So an approach that can be used here to try to leverage from the two worlds here is to place a portion of interest in the portion of the RTL on one of the RTL simulation techniques and then keep the rest of the system modeled in one of the virtual platforms.
[03:20.000 --> 03:38.000]  And this way you will then keep most of the system simulated at a quite fast speed while still keeping the visibility to this portion of RTL that is in focus.
[03:39.000 --> 03:43.000]  So this is what we mean with co-simulation that you are mixing these two worlds.
[03:49.000 --> 04:10.000]  In our open source co-sim solution, we have the SILINX QMU where we model the processing systems of the FPGAs and then we have system C that we use for modeling the programmable logic.
[04:11.000 --> 04:28.000]  And LIM system C, the LMSC, it has bridges that allows us to connect the system C models of RTL and also FPGA prototypes and the hardware emulators.
[04:29.000 --> 04:32.000]  I will be speaking more about the bridges shortly.
[04:33.000 --> 04:43.000]  But first, a little about the AMD SILINX QMU fork.
[04:44.000 --> 04:55.000]  So this is where we have our improved support and modeling for the SILINX platforms then.
[04:55.000 --> 05:10.000]  And today it is based on the mainline QMU version 7.1.0 and we upgrade it around once a year to a more recent mainline version.
[05:11.000 --> 05:16.000]  And the AMD SILINX QMU then has some extra functionality.
[05:17.000 --> 05:24.000]  One of these is that it can create machines through a hardware DTB.
[05:25.000 --> 05:31.000]  And this allows us for having a more flexible machine creation and modification process.
[05:33.000 --> 05:39.000]  And the AMD SILINX QMU also has an implementation of the remote port protocol.
[05:40.000 --> 05:54.000]  This protocol is the protocol that is used when we co-simulate both different QMU architectures and also when we co-simulate with system C.
[05:54.000 --> 06:09.000]  This is an overview of this where we see an AR64 QMU co-simulating with a microblaze QMU and also with a system C application on the side.
[06:10.000 --> 06:27.000]  Continuing with the LibSystem C. This is a project that was started by Edgar Iglesias in 2016 and the license is MIT.
[06:28.000 --> 06:41.000]  One of the core features is that it has the remote port protocol implementation in system C that is then used for connecting with QMU and co-simulating with QMU.
[06:42.000 --> 06:47.000]  And going together with this, it also has system C wrappers, what we call.
[06:48.000 --> 06:54.000]  These are for wrappers for our SYNX in Campyversal, Rosonetten.
[06:54.000 --> 07:05.000]  And the short description of a wrapper is that it wraps QMU into a system C module so that for the rest of the system C application,
[07:06.000 --> 07:18.000]  the interaction from the other modules with QMU is done through the standard system C interfaces as TLM and signals, etc.
[07:19.000 --> 07:39.000]  The library also has TLM bridges into AXE4, AXE3, AXE4 Lite, EPBAs, A-slite, CHI, CXS, TLP, XDMII.
[07:39.000 --> 07:51.000]  And a bridge converts the communication from the TLM site into the protocol-specific site.
[07:51.000 --> 07:58.000]  So here's an example of the TLM to AXE bridge, which translates TLM into AXE.
[07:59.000 --> 08:18.000]  And these bridges then is what allows us to co-simulate, for example, in this case an AXE, DUT, that has been generated from RTL.
[08:19.000 --> 08:30.000]  So we see here that the system C wrapper communicates through TLM to the bridge that then converts this TLM to AXE signaling.
[08:30.000 --> 08:37.000]  And communicates through this AXE signaling with the AXE DUT then.
[08:38.000 --> 08:46.000]  And this is how QMU on the left-hand side then can access the DUT.
[08:52.000 --> 09:00.000]  There are also RTL bridges in the library for AXE4, 3, AXE4 Lite, AXE4, CHI and CXS.
[09:01.000 --> 09:05.000]  And the RTL bridges have two components.
[09:05.000 --> 09:13.000]  The first one is the bridge itself that is placed on the FPGA or in a Harvard emulator.
[09:13.000 --> 09:21.000]  And the other component is the driver of the bridge that is placed on the system C application software side.
[09:21.000 --> 09:36.000]  So the way it goes is that TLM transaction enters the driver, which then configures the RTL bridge to replicate this transaction as an AXE transaction, for example, inside the FPGA or the Harvard emulator.
[09:37.000 --> 10:02.000]  And this is an example of when these bridges are used with an Albeo U250 card, where we have between the bridge and the bridge driver and the bridge, we have some infrastructure there.
[10:03.000 --> 10:07.000]  The fire PCIe next year made them.
[10:07.000 --> 10:20.000]  And one can see these components as a transport channel where the driver accesses go through towards the RTL bridge.
[10:21.000 --> 10:30.000]  And looking at how it looks inside a hardware emulator is very similar.
[10:30.000 --> 10:41.000]  But instead of PCIe and here the vendor bridges are used for this transport.
[10:42.000 --> 11:00.000]  In the library we also have protocol checkers for AXE 4, AXE 3, AXE 4 Lite and AXE Lite CHI.
[11:01.000 --> 11:16.000]  And the protocol checkers, they are connected to the signals and monitors the signals and try to find issues, violations to the protocols then.
[11:17.000 --> 11:30.000]  Also in the library we have modules that can be used for generating AXE traffic.
[11:30.000 --> 11:35.000]  So we have AXE, AXE LiteMasters and AXE Interconnect.
[11:35.000 --> 11:54.000]  So the masters here, they generate ace transactions towards the interconnect and the interconnect will then when required snoop the other masters and otherwise forward the transaction to the TLM memory at the bottom.
[11:54.000 --> 12:18.000]  We have a similar setup for CHI where we have request nodes that generate CHI traffic and a CHI interconnect that does snoopy when required or forwards the request to a slave node at the bottom.
[12:25.000 --> 12:33.000]  Also in the library we have a tool called PySimGen that can generate simulations from IP exact descriptions.
[12:33.000 --> 12:46.000]  And there's a basic TLM traffic generator that one can configure to generate randomized traffic or provide a description of transactions to issue.
[12:46.000 --> 12:55.000]  And there are some simple, easy co-simulation examples that one can have a look at as a starting point.
[12:55.000 --> 13:04.000]  There's a lot of documentation for all the components and we also have an extensive test suite.
[13:05.000 --> 13:21.000]  The system seat TLM CoSIM demo is also a project that was started by Edgar Iglesias in 2016 and the license sense is MIT.
[13:21.000 --> 13:50.000]  And this contains several QMU co-simulation demos where we co-simulate the SyncMP QMU and VERSAL QMU with PL model on the system seat side and there's also a risk five demo where a risk five QMU is co-simulating with an open source.
[13:51.000 --> 13:56.000]  Internet controller core on the system seat side.
[13:56.000 --> 14:04.000]  We have several X86 QMU that co-simulate with PCIe endpoint models on the system seat side.
[14:04.000 --> 14:17.000]  And there is also a PySimGen demo where the system seat side of the co-simulation has been completely generated by from IP exact.
[14:18.000 --> 14:28.000]  And these demos they serve, they demonstrate how to embed the live system seat library in an own project and how to use it.
[14:28.000 --> 14:56.000]  So for the demo that I'll show now, it is a, here I will be launching a Linux system on the SyncMP QMU and it will be co-simulating with a system seat app where that includes a couple of RTL memories.
[14:56.000 --> 15:08.000]  One of the RTL memories is XC4 interface and the second one has a XC4 light interface.
[15:08.000 --> 15:23.000]  On the XC4 light signals there's a protocol checker connected and I also modified the XC4 light memory here and I injected that error so that we can see that the protocol checker finds this.
[15:23.000 --> 15:27.000]  So let's see then.
[15:54.000 --> 16:13.000]  So we see here that on this left terminal this is where QMU is being launched and on the yellow terminal on the top is where the system seat application has been launched.
[16:14.000 --> 16:38.000]  And we will start by doing some accesses to the XC4 memory and thereafter here comes the accesses for the XC4 memory and then thereafter we will do an access towards the XC4 light memory that has an error in it.
[16:38.000 --> 16:46.000]  And here we see that the protocol checker found the error and outputted some description message.
[16:46.000 --> 17:04.000]  After the simulation you get a trace that we can inspect and we can see here, follow the access signals and look at the transactions just issued.
[17:04.000 --> 17:20.000]  See that it is the expected data that we're seeing in here and you can see those at the bottom here that these are the data that we were writing to the memory.
[17:20.000 --> 17:26.000]  Then the protocol checker's error is also connected to a signal in this case.
[17:26.000 --> 17:37.000]  So for the transaction that failed it can be found when this signal has been asserted.
[17:37.000 --> 17:51.000]  So this is seen at the bottom here where there is the asserted signal and then we can look into the transaction here and find the problem.
[17:56.000 --> 18:07.000]  And that is all what I have today.
[18:07.000 --> 18:10.000]  Thank you for listening.
[18:26.000 --> 18:29.000]  That's a dumb question which I'm known for.
[18:29.000 --> 18:39.000]  No, so because like I said at the beginning I'm very interested in how this works in enterprises and I'm curious how do you guys like decide a feature to be implemented?
[18:39.000 --> 18:41.000]  How do you plan that kind of stuff?
[18:41.000 --> 18:47.000]  Do you know how that works in the community or if you're in your basement?
[18:47.000 --> 18:52.000]  Do you mean like in QMU or in the system C or overall?
[18:53.000 --> 18:57.000]  So it ends up with me.
[18:57.000 --> 18:58.000]  Yes.
[18:58.000 --> 19:01.000]  So how do we decide the features that we implement?
[19:01.000 --> 19:08.000]  And it's actually the demand that drives this.
[19:08.000 --> 19:16.000]  So if we see that some team internally at AMD siblings needs a feature in QMU then we implement it.
[19:16.000 --> 19:27.000]  Or if we see if there's a feature that might be useful later forward going forward.
[19:27.000 --> 19:33.000]  Not right now but perhaps in a year or so that also then we will consider implementing it too.
[19:33.000 --> 19:47.000]  So and often it ends up that our demands are pretty similar to all other developer or all other demands.
[19:47.000 --> 19:52.000]  So if we do a feature, implement a feature, it often becomes useful for others as well.
[19:52.000 --> 19:56.000]  Not only for the silencs, AMD silencs in part.
[19:56.000 --> 20:02.000]  A small follow-up. You guys probably do Agile like the rest of the world.
[20:02.000 --> 20:08.000]  I'm curious like how do you guys refine the story like this in Agile service?
[20:08.000 --> 20:11.000]  And I'm very sorry.
[20:11.000 --> 20:16.000]  Okay, how do we use Agile development in this?
[20:16.000 --> 20:21.000]  I don't care about Agile. I really care about the refinements. I don't like Agile actually.
[20:21.000 --> 20:26.000]  Like how do you guys brainstorm together on a feature? What do you put on paper?
[20:26.000 --> 20:30.000]  Like it needs to be this but how do we do this?
[20:30.000 --> 20:37.000]  Because it's not always comparable to something that already exists with emulators.
[20:37.000 --> 20:41.000]  It's usually something that's never been done before.
[20:41.000 --> 20:45.000]  I'm really sorry about this question.
[20:45.000 --> 20:50.000]  I know it's a very good question and I have to admit it.
[20:50.000 --> 20:55.000]  I'm not sure if we have such a process that we're probably looking at here.
[20:55.000 --> 21:03.000]  We get a request in our group, implement. We need this feature from, for example, one of the RTL groups.
[21:03.000 --> 21:08.000]  They need a feature, they ask us and we implement it.
[21:08.000 --> 21:14.000]  So we don't have really a process where we kind of do this very Agile in that sense.
[21:14.000 --> 21:23.000]  This is our team. It might be different in other teams at AMD.
[21:23.000 --> 21:29.000]  So Chris, how do you get the system C model from Verilog?
[21:29.000 --> 21:38.000]  And does that also work for co-gen generated IP which might be implemented?
[21:38.000 --> 21:43.000]  So how do we get the system C model from Verilog?
[21:43.000 --> 21:51.000]  So there's an open source tool named Verilator that will Verilog and create the module for you.
[21:51.000 --> 21:57.000]  But it's not going to work for the co-gen generated IP which is encrypted and which Verilator cannot process.
[21:57.000 --> 22:02.000]  For that I'm not sure how to do that. Sorry for that.
[22:02.000 --> 22:07.000]  There is no free line.
[22:07.000 --> 22:26.000]  I don't have to speak on that because I have to admit that I'm mostly on the QMU development side.
[22:26.000 --> 22:35.000]  But if you ping me afterwards I can take your card and see if I can contact give you a correct contact or something.
[22:35.000 --> 22:40.000]  Is there something for VHDL as well?
[22:40.000 --> 22:44.000]  I think there are tools that do this.
[22:44.000 --> 22:54.000]  But if there is a tool that automatically generates a system C model from VHDL, there are tools apparently.
[22:54.000 --> 23:00.000]  I'm pretty sure there are too. But we have not used them.
[23:00.000 --> 23:08.000]  Are you limiting yourself to the synthesizable subset of system C or do you don't care?
[23:08.000 --> 23:21.000]  No, we don't limit ourselves to system C now.
[23:21.000 --> 23:26.000]  I'm coming from the world of open source software-defined radio.
[23:26.000 --> 23:30.000]  I have flow graphs where I have data processing blocks that are running in software.
[23:30.000 --> 23:35.000]  On an mpsox R64 core.
[23:35.000 --> 23:44.000]  What I want to do is I want to take a block and implement it in some RTL and get it to run on the fpga part.
[23:44.000 --> 23:50.000]  How does that work? I have some part of software that I want to be accelerated by an fpga accelerator.
[23:51.000 --> 23:54.000]  These tools you mean?
[23:54.000 --> 23:56.000]  Yes.
[23:56.000 --> 24:01.000]  In that case you could...
[24:01.000 --> 24:06.000]  Yes, how...
[24:06.000 --> 24:13.000]  Random acceleration implementation of software.
[24:13.000 --> 24:22.000]  How do I go from software acceleration to hardware implementation?
[24:22.000 --> 24:25.000]  I know how to write.
[24:25.000 --> 24:27.000]  Yes, yes.
[24:27.000 --> 24:34.000]  I have to admit that I myself am not an expert hardware engineer.
[24:34.000 --> 24:40.000]  I think that the way I would have done it is just to go ahead and create the world of code.
[24:40.000 --> 24:48.000]  With this tool it's very sweet because you can connect it to the QMU system.
[24:48.000 --> 24:52.000]  Just as a library and say, okay, I have this XE stream.
[24:52.000 --> 24:54.000]  Yes.
[24:54.000 --> 24:57.000]  Put it in there and I call C functions in the end, right?
[24:57.000 --> 25:02.000]  You can launch your real software in QMU that interacts with it.
[25:03.000 --> 25:07.000]  How do I exchange data with the library?
[25:07.000 --> 25:08.000]  What's the interfaces?
[25:08.000 --> 25:11.000]  I see internally it's here and it's called the system C, right?
[25:11.000 --> 25:12.000]  Yes, yes.
[25:12.000 --> 25:13.000]  You don't have to choose that.
[25:13.000 --> 25:16.000]  But what's on the surface? How do I get data in and out?
[25:16.000 --> 25:19.000]  How do you get data in and out, the simulators?
[25:19.000 --> 25:21.000]  Yeah.
[25:21.000 --> 25:29.000]  Perhaps I would have needed a better overview picture, but if you can get...
[25:29.000 --> 25:32.000]  How you get data in into your system C application.
[25:32.000 --> 25:35.000]  That's...
[25:35.000 --> 25:39.000]  We don't have any magic frills, but...
[25:49.000 --> 25:58.000]  So the remote port protocol is just a protocol that transfers...
[25:58.000 --> 26:04.000]  Transactions from QMU into the system C side or to another QMU.
[26:04.000 --> 26:09.000]  And so it's not really a way to...
[26:09.000 --> 26:14.000]  That will allow you to load in a bunch of data into the system C application.
[26:14.000 --> 26:16.000]  But...
[26:28.000 --> 26:31.000]  Any more questions?
[26:31.000 --> 26:34.000]  Did I answer your question?
[26:34.000 --> 26:38.000]  Yes, I think afterwards and I can...
[26:38.000 --> 26:41.000]  Okay, we don't have time.
[26:41.000 --> 26:42.000]  Thank you very much.
[26:42.000 --> 26:44.000]  Thank you.