[00:00.000 --> 00:29.960] Okay, ready for our next talk? [00:29.960 --> 00:31.960] Next talk is by Mariam and she's going to talk about, [00:31.960 --> 00:34.960] I'll give us a hybrid networking stack demo. [00:34.960 --> 00:35.960] Thank you. [00:35.960 --> 00:36.960] Hi, everyone. [00:36.960 --> 00:37.960] My name is Mariam Tahan. [00:37.960 --> 00:39.960] I'm a software engineer at Red Hat. [00:39.960 --> 00:41.960] And today I'm going to talk to you about a concept [00:41.960 --> 00:44.960] I've been researching, and I've coined hybrid networking stacks. [00:44.960 --> 00:46.960] So if anybody has better names as well, [00:46.960 --> 00:48.960] I'm open to the suggestion. [00:48.960 --> 00:52.960] So what I'm going to do is I'm actually going to introduce [00:52.960 --> 00:54.960] what a hybrid networking stack is. [00:54.960 --> 00:58.960] We're going to talk a little bit about an open source project [00:58.960 --> 01:00.960] called Cloud Native Data Plane or CNDP [01:00.960 --> 01:03.960] that gives us an example of such a networking stack [01:03.960 --> 01:05.960] or at least some components of it. [01:05.960 --> 01:07.960] We're going to have a live demo with a star there [01:07.960 --> 01:09.960] because we're going to cross our fingers and toes [01:09.960 --> 01:11.960] and pray that it all goes to plan. [01:11.960 --> 01:14.960] After that, I will try and sum up what we discussed [01:14.960 --> 01:19.960] and hopefully there will be some time for Q&A at the end. [01:19.960 --> 01:22.960] Okay, so what is a hybrid networking stack? [01:22.960 --> 01:25.960] Well, it's actually a networking stack for applications [01:25.960 --> 01:29.960] that want to take advantage of the XDP hook and AFXDP in particular [01:29.960 --> 01:32.960] without having to reimplement the full networking stack [01:32.960 --> 01:37.960] in user space, but rather lean on the existing Linux stack. [01:37.960 --> 01:41.960] It relies very heavily on the concept of control plane [01:41.960 --> 01:43.960] and user plane separation. [01:43.960 --> 01:48.960] So parts of the stack can run in user space [01:48.960 --> 01:50.960] and other parts of the stack can run in the kernel. [01:50.960 --> 01:52.960] And even if they're part of the control plane, [01:52.960 --> 01:55.960] they can run either in kernel or user space [01:55.960 --> 01:57.960] and the same for the user plane aspect. [01:57.960 --> 01:59.960] You can run stuff either in the kernel [01:59.960 --> 02:04.960] or in user space as part of that networking stack concept. [02:04.960 --> 02:08.960] This concept relies very heavily on the principle [02:08.960 --> 02:13.960] of classifying traffic into application-specific traffic [02:13.960 --> 02:15.960] and non-application-specific traffic. [02:15.960 --> 02:18.960] And application-specific traffic is redirected [02:18.960 --> 02:21.960] to the user plane and non-application-specific traffic [02:21.960 --> 02:24.960] is redirected to the control plane to be handled. [02:24.960 --> 02:26.960] So in that way, applications only really need to process [02:26.960 --> 02:29.960] the types of traffic that they're interested in. [02:29.960 --> 02:34.960] And what's really important then is that you filter [02:34.960 --> 02:36.960] this type of traffic as early as possible [02:36.960 --> 02:38.960] in your networking stack. [02:38.960 --> 02:40.960] So if your NIC hardware supports that filtering, [02:40.960 --> 02:42.960] you can take advantage of that. [02:42.960 --> 02:45.960] If it doesn't, then you can always rely on [02:45.960 --> 02:48.960] EBPF at the XDP hook to be able to do [02:48.960 --> 02:50.960] that level of filtering for you. [02:50.960 --> 02:55.960] So in the example I'm showing here on the slide, [02:55.960 --> 03:00.960] you can probably consider FRR and the Linux Networking Stack, [03:00.960 --> 03:01.960] the control plane. [03:01.960 --> 03:04.960] FRR is just an open-source routing protocol suite [03:04.960 --> 03:06.960] that's for Linux. [03:06.960 --> 03:09.960] And then on the user plane side, [03:09.960 --> 03:14.960] you would consider the CNET graph from CNDP, [03:14.960 --> 03:19.960] your data plane or user plane for this demo. [03:19.960 --> 03:24.960] The CNET stack that comes with CNDP, [03:24.960 --> 03:26.960] I'll just talk about it for a minute before we dive into [03:26.960 --> 03:32.960] the next topic, is based on the graph architecture from VPP. [03:32.960 --> 03:36.960] So with VPP, the concept was that you could build [03:36.960 --> 03:38.960] your whole application or parts of the stack [03:38.960 --> 03:41.960] that you want to leverage using a graph. [03:41.960 --> 03:43.960] And then your packets are processed by traversing [03:43.960 --> 03:45.960] each node in this graph. [03:45.960 --> 03:47.960] And they're processed in batches as well to keep [03:47.960 --> 03:49.960] your instruction cache relatively warm, [03:49.960 --> 03:51.960] and you got all the performance benefits from doing [03:51.960 --> 03:53.960] all of that good stuff. [03:53.960 --> 04:00.960] So the CNET stack is based on the exact same concept as that. [04:00.960 --> 04:05.960] And obviously, as your packets traverse the nodes, [04:05.960 --> 04:08.960] they're either terminated as part of that stack, [04:08.960 --> 04:10.960] they're either forwarded on, or they're dropped, [04:10.960 --> 04:14.960] depending on the decision that was determined previously [04:14.960 --> 04:18.960] by the control plane piece for your application. [04:18.960 --> 04:22.960] So let me introduce CNDP to you folks. [04:22.960 --> 04:25.960] CNDP, our cloud native data plane, [04:25.960 --> 04:29.960] is an open source framework for cloud native packet [04:29.960 --> 04:31.960] processing applications. [04:31.960 --> 04:34.960] It's actually built on the performance principles of VPP [04:34.960 --> 04:37.960] and DPDK, but it doesn't have any of the resource [04:37.960 --> 04:40.960] demands or constraints as it's completely abstracted [04:40.960 --> 04:43.960] from the underlying infrastructure. [04:43.960 --> 04:45.960] It actually is completely written [04:45.960 --> 04:47.960] using standard Linux libraries also. [04:47.960 --> 04:51.960] So what CNDP gives you is really three things. [04:51.960 --> 04:55.960] The first thing it gives you is a set of user space libraries [04:55.960 --> 04:58.960] for accelerating packet processing for your application, [04:58.960 --> 05:00.960] cloud application or service. [05:00.960 --> 05:05.960] The second thing that CNDP gives you is that CNET graph [05:05.960 --> 05:07.960] is part of the hybrid networking stack, [05:07.960 --> 05:11.960] and also a net link agent that's capable of communicating [05:11.960 --> 05:13.960] with the kernel to retrieve relevant information, [05:13.960 --> 05:15.960] like routing information and so on. [05:15.960 --> 05:19.960] And the last thing that CNDP gives you [05:19.960 --> 05:23.960] are the Kubernetes components to be able to provision [05:23.960 --> 05:26.960] and manage actually more so an AFXDP deployment [05:26.960 --> 05:28.960] than just a CNDP one. [05:28.960 --> 05:32.960] Those components are the AFXDP device plugin, [05:32.960 --> 05:38.960] which provisions the net devs that you want to use [05:38.960 --> 05:41.960] for AFXDP and advertises them up to Kubernetes [05:41.960 --> 05:44.960] as a resource pool that your pods can then request [05:44.960 --> 05:46.960] when they come up. [05:46.960 --> 05:49.960] And then you have the AFXDP CNI, [05:49.960 --> 05:52.960] which essentially plums your AFXDP net dev [05:52.960 --> 05:55.960] from the host network namespace [05:55.960 --> 05:58.960] into the pod network namespace. [05:58.960 --> 06:02.960] So just one last point on CNDP before we move on [06:02.960 --> 06:06.960] is that it actually supports multiple IO, packet IO backends, [06:06.960 --> 06:08.960] not just AFXDP, [06:08.960 --> 06:11.960] but for the purposes of this hybrid networking stack [06:11.960 --> 06:15.960] we've focused in on AFXDP itself. [06:15.960 --> 06:17.960] Okay, so it's nearly demo time. [06:17.960 --> 06:21.960] So, excuse me. [06:21.960 --> 06:23.960] So what am I going to show you? [06:23.960 --> 06:28.960] I'm actually going to show you CNDP FRR vRouter [06:28.960 --> 06:30.960] that we built. [06:30.960 --> 06:33.960] Originally, I set out to see, you know, [06:33.960 --> 06:37.960] could I build some sort of a hybrid networking stack application [06:37.960 --> 06:40.960] that could accomplish, you know, DPDK-like speeds, [06:40.960 --> 06:44.960] but leverage completely, you know, cardinal smarts. [06:44.960 --> 06:47.960] And so the scenario we came up with was that we would have [06:47.960 --> 06:50.960] two clients, client one and client two, [06:50.960 --> 06:52.960] residing in two different networks, [06:52.960 --> 06:55.960] network one and network three, [06:55.960 --> 06:58.960] and they're interconnected via a pair of vRouters, [06:58.960 --> 07:02.960] which learn routes using OSPF. [07:02.960 --> 07:06.960] So what the demo is going to be [07:06.960 --> 07:09.960] is we're actually going to bring up four Docker containers, [07:09.960 --> 07:13.960] client one, CNDP FRR one. [07:13.960 --> 07:16.960] We actually call this container CNDP FRR two, [07:16.960 --> 07:18.960] but for the purposes of the demo, [07:18.960 --> 07:21.960] I'm only going to run FRR in it, just to show it full interworking. [07:21.960 --> 07:26.960] And client two will then be our last Docker container. [07:26.960 --> 07:29.960] At the start of the demo, we're just going to bring everything up. [07:29.960 --> 07:32.960] No FRR will be running, no CNET stack will be running. [07:32.960 --> 07:35.960] And so when we try to bring from client two to client one, [07:35.960 --> 07:37.960] we're going to see nothing happen. [07:37.960 --> 07:40.960] And then we're going to bring up all the components in part, [07:40.960 --> 07:42.960] see the routes being learned, [07:42.960 --> 07:45.960] hopefully have a successful ping, [07:45.960 --> 07:48.960] and maybe even, you know, run an IPerf session [07:48.960 --> 07:50.960] between client one and client two also. [07:50.960 --> 07:54.960] So if we just zoom into this CNDP FRR node for one second, [07:54.960 --> 07:58.960] I just want to show you one thing, I guess. [07:58.960 --> 08:01.960] So we can see here it's going to have two vEath interfaces, [08:01.960 --> 08:03.960] one connected to net one and the other connected to net two, [08:03.960 --> 08:04.960] and these are here. [08:04.960 --> 08:07.960] We're going to inject an EBPF program on the XDP hook [08:07.960 --> 08:11.960] that's going to filter all UDP traffic to CNET graph [08:11.960 --> 08:14.960] and non-UDP traffic to the Linux networking stack. [08:14.960 --> 08:17.960] So actually one of the other things I'm going to show you [08:17.960 --> 08:20.960] is that we're not going to see ICMP traffic [08:20.960 --> 08:22.960] traverse through CNET. [08:22.960 --> 08:24.960] And then when we run IPerf with UDP traffic, [08:24.960 --> 08:27.960] we're going to see the actual traffic flow through CNET also. [08:27.960 --> 08:32.960] So here we go. [08:32.960 --> 08:36.960] Let's just check that we have nothing running. [08:36.960 --> 08:37.960] Yep, that's fine. [08:37.960 --> 08:40.960] And I presume everybody can see the text. [08:40.960 --> 08:41.960] Okay, cool. [08:41.960 --> 08:46.960] Okay. [08:46.960 --> 08:48.960] So all the script is doing is setting up the four containers [08:48.960 --> 08:53.960] and the relevant networking between them right now. [08:53.960 --> 08:55.960] We can ignore the permission denied, [08:55.960 --> 08:58.960] but we didn't see that for now. [08:58.960 --> 09:01.960] So we actually see we have four Docker containers here, [09:01.960 --> 09:06.960] Client 1, Client 2, CNDP FR1, and CNDP FR2. [09:06.960 --> 09:18.960] And if we try to ping Client 1 from Client 2, [09:18.960 --> 09:22.960] essentially nothing happens. [09:22.960 --> 09:29.960] Okay, so let's start up our FRR agent on CNDP FR1 [09:29.960 --> 09:56.960] as well as the CNET graph. [09:56.960 --> 10:07.960] So, sorry about the formatting. [10:07.960 --> 10:09.960] It looked a lot better when I was presenting. [10:09.960 --> 10:13.960] But the key part here is if we try and check the routes, [10:13.960 --> 10:20.960] what we see is the two net devs that are attached to CNDP, [10:20.960 --> 10:23.960] or the CNDP FR1 vRouter, [10:23.960 --> 10:26.960] but most importantly we just see Network 1 and Network 2. [10:26.960 --> 10:43.960] So let's start up the FRR agent on this node. [10:43.960 --> 10:51.960] So if we have a look at the information that's been set up so far, [10:51.960 --> 10:53.960] we can see this vRouter has an IP address. [10:53.960 --> 10:57.960] It's adding Network 1 and Network 2 to the same OSPF area. [10:57.960 --> 11:01.960] And if we try to show IP OSPF neighbor at this point, [11:01.960 --> 11:03.960] it hasn't learned anything [11:03.960 --> 11:06.960] because we haven't started FRR on the other vRouter. [11:06.960 --> 11:22.960] So let's go ahead and do that. [11:22.960 --> 11:24.960] And here this vRouter has my IP address [11:24.960 --> 11:29.960] and is adding Network 2 and Network 3 to the same OSPF area. [11:29.960 --> 11:39.960] And if we show the OSPF neighbor, it's picked up its opposite end [11:39.960 --> 11:41.960] of the vRouter. [11:41.960 --> 11:46.960] And if we do the same on the CNDP FR1, [11:46.960 --> 11:52.960] it's also learned about the other route via OSPF as well. [11:52.960 --> 11:58.960] So at this point, if we actually try to ping again from client 2 to client 1, [11:58.960 --> 12:00.960] we can ping. [12:00.960 --> 12:04.960] And actually, if we check the routes on CNDP, [12:04.960 --> 12:08.960] we have the new Network 3 added in. [12:08.960 --> 12:13.960] And just to show you that no traffic is flowing through CNDP yet, [12:13.960 --> 12:16.960] this is ETH0 stats for RX and TX. [12:16.960 --> 12:20.960] We see they're still 0 and the same for ETH1. [12:20.960 --> 12:23.960] So let's kill that off for the moment [12:23.960 --> 12:27.960] and try and run an IPerFUDP session between client 1 and client 2. [12:27.960 --> 12:39.960] And this time we should see traffic flow through the CNET graph. [12:39.960 --> 12:47.960] And if we check here, you can see an increment in the stats. [12:47.960 --> 12:52.960] And this doesn't show as nice as I hope. [12:52.960 --> 13:01.960] And this kills the app. [13:01.960 --> 13:05.960] Let's try it one more time. [13:05.960 --> 13:09.960] Unfortunately, I won't be able to get this right just yet. [13:09.960 --> 13:14.960] Oh, there we go. [13:14.960 --> 13:18.960] OK, let's try and run it one more time. [13:18.960 --> 13:24.960] OK, folks, bear with me. [13:24.960 --> 13:29.960] So we can see sort of IP4 input node at the top here, [13:29.960 --> 13:33.960] an IP4 forward node, [13:33.960 --> 13:36.960] and they're passing UDP traffic through those nodes. [13:36.960 --> 13:39.960] Now, we're not going to the UDP nodes that are listed there [13:39.960 --> 13:42.960] because obviously traffic isn't destined for the CNDP, [13:42.960 --> 13:48.960] FRR, V-Router, they're destined for the client attached to it. [13:48.960 --> 13:52.960] And that's why they're forwarded on. [13:52.960 --> 13:55.960] Applications can also hook on to the CNET graph [13:55.960 --> 13:58.960] via a socket-like architecture. [13:58.960 --> 14:01.960] All the function calls look exactly the same like a socket, [14:01.960 --> 14:03.960] except it's just called a channel, [14:03.960 --> 14:07.960] and you prefix all of your normal socket calls with channel underscore [14:07.960 --> 14:10.960] before hooking up into the CNET graph. [14:10.960 --> 14:16.960] So that's the demo. [14:16.960 --> 14:21.960] So the next step was to essentially take that CNDP FRR-Router [14:21.960 --> 14:24.960] and put it through a heck of a lot of permutations [14:24.960 --> 14:28.960] in terms of interfaces that we hooked it up to, [14:28.960 --> 14:30.960] leveraging things like XDP redirects [14:30.960 --> 14:32.960] between the two V-Router instances and so on [14:32.960 --> 14:35.960] to try and see what kind of levels of performance [14:35.960 --> 14:37.960] could we push this to. [14:37.960 --> 14:40.960] And so what we noticed was for AF-XDP, [14:40.960 --> 14:45.960] the performance is completely dependent on the deployment scenario. [14:45.960 --> 14:49.960] So for north-south traffic that was coming in on a physical interface [14:49.960 --> 14:53.960] or out of a physical interface with AF-XDP in native mode, [14:53.960 --> 14:55.960] so hooked in at the XDP hook, [14:55.960 --> 15:01.960] we actually had this example yielded comparable performance to DPDK. [15:01.960 --> 15:05.960] However, while we moved to something that was completely local to a node, [15:05.960 --> 15:09.960] so east-west type traffic with all virtual interfaces and AF-XDP, [15:09.960 --> 15:13.960] while the performance was still better than vanilla V-Eat [15:13.960 --> 15:15.960] for AF-XDP in native mode, [15:15.960 --> 15:18.960] it wasn't what we had expected it to be. [15:18.960 --> 15:21.960] So there's definitely some level of optimization [15:21.960 --> 15:25.960] that we need to look into on that front. [15:25.960 --> 15:29.960] And then we tried one other thing which is AF-XDP in generic mode, [15:29.960 --> 15:32.960] so that's your program hooked in at the TC hook, [15:32.960 --> 15:36.960] and that actually yielded a better performance than native mode. [15:36.960 --> 15:42.960] But again, that goes into some optimization requirements are needed on that front. [15:42.960 --> 15:44.960] So just to sum up, I guess, [15:44.960 --> 15:49.960] we set out to show it was it possible to build some sort of a hybrid networking stack. [15:49.960 --> 15:52.960] I think the building blocks are there for sure. [15:52.960 --> 15:57.960] I think we've demonstrated that it is possible to do something like that, [15:57.960 --> 16:02.960] especially for these high-performance use cases that want to take advantage [16:02.960 --> 16:08.960] of internal fast paths and essentially XDP and AF-XDP. [16:08.960 --> 16:12.960] There's obviously an opportunity as well to make sure that we hook in [16:12.960 --> 16:16.960] EBPF a lot more into the puzzle, [16:16.960 --> 16:18.960] especially from the user plane aspect, [16:18.960 --> 16:21.960] not everything has to go into user space and so on. [16:21.960 --> 16:25.960] So I just want to summarize in terms of generic challenges [16:25.960 --> 16:30.960] that we have noted for AF-XDP. [16:30.960 --> 16:34.960] The first one is that we still can't take advantage of hardware offloads. [16:34.960 --> 16:41.960] It's been great to see the XDP hence K-funk support getting merged into the Linux kernel, [16:41.960 --> 16:45.960] or at least agreed on as model and then merged, which has been fantastic, [16:45.960 --> 16:48.960] and it will form a great cornerstone for a lot of this work. [16:48.960 --> 16:54.960] The only thing that I would ask is that we make sure that for the containerized environment, [16:54.960 --> 16:59.960] we put the onus on the infrastructure to lifecycle manage the BPF programs [16:59.960 --> 17:05.960] and to take that level of responsibility and privilege out of the scope of the application. [17:05.960 --> 17:09.960] So the application doesn't need to know any special formats or have to, [17:09.960 --> 17:13.960] especially if they're using AF-XDP, they don't need to know any special formats [17:13.960 --> 17:18.960] or have to do special compilations of BPF programs or anything along those lines. [17:18.960 --> 17:22.960] That should all be managed on the infrastructure side. [17:22.960 --> 17:28.960] The next thing that's been a gap was really jumbo frame or multi-buffer support for AF-XDP, [17:28.960 --> 17:33.960] but we've seen lots of activity on that in the last couple of months on the mailing list, [17:33.960 --> 17:37.960] so hopefully that's something that we can take off the list very, very soon. [17:37.960 --> 17:42.960] And lastly, there's going to be some need for some level of optimization of AF-XDP [17:42.960 --> 17:48.960] in native mode for VEATs, and just some links for folks if they're interested [17:48.960 --> 17:53.960] on some of the stuff that we used for this talk, sorry. [17:53.960 --> 17:56.960] So thank you very much folks for your time, I really appreciate it. [17:56.960 --> 17:59.960] And it's been a pleasure presenting on my first podcast. [17:59.960 --> 18:03.960] It's like a bucket list I should have ticked off there, so thanks a lot. [18:03.960 --> 18:13.960] Thank you for the talk. We do have ample time for questions. [18:25.960 --> 18:27.960] Thank you for the presentation. [18:27.960 --> 18:35.960] I have a question about XDP, so does it run on hardware or is it in software? [18:35.960 --> 18:40.960] The XDP hook itself is typically supported by the drivers, [18:40.960 --> 18:45.960] so actually I think there's a good host of drivers that support them right now, [18:45.960 --> 18:49.960] most of the Intel ones, a good few of the Melanox ones as well. [18:49.960 --> 18:55.960] The thing with AF-XDP is if the hook isn't natively supported by the driver, [18:55.960 --> 19:00.960] it automatically falls back to the TC hook, which is what we call generic mode, [19:00.960 --> 19:04.960] and it'll still work there except that you don't get the raw buffer from the driver, [19:04.960 --> 19:08.960] you will essentially be working with the equivalent of an SKV. [19:08.960 --> 19:14.960] So there's some level of allocation and copy that happens there before you can process the package. [19:14.960 --> 19:16.960] Okay understood, thank you. [19:16.960 --> 19:25.960] More questions. [19:31.960 --> 19:34.960] Okay then, thank you for the talk. [19:34.960 --> 19:36.960] Thank you for being here. [19:36.960 --> 19:46.960] Thank you very much, really appreciate it.