[00:00.000 --> 00:14.600]  Alright, thanks everybody for joining me today to talk about CNI automatic, making some use
[00:14.600 --> 00:21.000]  of some of this bike discovery for semantic network attachment in Kubernetes.
[00:21.000 --> 00:22.000]  My name is Doug.
[00:22.000 --> 00:28.120]  I maintain something called multi-CNI, which is a way to attach multiple network interfaces
[00:28.120 --> 00:38.120]  to causing Kubernetes, and I'm really interested in telco use cases for Kubernetes, especially.
[00:38.120 --> 00:44.120]  And I am going to talk a little bit about some mappings to make this semantic.
[00:44.120 --> 00:50.120]  So I thought I would show you a map of where I'm from, which is Vermont in the United States.
[00:50.120 --> 00:56.120]  If you're not familiar with it, you might be familiar with our two most famous exports,
[00:56.120 --> 01:00.120]  like our Bernie Sanders and Ben and Jerry's ice cream.
[01:00.120 --> 01:06.120]  I'll also put in a little size reference here of the Adirondacks, which will come in as a trivia later,
[01:06.120 --> 01:11.120]  and belt them, because they're all sort of similar sizes.
[01:11.120 --> 01:18.120]  So, yeah, what we're going to look at today is a problem statement for what I discovered.
[01:18.120 --> 01:25.120]  A tour of CNI plug-in that I developed to address that problem.
[01:25.120 --> 01:32.120]  And along with that, we're going to look at kind of like it's what I made is really a proof of concept
[01:32.120 --> 01:37.120]  to try to address that problem and what are the kind of limitations.
[01:37.120 --> 01:46.120]  But most of all, I want to show how this kind of thought process that I used plays into what I think
[01:46.120 --> 01:52.120]  is kind of some bigger picture of things that need to happen for the next version of CNI.
[01:52.120 --> 01:59.120]  So as we look at the problem statement, keep that in mind that there's this problem
[01:59.120 --> 02:04.120]  and there's a solution that I've got. It's a kernel of a solution.
[02:04.120 --> 02:15.120]  But we also want to look at this thought process because I think that it is more important than the solution that I've got.
[02:16.120 --> 02:25.120]  So, a lot of times in networking, we have these kind of ideals that everything's going to be mapped out
[02:25.120 --> 02:28.120]  and it's going to be perfect, but once you get on the ground floor,
[02:28.120 --> 02:37.120]  or maybe you've had this job before, sometimes your network really looks like this in the end.
[02:37.120 --> 02:47.120]  And at least for me, it's usually that we try to make everything as uniform and perfect as we can.
[02:47.120 --> 02:53.120]  In these diagrams, it'll look symmetrical. Everything will be like by a textbook.
[02:53.120 --> 02:59.120]  However, once you go to implement it, you discover that not everything is the same.
[02:59.120 --> 03:02.120]  Sometimes you have legacies in your systems.
[03:02.120 --> 03:08.120]  You might need to have some kind of non-sequitur type of things in your network,
[03:08.120 --> 03:15.120]  like a jump host, for example, or maybe you've got some vendor equipment that you've bought
[03:15.120 --> 03:21.120]  that just doesn't exactly match everything that was going to be part of your plan.
[03:21.120 --> 03:28.120]  And this is really where the problem starts for me is that I have people come to me and say,
[03:28.120 --> 03:32.120]  yeah, well, this would work if it was the same on every machine,
[03:32.120 --> 03:40.120]  but I've got one or two or 25 out of a thousand that just aren't the same.
[03:40.120 --> 03:50.120]  So really what it is is, okay, hey, I'm adding a secondary network to my pods,
[03:50.120 --> 03:54.120]  and I've got this definition of what it's supposed to be.
[03:54.120 --> 03:59.120]  And it references specific interfaces on my hosts,
[03:59.120 --> 04:05.120]  but in a non-uniform environment, it might just not match.
[04:05.120 --> 04:15.120]  So in this particular CNI configuration, which is tiny on there, I'm sorry,
[04:15.120 --> 04:23.120]  we have CNI configs that want to reference a specific interface that this is going to be created on.
[04:23.120 --> 04:27.120]  And that's, say, for example, ETH0.
[04:27.120 --> 04:34.120]  And sometimes we want to know, okay, yeah, it's ETH0, but what is actually the network that's behind that?
[04:34.120 --> 04:41.120]  Because as much as we can reference that interface, there's like a greater world beyond that.
[04:41.120 --> 04:46.120]  It's how it's connected to the rest of the network.
[04:46.120 --> 04:57.120]  So if I have a non-uniform environment and I'm going to have these CNI configurations where I've got, say,
[04:57.120 --> 05:06.120]  node1 has an ETH1, node2 has ENS2, ENS4, and those are all connected to green.
[05:06.120 --> 05:15.120]  The way that, like, a network attachment definition is that's what multi-CNI uses for a secondary network.
[05:15.120 --> 05:20.120]  When someone comes to me and says, I want to use this on this non-uniform network,
[05:20.120 --> 05:26.120]  I've got to tell them to make a configuration for each thing that's different.
[05:26.120 --> 05:32.120]  And then additionally, on top of that, again, it's too small and I apologize,
[05:32.120 --> 05:37.120]  but I've got to tell them to, yeah, well, make one for each node,
[05:37.120 --> 05:42.120]  but then make a pod that references that one and then uses a node selector.
[05:42.120 --> 05:49.120]  And then that way you can get the right configuration for the right pod associated with the right node.
[05:49.120 --> 05:54.120]  And it's just not a very Kubernetes way of doing things.
[05:54.120 --> 06:06.120]  Like, we want to express intent at a higher level and get away to get these things attached in a, like, easier to express way
[06:06.120 --> 06:11.120]  and not have to, like, baby it every little thing and say,
[06:11.120 --> 06:17.120]  it's like this stack of three things and they have to be associated to this node and I have to label this node.
[06:17.120 --> 06:21.120]  And I just wasn't happy with that.
[06:21.120 --> 06:29.120]  So really what I wanted is to say, instead of all of this stuff that I'm configuring with a CNI config in the pod
[06:29.120 --> 06:35.120]  and the nodes selector with the labeled node, I just want to say, I want it attached to the green network.
[06:35.120 --> 06:38.120]  That's really what I want.
[06:38.120 --> 06:48.120]  So I'm like, I want to, like, give some meaning to these network interfaces
[06:48.120 --> 06:53.120]  and make it so that I can scale on a non-uniform environment.
[06:53.120 --> 06:56.120]  So we use Kubernetes for scale.
[06:56.120 --> 07:00.120]  It's a great way to deploy workloads at scale.
[07:00.120 --> 07:06.120]  And we use CNI for plumbing our network interfaces in our pods.
[07:06.120 --> 07:16.120]  So as a CNI developer and as a Kubernetes developer, this is how I approach this problem.
[07:16.120 --> 07:21.120]  So I made something that I call Surveyor CNI.
[07:21.120 --> 07:30.120]  And it essentially maps devices to network names using CRDs, which are custom resource definitions.
[07:30.120 --> 07:41.120]  It's essentially a way to extend the Kubernetes API and to store data and have data that works in a way
[07:41.120 --> 07:45.120]  that, like, other Kubernetes applications can talk to.
[07:45.120 --> 07:50.120]  It's sort of like a lingua franca for CRDs.
[07:50.120 --> 08:02.120]  And also, I have, like, a number of projects that I've, like, named after, kind of, like, outdoor related things in my area.
[08:02.120 --> 08:09.120]  And I really was thinking of, like, this topographic engineer and, like, rad adventure guy named Verplank Colvin
[08:09.120 --> 08:12.120]  that was famous in the Adirondacks for the first thing.
[08:12.120 --> 08:17.120]  But the name Verplank Colvin doesn't flow off the tongue.
[08:17.120 --> 08:20.120]  So I'm like, all right, I'll call it Surveyor CNI.
[08:20.120 --> 08:26.120]  It essentially works in two phases.
[08:26.120 --> 08:33.120]  When the, when it's installed, it starts up a daemon set.
[08:33.120 --> 08:41.120]  And that daemon set has some go language scripts that just go and onto that node.
[08:41.120 --> 08:48.120]  And I say, all right, give me the network interfaces that are on that particular node.
[08:48.120 --> 09:02.120]  Then it creates an empty CRD that we can use to create a mapping association to go from network device name to network name itself.
[09:02.120 --> 09:10.120]  So in essence, what it does is I can have those two nodes with these all different names for the green network.
[09:10.120 --> 09:14.120]  And I can say, each one is green on node one.
[09:14.120 --> 09:17.120]  ENS2 is green on node two.
[09:17.120 --> 09:20.120]  And on node three, ENS4 is green.
[09:20.120 --> 09:29.120]  So that way, I can just say, thank you, I can say, all right, we want to attach to network green.
[09:29.120 --> 09:35.120]  In lieu of actually having a demo of this,
[09:35.120 --> 09:40.120]  I will challenge you to bring up the code and run it yourself.
[09:40.120 --> 09:48.120]  And I've got like a do it yourself kind of tutorial on the read me and I'll share the links with you.
[09:48.120 --> 09:54.120]  Otherwise, you'll just see the frustration of how to do it.
[09:54.120 --> 10:03.120]  So really, in a lot of ways, this is just sort of a like rolling chassis of a car
[10:03.120 --> 10:06.120]  that doesn't have an engine in it yet.
[10:06.120 --> 10:16.120]  Because it will make the custom resource for you and it will let you fill out those associations yourself.
[10:16.120 --> 10:19.120]  But I don't think that that really scales either.
[10:19.120 --> 10:32.120]  So I think that something that could really be improved here and because it uses custom resources is to have really an engine behind it,
[10:32.120 --> 10:41.120]  to have other Kubernetes controllers that know something further about your network.
[10:41.120 --> 10:46.120]  So for you to program some kind of intelligence into this, be able to do it.
[10:46.120 --> 10:52.120]  So like a working group that I'm part of that we call Kubernetes Network Plumbing Working Group,
[10:52.120 --> 10:59.120]  that comes up with all kinds of ideas about how to plumb your networks in Kubernetes.
[10:59.120 --> 11:05.120]  One of our like sort of, what's the word for it?
[11:05.120 --> 11:10.120]  I guess it's like the like holy grail kind of question is to ask the question,
[11:10.120 --> 11:13.120]  what network am I attaching to?
[11:13.120 --> 11:19.120]  And I think about that a lot and it makes me think of the question of,
[11:19.120 --> 11:25.120]  okay, hey, if I have this messy physical network environment with cables all over the place
[11:25.120 --> 11:33.120]  and I unplug a cable from one interface and into another in the physical world,
[11:33.120 --> 11:36.120]  then everything changes.
[11:36.120 --> 11:44.120]  And something that keeps coming up for me is this idea of can we listen to Netlink in Linux
[11:44.120 --> 11:53.120]  and build some more intelligence that happens like over the lifetime of a pod.
[11:53.120 --> 12:02.120]  And IPv6 usually comes up when we talk about this because in IPv6 you can have Slack,
[12:02.120 --> 12:11.120]  you can have router advertisement, so your IP addresses can be assigned kind of on the fly,
[12:11.120 --> 12:17.120]  your routes can change, your network can be changing, like these things are considered in IPv6.
[12:17.120 --> 12:25.120]  And I don't think that they're very well covered in Kubernetes in general and I think about that.
[12:25.120 --> 12:32.120]  I also was like, hey, if I have sort of this like high level way to express this,
[12:32.120 --> 12:44.120]  I'm like, could I train something that might know about this to sort of figure it out for me like an AI?
[12:44.120 --> 12:51.120]  If you haven't heard everyone talking about like the leaps and bounds in AI in the last like whatever,
[12:52.120 --> 12:56.120]  bring up social media and just see everyone talking about it.
[12:56.120 --> 13:00.120]  So I was like, yeah, maybe I could train like Chad GPT to do this.
[13:00.120 --> 13:05.120]  I realized that at this point it will probably just confidently make up something about my network.
[13:05.120 --> 13:09.120]  That's not actually true, but it seemed kind of sweet.
[13:09.120 --> 13:16.120]  Another question that came up when I was socializing this idea was,
[13:16.120 --> 13:23.120]  hey, why don't you just apply aliases to your network interfaces on the host?
[13:23.120 --> 13:27.120]  And I'm like, yeah, that's probably fine.
[13:27.120 --> 13:33.120]  And in fact, I would say if you have this problem today, this is probably what you should do
[13:33.120 --> 13:41.120]  instead of using my demo software because it's a really reliable way to do it.
[13:41.120 --> 13:46.120]  However, as a like CNI developer and as a Kubernetes developer,
[13:46.120 --> 13:50.120]  it's just not the way that I approach the problem.
[13:50.120 --> 14:01.120]  And it didn't really for me approach it in a way that exposes that data to something that can really be more of an engine behind this.
[14:01.120 --> 14:03.120]  So I think it's a good approach.
[14:03.120 --> 14:09.120]  I think that there's still something to be asked about scale here.
[14:10.120 --> 14:18.120]  So given that, let's think about some of the possible pitfalls here.
[14:18.120 --> 14:28.120]  Something in my space that gets brought up a lot when we talk about this variety of network tools is,
[14:28.120 --> 14:35.120]  are we going to wind up creating some kind of network manager like a neutron in OpenStack?
[14:35.120 --> 14:43.120]  And I think that there's a lot of problems that that exposes.
[14:43.120 --> 14:52.120]  And I think that it might also sort of like give some tunnel vision to the problems.
[14:52.120 --> 15:02.120]  And what I really like about how we approach networking in Kubernetes with CNI is we do this in a modular fashion.
[15:02.120 --> 15:07.120]  So with something like this idea of my survey or CNI,
[15:07.120 --> 15:14.120]  maybe it is a more of a modular way to approach it that's like, hey, here's one tool for this specific case.
[15:14.120 --> 15:20.120]  So if you don't encounter this problem, well then don't use this thing, right?
[15:20.120 --> 15:25.120]  And or if you do and you have a non-uniform environment,
[15:25.120 --> 15:29.120]  use something like Surveyor and get it to work for you or whatever,
[15:29.120 --> 15:35.120]  create Ansible scripts and make the aliases to your interfaces.
[15:35.120 --> 15:51.120]  Another pitfall of this as well is it doesn't cover how the workloads would be scheduled to nodes that have those devices available, right?
[15:51.120 --> 15:57.120]  So, you know, in my thing where I say network green,
[15:57.120 --> 16:05.120]  I really assume you're going to have a device mapping and association to the green network on every node.
[16:05.120 --> 16:16.120]  So if you have nodes that aren't attached to the green network, well, you need to know a way to approach that.
[16:16.120 --> 16:28.120]  And the way that we approach this problem today with a resource being available on a specific node is with something called device plugins.
[16:28.120 --> 16:38.120]  And what device plugins do is they give the Kubernetes scheduler awareness of consumable resources on a particular host.
[16:38.120 --> 16:44.120]  So if you've got, say, SROV network interfaces that are for high performance,
[16:44.120 --> 16:58.120]  your device plugin can know about that and tell Kubernetes that, hey, there are 15 virtual functions on this SROV card on this node.
[16:58.120 --> 17:01.120]  And that's how we approach that.
[17:01.120 --> 17:11.120]  So you could definitely extend my idea with adding device plugins, which are not super, super intuitive to use.
[17:11.120 --> 17:15.120]  And I think it's an area that also needs help in this space.
[17:15.120 --> 17:20.120]  But it would be a solution to work through that.
[17:20.120 --> 17:32.120]  So all of that being said, I'm kind of on a campaign to try to give people more awareness of what's going to happen with CNI 2.0.
[17:32.120 --> 17:46.120]  And if you are interested in this space at all, I encourage you to keep an eye on what's going on in the next version of CNI.
[17:46.120 --> 17:51.120]  We've got a number of problems that we really want to address.
[17:51.120 --> 18:00.120]  One of those is what happens to networking during a pod lifecycle.
[18:00.120 --> 18:05.120]  So, you know, I was mentioning, like, hey, what happens if I unplug this network and plug it in somewhere else?
[18:05.120 --> 18:09.120]  Like, can I detect that? Can I do something about that?
[18:09.120 --> 18:16.120]  Because right now, CNI works in essentially two different operations, which is add and delete.
[18:16.120 --> 18:23.120]  So when your network is added, CNI runs.
[18:23.120 --> 18:26.120]  When your pod is added, CNI runs.
[18:26.120 --> 18:29.120]  When your pod is deleted, CNI runs to clean it up.
[18:29.120 --> 18:35.120]  We want to have something that goes further through the lifecycle there.
[18:35.120 --> 18:40.120]  So, like, with IPv6 and ever-changing things, that can be covered.
[18:40.120 --> 18:47.120]  Maybe you can improve how, like, a cleanup happens because CNI delete isn't necessarily guaranteed.
[18:47.120 --> 18:59.120]  But the number one thing I think should happen with CNI 2.0 is that we have, like, more of a Kubernetes awareness for CNI 2.0.
[18:59.120 --> 19:04.120]  CNI is Container Orchestration Agnostic, which is an awesome start for it.
[19:04.120 --> 19:09.120]  But we only have one Container Orchestration Engine left, essentially, and it's Kubernetes.
[19:09.120 --> 19:13.120]  And I think it's important.
[19:13.120 --> 19:18.120]  So, yes, that being the case, this is how I approached it, right?
[19:18.120 --> 19:23.120]  I took this problem and I approached it with Kubernetes and with CNI, right?
[19:23.120 --> 19:33.120]  And I needed to kind of figure out how to, like, interact with all of these objects again when I made a new CNI plug-in.
[19:33.120 --> 19:37.120]  So I sort of, like, took a look at where I spent my time.
[19:37.120 --> 19:45.120]  And I spent it a lot during the, like, integration into Kubernetes.
[19:45.120 --> 19:50.120]  So, you know, I'm a CNI developer, so I spent a good chunk of the time on that.
[19:50.120 --> 19:56.120]  I spent a half-decent chunk on design, and I spent most of the time integrating it with Kubernetes.
[19:56.120 --> 20:00.120]  And I think that that could be improved.
[20:00.120 --> 20:06.120]  The X factor here was that I also used chatGBT to generate a bunch of my code.
[20:06.120 --> 20:09.120]  So maybe that also wasted my time.
[20:09.120 --> 20:14.120]  So there's that because it isn't always truthful of you.
[20:14.120 --> 20:19.120]  But, yeah, I want CNI 2.0 to communicate with Kubernetes.
[20:19.120 --> 20:21.120]  I think it's going to be a revolution in this phase.
[20:21.120 --> 20:28.120]  And I want everyone here that's interested in it to, like, get involved in this effort.
[20:28.120 --> 20:33.120]  And I think it's going to be, like, the next big thing for networking and Kubernetes.
[20:33.120 --> 20:35.120]  So, yeah, if you want, try it out.
[20:35.120 --> 20:37.120]  And that concludes what I've got.
[20:37.120 --> 20:39.120]  I can open it up for questions.
[20:39.120 --> 20:41.120]  Thank you.
[20:49.120 --> 20:54.120]  Have you considered using Linklayer Detection Price Go?
[20:54.120 --> 20:55.120]  Or Discovery, I can't remember.
[20:55.120 --> 20:57.120]  It actually stands for...
[20:57.120 --> 20:58.120]  Yeah, that's the one.
[20:58.120 --> 20:59.120]  Love it.
[20:59.120 --> 21:00.120]  I love that idea.
[21:00.120 --> 21:06.120]  I hadn't thought of it, but I am going to put that right in line with a Netlink thing.
[21:06.120 --> 21:08.120]  Yeah, I was just thinking about that as well.
[21:08.120 --> 21:13.120]  But isn't the problem that it finds layer 2 devices, your adjacency with the layer 2 device, not the layer 3?
[21:13.120 --> 21:18.120]  So I was wondering, would you be better to multi-cast from each, to also discover,
[21:18.120 --> 21:22.120]  have each device multi-cast what it thought and what name it gave to this network?
[21:22.120 --> 21:26.120]  The question then is how you ever assign the colors if you want to completely automate it.
[21:26.120 --> 21:30.120]  And maybe you have chat GPT assigned the colors for you.
[21:30.120 --> 21:31.120]  I like it.
[21:31.120 --> 21:33.120]  No, I think that's really good.
[21:33.120 --> 21:38.120]  Also, yeah, good consideration to the layer 2 versus layer 3.
[21:38.120 --> 21:44.120]  And, yeah, I think depending on your use case, it might end with your network.
[21:44.120 --> 21:49.120]  But I know in the telco world, there's a lot of LTE networks, for sure.
[21:49.120 --> 21:50.120]  Yes.
[21:51.120 --> 21:56.120]  Yeah, I think the next device T is going to block the LDP.
[21:56.120 --> 21:58.120]  So you'll only see that neighbor.
[21:58.120 --> 22:00.120]  You won't see the other nodes on that subnet.
[22:00.120 --> 22:01.120]  That was more my issue.
[22:01.120 --> 22:03.120]  That's what you want to find out, right?
[22:03.120 --> 22:04.120]  Yeah.
[22:04.120 --> 22:08.120]  It could give you the B line if that's your network separation.
[22:08.120 --> 22:09.120]  Yeah, yeah, it could do.
[22:09.120 --> 22:15.120]  Yeah, I think it merits consideration for sure.
[22:16.120 --> 22:18.120]  Any other questions?
[22:20.120 --> 22:21.120]  No?
[22:21.120 --> 22:22.120]  Okay.
[22:22.120 --> 22:23.120]  Thank you very much.
[22:23.120 --> 22:24.120]  Thank you all.
[22:24.120 --> 22:25.120]  Thank you.