[00:00.000 --> 00:10.320]  All right. Good afternoon, everyone. Before I get started, how many of you attended the
[00:10.320 --> 00:15.680]  session about service mesh this afternoon? Sorry about that. If you're fed up with me,
[00:15.680 --> 00:22.280]  you have to stay with me for another session. I will repeat some parts of it to introduce
[00:22.280 --> 00:28.160]  the topic. But otherwise, I'm sorry about that. So welcome to the session, Golden Signals
[00:28.160 --> 00:34.560]  with Cillium and Grafana. My name is Raymond De Jong. I'm field CTO for EMEA at Isovalent.
[00:34.560 --> 00:39.280]  Isovalent is the company which originated Cillium. And today I'm going to talk about
[00:39.280 --> 00:45.520]  Grafana and how Cillium enables you to get golden signals out of the box. Introduction
[00:45.520 --> 00:51.920]  about technology, a little bit about EBPF and Cillium, then about observability challenges
[00:51.920 --> 00:59.080]  and how we can tackle those challenges using observability. A bit on monitoring data operations
[00:59.080 --> 01:03.880]  and the default dashboards we provide. And then hopefully the demo gods are with me for
[01:03.880 --> 01:10.280]  a small demo to actually see how we get layer seven metrics. And we can see return codes
[01:10.280 --> 01:19.440]  and monitor application, response times, et cetera. So to start this topic, I want to
[01:19.440 --> 01:25.680]  introduce Cillium and EBPF. How many of you know about EBPF? Quite a lot, awesome. How
[01:25.680 --> 01:36.920]  many of you know about Cillium and are using Cillium? Cool. Great. So Isovalent is the
[01:36.920 --> 01:42.720]  company behind each Cillium and EBPF. We maintain it with the community. And to start
[01:42.720 --> 01:50.560]  with, EBPF is explaining what EBPF is and how it works. So we like to say what JavaScript
[01:50.560 --> 01:55.040]  is to the browser, EBPF is to the kernel. What we mean by that is that we make it extensible
[01:55.040 --> 02:00.200]  in a dynamic way. So that means we're not changing the kernel, but at kernel speed we
[02:00.200 --> 02:07.400]  can run programs based on kernel events. And for the context of this session, what's important
[02:07.400 --> 02:13.280]  is that considering metrics, considering getting useful information from your applications,
[02:13.280 --> 02:19.080]  what we're doing here is whenever a process opens to Sokort or a packet gets sent on the
[02:19.080 --> 02:24.320]  network device, on the node, we can expose metrics or we can export those metrics using
[02:24.320 --> 02:32.680]  EBPF. And we can use tools like Rafaana to display it in a good way so you can make,
[02:32.680 --> 02:39.760]  you can do data operations or you can see how your application or cluster is performing.
[02:39.760 --> 02:44.520]  Cillium runs on EBPF. The good thing about Cillium is you don't need to be an EBPF engineer
[02:44.520 --> 02:51.880]  to run an operate Cillium. You just set certain configuration options and EBPF programs will
[02:51.880 --> 02:58.240]  be mounted on the nodes and will run when they need to run. And Cillium on the high
[02:58.240 --> 03:04.600]  level provides a number of capabilities, networking capabilities, observability capabilities,
[03:04.600 --> 03:11.360]  also service mesh, and using catricon solution we can also use runtime visibility and observability
[03:11.360 --> 03:19.520]  security based on processes such as opening files, process execution, et cetera, et cetera.
[03:19.520 --> 03:26.120]  Today we'll focus about the observability part. So besides networking we can get rich visibility
[03:26.120 --> 03:32.920]  of metrics on your cluster. As you may know Google uses on data plane
[03:32.920 --> 03:40.040]  V2 actually under the hood Cillium, Microsoft has moved Azure default AKS clusters to Cillium
[03:40.040 --> 03:44.560]  so all the cloud providers see that Cillium is a powerful technology to run clusters at
[03:44.560 --> 03:51.560]  scale and to get useful metrics running them. So let's start with common challenges we see.
[03:51.560 --> 03:55.920]  One of the main challenges if we talk about performance of your application or performance
[03:55.920 --> 04:01.480]  of your clusters at scale is that you run into this issue of the finger pointing problem.
[04:01.480 --> 04:06.760]  What I mean by that is that network connectivity is layered and when you run into issues you
[04:06.760 --> 04:13.480]  need to especially at scale you need to be easily aware where a possible problem could
[04:13.480 --> 04:17.480]  exist and it can be in multiple layers and if you look at the OZ layer obviously it can
[04:17.480 --> 04:21.480]  be at the data link layer, the network layer, the transport layer or in the application
[04:21.480 --> 04:27.880]  layer. So the goal of Cillium with Kavana is to give you the tools to quickly inspect
[04:27.880 --> 04:36.160]  what's going on and to be more efficient in troubleshooting your cluster and or applications.
[04:36.160 --> 04:40.840]  Another common issue especially at scale is obviously the signal to noise problem. You
[04:40.840 --> 04:46.760]  may run in the cloud you get data from your notes you see IP addresses communicating with
[04:46.760 --> 04:51.760]  each other but IP addresses by itself mean nothing in Kubernetes clusters. They come,
[04:51.760 --> 04:57.440]  they go and at scale it's impossible to track and trace what's going on with service to
[04:57.440 --> 05:04.680]  service communication in your applications. Also where our existing mechanisms falling
[05:04.680 --> 05:11.200]  short so first of all network devices think about centralized monitoring or firewall solutions,
[05:11.200 --> 05:17.840]  think about Splunk. They are great to get alerts, to get dashboards but again at scale
[05:17.840 --> 05:25.640]  they can be very costly or they can be a bottleneck. Also these devices don't have awareness of
[05:25.640 --> 05:30.640]  the identities of your applications running on your Kubernetes clusters. Cloud provider
[05:30.640 --> 05:37.200]  network logs are nice for note to note communication but don't provide identity as well. You can
[05:37.200 --> 05:44.320]  monitor the host, you can see Linux host statistics but that gives you only visibility on the
[05:44.320 --> 05:50.000]  note level and again a Linux note doesn't have any awareness of the identities of the
[05:50.000 --> 05:56.000]  services running in your Kubernetes cluster. You may want to try to modify application
[05:56.000 --> 06:00.480]  code and this applies a bit to the service mesh session. You may want to install libraries
[06:00.480 --> 06:05.200]  which gives you visibility of the application but then you don't have visibility on the
[06:05.200 --> 06:11.840]  networking layer. Service mesh, main goal of service mesh is obviously visibility of
[06:11.840 --> 06:17.040]  the network and trend communication to get metrics out of that but that with the sidecar
[06:17.040 --> 06:23.080]  based implementation comes with a footprint and with induced latency plus you have operational
[06:23.080 --> 06:29.400]  complexity maintaining your service mesh solution on top of your Kubernetes clusters.
[06:29.400 --> 06:37.320]  So this is where Cilium comes around the corner is that we provide identity aware based observability
[06:37.320 --> 06:44.280]  and security. What that means is based on the labels you set on your workloads, we create
[06:44.280 --> 06:51.480]  unique identities and we're able to attach that identity in the data plane using eBPF
[06:51.480 --> 06:56.600]  and using that identity we can do things with that so we can secure the connectivity in
[06:56.600 --> 07:01.560]  this example a front end to a back end is allowed to communicate based on the network
[07:01.560 --> 07:06.400]  policies we set and the identities we are aware of and these identities are cluster
[07:06.400 --> 07:12.640]  wide property but in terms of observability this also means that we can use this identity
[07:12.640 --> 07:19.720]  to get rich metrics and data for that identity and you can monitor it effectively. This means
[07:19.720 --> 07:24.960]  that you're not looking anymore at IP addresses, you're looking at identities so the whole
[07:24.960 --> 07:33.840]  set of workloads the service to service communication for a front end to a back end.
[07:33.840 --> 07:39.320]  Hubble is our observability solution built on top of Cilium, how it works is that Cilium
[07:39.320 --> 07:46.520]  runs as a demon set on your cluster nodes as an agent and Hubble can retrieve data from
[07:46.520 --> 07:53.840]  those agents through a CLI or UI and we can export metrics based on your workloads. So
[07:53.840 --> 07:59.080]  there are three parts, first of all the Hubble UI gives a service dependency map so on a
[07:59.080 --> 08:05.120]  namespace level you can see what's deployed, what is communicating with each other, what
[08:05.120 --> 08:11.080]  kind of protocols they're using, what's coming between namespaces so you would see for example
[08:11.080 --> 08:16.720]  if there's inter namespace communication you can identify the source namespace, if you
[08:16.720 --> 08:22.640]  use cluster mesh you can even identify the source cluster, you can also identify egress
[08:22.640 --> 08:28.200]  traffic and ingress traffic on a namespace level. The Hubble CLI is more a power user
[08:28.200 --> 08:33.160]  tool to give you detailed flow, you can export it to JSON, you can do a lot of filtering
[08:33.160 --> 08:38.920]  based on labels. Hubble metrics is the part where mostly the topic for today is where
[08:38.920 --> 08:44.440]  you export metrics and you use Grafana for example to observe the performance of your
[08:44.440 --> 08:50.480]  cluster application. This is all fueled through EBPF so again think about a network device
[08:50.480 --> 08:56.920]  sending a packet, that's a kernel event, EBPF program gets attached to it, gets the metrics
[08:56.920 --> 09:05.480]  and it's done. This is a small screenshot of the CLI so this gives you flow visibility
[09:05.480 --> 09:10.520]  using Hubble observe commands, you can follow for example based on the label in this case
[09:10.520 --> 09:15.880]  X-Wing so we're following all the workloads labeled with X-Wing so again no IP addresses
[09:15.880 --> 09:22.360]  just labels. In purple it's highlighted these IDs we use so again each unique set of labels
[09:22.360 --> 09:28.640]  gets a unique cluster wide ID and based on those IDs we can track based on labels what
[09:28.640 --> 09:34.080]  communication is going on and there's a lot of metadata you can filter on things like
[09:34.080 --> 09:41.400]  headers, things like ports, things like protocols, obviously labels in Q&A spot names, services,
[09:41.400 --> 09:48.720]  worker nodes, DNS, we also have Cilium network policies which allow you to filter and observe
[09:48.720 --> 09:56.640]  two FQDN rules meaning we can inspect queries to external domains and we can filter based
[09:56.640 --> 10:02.640]  on that and obviously Cilium related identity such as world, ingress, egress, host and that
[10:02.640 --> 10:11.000]  kind of stuff. Policy verdict matches, things like dropped, allowed and stuff. This is the
[10:11.000 --> 10:17.160]  Hubble UI surface map like I said before this gives you a namespace level view in this case
[10:17.160 --> 10:22.960]  we have a jobs app and I'm using this app as well in the demo I'm showing a bit later
[10:22.960 --> 10:26.560]  so here you're looking at a namespace level view where you can see all the surface to
[10:26.560 --> 10:32.000]  surface communication of your application running in that namespace. In this case it's
[10:32.000 --> 10:37.080]  only intra namespace communication and you can see for example that the recruiter is
[10:37.080 --> 10:42.800]  talking to core API, the core API is talking to Elasticsearch, we have a zookeeper component,
[10:42.800 --> 10:49.120]  we identify Kafka, also identifying Kafka protocols so there are a number of protocols
[10:49.120 --> 10:56.720]  we can inspect and see and we also see layer 7 information so in this case HTTP calls to
[10:56.720 --> 11:03.040]  a specific URI or URL with specific method and return calls and this is triggered through
[11:03.040 --> 11:09.840]  just simple construct as a Syllium Network Policy. If you just allow let's say internamespace
[11:09.840 --> 11:19.640]  traffic and you are accepting HTTP that already triggers this visibility for you to see. Now
[11:19.640 --> 11:28.080]  using this data we can also export metrics to Grafana so we are working with Grafana
[11:28.080 --> 11:35.400]  a lot more lately that means that we are building a lot of more useful dashboards and also integrating
[11:35.400 --> 11:41.960]  with things like Tempo for getting transparent tracing in Grafana, all powered through Syllium
[11:41.960 --> 11:48.680]  and EBPF. This allows us to not only see on the network level metrics on performance
[11:48.680 --> 11:54.160]  on the node but also for surface to surface communication to provide golden signals things
[11:54.160 --> 12:01.400]  like HTTP request rate, latency, response codes and error codes which would as an application
[12:01.400 --> 12:08.440]  engineer would allow you to quickly see which component of the application is not responding
[12:08.440 --> 12:15.240]  as it should. But also detecting transient network layers so this will be more network
[12:15.240 --> 12:22.080]  related we may see retransmissions, we can see bytes sent and received and we can indicate
[12:22.080 --> 12:27.680]  things like boundary time to indicate a network layer problem. So maybe in a data center you
[12:27.680 --> 12:32.600]  have a specific rack switch or top of rack switch not performing as it should so nodes
[12:32.600 --> 12:38.160]  connected to that switch will have improved or will have reduced performance and you would
[12:38.160 --> 12:48.280]  see latency increasing. Now with the latest dashboards we also able to see programmatic
[12:48.280 --> 12:56.360]  API request using transparent tracing. This goes to the integration with Grafana. So at
[12:56.360 --> 13:00.320]  the moment your application need to be able to support it so you need to be able to inject
[13:00.320 --> 13:07.560]  traces. But we are working out of the box getting also this support and be able to help
[13:07.560 --> 13:14.480]  by help support HTTP traces as such. And then you get this exemplar so I am pointing
[13:14.480 --> 13:20.680]  at a small exemplar here after which you can inspect this with Tempo and you can see a
[13:20.680 --> 13:29.240]  span of a specific request and see where the problem may reside. A bit more on monitoring
[13:29.240 --> 13:34.640]  so this is more day 2 ops. I want to highlight that if you run Cillian and you are also using
[13:34.640 --> 13:40.400]  Grafana make sure that you install the agent, Hubble and operator metrics plugins. These
[13:40.400 --> 13:46.400]  are out of the box plugins we provide through the Grafana marketplace you can download.
[13:46.400 --> 13:51.240]  This gives you visibility in the performance of your cluster. So first of all agent metrics
[13:51.240 --> 13:56.000]  everything on the node level how the node is performing, how many throughput they are
[13:56.000 --> 14:03.000]  processing, how many memory the BPF is using, all this related stuff. Hubble metrics gives
[14:03.000 --> 14:11.360]  you the visibility across your cluster in terms of application, layer 7 return calls,
[14:11.360 --> 14:17.960]  policy verdicts so allows versus drops so you can monitor on the cluster level the performance
[14:17.960 --> 14:24.240]  of your applications. And in some cases you run an operator so you may want to track the
[14:24.240 --> 14:31.840]  number of identities, how the cluster in general is behaving, API responses and such. And finally
[14:31.840 --> 14:37.280]  what we released just a few days ago thanks to Raphael who is also here is the Cillian
[14:37.280 --> 14:45.800]  policy verdict metrics dashboard which gives you the capability to get meaningful graphs
[14:45.800 --> 14:52.920]  if you have workloads actually hitting network policies you set. What I mean by that is that
[14:52.920 --> 14:57.600]  when we work with customers with Cillian is they want to go to this micro segmentation
[14:57.600 --> 15:03.720]  zero trust model and you can use obviously Hubble to monitor service-to-service communication
[15:03.720 --> 15:10.840]  and to see if traffic is allowed and denied. But this dashboard also is a very useful tool
[15:10.840 --> 15:17.000]  to confirm if you have either ingress or egress policies which are matching with your traffic.
[15:17.000 --> 15:23.240]  So in this case we see green graphs which means that on ingress and egress we have matching
[15:23.240 --> 15:29.960]  traffic. The purple represents DNS matching traffic but if there's some yellow traffic
[15:29.960 --> 15:35.480]  that's either allow all match traffic which is too broadly which should trigger you to
[15:35.480 --> 15:41.000]  get even better network policies to make sure that kind of flows are actually related to
[15:41.000 --> 15:47.320]  a network policy to ensure that both ingress and egress traffic is secured as such. If
[15:47.320 --> 15:51.600]  you do so all the graphs will turn green and you know and you can confirm for that given
[15:51.600 --> 16:01.280]  namespace that you have secured it. Alright I've prepared a little demo. This runs this
[16:01.280 --> 16:08.600]  tenant jobs application I mentioned before. I'm running this on Kynes so just a simple
[16:08.600 --> 16:15.040]  Kynes cluster on my laptop. I'm showing here the components of my application so it's you
[16:15.040 --> 16:21.080]  know a number of workloads I've shown before on the screen shot. To help me through this
[16:21.080 --> 16:26.840]  demo I've created a little script and what this does it only updates a helm chart for
[16:26.840 --> 16:33.920]  this application so it makes my workflow a lot easier. I don't have to enter commands
[16:33.920 --> 16:41.520]  but we should see some things changing in a Grafana dashboard. Before I start this let
[16:41.520 --> 16:51.080]  me highlight the metrics so I need to log in. So I've installed Grafana, I've installed
[16:51.080 --> 16:56.120]  Tempo, I've installed Prometheus and configured Silium to export those metrics. So this is
[16:56.120 --> 17:01.760]  currently the performance of my application running on my Kynes cluster on my laptop.
[17:01.760 --> 17:09.000]  So as you can see we have 100% success rate, we have incoming 100% and we also have good
[17:09.000 --> 17:17.520]  Grafana information for the performance of the application. Okay so let me start with
[17:17.520 --> 17:27.880]  starting the script. Yes so I mentioned before that the Hubble metrics are available as soon
[17:27.880 --> 17:33.040]  as you configure some kind of layer 7 Silium network policy because that triggers the collection
[17:33.040 --> 17:41.840]  of those metrics for layer 7 and I'm showing this but I will show this a bit better in
[17:41.840 --> 17:48.360]  a different window. So what I'm going to do now is I want to increase the request volume
[17:48.360 --> 17:52.760]  so I'm configuring the crawler component to get more requests in my application. As you
[17:52.760 --> 17:59.480]  can see it's redeploying the crawler component. So this is something we should see in the
[17:59.480 --> 18:06.720]  Grafana dashboard. While this is redeploying I can show the helm chart I'm using. You
[18:06.720 --> 18:11.680]  need to be a bit patient with me because it takes one minute before the graphics, the
[18:11.680 --> 18:16.560]  Grafana dashboards are updated and you can actually see the impact of this new version
[18:16.560 --> 18:22.240]  of the application. So typically you configure Silium through a helm values file so in this
[18:22.240 --> 18:29.200]  case on the operator component I've enabled metrics and Prometheus. On the Hubble side
[18:29.200 --> 18:35.200]  I've configured Hubble relay to gather the metrics and also Prometheus and metrics. So
[18:35.200 --> 18:39.520]  in this part it's very interesting because if you want to have layer 7 visibility you
[18:39.520 --> 18:46.200]  need to have specific metrics being enabled. This will be documented in the Silium IO website
[18:46.200 --> 18:51.480]  once we have the new release ready. So as you can see we are matching HTTP V2, we have
[18:51.480 --> 18:57.800]  enabled exemplars, we are looking for labels in terms of context, source IP, source namespace
[18:57.800 --> 19:04.400]  etc. So these are important sets of labels you need to set and on the Prometheus side
[19:04.400 --> 19:10.920]  we've enabled it to gather the graphs. The Silium network policy I mentioned before this
[19:10.920 --> 19:19.800]  is just a simple example. We allow everything within a namespace. We have enabled DNS visibility
[19:19.800 --> 19:27.760]  so we're inspecting all DNS traffic to cube DNS that allows us to get visibility of the
[19:27.760 --> 19:33.720]  DNS queries. We've enabled ingress and egress for the purpose of the demo so we can also
[19:33.720 --> 19:41.040]  see that traffic. And what's important is that we have an empty or open rule HTTP which
[19:41.040 --> 19:47.120]  allows us to see all traffic, it allows all traffic, but that triggers the collection
[19:47.120 --> 19:56.480]  of metrics. Alright so on the demo side so it has deployed a new version of my application
[19:56.480 --> 20:04.920]  looking at my metrics. I can see incoming request volume increasing so you see already
[20:04.920 --> 20:11.880]  an increase of volume. We also see requests by source and response codes increasing so
[20:11.880 --> 20:18.600]  still 200, always fine, always good, just an increase of request per seconds. And also
[20:18.600 --> 20:29.880]  on the incoming side. Okay all good. Okay let's now deploy a new configuration of our
[20:29.880 --> 20:40.800]  app and this app has an error. So let's see what we can see there. I can redeploying the
[20:40.800 --> 20:46.720]  core API components and now we should be able to see the error rate increase as a result
[20:46.720 --> 20:56.640]  of core API configuration changing. So this will take one minute. Here I can select the
[20:56.640 --> 21:02.400]  destination workload so I can switch between core API or the loader component to see how
[21:02.400 --> 21:07.280]  the traffic for that destination is matching and how it's performing. Let's give it a few
[21:07.280 --> 21:14.400]  seconds to actually show. What I'm looking for is the incoming request to access rate.
[21:14.400 --> 21:19.560]  Obviously this application version has an error so the success rate will be lower than
[21:19.560 --> 21:49.520]  before. It's running. It's always a bit, takes a bit longer than I wanted. There we
[21:49.520 --> 22:05.960]  have it. In the meantime I'll already start the next step so I don't have you waiting.
[22:05.960 --> 22:11.400]  So here you see that the success rate is decreasing because of this faulty version of my application
[22:11.400 --> 22:17.680]  so I can really see there's something wrong with my application and as application developer
[22:17.680 --> 22:23.720]  or owning this namespace I should now be able to investigate what's going on. This also
[22:23.720 --> 22:29.880]  means that here on the incoming request to source and response code I would see the resumes
[22:29.880 --> 22:37.440]  components showing 500 and 503 error return codes which triggers me to check that component
[22:37.440 --> 22:48.560]  and communication between those components. Also on the destination site. All right. So
[22:48.560 --> 22:58.640]  now I've introduced a new version and what this does is changing the request duration.
[22:58.640 --> 23:03.120]  So again a new version of the application and let's see how we can monitor this performance
[23:03.120 --> 23:08.160]  of the application in Grafana. So let's check the request duration increase as a result
[23:08.160 --> 23:19.120]  of core API configuration changing. Okay. So let's use here. So I'm monitoring HTTP request
[23:19.120 --> 23:35.320]  duration by source and destination. So if the demo works well we see an increase there.
[23:35.320 --> 24:04.160]  Okay. It takes a bit too long. I'm comfortable with but it should be there any minute. It
[24:04.160 --> 24:16.440]  should appear any moment. Let me just... In the meantime I will deploy a new version of
[24:16.440 --> 24:22.480]  the application which also introduces tracing. So again for tracing to be supported you at
[24:22.480 --> 24:28.320]  this moment your application needs to support that. So in this case I'll deploy a new version
[24:28.320 --> 24:36.880]  of this application to support tracing. And this is using open telemetry. So let's deploy
[24:36.880 --> 24:45.720]  that in the meantime. That's deploying. In the meantime I can check how the request duration
[24:45.720 --> 25:00.000]  is doing. Okay. This part is not working yet but we should see a request duration increase.
[25:00.000 --> 25:08.360]  Oh God yeah thanks. That doesn't help. I clicked on something. Ah yes thank you so much. Yeah
[25:08.360 --> 25:16.680]  here you can see the request duration increasing. And I just deployed a new version of my application
[25:16.680 --> 25:22.440]  which supports tracing using open telemetry. And then you already can see that I have these
[25:22.440 --> 25:29.880]  exemplars appearing. So I now can not only inspect HTTP request duration but I can also
[25:29.880 --> 25:35.760]  inspect specific traces and exemplars. So if you click on this little box you get this
[25:35.760 --> 25:42.120]  window. You can get valuable information about this trace point. And then you can query it
[25:42.120 --> 25:51.160]  with track tempo. Yep let's leave this side. So here you can see a specific trace ID and
[25:51.160 --> 25:57.000]  you can see a node graph. So this is also nice. You can see between nodes what's going
[25:57.000 --> 26:04.200]  on and you see highlighted in red here what has a high latency as such. And here we can
[26:04.200 --> 26:17.200]  see that in this specific API call there is an error. So a post call and it has some events
[26:17.200 --> 26:22.400]  exception, random error. So something is wrong with my application. So this enables me as
[26:22.400 --> 26:30.480]  an application owner to troubleshoot my application effectively. So this concludes the demo. Let
[26:30.480 --> 26:38.440]  me quickly move to here. All right, so if you want to know more how to configure Cilium
[26:38.440 --> 26:45.560]  to enable metrics, how to configure Cilium with the right values for layer 7 monitoring,
[26:45.560 --> 26:50.760]  I recommend to read the documentation on Cilium.io. If you're using Cilium or planning to use
[26:50.760 --> 26:54.640]  Cilium and you have questions go to our Slack channel. We're happy to help you there. The
[26:54.640 --> 26:59.600]  community is out there and very helpful answering questions. If you want to know more about
[26:59.600 --> 27:07.320]  eBPF go to eBPF.io. We also have released or close to release a lab with this kind of dashboards
[27:07.320 --> 27:12.680]  as well. So feel free to check them out at isovenom.com. And if you want to know more
[27:12.680 --> 27:19.520]  about isovenom.com or you may want to contribute, we also have open positions for engineering
[27:19.520 --> 27:30.040]  as such. So if you want to know more, please check us out. I'm happy to take questions.
[27:30.040 --> 27:42.200]  Any questions? Hello. Thank you for your talk. Is it possible in the service graph of the
[27:42.200 --> 27:54.640]  Hubbell UI to show transitive dependencies of services with the tracing enabled?
[27:54.640 --> 28:01.080]  So with Hubbell UI, the open source version, you will see the service connectivity only.
[28:01.080 --> 28:08.800]  So that related information is not integrated in Hubbell as such. So you would switch between
[28:08.800 --> 28:14.960]  Hubbell and Grafana to get that information. On the enterprise, we have built in dashboards
[28:14.960 --> 28:26.160]  for getting that specific areas of monitoring. So let's say application or node performance
[28:26.160 --> 28:31.200]  or cluster-wide performance. We have some dashboards which should quickly highlight performance
[28:31.200 --> 28:33.920]  issues there. Okay. Thank you.
[28:33.920 --> 28:41.440]  Any other questions? Hello. Did you measure the impact of metrics,
[28:41.440 --> 28:47.000]  recollect of metrics on network performance? Yeah. We do have some performance-related
[28:47.000 --> 28:53.200]  reports on sodium.io. So yes, it comes with a price. Using EBPF, we keep it as low as
[28:53.200 --> 28:58.920]  possible. It's a very hard question to answer because it will depend on which flags you
[28:58.920 --> 29:06.600]  configure. So if you have full layer 7 visibility across all workloads in your cluster, of course
[29:06.600 --> 29:13.240]  it will have some performance impact for sure. Yes. Using EBPF, we keep it as low as possible.
[29:13.240 --> 29:16.760]  But yeah, it's a multi-dimensional question. It depends on the amount of traffic, the amount
[29:16.760 --> 29:24.560]  of applications, how big your cluster is, et cetera. So we have some performance reports
[29:24.560 --> 29:30.320]  you can check. So that's 500 nodes, 1,000 network policies, helpful, enabled, and you
[29:30.320 --> 29:35.680]  get some feel of how memory consumption and processing is with Selium. So feel free to
[29:35.680 --> 29:42.960]  check them out on the selium.io website. But in practice, it's a multi-dimensional story.
[29:42.960 --> 29:47.160]  Welcome. Any other questions in the background?
[29:47.160 --> 29:56.120]  Hi. Thanks for the talk. A couple of questions about the integration of Selium on AQS and
[29:56.120 --> 30:05.920]  GKE. Is there anything specific regarding those implementations or are all the tools
[30:05.920 --> 30:17.800]  that work natively on these kind of clusters? And second questions regarding above UI, is
[30:17.800 --> 30:23.840]  it possible to see intern namespace traffic flows or is it limited to intranamespace?
[30:23.840 --> 30:29.680]  OK. Good question. So to answer the second question first, yes, you can see that. So
[30:29.680 --> 30:34.440]  if there is communication between, from a different namespace in regards to your namespace
[30:34.440 --> 30:39.600]  and monitoring, you will see that. You will see those labels, and you will see the workloads,
[30:39.600 --> 30:44.360]  even across clusters if you enable cluster mesh. So yes, that works out of the box.
[30:44.360 --> 30:50.800]  On the cloud provider side, so if you run AQS with Azure CNI Powered by Selium, you have
[30:50.800 --> 30:56.960]  a limited set of metrics which are enabled. And that's obviously from support reasons
[30:56.960 --> 31:02.320]  for Microsoft to support that solution out of the box. However, you can also choose to
[31:02.320 --> 31:09.560]  bring your own CNI with AQS, and that also applies to GKE and AQS, to manage Selium yourself.
[31:09.560 --> 31:16.400]  Right? So then you have the freedom to configure the flags I just shown and to configure Selium
[31:16.400 --> 31:22.400]  as such. Keep in mind that you're responsible obviously of monitoring, managing Selium,
[31:22.400 --> 31:28.280]  and the cloud provider will manage the cluster. Any other question?
[31:28.280 --> 31:29.800]  Cool. We have to cut it in.
[31:29.800 --> 31:30.800]  Oh, yes. Sorry.
[31:30.800 --> 31:31.800]  OK. Thank you very much.
[31:31.800 --> 31:32.800]  Thank you.
[31:32.800 --> 32:01.800]  Thank you.