So, up next, we have Christos and Alex and unifying observability in the power of common schema. Okay, thanks everyone and welcome to our talk. We will in this presentation talk about the conversion story of two schemas of open telemetry in the elastic common schema. But let's first introduce ourselves. My name is Alex. I'm leading the open telemetry initiative at Elastic and I'm a co-maintenor of the open telemetry semantic conventions project. Hi, I'm Christos. I work on elastic as well and I'm software engineer focusing on observability and specifically open telemetry where I am a contributor and a prover on the semantic convention project. Okay, we would like to start with a quite easy and simple question. How many of you do know exactly what open telemetry is? That's great. I can skip some slides later. How many of you do know what semantic conventions is about? That's what I expected. And how many of you do know what elastic common schema is? Okay, thanks everyone. So let's deep dive a bit on the history of open source tools and standards in observability to give us a picture where the standards come from. Let me. Okay. No. Does that work? Okay, around, do you hear me? That works well. Okay. Around or a bit more than 10 years ago when microservice emerged that also changed the observability market and industry. That's when like big tech companies started building their own open source tools for collecting observability data. So tools like Zipkin, Jega for distributor traces emerged, the Elk stack for logging, Prometheus for metrics. We heard a lot about this in previous talks. And based on this defective standard tools, then actual standards emerged like open tracing, open sensors later for distributed tracing, open sensors also covered metrics and the open metrics as a derivative of Prometheus format emerged and Elastic has its own ECS that defines the semantics of structured logging data. Since we will talk a bit more about ECS, a quick introduction what that is. So ECS stands for the Elastic Com Schema and it's basically just a definition of a set of fields that describe the semantics in structured logging data. So for example, if you're collecting a service name with your observability data, the Com Schema tells you that you should put this value into a field that is called service.name, not app.name or application.name. So you have common names that you can later on search for and this also allows you to correlate data across different signals. Now as you can see, we already have at least four standards here that are partially competing, partially complementary. Plus we have all the tools that also create some defective standards for collecting data. So it's ridiculous to have so many standards, right? We need one more that covers all of them. And usually what happens is we have one more that is competing with all the others. And yes, we have one more standard for observability. OpenTelemedia will come back to the comic later again. This is the slide that I can skip based on the Paul. So OpenTelemedia provides not just a standard but a full ecosystem and framework for observability. For collecting data, sending it protocol. One thing that I want to highlight here, there is a specification in OpenTelemedia that defines what data you can collect, like traces, metrics, logs. OpenTelemedia working group is also working on a profiling signal. And what we will talk more about in this presentation is the semantic conventions. Semantic conventions are very similar to what I've shown for ECS. And basically defines, yeah, attribute names and their semantics. Let's have a concrete example of how the data structure in OpenTelemedia looks like here with some logging data. Very simplified view here, it's a bit more complex. But let's say we have a set of log records, right? The OpenTelemedia protocol defines like the core structure of that signal with fields like severity text, which is basically the log level and body, which is basically the log message. In addition, you can collect with your observability data additional context information. This is usually represented in so-called attributes, and that's where semantic conventions come into play. The semantic conventions define which attributes exist, their names, types, and also the semantics behind this. For example, if you're collecting an HTTP access log, right, and you want to capture the HTTP request method, this is the attribute name that you would use for it. Now observability data is usually also captured in a broader context for some resource like a concrete service, a host, or other resources. That's why OTLP wraps the actual observability data into a resource wrapper, and a resource again has a set of attributes, so-called resource attributes, that describe the resource, something like the service name, host name, and so on. So this is the structure in OpenTelemedia for collecting observability data, and semantic conventions is just about the attributes basically in their meaning in this data. Now let's come back to our timeline of standards. There's one important thing I didn't mention before. Actually OpenTelemedia, and we heard this in the previous talk, is the result of a merger between open tracing and open sensors. OpenTelemedia also supports Prometheus metrics and OpenMetrics that we have heard in some of the previous talks, and just last year, Elastic also announced the donation of ECS into OpenTelemedia. So coming back to this, the question is, is it really that we have one more competing standard? I would say actually not. With OTLP we have less competing standards, and OTLP really succeeds in reducing the amount of competing standards and becoming the one and single standard for observability. Now as I said before, Elastic announced the donation of ECS into the OTLP's semantic conventions project. Why? Yeah, because there are great benefits to this. First of all, there are complementary parts and strengths in both schemas that we now merge into one single schema. And second, we grow two different communities by merging them and providing a bigger network effect. So it's a huge win I think for the community, but there are not only benefits, there are also challenges, right? First of all, the overlap between the two schemas is a potential for schema conflicts. And to resolve these conflicts might mean that we need to have either breaking changes in the one schema or in the other. We have seen the structure of observability data in OpenTelemedia, which consists of the protocol with the nested structure plus the semantic conventions. It's quite different to how ECS defines the fields because ECS is just a plain definition of fields without like nested structures or so. So there's some difference resolving that is a bit of a challenge. Another interesting thing that we discovered when we started merging ECS is that in OpenTelemedia before the merger, many times attributes have been defined in a concrete context. For example, we have here an HTTP server span and the attribute HTTP route is basically defined under the semantic conventions for HTTP server spans. The problem is now if I want to use the same attribute in a different context like let's say HTTP access logs, I mean there was always a means just to reference the other attribute, but it feels sort of weird because in the one context is a first class, right, attribute and the other one is just a reference that overrides some semantics. So learning from ECS, what we already achieved with the merger is that now we have in OpenTelemedia a dedicated attributes registry that serves the case of just defining attributes with their types, with their meaning and in the different semantic conventions and their use cases we are just referencing those attributes. So we have clear separation between defining attributes and using them in a concrete context. And finally another challenge is metrics. Metrics formats in OpenTelemedia follow the TSTB model. So we have a concrete metric name like system disk IO in this case with a type, with a unit and we have a set of dimensions modeled as attributes. In this case direction for example for disk IO read or write. In ECS previously the metrics were basically modeled as numerical fields on documents and you can have multiple numerical fields in the documents so you can have multiple metrics. That's the reason why often some of these dimensions that we have in OpenTelemedia are just encoded into the metric name on ECS side. So we have things like disk read bytes or disk write bytes. This is quite a big difference in modeling. This is a case where we are learning basically from OpenTelemedia and adopting this at Elastic now also with Elastic Search supporting TSTB. So we see we are learning from both sides which is a great thing and we are coming to the best solution possible for the community. And Chris will tell you how this actual merger is happening in practice. Thank you. Can you hear me? Okay. So as Alex mentioned there are a lot of things going on so the question is when is time to celebrate the merger that everything has been completed and the truth is that we are not there yet. There are things that needs to be done and actually everyone believed in the beginning that once the merger was announced that that's all. I mean we have not anything to add there but yeah the truth is that the actual work started right after the merger was announced. So yeah let's see some examples of how the merger is happening and how things are moving forward. So I have some real examples here from the upstream repository on GitHub with issues and pull requests. So this one for example is trying to add some new resource attributes for the container images and specifically the digest of the image. So as we can see that PR was filed on the 4th of July I think yes and it took it some time to get seen right. So it took us like many review cycles more than 20 blocker comments actually there so lots of back and forth lots of discussions but that one was actually merged after almost two months. And another example is about a very important attribute the IP of the host hosted IP as we call it and this one was really unique really interesting actually because this PR was filed by a non ECS contributor. So actually that contributor used to work for a company that it's I would say completely unrelated to the ECS project but it was quite nice because in that case the existence of the ECS project was taken into account and there were very interesting conversations and it took us like almost three months to have it in. So yeah it's quite obvious with these examples that the merger was not something trivial not something straightforward that can happen from one day to the other by for example writing a script that will transfer everything from one project to the other or something like that. So we have decided to take an approach to move let's say not so fast and pay attention to the detail and have the proper people work on specific areas so as to leverage their expertise and be sure that what we are merging to the up seem to the final project which is actually the sematic convention of open telemetry will stay there and everyone will be happy with that in the future. So that's more or less the areas of the sematic conventions. We have areas in area about databases cloud containers Kubernetes HTTP system metric system resource attributes and many others. And yeah so we have started focusing on specific areas some examples is the effort that we are doing on the system metrics area we have a working group working there focusing on the stability of the area. We are in a really good position now we are moving towards the ability really soon and the same for the process namespace the process area the process resource attributes and the same for container area we are close to achieving the 100 percent converges there the recent going PR that will add the final attributes final metrics excuse me same for HTTP and network areas we have good coverage HTTP sematic conventions were declared as stable really recently so we are adding on top now which is quite nice and yeah we have work in progress in databases mobile areas cloud Kubernetes so we have working groups getting started and focusing on these areas and yeah over the past months we are focusing on making the project as good as possible it's a community driven way so we as ECS contributes to the contributors donating this project we are not only focusing on the merger itself but we want also to ensure that the sematic conventions project will be there and will can serve us in the future so we are also focusing on other things as well like improving the tooling of the project working on the guidelines this is quite important because there are many times that the guidelines of the one project are in conflict with the guidelines of the other projects so in that case we need to take a step back and reconsider the guidelines and see what we want to have there as a final result and yeah also we work on restructuring the project before it was the sematic conventions within the project were grouped by signal logs metrics traces and so on but now we have a better organized organization there and we group the attributes by topic and yeah as Alex mentioned already we have introduced the global attributes registry it's actually a very big list with all the attributes there and then within the actual specification you can reference the attributes from there so yeah that's quite useful and we're also working on adding a new concept from ECS which actually the attribute nesting or reusing some namespaces that means that if you have a namespace for example always dot whatever you can nest it attach it as it is under the host namespace for example and you don't need to redefine it again so yeah these are some examples from the upstream most of them are closed some of them are really let's say close to be completed but we have some small blockers there but the work is moving forward that's a that's the point and yeah how the community is organized around these so as I mentioned before we want to have proper people working on specific areas leveraging their expertise so we have working groups working on each area and we're trying to first declare their the areas of the semantic attribute the sematic conventions as stable which means that all the semantic conventions that we will have there will be stable and then we can use them in the actual implementations so the next step is to tune the implementations accordingly which means essentially the open telemetry collector and the language SDKs and yeah some examples the system metrics working group the working group around databases we have a security semantic conventions working group which is getting started now we have also approvers areas for the mobile area containers Kubernetes and many others that I don't mention here and the process looks like this first once you want to create a working group or a specific project you propose the working group area and you mentioned there what issues you want to work on and then you will have people expressing their interest to join this effort you will need to find a sponsor from the technical committee and yeah once everything is decided we have a specific project board we have regular meetings we have people getting assigned to the issues there and yeah the work is happening like this and yeah regarding the merger itself in yeah technically it happens like this we follow this process so once we have to either introduce some new fields some new semantic conventions or we want to move something from ECS to the semantic conventions of open telemetry we first check obviously what we have in these two projects and we also check what implementations have so far essentially the open telemetry collector or the SDKs because there are cases that the for example the collector already uses some some let's say metrics there or some semantic events some resource attributes for example but those are not yet part of the semantic conventions of open telemetry so in that case we also check what there is there so we might find something interesting so we can use it and once we have everything considered we have a final proposal we raise an issue or a pull request directly and we start the discussion within the community we yeah particularly focusing on measuring the breaking changes because you can imagine that we want to avoid bringing frustration to our users on both sides so yeah that's really unique really important thing to consider and we go through the review process and then once we have a conclusion we merge and then of course we need to handle the breaking changes because they are there most of the times and yeah the summary for today is that the merger is happening feel free to join us contributors are more than welcome everything happens in the app stream so if you are interested please join and you will see that you will find that you will have real impact from day one there and the goal of everyone is to make the semantic convention of open telemetry the one unique straight one unique and straightforward standard for observability and security that will be there for the future so yeah with that you can find us on csf slack channels or by using our github handles and some project meetings on Mondays we have the semantic events working group meeting same our next day Tuesdays we have the specification sig meeting and on Thursdays we have the system metrics working group 530 30 central time and yeah without any questions I think we're out of time do we have any questions hi thank you for the talk this this was really interesting and clarified some things for me I have one question about what's how what are the benefits of these semantic conventions in terms of like front-end tooling that that we are using because I know that you know there's this idea in open telemetry project that you have semantic conventions and you have common attributes for different signals and then we collect all this data in all these different signals in some observability tools and I imagine in like front-end we could automatically correlate different signals if we have this like common attributes I'm not up to date with the current state of this this area so yeah this is my question what are the main benefits of following this semantic conventions yeah I would say there are two actually one is I mean open telemetry is an open source standard right and there are many vendors adopting this so we need common semantics of what the data represents to build features higher level features on top this is the first thing and the other one is correlation as you already mentioned cross like different signals to also have correlation cross or through the resource attributes for example so you can drill down basically on different signals into the same resource and yeah I would say these two things and also cross signal correlation not only through resources but things like trace ID to have them you know both on locks and traces and later maybe in profiling data this kind of things okay thank you so are you doing something like that in elastic like in front-end at the moment is there any work going on in this area like correlation of different signals yeah of course like I think that's that's the goal for for every observability vendor to bring all all these different signals together yeah okay great thank you very much any other questions going once okay cool then bingo plus okay