Hi everyone, my name is Robert Swenaga and I work at Red Hat. Today I'll be talking a little bit about JDK Flight Recorder in Gravium Native Image. And from now on we'll just refer to JDK Flight Recorder as JFR. So as a high level breakdown, I broke in this presentation to two sections. The first section is a high level overview of JFR Native Image and then we'll go into a low level deep dive of JFR Native Image and talk about some comparisons between substrate VM and hotspot. And I want to make note that even if you're not interested in Gravium Native Image at all, you may still be interested in the second half of this presentation because the details of JFR are going to be talking about there extend beyond just native image and also apply to hotspot more generally as well. Okay, so as a very quick refresher, JFR is an event-based monitoring and profiling tool. It's built directly into the JDK and it can give you some really valuable insights into what your application is doing both at a high level and also at the VM level. Okay, so Phoebus already talked about this a little bit, but Gravium Native Image is essentially a technology that allows you to convert your Java applications into binary executables. The appeal of this is you get much faster startup and use less resources and a big reason for that is you don't have to warm up your traditional JVM alongside your application code. And how it works is you compile your Java application to bytecode like you normally would and then you run the native image tool to convert that bytecode into your executable which you can later run. So why is JFR different to native image than in OpenJDK? The reasoning behind this is that a native image executable doesn't require a traditional JVM to run, however it still requires certain runtime components that your Java code expects such as GC and synchronization constructs like monitors, for example, and what's providing that in native images is something called substrate VM, which you can think of as sort of like a scoped down replacement for a hotspot. So it does a lot of the things that your Java code requires, but strips out a lot of the dynamic stuff that hotspot does that we don't really need in this environment. And the key here is that since a lot of the JFR code is embedded within hotspots, when we transfer it over to native image, we're using substrate VM so it has to be re-implemented in that VM instead. So that involves everything from the low-level JFR event instrumentation to the actual infrastructure that varies that JFR data from the point of instrumentation to the point where it's later consumed by a user. Yeah, so in terms of the current state of JFR support in native image, you can do things such as starting and stopping recording from the command line or from within your application code via the recording API. Several events are implemented, especially at the VM level. We have events for threads, monitors, allocations, you see, save points, etc. You can dump, snap, shot, to disk and inspect them with tools such as visual VM or JDK mission control as you normally would. The custom event API is also working, so you can create your own custom application level events. Stack traces and CPU profiling are also possible. Event streaming has recently been added as well. You can also even connect via remote GMX to flight recorder MaxBean, which practically means you can do things like from within the JMCUI, interact with JFR recordings that way, start them and manage them on the fly. How you might first interact with JFR in native image is at build time, you specify that you want the enable monitoring flag, specify you want JFR specifically, and that builds the JFR components into your executable. So then at runtime you can use the normal start recording, start flight recording option and pass all of the normal parameters that you would require, such as specifying a file name to dump the recording to or a duration, etc. There are still quite a few limitations to JFR native image. So not all events are implemented yet. It's an ongoing effort to keep up with open JDK in that area. Specifically, events related to bytecode instrumentation are not yet supported and of course some new JDK events we're trying to keep pace with that as well. Event streaming doesn't yet support stack traces, so that's one limitation of that. And we have a couple things that are in the review pipeline as well and are not yet supported in any release. That said, we've reached the deep dive, which is going to take up the majority of the presentation. And yeah, let's take a deep breath. So this road map essentially represents a very high level zoomed out view of the flow of JFR data through the system. And from now on each slide is going to contain this road map and the highlighted part will indicate the part that we're currently talking about just for convenience and easy reference. So firstly, the point of instrumentation. These are various points where JFR events are made at either an application level code or a VM level. And the screenshot on the slide is just from JDK Mission Control. I'm just using it to show some content that an event may contain. You can see there's a bunch of fields and corresponding values. And this is just one example. It'll vary by event. And you can think of JFR events as the primary thing that we're concerned with really. And the rest of the slides going forth are basically just piping to get that JFR data from the point of instrumentation to the chunk file where it can be consumed later. So yeah, speaking of chunk files, we're jumping all the way to the end of the road map. So chunk files are essentially the resting place of the JFR data as far as we're concerned for this presentation. And they must contain basically the same information, the same format regardless of whether OpenJDK or native images generating them. And they can be dumped to snapshots, the JFR snapshot which is the .JFR file format. And that's usually how people are going to interact with them via JMC or Visual VM or the JFR command line tool. Yeah, so chunk files are self-contained and they have four distinct sections. You can see in the diagram here header which contains pointers and other metadata. There is the event data section which contains the core JFR event data. Then there's the metadata section which describes the format and layout of the events in the event data section. And then we have the constant pools which contain constants which are referenced from the event data section. So the constants, in order to reduce the size of JFR data, we use a referencing ID scheme to increase compactness. And how this works is entries in the event data section of the chunk file will use unique IDs to reference into the constant pool section of the chunk file. And this helps with deduplicating the actual constants that are used by the JFR events. So in this slide you can see there's an example of one event entry which uses the unique ID 12 which is then going to be used to index the thread constant pool and reference the actual thread data residing there. So all this increases the compactness of the JFR data and what that does is it reduces overhead when dealing with it while it's in flight and when writing it to disk. It reduces the overall chunk file size as well. However the downside of this increased compactness in this referencing ID scheme is that we have a tight coupling of the event data and the constant pool data so that if they're ever separated and not found in the same self-contained chunk file then we can't decode the event data section and it's basically unreadable. So that's when down side. Right, so now that we've talked about the very beginning and the end of the road map we'll jump and fill in the middle. So now, so after event emission the JFR data splits. So the event data, the core event data goes to the JFR thread local buffers while the constant data goes to the constant pools. And in both hotspot and substrate VM the JFR thread local buffers essentially have the same purpose and same structure. So their structure in a segment way that allows for concurrent rating and reading of data and there are various pointers which define the sections. So there's the rate position pointer which basically determines where new data is written into the buffer. So when the event rate is in progress that's the pointer that's going to be in use. Then there's the committed position pointer which represents the end of the committed data section. And the committed data section is data that has been fully written so it's not an in-progress rate but it hasn't migrated anywhere else yet. The flush data section is essentially committed data that has been migrated somewhere else so it can be overridden at the earliest convenience. Eventually the buffers will fill up with committed data and will have to be flushed elsewhere and at that point all the pointers reset back to the start position. Hotspot is a little bit different in that it uses buffer pools to recycle buffers. So there's a live list and a free list and when a new thread requires a T.O.B. from JFR one will be taken off of the free list and put on the live list and vice versa when that thread goes away. But in such a threat we have it a little bit simpler. We just allocate a native memory, a thread local buffer when it's required and when the thread goes away we destroy that memory. So we don't really have to manage access to these buffer pools and maintain them. Right, in the case of virtual threads, multiple virtual threads may share the same thread local buffer of the carrier thread and that's not really an issue because each one has exclusive access at any point in time and the JFR data is eventually going to the same place anyways. Right, so after the thread local buffers fill up they are migrated, the data is migrated to a set of global buffers and the global buffers essentially act as a capacity for overflow storage and it's more efficient than increasing the size of all the thread local buffers because not all threads will be equally as busy with respect to JFR events. Right, so constant pools. Previously we mentioned how constant pools use a referencing ID scheme to reduce the size of JFR data and they do this essentially works by deduplicating constants. In a hotspot the deduplication works, one way the deduplication works is by using JFR specific bits and the metaspace data for certain constant types such as class with a K and also methods. So these JFR specific bits act essentially as Boolean toggles so when an event data reference from in a JFR local buffer somewhere references a constant that bit in that constant is flipped to indicate that it's referenced somewhere that way when it's time to actually persist the constants to disk we only have to persist the ones that are actually referenced not all of them. Additionally if multiple events reference the same constant that bit is only flipped once and that's only used to be written once so that's where the deduplication happens. There are some constant types such as stack traces that don't have metaspace data and those cases a lookup table is instead used for the deduplication and tracking and an interesting thing is in substrate VM native image there is no metaspace at all so we have to rely on the lookup table approach for all the various constant types. Right, so after enough JFR data has been generated a chunk rotation must be requested and what this is is essentially the way that JFR data is continually persisted to disk. The current chunk file and disk that's open is sealed and then a new chunk file is opened and in that process all the in-flight and memory data is flushed to that chunk file before it's sealed and the thread that's performing this the chunk rotation must flush the thread local buffers of other threads and to do that safely we have to request a save point. So the order of operations at a chunk rotation save point is as follows on the slide I want to make note that it's pretty similar in open JDK as it is in substrate VM and the space between chunk rotation save points the recording time between is called an epic and you can see in the green save point box that that's where we're actually flushing the JFR buffers both local and global to disk but the most interesting thing here is that we're writing the constant pool to disk outside of the save points when we're already starting epic 2 so what that means is we'll we're simultaneously writing the constants from epic 1 to disk while recording constants for relative to epic 2 so they're kind of mingling inside the constant pools so we need to keep them isolated however because we want to avoid writing constants perspective to epic 2 to disk into chunk file for epic 1 otherwise we'll have that mismatch and we won't be able to decode the data for constant for epic 2 the same issue that I explained a few slides back so how we do this is we tag each constant according to the respective epic to keep them isolated and essentially overall the more of the story is it allows us to reduce save point pause time by writing these constant pools outside of the save point and another way we actually reduce save point pause time is by having a dedicated JFR thread flush the global buffers to disk periodically throughout the epic time so it's not actually happening in the save points so there's less work to actually be done when we are stopping the worlds to flush the buffers to disk right um um one related note on save pointing is the question of can a chunk rotation save point interrupts concurrent event emission that may be happening in other threads so we have a scenario here where the save point actually and save points and epic transition actually interrupts the event emission and separates the constant data and the event data into different epics and different chunk files and then it will be unreadable then so that's a scenario that is in question right now um and in j in open JDK in hotspot the JFR code is written in C++ it's native code so it can actually be interrupted for a save point so it's not really an issue at all however in substrate VM it's Java on Java and the VM code is written in Java so the JFR stuff is Java code and potentially could save point at a very inopportune moment so how do we prevent that stuff from happening in substrate VM um how it's done is we have this annotation called an interruptible and what that does is that build time prevents the insertion of save point checks so that the code that's in the annotated with an interruptible annotation doesn't actually save point at all so you find that a lot of the JFR code is sprinkled with this annotation all over the place in the VM especially dealing with buffers and constant pools and event writes but this has pretty big consequences for the implementation itself because un-interruptible code that can't save point can only call other un-interruptible code that can't save point which means a lot of the JDK code that's written in Java is off limits so we can't use things like the normal hash tables, re-entrant locks, etc. we have to kind of like roll our own versions of that which are un-interruptible another thing is we can't even use manage memory on the Java heap because that can induce a garbage collection which requires save point and that's not un-interruptible so we have to use unmanaged native memory in order to craft room data structures to deal with a lot of these things so it's a little bit of work dealing with that and the last thing I want to talk about and the last difference I want to mention between JFR and substrate VM and hotspot is related to how JFR interfaces from the the Java level JFR code to the VM level JFR code and in open JDK it happens in the JVM class here you can see on the left side of sorry the right side of the slide and these are basically the points where the Java level JFR code and the JDK calls down to hotspot at the VM level using JNI so we reuse that code in native image we reuse that Java level JFR code from the JDK but there's no underlying hotspot implementation to call into so how do we resolve that mismatch what we use is we use substitutions which Feeb has talked about a little bit but I'll mention again but essentially what it does is allows us at build time to specify redirects from these Java methods to our own implementation the JFR VM level code so on the right side you can see mark chunk final is highlighted and that corresponds to the Java level code on the left sorry on the right I keep getting mixed up on the right side of the of the slide so we can see that we're actually grabbing that and then redirecting it to our own substrate VM base implementation of that code so that's how we kind of resolve that mismatch um yeah with that said um that basically concludes my presentation if you're interested there are further links for for more reading there's some documentation and some blog posts as well and you can always approach me as outside as well if you have more questions um yeah how good you are for time Chris okay if there's any questions I'm happy to answer them now you just did such a good job explaining it thanks yeah on on substrate VM is there did you measure impagant time to save point because if is it uninterruptible you know this uninterruptible trade oh time to save points yeah yeah I could imagine yeah um I'm not really sure of the exact figures I can't really give you a number but um I I know what you're saying it it would potentially an issue I haven't not really aware of it um but yeah that that's definitely a concern um but it's not just the jfr code that's marked as interruptible a lot of the gc code as well a lot of the low-level operations they they must also be uninterruptible so it's not just jfr yeah understood thanks yeah actually to tag on to that a lot of jfr code is really just instrumenting other low-level code which is already an uninterruptible so it's like collateral damage it's not really an issue to add a little bit more on to code that's already an intructible such as uh jfr gc event handling and uh slow path allocation stuff that's already you can't save point there anyways thank you okay okay uh thank you for listening