Hi everyone, my name is Robert Swenaga and I work at Red Hat. Today I'll be talking a
little bit about JDK Flight Recorder in Gravium Native Image. And from now on we'll just refer
to JDK Flight Recorder as JFR. So as a high level breakdown, I broke in this presentation
to two sections. The first section is a high level overview of JFR Native Image and then
we'll go into a low level deep dive of JFR Native Image and talk about some comparisons between
substrate VM and hotspot. And I want to make note that even if you're not interested in Gravium
Native Image at all, you may still be interested in the second half of this presentation because the
details of JFR are going to be talking about there extend beyond just native image and also apply
to hotspot more generally as well. Okay, so as a very quick refresher, JFR is an event-based
monitoring and profiling tool. It's built directly into the JDK and it can give you some really
valuable insights into what your application is doing both at a high level and also at the VM level.
Okay, so Phoebus already talked about this a little bit, but Gravium Native Image is essentially a
technology that allows you to convert your Java applications into binary executables. The appeal
of this is you get much faster startup and use less resources and a big reason for that is you
don't have to warm up your traditional JVM alongside your application code. And how it works is you
compile your Java application to bytecode like you normally would and then you run the native
image tool to convert that bytecode into your executable which you can later run. So why is
JFR different to native image than in OpenJDK? The reasoning behind this is that a native image
executable doesn't require a traditional JVM to run, however it still requires certain runtime
components that your Java code expects such as GC and synchronization constructs like monitors,
for example, and what's providing that in native images is something called substrate VM, which
you can think of as sort of like a scoped down replacement for a hotspot. So it does a lot of
the things that your Java code requires, but strips out a lot of the dynamic stuff that hotspot
does that we don't really need in this environment. And the key here is that since a lot of the
JFR code is embedded within hotspots, when we transfer it over to native image, we're using
substrate VM so it has to be re-implemented in that VM instead. So that involves everything from
the low-level JFR event instrumentation to the actual infrastructure that varies that JFR data
from the point of instrumentation to the point where it's later consumed by a user. Yeah, so
in terms of the current state of JFR support in native image, you can do things such as starting
and stopping recording from the command line or from within your application code via the
recording API. Several events are implemented, especially at the VM level. We have events for
threads, monitors, allocations, you see, save points, etc. You can dump, snap, shot, to disk and
inspect them with tools such as visual VM or JDK mission control as you normally would. The custom
event API is also working, so you can create your own custom application level events. Stack traces
and CPU profiling are also possible. Event streaming has recently been added as well. You can also
even connect via remote GMX to flight recorder MaxBean, which practically means you can do things
like from within the JMCUI, interact with JFR recordings that way, start them and manage them
on the fly. How you might first interact with JFR in native image is at build time, you specify
that you want the enable monitoring flag, specify you want JFR specifically, and that builds the JFR
components into your executable. So then at runtime you can use the normal start recording, start
flight recording option and pass all of the normal parameters that you would require, such as
specifying a file name to dump the recording to or a duration, etc. There are still quite a few
limitations to JFR native image. So not all events are implemented yet. It's an ongoing effort to
keep up with open JDK in that area. Specifically, events related to bytecode instrumentation are not
yet supported and of course some new JDK events we're trying to keep pace with that as well.
Event streaming doesn't yet support stack traces, so that's one limitation of that. And we have a
couple things that are in the review pipeline as well and are not yet supported in any release.
That said, we've reached the deep dive, which is going to take up the majority of the presentation.
And yeah, let's take a deep breath. So this road map essentially represents a very high level zoomed
out view of the flow of JFR data through the system. And from now on each slide is going to
contain this road map and the highlighted part will indicate the part that we're currently talking
about just for convenience and easy reference. So firstly, the point of instrumentation. These
are various points where JFR events are made at either an application level code or a VM level.
And the screenshot on the slide is just from JDK Mission Control. I'm just using it to show some
content that an event may contain. You can see there's a bunch of fields and corresponding values.
And this is just one example. It'll vary by event. And you can think of JFR events as the primary
thing that we're concerned with really. And the rest of the slides going forth are basically just
piping to get that JFR data from the point of instrumentation to the chunk file where it can
be consumed later. So yeah, speaking of chunk files, we're jumping all the way to the end of the
road map. So chunk files are essentially the resting place of the JFR data as far as we're
concerned for this presentation. And they must contain basically the same information, the same
format regardless of whether OpenJDK or native images generating them. And they can be dumped
to snapshots, the JFR snapshot which is the .JFR file format. And that's usually how people are
going to interact with them via JMC or Visual VM or the JFR command line tool. Yeah, so chunk
files are self-contained and they have four distinct sections. You can see in the diagram here
header which contains pointers and other metadata. There is the event data section which contains the
core JFR event data. Then there's the metadata section which describes the format and layout of
the events in the event data section. And then we have the constant pools which contain constants
which are referenced from the event data section. So the constants, in order to reduce the size of
JFR data, we use a referencing ID scheme to increase compactness. And how this works is entries in
the event data section of the chunk file will use unique IDs to reference into the constant pool
section of the chunk file. And this helps with deduplicating the actual constants that are used
by the JFR events. So in this slide you can see there's an example of one event entry which
uses the unique ID 12 which is then going to be used to index the thread constant pool and reference
the actual thread data residing there. So all this increases the compactness of the JFR data and
what that does is it reduces overhead when dealing with it while it's in flight and when writing it
to disk. It reduces the overall chunk file size as well. However the downside of this increased
compactness in this referencing ID scheme is that we have a tight coupling of the event data and the
constant pool data so that if they're ever separated and not found in the same self-contained
chunk file then we can't decode the event data section and it's basically unreadable. So that's
when down side. Right, so now that we've talked about the very beginning and the end of the road map
we'll jump and fill in the middle. So now, so after event emission the JFR data splits. So the event
data, the core event data goes to the JFR thread local buffers while the constant data goes to the
constant pools. And in both hotspot and substrate VM the JFR thread local buffers essentially have
the same purpose and same structure. So their structure in a segment way that allows for concurrent
rating and reading of data and there are various pointers which define the sections. So there's
the rate position pointer which basically determines where new data is written into the
buffer. So when the event rate is in progress that's the pointer that's going to be in use. Then
there's the committed position pointer which represents the end of the committed data section.
And the committed data section is data that has been fully written so it's not an in-progress
rate but it hasn't migrated anywhere else yet. The flush data section is essentially committed data
that has been migrated somewhere else so it can be overridden at the earliest convenience.
Eventually the buffers will fill up with committed data and will have to be flushed elsewhere and
at that point all the pointers reset back to the start position. Hotspot is a little bit different
in that it uses buffer pools to recycle buffers. So there's a live list and a free list and when a
new thread requires a T.O.B. from JFR one will be taken off of the free list and put on the live
list and vice versa when that thread goes away. But in such a threat we have it a little bit
simpler. We just allocate a native memory, a thread local buffer when it's required and when the
thread goes away we destroy that memory. So we don't really have to manage access to these buffer
pools and maintain them. Right, in the case of virtual threads, multiple virtual threads may share
the same thread local buffer of the carrier thread and that's not really an issue because each one
has exclusive access at any point in time and the JFR data is eventually going to the same place
anyways. Right, so after the thread local buffers fill up they are migrated, the data is migrated
to a set of global buffers and the global buffers essentially act as a capacity for overflow storage
and it's more efficient than increasing the size of all the thread local buffers because not all
threads will be equally as busy with respect to JFR events. Right, so constant pools. Previously we
mentioned how constant pools use a referencing ID scheme to reduce the size of JFR data and they do
this essentially works by deduplicating constants. In a hotspot the deduplication works, one way the
deduplication works is by using JFR specific bits and the metaspace data for certain constant types
such as class with a K and also methods. So these JFR specific bits act essentially as Boolean toggles
so when an event data reference from in a JFR local buffer somewhere references a constant
that bit in that constant is flipped to indicate that it's referenced somewhere that way when it's
time to actually persist the constants to disk we only have to persist the ones that are actually
referenced not all of them. Additionally if multiple events reference the same constant
that bit is only flipped once and that's only used to be written once so that's where the deduplication
happens. There are some constant types such as stack traces that don't have metaspace data
and those cases a lookup table is instead used for the deduplication and tracking and an interesting
thing is in substrate VM native image there is no metaspace at all so we have to rely on the
lookup table approach for all the various constant types. Right, so after enough JFR data has been
generated a chunk rotation must be requested and what this is is essentially the way that JFR
data is continually persisted to disk. The current chunk file and disk that's open is sealed and
then a new chunk file is opened and in that process all the in-flight and memory data is flushed
to that chunk file before it's sealed and the thread that's performing this the chunk rotation
must flush the thread local buffers of other threads and to do that safely we have to request
a save point. So the order of operations at a chunk rotation save point is as follows on the slide
I want to make note that it's pretty similar in open JDK as it is in substrate VM and the space
between chunk rotation save points the recording time between is called an epic and you can see
in the green save point box that that's where we're actually flushing the JFR buffers both local
and global to disk but the most interesting thing here is that we're writing the constant pool to
disk outside of the save points when we're already starting epic 2 so what that means is we'll
we're simultaneously writing the constants from epic 1 to disk while recording constants for
relative to epic 2 so they're kind of mingling inside the constant pools so we need to keep them
isolated however because we want to avoid writing constants perspective to epic 2 to disk into chunk
file for epic 1 otherwise we'll have that mismatch and we won't be able to decode the data for
constant for epic 2 the same issue that I explained a few slides back so how we do this is we tag each
constant according to the respective epic to keep them isolated and essentially overall the
more of the story is it allows us to reduce save point pause time by writing these constant pools
outside of the save point and another way we actually reduce save point pause time is by
having a dedicated JFR thread flush the global buffers to disk periodically throughout the
epic time so it's not actually happening in the save points so there's less work to actually be done
when we are stopping the worlds to flush the buffers to disk right um
um
one related note on save pointing is the question of can a chunk rotation save point interrupts
concurrent event emission that may be happening in other threads so we have a scenario here where
the save point actually and save points and epic transition actually interrupts the event
emission and separates the constant data and the event data into different epics and different
chunk files and then it will be unreadable then so that's a scenario that is in question right now
um and in j in open JDK in hotspot the JFR code is written in C++ it's native code so it can
actually be interrupted for a save point so it's not really an issue at all however in
substrate VM it's Java on Java and the VM code is written in Java so the JFR stuff is Java code
and potentially could save point at a very inopportune moment so how do we prevent that
stuff from happening in substrate VM um how it's done is we have this annotation called an
interruptible and what that does is that build time prevents the insertion of save point checks
so that the code that's in the annotated with an interruptible annotation doesn't actually
save point at all so you find that a lot of the JFR code is sprinkled with this annotation all over
the place in the VM especially dealing with buffers and constant pools and event writes
but this has pretty big consequences for the implementation itself because
un-interruptible code that can't save point can only call other un-interruptible code that can't
save point which means a lot of the JDK code that's written in Java is off limits so we can't use
things like the normal hash tables, re-entrant locks, etc. we have to kind of like roll our own
versions of that which are un-interruptible another thing is we can't even use manage memory on the
Java heap because that can induce a garbage collection which requires save point and that's
not un-interruptible so we have to use unmanaged native memory in order to craft
room data structures to deal with a lot of these things so it's a little bit of work
dealing with that and the last thing I want to talk about and the last difference I want to
mention between JFR and substrate VM and hotspot is related to how JFR interfaces from the the
Java level JFR code to the VM level JFR code and in open JDK it happens in the JVM class here
you can see on the left side of sorry the right side of the slide and these are basically the
points where the Java level JFR code and the JDK calls down to hotspot at the VM level using JNI
so we reuse that code in native image we reuse that Java level JFR code from the JDK
but there's no underlying hotspot implementation to call into so how do we resolve that mismatch
what we use is we use substitutions which Feeb has talked about a little bit but I'll mention again
but essentially what it does is allows us at build time to specify redirects from these
Java methods to our own implementation the JFR VM level code so on the right side you can see
mark chunk final is highlighted and that corresponds to the Java level code on the left
sorry on the right I keep getting mixed up on the right side of the of the slide so we can see
that we're actually grabbing that and then redirecting it to our own substrate VM base
implementation of that code so that's how we kind of resolve that mismatch
um yeah with that said um that basically concludes my presentation if you're interested
there are further links for for more reading there's some documentation and some blog posts as well
and you can always approach me as outside as well if you have more questions um yeah how good
you are for time Chris okay if there's any questions I'm happy to answer them now
you just did such a good job explaining it
thanks yeah
on on substrate VM is there did you measure impagant time to save point because if is it
uninterruptible you know this uninterruptible trade oh time to save points yeah yeah I could imagine
yeah um I'm not really sure of the exact figures I can't really give you a number but um I I know
what you're saying it it would potentially an issue I haven't not really aware of it um but yeah
that that's definitely a concern um but it's not just the jfr code that's marked as interruptible
a lot of the gc code as well a lot of the low-level operations they they must also be
uninterruptible so it's not just jfr yeah understood thanks yeah actually to tag on to that a lot of
jfr code is really just instrumenting other low-level code which is already an
uninterruptible so it's like collateral damage it's not really an issue to add a little bit more on
to code that's already an intructible such as uh jfr gc event handling and uh slow path allocation
stuff that's already you can't save point there anyways thank you
okay
okay uh thank you for listening