[00:00.000 --> 00:12.400]  Yes. Hi. I'm Johannes Pechberger. As I was already introduced, I work at the sub machine.
[00:12.400 --> 00:19.240]  It's another great distribution of the OpenTraderK. So I worked since the beginning of last year
[00:19.240 --> 00:25.000]  on my new project on Async Get Stack Trace. It's essentially an improved version of the
[00:25.000 --> 00:32.160]  Async Get Call Trace API. And I think many of you probably don't know this API. I didn't know it
[00:32.160 --> 00:39.040]  before I started this project. But essentially, it's related to profiling. So how does profiling
[00:39.040 --> 00:45.920]  work? Some of you might have already seen Flamecraft. If not, there are some other talks on
[00:45.920 --> 00:53.040]  profiling in the Mozilla left room that you can look it up. But essentially, what profiling is,
[00:53.040 --> 00:59.320]  you want to see which parts of your applications are so, for example, here, wanted to see, I can
[00:59.320 --> 01:05.000]  see that some JDK stuff is probably a thing that takes time. But essentially, how it works under
[01:05.000 --> 01:09.720]  the hood is that we have a selection of threads, like for example, here, five threads. Then we
[01:09.720 --> 01:17.760]  randomly select three threads because we cannot usually sample all threads because it would be
[01:17.760 --> 01:23.600]  too costly. Then we pre-allocate some traces. There's just a data structure where we store the
[01:23.600 --> 01:29.840]  stack frame information in. And then we ping the first thread. And with ping, I mean, we send it
[01:29.840 --> 01:35.360]  the signal. And then the signal handle. We walk the stack because in the single handler, the thread
[01:35.360 --> 01:41.960]  is stopped. So we can walk the stack. We do this with the thread two, with the thread five. And
[01:41.960 --> 01:49.600]  we have the traces. And then we store it. And then we do some post-processing. That's
[01:49.600 --> 01:56.080]  essentially how I think profile works, but just in a loop. So in a loop, we already do this. And
[01:56.080 --> 02:04.760]  so we need an API because we need an API. It's called, I think it called trace because we could
[02:04.760 --> 02:12.360]  use JVMTR libraries. They are safe from bias. So they let the threads wait till they're ready,
[02:12.360 --> 02:18.800]  till they're at a safe point. But we want to have the call trace at a certain point where we
[02:18.800 --> 02:23.800]  want it. And so I think that call trace is quite a cool API. So how it works, here we have the stack,
[02:23.800 --> 02:31.400]  how it's on your system. We have at the bottom the pthread start. It's on the Unix system. And on
[02:31.400 --> 02:39.200]  top, we have like some Java frames. And then it goes up till the top to write, write bytes method
[02:39.200 --> 02:47.600]  because it writes to a buffered output stream. It's essentially, hello world, just print some
[02:47.600 --> 02:53.720]  strings. And in the single handler, we get the top frame. That's where the U context from the
[02:53.720 --> 02:59.040]  single handler points to. And then we do some stack walking. And as in get call trace does it
[02:59.040 --> 03:07.920]  for us. And essentially it returns us in a preallocated data structure, the frames. And the
[03:07.920 --> 03:12.520]  number of frames that we got. And it also stores a number of frames in error code if there was an
[03:12.520 --> 03:18.480]  error. And so what we get for every frame is the line number. So it's called line number,
[03:18.480 --> 03:25.680]  but it's essentially the byte code index. I don't know. It's historically this way because this API
[03:25.680 --> 03:33.400]  is like from 2003 around. And we get a method ID. But we only get this information on Java frames.
[03:33.400 --> 03:41.680]  So what are these problems? So don't get missed out. They worked on it for long enough time. So
[03:41.680 --> 03:49.760]  it's unofficial. So it's there in 2003, like for three months. And then Oracle put it out,
[03:49.760 --> 03:56.800]  sun at the time put it away. It's now just lying around as an exported symbol but doesn't have
[03:56.800 --> 04:05.720]  its own header. It's unsupported. So if there's a change in another part of the JVM that potentially
[04:05.720 --> 04:13.920]  breaks it, nobody notices it because there's only one single test that doesn't test that much. So
[04:13.920 --> 04:21.480]  there's also missing information. So it only gives us information on the stack frames of the
[04:21.480 --> 04:27.600]  Java stack, of the Java frames, but not on anything else. And it misses information like inlining,
[04:27.600 --> 04:34.960]  which isn't that great. And so in the beginning of last year, I started to work on a new API
[04:34.960 --> 04:43.400]  because this, I think, is the best we have. And maybe we could do something better. And
[04:43.400 --> 04:52.960]  so I worked, I started to work on Async et cetera. It's now a CHEP candidate. It's 435. So if you
[04:52.960 --> 04:59.560]  want to see the CHEP in its entirety, just go on the OpenTedicay website or read the blog post
[04:59.560 --> 05:07.520]  for this talk and you get a picture of what it does. And so the idea was to create a better API
[05:07.520 --> 05:13.760]  that gives us more information and is far more supported, so with lots of tests with its own
[05:13.760 --> 05:23.680]  header. And so again, we have the stack, our stack, but we then get more information. For
[05:23.680 --> 05:30.320]  example, we get at its most basic level, we also get the kind of the thread that we're running on.
[05:30.320 --> 05:39.160]  So is this thread like in Java mode or is this in GC mode or what is this thread, which is quite
[05:39.160 --> 05:44.840]  neat. And we got more information. For example, we get the BCI. It's not called BCI because, yeah,
[05:44.840 --> 05:51.160]  it's the byte code index. We get the method ID. We get also the type. Is it inlined? Is it native?
[05:51.160 --> 06:02.160]  With native, I mean not CC++, but these boundary methods that are defined in Java, but which code
[06:02.160 --> 06:08.880]  is implemented in CC++. And we also get a compilation level. So is it C1, C2, compiled,
[06:08.880 --> 06:15.880]  or don't compile at all? So this is quite neat because we get more information. But the cool
[06:15.880 --> 06:23.560]  thing is we have options now. With this API, we can set in an integer. Hey, we want to have
[06:23.560 --> 06:32.200]  non-Java frames and we also want to walk non-Java threads, which leads us to this situation where
[06:32.200 --> 06:40.200]  we get information also on the thread on these CC++ frames, which is quite nice. Because for
[06:40.200 --> 06:46.640]  these frames, we get also the type. So it's a CC++ and we also get a program counter. So we can
[06:46.640 --> 06:57.400]  then go back, do some of our own analysis and use DL-SIM to get methods of the DL family and get
[06:57.400 --> 07:03.760]  the method name. And we can also walk with these options non-Java threads. So we see more
[07:03.760 --> 07:09.960]  information. It essentially makes the life of a profile developer far easier because we can now
[07:09.960 --> 07:16.560]  just use this API. It will be supported if it gets in. It will be supported. I'm working on lots
[07:16.560 --> 07:27.440]  and lots of tests. And yeah, I hope it gets in. And as a bonus, what I also introduced is new
[07:27.440 --> 07:32.600]  methods for OpenShiftedHead developers to walk stacks because currently the code is like spread
[07:32.600 --> 07:37.640]  between a few different places. Some of them are copies of others. So it's quite hard when you
[07:37.640 --> 07:45.800]  change some port. You have to change other parts too. So it's essentially technical depth. There
[07:45.800 --> 07:55.760]  were good reasons in the years before, but still I want to make stack walking easier. So the new
[07:55.760 --> 08:04.640]  API that I used in the implementation of my chat proposal allows us to just give a stack walker
[08:04.640 --> 08:10.640]  some options like, hey, I want to walk stacks. I want to skip. I want to walk also non-Java
[08:10.640 --> 08:16.120]  frames. And I can just go over it and say, oh, give me the next frame. And on this next frame,
[08:16.120 --> 08:21.520]  we can ask all the information. Is this a Java frame? Is this a native frame? Which is this
[08:21.520 --> 08:29.640]  compilation level? And this makes it far easier to walk stacks and hopefully makes it easier to
[08:29.640 --> 08:36.760]  combine all the stack walking from some ever-related stack traces from AsyncGetCallTrace,
[08:36.760 --> 08:46.440]  from JVR using one API. And so when you make an improvement in one of these APIs and implementations,
[08:46.440 --> 08:54.440]  you get an improvement on all. So what I've done is that I improved AsyncGetCallTrace with the
[08:54.440 --> 09:04.080]  help of my colleagues to be much safer. So I wrote testing code that used SafeFed so that it
[09:04.080 --> 09:12.200]  checks the pointer. So it kind of checks the pointer before it exists. So it's far safer than I did
[09:12.200 --> 09:18.640]  here for AsyncGetStackTrace. Lots of testing, for example. I did some fuzzing. So I called AsyncGetStackTrace
[09:18.640 --> 09:26.040]  with random u-context, so with randomized frame pointers and stack pointers. And it doesn't crash
[09:26.040 --> 09:34.480]  like for hours on a large machine, which is quite cool. And so this covers AsyncGetAsync
[09:34.480 --> 09:44.200]  profile when it modifies the frame and stack pointer to alleviate some concerns when the VM is
[09:44.200 --> 09:50.640]  like an undefined state. It needs a lot of convincing, so I'm still in the process where I
[09:50.640 --> 09:56.000]  have to talk with all the people from Oracle, all the JVR people. It's a long drawn-out process,
[09:56.000 --> 10:03.120]  but I hope I can convince them. But clearly, because clearly the people on the profile
[10:03.120 --> 10:10.280]  side are really happy to have this because it has many advantages for them. And of course,
[10:10.280 --> 10:15.800]  again, testing because the whole point of this API is that you get more information,
[10:15.800 --> 10:22.200]  but also that it's a better tested API. Currently, I have six tests, and I'm working on more.
[10:22.200 --> 10:37.080]  So I hope that it gets in. Till then, you can see on GitHub, there's a draft PR on the
[10:37.080 --> 10:44.840]  step. Just search in the PRs for draft PR with ISKST in the name. And then you can, yeah,
[10:44.840 --> 10:52.200]  you can follow me on Twitter on our team at SpeedSubmachine. And that's all. Oh, yes, yes,
[10:52.200 --> 10:59.000]  yeah. And I'm also blogging like on mostly nerdlers, and all the blog posts I like also put
[10:59.000 --> 11:08.120]  on Fujay. But yeah, you can follow me there and read on all the topics that they talk today. So, thanks.
[11:08.120 --> 11:32.040]  The question was, can safehatch be called from signal hunters because it uses signals? I think
[11:32.040 --> 11:36.840]  it uses different signals because I didn't have any problems using it from signal hunters. So I
[11:36.840 --> 11:44.840]  have tests. To use us and get stack drives, you have to use signal hunters. So I didn't see any
[11:44.840 --> 11:49.240]  problems so far. I think that's probably, it's even weird because from signal hunters, you can,
[11:49.240 --> 11:54.840]  you cannot do any malloc. So you have to preallocate, but you can call fork. So it's quite,
[11:54.840 --> 12:01.960]  quite interesting. So any other questions? Does it handle in both dynamics, especially,
[12:01.960 --> 12:08.760]  because within that stack, you get like the whole stack of deciding how to dispatch the call?
[12:11.880 --> 12:18.200]  So the question was, does it handle in work dynamics specifically? Now, it just uses,
[12:18.200 --> 12:24.600]  it just is based on the frame stack walking and like the internal mechanism of stack walking. So
[12:24.600 --> 12:32.120]  it doesn't handle it differently than, for example, I think get call trace and trade for. Yeah,
[12:32.120 --> 12:35.480]  that's all Java frames. So that's, that's probably fine.
[12:40.120 --> 12:45.080]  Do you have to change the native parts? Or does it go on all platforms?
[12:46.120 --> 12:51.480]  So the question was, does it work on all platforms? It's known that it doesn't really work on windows
[12:51.480 --> 12:56.520]  just because windows hasn't really a concept of signals. If you have any ideas on getting
[12:56.520 --> 13:03.400]  something like this to work on windows, feel free to drop me a message. So no,
[13:03.400 --> 13:09.000]  I didn't have to change any native parts. I had to change some, I had to create some native parts
[13:09.000 --> 13:15.160]  for testing to modify like the U context because this is highly applications, highly operating
[13:15.160 --> 13:23.320]  systems specific. So the changes to the whole OpenJDK are fairly minimal. So they aren't that
[13:23.320 --> 13:29.880]  large besides passing through some bullets to configure stuff. And the code itself is just
[13:29.880 --> 13:35.880]  a few couple hundred lines. So it's quite simple also to understand. And there's a blog post that
[13:35.880 --> 13:46.200]  describes like reasoning behind it. Any other questions? Yes? Is it already a sub machine?
[13:46.200 --> 13:53.400]  No, it's not yet on the sub machine because I'm still in the process of testing it. So there's
[13:53.400 --> 14:01.080]  of course a podcast. You can already use the JVM when you compile it yourself. I'm in the process
[14:01.080 --> 14:08.120]  of updating my demo repository which contains a modified sync profile that uses it. So you can
[14:08.120 --> 14:15.960]  try it out yourself. I should be right in the next few weeks. It still has some bugs. Yeah.
[14:15.960 --> 14:28.680]  Anything else? Thank you very much.