[00:00.000 --> 00:12.600]  Alright, we are going to start next talk. Next up, we have Kiran. Kiran will give us
[00:12.600 --> 00:24.200]  an update on flight. Hello everyone. As Christoph introduced, I'm Kiran. I work at ARM in
[00:24.200 --> 00:29.160]  the Fortran compiler team and today the task for me today is to give you all an update
[00:29.160 --> 00:36.640]  about the progress we have made with the flying development. This slide shows the contents
[00:36.640 --> 00:41.480]  of my presentation today. It's fairly simple. First, I start with an overview of the flying
[00:41.480 --> 00:47.960]  compiler. Then I give you a summary of the story, whatever has happened so far. Then
[00:47.960 --> 00:55.280]  I provide you with the status of the compiler. Finally, I identify a few of the major development
[00:55.280 --> 01:05.600]  efforts going on currently. This slide shows in brief the overview of the flying compiler.
[01:05.600 --> 01:10.600]  Flying is a new Fortran frontend developed from scratch. It has a traditional compiler
[01:10.600 --> 01:16.680]  flow. It's an LLVM based Fortran frontend. It's actually the Fortran frontend of LLVM.
[01:16.680 --> 01:22.760]  It generates LLVM IR, but it has a difference with the other frontend in the LLVM project
[01:22.800 --> 01:30.120]  Clang. While Clang lowers from the AST to LLVM IR, flying uses a high level intermediate
[01:30.120 --> 01:37.640]  representation called Fortran IR or FIR. That's what flying generates. It uses the
[01:37.640 --> 01:46.040]  MLIR infrastructure for FIR and MLIR interfaces with LLVM through the LLVM dialect. FIIR is
[01:46.040 --> 01:52.840]  converted to LLVM dialect and then the LLVM pipeline kicks in. This is basically the
[01:52.840 --> 01:57.680]  diagram that I have on the left hand side. Given a Fortran program, there is some parsing
[01:57.680 --> 02:02.360]  and semantic checks that happen. Finally, you get a flying pass tree that's fairly
[02:02.360 --> 02:09.400]  well defined. Then that code is lowered into Fortran IR and calls to the runtime. Then
[02:09.400 --> 02:19.280]  the Fortran IR is converted to the LLVM dialect. This slide summarizes the story so far with
[02:19.280 --> 02:26.680]  the flying compiler. Looking at the slide, this project started in 2018. It was during
[02:26.680 --> 02:32.200]  Euro LLVM 2018 that news about this compiler start to come out that there is a new Fortran
[02:32.200 --> 02:40.000]  frontend being written from scratch. One year later in April 2019, the project was
[02:40.000 --> 02:46.480]  accepted as the Fortran frontend of LLVM. Again, one year later in 2020, it was merged
[02:46.480 --> 02:54.400]  into the LLVM project as LLVM flying. When this happened, there was some code that was
[02:54.400 --> 03:03.400]  left behind. The project actually split into two repositories. The first one in the LLVM
[03:03.400 --> 03:09.880]  project, where the parsing and semantic checks and the code for the runtime was there. All
[03:09.880 --> 03:15.080]  the code that lowers from the parse tree to the Fortran IR got left behind because it was
[03:15.080 --> 03:22.440]  not ready at that time. It began to take a life of its own. People had to now sync these
[03:22.440 --> 03:28.120]  two repositories, sometimes commit to both two repositories, and you have all the overhead
[03:28.120 --> 03:35.480]  of maintaining a downstream project. Fortunately, sometime in April 2022, people decided to
[03:35.480 --> 03:40.920]  freeze all the downstream development and pushed all the code into upstream. Sometime
[03:40.920 --> 03:48.680]  in July 2022, this whole code is now in the LLVM project repository. Since then, the project
[03:48.680 --> 03:56.000]  has mostly followed the LLVM contributions process, and all the social guidelines are
[03:56.000 --> 04:05.080]  associated with it. When it was merged into LLVM, it was mostly a Fortran 95 compiler,
[04:05.080 --> 04:10.440]  but there were still a few missing pieces. The code was stabilized further, and all unknown
[04:10.440 --> 04:16.480]  features were marked with to-dos, so that if you try to compile an unsupported feature,
[04:16.480 --> 04:23.080]  it will give a message saying that this feature is not supported rather than giving a crash.
[04:23.080 --> 04:28.600]  At the same time, development has continued to support newer standards, newer Fortran
[04:28.600 --> 04:35.960]  standards features like features from Fortran 2003, Fortran 2008, and things like that.
[04:35.960 --> 04:40.200]  Also a lot of bug fixes went in, as well as people started to look at some performance
[04:40.200 --> 04:52.600]  work as well. This slide summarizes the current status of the compiler. The compiler is not
[04:52.600 --> 04:58.720]  yet ready for general use, but it is still fairly advanced in its support for various
[04:58.720 --> 05:04.480]  Fortran constructs. The driver is temporarily called flying new, and executables can be
[05:04.480 --> 05:09.200]  generated, but you have to use an option called flying experimental exec.
[05:09.200 --> 05:14.160]  The feature development of Fortran 95 is mostly complete, and as I mentioned before, development
[05:14.160 --> 05:19.800]  of Fortran 2003 and later features are in progress. The compiler has been tested with
[05:19.800 --> 05:27.520]  various commercial and free test feeds. It has also been verified with some HPC benchmarks
[05:27.520 --> 05:36.200]  like SNAP, Cloverleaf. We have also used the spec benchmarks to test it. We are also continuing
[05:36.200 --> 05:42.880]  to test it with other benchmarks like the OpenMP version of spec, other open source
[05:42.880 --> 05:47.360]  applications like open radios, and things like that.
[05:47.360 --> 05:55.080]  This driver's name was flying new, and the use of the experimental flag is currently
[05:55.080 --> 06:02.000]  being discussed. It is possible that those requirements will go away soon, and then people
[06:02.000 --> 06:09.720]  can just type flying to compile their application. There is a discourse thread on that currently
[06:09.720 --> 06:17.920]  under discussion. This slide summarizes the support level of
[06:17.920 --> 06:25.000]  flying for various Fortran standards. Fortran is a living language. As you can see, it has
[06:25.000 --> 06:29.640]  gone through a lot of revisions. There is another revision that is going to come in this
[06:29.640 --> 06:35.880]  year and one later in this decade. It is a living language, and it is continuing to
[06:35.880 --> 06:40.560]  make progress, adding new features and things like that, but it makes the job of the compiler
[06:40.560 --> 06:45.920]  engineers much harder because you are always trying to catch up.
[06:45.920 --> 06:51.040]  The initial standard that we track is 77, and the development work is complete. Fortran
[06:51.040 --> 06:58.400]  95, as I mentioned before, is complete except for a few bits here and there. Work on 2003
[06:58.400 --> 07:04.960]  is in progress, particularly on polymorphic types, but the parsing semantics and runtime
[07:04.960 --> 07:11.680]  mostly works. Similarly, with Fortran 2008 as well, parsing semantics and runtime works,
[07:11.680 --> 07:18.160]  but some of the features are in progress. Whereas, when you come to Fortran 2018, none
[07:18.160 --> 07:23.480]  of the lowering or codes and work has happened yet. Whereas, the parser and semantics and
[07:23.480 --> 07:28.080]  runtime should work fine for this code, modulo n e bux.
[07:28.080 --> 07:36.320]  Now, I have said that the compiler is able to compile a lot of Fortran code. How does
[07:36.320 --> 07:42.960]  the performance look like? This slide gives you a summary of where this compiler stands
[07:42.960 --> 07:51.040]  with respect to other compilers. The benchmark I have used for this slide is the respect
[07:51.080 --> 07:57.800]  2017 benchmark, and all the Fortran benchmarks from that, either Fortran or mixed Fortran.
[07:57.800 --> 08:02.880]  I have compared it with the two compilers. One is the G Fortran 12 version, and the other
[08:02.880 --> 08:07.980]  one is the compiler called classic flank. There is a compiler that is previously open
[08:07.980 --> 08:14.440]  sourced by PGI, and that is actually the Fortran front end of many of the existing
[08:14.440 --> 08:19.960]  commercial compilers like AMD's, ARM's and Huawei's.
[08:19.960 --> 08:26.960]  So, I have compared it with both these compilers, and what I have at the bottom here is the
[08:26.960 --> 08:32.760]  geometric mean, if you consider the performance of all the benchmarks in this suite.
[08:32.760 --> 08:40.760]  So, you can see that compared to G Fortran, we are around 1.5 times the runtime it takes
[08:40.760 --> 08:49.080]  in flank, whereas compared to classic flank, it is around 1.38. So, for some of the benchmarks,
[08:49.080 --> 08:54.600]  we are mostly on par, but for some of the other benchmarks, there is still some work
[08:54.600 --> 09:01.480]  to be done. The comments column basically summarizes what are probably the issues that
[09:01.480 --> 09:05.360]  are there that causes this performance difference.
[09:06.040 --> 09:13.520]  Some of them are familiar things like alias analysis, the other things like intrinsic
[09:13.520 --> 09:21.360]  in lining and function specialization. Fortran has a lot of intrinsic functions. So, generally
[09:21.360 --> 09:26.360]  these are all implemented in the runtime. Because you have a lot of these intrinsics,
[09:26.360 --> 09:31.640]  the runtime is many times written in a generic fashion. So, you might not get the performance
[09:31.760 --> 09:36.420]  if you call the runtime function. Also, for simple arrays and all, it does not make much
[09:36.420 --> 09:40.560]  sense to incur the overhead, particularly if that function is being called in a loop
[09:40.560 --> 09:47.560]  or something like that. So, many times it is good to inline that code. So, in benchmarks
[09:47.560 --> 09:55.060]  like exchange too, there are a lot of functions like count, sum and any mean lock and all.
[09:55.060 --> 10:00.920]  That can be possibly be inline to get more performance. And exchange too happens to be
[10:00.920 --> 10:07.240]  one benchmark where if you have performed function specialization, you get much benefits.
[10:07.240 --> 10:12.720]  So, function specialization is a process where if you know the arguments to the function,
[10:12.720 --> 10:18.000]  you can generate a specialized version of that function based on the known parameters
[10:18.000 --> 10:19.500]  value.
[10:19.500 --> 10:26.960]  There are also other issues mostly associated with how arrays are handled in Fortran. So,
[10:26.960 --> 10:32.600]  by the definition of or how the standard interprets arrays in Fortran or array expressions
[10:32.600 --> 10:38.480]  in Fortran is that there is always a temporary associated with it. But when we generate code,
[10:38.480 --> 10:43.760]  if there are a lot of temporaries, you know a lot of work, a lot of time is consumed just
[10:43.760 --> 10:48.600]  for copying these arrays from one version to another. So, in many cases these copies
[10:48.600 --> 10:53.760]  can be optimized away, but we do not do a good job about it and that is what is causing
[10:53.760 --> 10:55.120]  this performance issue.
[10:55.120 --> 11:00.240]  So, few engineers have been working on this for some time now. A few months back, we were
[11:00.240 --> 11:09.880]  around 2x, but we are closing the gap you know as fast as possible. So, now in this
[11:09.880 --> 11:15.080]  slide, in the following slides, I summarize some of the major development efforts. So,
[11:15.080 --> 11:21.440]  I probably missed some of the efforts, but what I am going to summarize is first one
[11:21.440 --> 11:27.000]  is high level fare, that is a new dialect that is being written, that sits just above
[11:27.000 --> 11:34.480]  Fortran IR. I will come to the reason for that. Second one is I will try to have one
[11:34.480 --> 11:40.280]  or two slides about polymorphic types and how they are handled in flank. I look at some
[11:40.280 --> 11:46.480]  of the performance work that is being done. I briefly summarize the work that is done
[11:46.480 --> 11:53.640]  already and what are the work going on in open MP as well as the driver.
[11:53.640 --> 12:02.440]  So, when the compiler work started, the IR that we had is Fortran IR which represents
[12:02.440 --> 12:08.080]  a lot of the Fortran constructs, but what we found is that although it does model a
[12:08.080 --> 12:14.640]  lot of the Fortran constructs, there is still some gap between the Fortran source program
[12:15.000 --> 12:20.040]  and the Fortran IR. So, there is some information that is being lost like what are the variables
[12:20.040 --> 12:24.440]  in the source program, what were the annotations that were there on the variables and things
[12:24.440 --> 12:31.440]  like that and losing that meant that we might not be able to do some performance optimization.
[12:31.440 --> 12:37.520]  So, what people decided is that we need to, so that was one issue. The other issue was
[12:37.520 --> 12:41.880]  that the lowering was also proving to be a bit difficult because of the huge difference
[12:41.880 --> 12:47.720]  between the source and the IR. So, that is the reason why a new intermediate representation
[12:47.720 --> 12:54.640]  was introduced between Fortran IR and the source that is the HLFIR IR or the high level
[12:54.640 --> 13:01.040]  FIR. As I mentioned before, it enables optimizations and because it carries more information from
[13:01.040 --> 13:04.800]  the source, it is likely to generate better debug.
[13:04.800 --> 13:11.320]  So, this IR basically introduces two major concepts. One is that it models expressions
[13:11.440 --> 13:16.760]  that are not buffered. As I mentioned before, array expressions in Fortran generally involve
[13:16.760 --> 13:23.760]  temporary arrays and whenever we introduce that arrays into it or the buffers associated
[13:24.560 --> 13:31.560]  with it, it looks like a lot of low level code. Whereas, expressions that are not associated
[13:33.520 --> 13:39.800]  with these buffers are still higher level concepts. So, that if you have chains of intrinsic
[13:40.160 --> 13:44.600]  functions that operate on arrays, it is easy to do some kind of processing there to simplify
[13:44.600 --> 13:48.920]  those expressions. It also introduces the concept of variables.
[13:48.920 --> 13:55.120]  So, there is something, there is an operation in high level FIR called HLFIR variable which
[13:55.120 --> 13:59.400]  collects information about all the variables in a single place. So, that you know this
[13:59.400 --> 14:03.960]  is the variable and what its properties are. Some of these might be modeled by attributes
[14:03.960 --> 14:08.760]  like if you say that a Fortran variable is a pointer or allocatable that is marked as
[14:08.760 --> 14:15.720]  an attribute. It also identifies the shape of the array, you know if that is an array
[14:15.720 --> 14:22.720]  along with the memory associated with it. So, then the initial lowering will be from
[14:23.120 --> 14:29.200]  the Fortran source to a mix of high level FIR and FIR and then the high level FIR is
[14:29.200 --> 14:35.000]  converted again to FIR and then the rest of the pipeline kicks in as always. So, I would
[14:35.080 --> 14:40.560]  not be going into the details, but if people are interested there is a detail RFC you know
[14:40.560 --> 14:47.560]  inside the flying documents repository. But I will try to show you an example.
[14:47.560 --> 14:53.760]  So, this is the Fortran source code and what we have is a declaration of two arrays. The
[14:53.760 --> 14:57.520]  first array called Y is a two dimensional array, the second one is a single dimensional
[14:57.520 --> 15:04.520]  array and then we are summing all the rows or columns in the first array Y and storing
[15:05.680 --> 15:12.680]  it in the result array that is the array called S. I tried to put the FIR code for this in
[15:13.720 --> 15:19.280]  a slide, but it happened to be too much and it would not fit in one slide or two slides.
[15:19.280 --> 15:25.080]  So, I have just left the comments here whereas the source code is completely gone. So, you
[15:25.120 --> 15:30.920]  can see that there is some FIR aloca which allocates that array and the name of that variable
[15:30.920 --> 15:36.120]  was part of that aloca. You can also see that there is a call to the runtime for the sum
[15:36.120 --> 15:43.120]  function. You can see some comments that I mentioned there that the runtime calls and
[15:45.280 --> 15:50.440]  it allocates a buffer on the heap and then returns that and then that heap is copied
[15:50.480 --> 15:57.480]  to the real variable and the original heap is deallocated. Not much here, but if you
[15:57.680 --> 16:04.680]  come to the HLFIR you know you can actually fit that into a single slide because it models
[16:05.280 --> 16:10.400]  things at a higher level. So, the important difference to notice here is that there is
[16:10.400 --> 16:15.320]  an HLFIR declare that there are two HLFIR declares that declares the variables that
[16:15.320 --> 16:20.920]  are there in your program. So, you can see that there are two arrays S and Y they have
[16:20.920 --> 16:26.600]  a representation for that and instead of a runtime call you have an operation called
[16:26.600 --> 16:32.240]  HLFIR sum that actually returns something called an HLFIR expression. So, there is no
[16:32.240 --> 16:36.840]  buffer associated with it or the runtime is not called there is no memory allocated as
[16:36.840 --> 16:42.760]  of now and then that is assigned from the expression to the original variable that is
[16:42.760 --> 16:49.440]  a result array S. So, basically I just want to show that you
[16:49.440 --> 16:55.160]  know this is at a higher level it has some concepts called variables and it has also
[16:55.160 --> 17:01.960]  some things called expressions. I will now move on to polymorphic types. Polymorphic
[17:01.960 --> 17:08.960]  types came as part of the FORTRAN 2003 standard. The types are only known at runtime FORTRAN
[17:09.080 --> 17:14.160]  has the class type for specifying such a type. You know if you have a class type it can
[17:14.160 --> 17:19.440]  refer to either that type or any of the type that extends from it. So, extension is the
[17:19.440 --> 17:25.680]  name for the inheritance concept that is there in some other languages. Again I will not
[17:25.680 --> 17:29.400]  go into much details I have an example in the next slide, but if people are interested
[17:29.400 --> 17:34.320]  they can check this RFC. So, only the example that I have on the left
[17:34.400 --> 17:41.000]  hand side there is a type called point. We call it derived types in FORTRAN and then
[17:41.000 --> 17:45.480]  you have a three dimensional type it extends from point and basically it just adds another
[17:45.480 --> 17:52.480]  field to it. We have a class type that is called P3D and then you call this subroutine
[17:56.400 --> 18:03.400]  called foe and then this subroutine accepts it as a class type of the base type.
[18:04.680 --> 18:10.580]  Then there is a construct called select type in FORTRAN that you know that can at runtime
[18:10.580 --> 18:15.040]  identify which type it is. So, if its type is point 3D then something is printed, if
[18:15.040 --> 18:20.520]  its type is point something else is printed. So, the modeling mostly follows what is there
[18:20.520 --> 18:27.520]  in this code. We have something called the FIR type and that type has the ones in green
[18:27.560 --> 18:33.160]  are the extended type, the ones in red are the base type. You can see that there is a
[18:33.200 --> 18:40.200]  FIR class that has inside it FIR type and then you have this FIR select type construct
[18:41.960 --> 18:48.960]  which tries to match between the base type or the extended type and then it branches
[18:50.000 --> 18:55.280]  off to the basic blocks that handle it. It is basic block one for the extended type and
[18:55.280 --> 19:00.920]  they are basic block two for the base type. So, at runtime also when you generate further
[19:00.960 --> 19:06.480]  lower level code like LVM there will be some comparison instructions that compares whether
[19:06.480 --> 19:11.320]  it is that type. Types will probably be represented as structures that are global. So, you can
[19:11.320 --> 19:15.080]  compare with it to know what is the real type.
[19:15.080 --> 19:22.080]  So, next I move to alias analysis. So, alias analysis is important to distinguish between
[19:24.200 --> 19:30.240]  different arrays that can potentially alias as well as to say that two arrays cannot definitely
[19:30.240 --> 19:36.520]  alias. The rules of aliasing in Fortran is different from what is there in C and so you
[19:36.520 --> 19:43.520]  know we cannot directly reuse whatever is there in C. In general I mean there are exceptions
[19:43.720 --> 19:49.680]  and lot of other special cases. Arrays do not overlap unless you specify that you know
[19:49.680 --> 19:55.640]  some array is a pointer and another array is a target and then these two can overlap.
[19:56.440 --> 20:00.760]  Ideally we should benefit from the restrict patches that are being worked on, but that
[20:00.760 --> 20:06.320]  work is not yet complete. We also have some issues with pointer escape and all. That is
[20:06.320 --> 20:13.320]  all captured in this RFC by Slava. But we still need to do some kind of alias analysis
[20:14.440 --> 20:19.040]  because otherwise as we saw in some of the earlier slides where I show the performance
[20:19.040 --> 20:25.200]  results you know the performance is hampered by the lack of alias information in the LLVM
[20:25.240 --> 20:30.000]  optimizer. We probably have some more information at the fair level, but much of the optimization
[20:30.000 --> 20:36.000]  is currently delegated to LLVM. So, LLVM still needs that information to do the optimizations.
[20:36.000 --> 20:42.720]  So, as a first step what we have done is that you know we are trying to distinguish between
[20:42.720 --> 20:49.640]  accessing the descriptor versus accessing another memory. So, as I might have probably
[20:49.640 --> 20:55.000]  mentioned before Fortran has arrays and it does it is very good at arrays. So, sometimes
[20:55.000 --> 20:59.720]  to pass additional information you cannot just pass just a pointer. You might need to
[20:59.720 --> 21:06.720]  pass additional information like its rank or its you know starting dimension starting
[21:07.320 --> 21:13.600]  extents value or its ending low basically the lower bounds or upper bounds to see whether
[21:13.600 --> 21:18.160]  that array has a stride or not you know there are all these information that can be passed
[21:18.160 --> 21:22.440]  in the descriptor. The descriptor is generally modeled as a structure at the LLVM level.
[21:22.440 --> 21:28.200]  So, you have to go and fetch the contents from the descriptor by loading. Now, this load
[21:28.200 --> 21:33.200]  can potentially alias with other arrays if you directly load from that. So, we are trying
[21:33.200 --> 21:39.800]  to just distinguish these using alias analysis information using TBA. So, that is what we
[21:39.800 --> 21:46.800]  have on this slide. I do not know how clear it is, but then this is in the LLVM MLIR dialect
[21:47.800 --> 21:53.320]  not in the LLVM IR representation. So, we have this TBA information that is being generated
[21:53.320 --> 22:00.320]  here. So, if you know TBA it is mostly trees and if one node is an ancestor of the other
[22:00.640 --> 22:05.880]  node then they alias, but if they are in separate sub trace they do not alias. So, you can see
[22:05.880 --> 22:12.720]  that the ones in gray is any data access the ones in yellow is whether when you are accessing
[22:12.720 --> 22:16.360]  a descriptor member.
[22:16.360 --> 22:20.200]  And if you go back to the source code. So, what I have here is in the simple program
[22:20.200 --> 22:26.200]  is a subroutine. Subroutine is a procedure which does not return a value in Fortran.
[22:26.200 --> 22:32.360]  The two values A and B, A is an array and B is a scalar both of integer type. I am loading
[22:32.360 --> 22:38.200]  the value at the tenth location and putting in this putting in this variable B. And you
[22:38.200 --> 22:44.840]  can see that whenever we are accessing the descriptor that is in this case the descriptor
[22:44.840 --> 22:51.640]  is as access possibly for a stride. We use the TBA with the yellow color and whenever
[22:51.640 --> 22:57.880]  we access something related to B we use the one with the gray one. So, you can distinguish
[22:57.880 --> 23:04.720]  that these do not alias and sometimes when it is in loops we get some performance benefits.
[23:05.720 --> 23:13.560]  So, the next one is code gen of assume shape array arguments. So, as I have mentioned before
[23:13.560 --> 23:17.920]  Fortran has different kinds of arrays one is assume shape. What assume shape means is
[23:17.920 --> 23:23.560]  that you know if you have an argument it takes the shape of the array that you pass to it.
[23:23.560 --> 23:31.000]  So, you can have you can either pass it an array of you know you can either pass it an
[23:31.000 --> 23:39.960]  array of a non size or an unknown size and it will accept both of them. So, this causes
[23:39.960 --> 23:45.240]  some issues particularly because you know the arrays can also be strided. So, if the
[23:45.240 --> 23:49.240]  array is strided then if you have a loop that is working on that array you have to fetch
[23:49.240 --> 23:53.480]  consecutive elements from the array. If it is strided then you have to increment it by
[23:53.480 --> 24:00.520]  the stride and usually you have to load from the descriptor to find the stride. And then
[24:00.560 --> 24:07.720]  you set stride to find the next element. So, sometimes this can be modeled by you know
[24:07.720 --> 24:13.520]  scatter gather loads and stores, but sometimes it is not possible, but in many cases the
[24:13.520 --> 24:17.960]  stride is actually one you are actually passing a consecutive array. So, we can do some
[24:17.960 --> 24:24.240]  versioning. So, it is represented in high level source here if you have some input code
[24:24.240 --> 24:30.640]  like this an array there is an array called x and you are looping over it. Then you can
[24:30.640 --> 24:37.640]  create a version of it in if the stride is one do this if otherwise do this right. And
[24:37.640 --> 24:42.440]  then this side of the portion loop this side of the version becomes easier to optimize
[24:42.440 --> 24:49.880]  and vectorize. I just have two more slides I will probably
[24:50.040 --> 24:55.480]  just run through it. We are nearing open mp 1.1 completion there are still a few items
[24:55.480 --> 25:01.600]  to complete like privatization atomic reduction and detail testing, but a lot of other things
[25:01.600 --> 25:06.240]  are still going on in parallel there is basic support for task SIMD construct. We have been
[25:06.240 --> 25:12.480]  able to run it with some spec speed benchmarks and it works things in progress include target
[25:12.480 --> 25:19.280]  off loading it just started task dependencies and new loop related constructs. We also made
[25:19.280 --> 25:26.320]  a lot of progress with the driver it can now generate executables, but what is new is that
[25:26.320 --> 25:31.280]  we now handle target specification, fast math, MLIR level optimizations previously only
[25:31.280 --> 25:38.040]  LVM optimization over there now we can control MLIR optimizations as well as LVM pass plugins.
[25:38.040 --> 25:43.200]  People are continuing to work on LTO saving optimization records and supporting something
[25:43.200 --> 25:50.400]  called stack arrays. This final slide just says that you are all welcome to contribute
[25:50.400 --> 25:54.520]  to this project and the details of this are here. Thank you.
[26:13.200 --> 26:23.040]  Yeah, I mean as of now we do not have so the basically the question was that when you traverse
[26:23.040 --> 26:29.520]  across the various layers in MLIR and LVM IR is the debug information preserved. So,
[26:29.520 --> 26:35.280]  the whole concept of the HLFIR operation is to you know put that information somewhere
[26:35.280 --> 26:41.280]  in MLIR at the highest level and then we plan to propagate it further down. So, the HLFIR
[26:41.280 --> 26:46.400]  declare has a corresponding FIR declare it will lower to that and from the FIR declare
[26:46.400 --> 26:51.080]  when we convert to LVM we will just pass on that debug information, but debug support
[26:51.080 --> 26:58.560]  is quite early in flank as of now only function names and location numbers as supported, but
[26:58.560 --> 27:05.360]  also may be not by default it is just a pass separately running the code to add that the
[27:05.360 --> 27:12.360]  driver is still pending. Yes.
[27:12.360 --> 27:27.280]  Everything that since standard or some well known extensions are there, but it is a lot
[27:27.280 --> 27:31.840]  of dusty deck code is was tested with it, but I do not know whether the specific thing
[27:31.840 --> 27:37.120]  that you have in mind is supported or not you have to try out or look at the documentation.
[27:37.120 --> 27:45.360]  So, the question was whether old Fortran 77 code or legacy extensions are supported.
[27:45.360 --> 27:53.520]  So, I am trying to understand what needs to be done for open MP because open MP is a completely
[27:53.520 --> 27:59.600]  separate project right. Yeah. So, the question is what needs to be done separately for open
[27:59.680 --> 28:07.680]  MP. So, as far as open MP is concerned all the open MP work is mainly represented at
[28:07.680 --> 28:13.960]  the MLIR level by a separate dialect called the open MP dialect. Then that dialect sits
[28:13.960 --> 28:21.400]  in the main MLIR repository and from the source program when we generate it you know we create
[28:21.400 --> 28:26.280]  these additional operations for the open MP dialect and it has regions in MLIR. So, it
[28:26.320 --> 28:32.320]  can capture things like you know a parallel directive much better compared to LVM.
[28:32.320 --> 28:38.320]  So, roughly speaking open MP is a set of intrinsics more or less.
[28:38.320 --> 28:45.820]  I mean it is not as I mean intrinsic when you whenever you say it is kind of some kind
[28:45.820 --> 28:50.360]  of function, but these are MLIR operations right. So, if you have something called a
[28:50.400 --> 28:56.400]  parallel directive there is an operation in MLIR called omp.parallel and it might have
[28:56.400 --> 29:02.400]  lot of clauses like you know what is the you know threading model and things like that.
[29:02.400 --> 29:08.000]  So, all those information is captured at that level along with the code that comes in that
[29:08.000 --> 29:12.760]  parallel region and what we do actually now is that we are trying to share code with Clang.
[29:12.760 --> 29:17.560]  So, there is some code that is refactored from Clang and put into something called the open
[29:17.640 --> 29:24.640]  MP IR builder. So, when we lower it from this dialect to LVM we use that to generate the
[29:24.640 --> 29:28.160]  LVM IR. Thank you.