We'll get started.
Sylvester will introduce us to Piepart MC.
Thank you for coming.
I'm Sylvester Arrabas.
I work at the AGH University in Kraków in Poland.
And this is a project carried out together with a team from the University of Illinois,
Urbana-Champaign in US.
So Piepart MC is the highlight here.
But from the perspective of this conference, probably I should read the subtitle, namely
How to engineer a Python to Fortran binding in C++ for use in Julia and MATLAB and why
to do it.
So the package that this tool is interfacing is called Piepart MC.
It's a Monte Carlo simulation package for air resolves that are, for example, floating
in the air.
It's an open source tool developed for more than 20 years at Urbana-Champaign.
And just one line about the physics.
So usually it's kind of a box model, so studying just processes without a spatial context.
But it also has an option to be coupled with the Worf weather simulation for a cast.
So here is the HPC context.
And it simulates things like air pollution, evolution due to collisions of particles,
condensation, chemical reactions, et cetera.
And on the technical side, it's actually an object-oriented code base written in quite
classic, using quite classic subset of Fortran, but still in very much object-oriented manner.
And despite 20 years of heritage, it has a very comprehensive test suite.
And I would say it could be an example of best practices in Fortran.
However, its usage poses several challenges, for example, to students who intend to start
off using it, for example, from a Jupyter notebook.
And these challenges are related with, first of all, multiple dependencies.
The need to compile it.
Getting updates doesn't have really a workflow ready.
The automation of simulations, analysis, et cetera, usually involves Shell.
The input output is handled through multiple text files.
And to analyze output from these simulations, usually one needs to actually look or use
some of the Fortran code the simulation is based on.
So the question that was posed when we started was how to bring together these two seemingly
separate worlds.
So on the right-hand side, this is the simulation package, part MC, with its Fortran code base,
a bit of C code base, different dependencies.
And then a perspective of a modern student, let's say, who starts with Jupyter and expects
basically everything to be importable and interoperable with other libraries, scipy,
numpy, et cetera.
So the goals would be to lower the entry threshold for installation and usage.
To ensure that the same experience is doable on different operating systems.
And also to streamline the dissemination of studies based on the simulation tool, for
example, for peer review with scientific journals.
So the status of the project, as of now, of part MC, this Python bindings, is that we
released after two years of development version one, it's on PyPy.
And we also published a description of the package in the software X journal.
So we are kind of ready for a rollout.
And today I will talk more about the internals.
And the internals start with PyBind 11.
So despite we are talking about Python and Fortran, we actually, we picked PyBind 11,
which is a C++ tool for developing Python packages as our backbone.
So here's some highlights.
The project actually is for those who are new to it, it's quite a remarkable success,
I would say, with over 300 contributors on GitHub, 2,000 forks and 14,000 stars.
Congratulations to PyBind 11.
And it's very useful.
So it fits here into the picture.
So essentially we developed in C++, in C and in Fortran, so it's a triple language project,
something that uses PyBind 11 and a few other components to automate building of this part
of C and offering the Python package.
So probably what's also worth mentioning is here that most of the work on PyPartnC was
around substituting this text file input output with JSON-like Python native, let's say,
or Python-like Pythonic input output layer.
And as I mentioned, the original project has the object-oriented structure, so we tried
to also couple Python's garbage collector with the Fortran functions that are provided
for creating and deallocating objects.
And there are many, many dependencies that the project has in Fortran, in C, in C++.
And here, let me just mention that we picked Git submodules as a tool to pin versions of
these dependencies, which is useful because the pip install command is able to grab packages
from a Git repository, and this would include all the submodules with their versions.
So let me now present a bit of code and how it looks from a user perspective.
So this example here, please don't look particularly on the license of code, maybe just on the bulk
of code, and the type of code.
So here on the left, we have the Fortran Hello World for using the PartMC package, and on
the right, three text files that would be the minimum to start a simplest simulation.
So now this is the end result that uses the PyPartnC layer, so essentially the same can
be obtained with a single file, starting with importing from this PyPartnC wrapper, and then
using this kind of JSON-like notation, essentially here, list and dictionaries that are wrapped.
So one achievement kind of, and one big advantage of using Python is that actually providing
Python wrappers, you are catering also to Julia users, for example, here through the
PyCall.jl package, essentially the same code and the same logic can be obtained for Julia
users using PyPartnC.
And finally, example with using Matlap, which ships with built-in Python bridge, and then
which allows also to use PyPartnC to access the Fortran code from Matlap.
So these three examples I've shown are actually part of our CI, so we have them in the readme
file, and on CI we are executing the Julia, the Python, the Fortran, and the Matlap example,
uploading the output as artifacts, and there is an assert stage that checks if the output
from all these languages match.
By the way, the timings here are essentially compilation and set up, so it's not that Fortran
takes much shorter, the execution is always done through the Fortran code base and binary,
but clearly compiling just the Fortran code is faster than setting up the Python, Julia,
or Matlap environment, and how it works actually in practice when looking at the code.
So here, this diagram might be not perfectly visible, but the right column is C++ layer,
here is the C layer, here is Fortran layer, and here is the user code either in Julia,
Matlap, or Python.
And the different color here is to depict the package that we are interfacing with.
So if we start with this readme code here, the user's Python code, we have set up the
some import and instantiation of a single object of this arrow data class as an example,
and what happens if we call it, first it goes through barely visible, I guess.
So anyhow, this is the kind of outer layer for the C++ implemented Python package, and
now I hope it's more visible.
This is how PyBind 11, how one works with PyBind 11.
So this is the C++ code where we define a module for Python, creating a Python class
from C++ code looks roughly like this, with some templates defining the class that we
interface how to handle memory allocation and defining particular methods.
Here there is an init method, so a kind of constructor, and this constructor, when called,
goes through C++ code, this arrow data class that we wrap, but quickly we need on our way
to Fortran to go into what is written here up at the top, C binded signatures for the
Fortran function.
So they cannot take exceptions, exception handling through, across these languages is essentially
undefined behavior, depending on the compiler.
This is how it looks from the C++ perspective.
So when we look now on the C signatures here at the top, they match to what is later defined
in Fortran with the Fortran built in C binding module.
So whenever you see this bind C or C underscore types, these ensure within Fortran code that
we can access this code from C, and each of these routines is written for our wrapper
and essentially calls quickly as a fin wrapper around the original Fortran routines that
we wanted to wrap.
So for example, the one below spec file read arrow data.
So now we go finally to the wrapped code.
This is the unmodified code that we access, and it sits in a Git submodule of the Pypartmc
project.
Now the fun starts when this Fortran code actually calls its input output layer, and
there is like, usually a simulation takes something like 20 different text files to be read
through, and these text files are nested.
So what we've done is we replaced one of the components of the original Fortran package
with our implementation that starts in Fortran, then goes through a C layer back to C++, which
then uses JSON for Fortran.
So this is a C++ library that helps get very readable C++ code for using Fortran, and this
was our solution to replacing the multiple text files with what from user perspective
are essentially in memory, MATLAB, Julia, or Python objects.
We also have online documentation for the project generated from the source code, and
as you can see here, for example, the types are hinted correctly.
So despite in principle the Fortran parameter ordering is the key, we do inform Python users
for the types of the arguments.
So to start a summary, what we achieved with the Pypartmc wrapper is that we have a list
of different types of the wrapper, and we have a single command pip installation on Windows
Linux and OS X, with the exception that from Apple Silicon we are still struggling to get
it done and help welcome, if any of you is a Fortran hacker who could help us produce
universal binaries.
We provide access to unmodified internals of the Pypartmc underlying package from Python,
MATLAB, and also C++.
So as a side effect by product of this goal of providing Python interface, we got also
Julia MATLAB and C++ layer.
Probably something that might not be obvious from the original plan, and we ended up actually
using extensively is that this provides us with a nice tool for development of other
Python packages because we can use part mc in test shoots to verify against the established
simulation package.
And also probably it's maybe a non-trivial way to use pip, but since C and Fortran are
probably not the best, are not the solutions, not the technologies where you see mainstream
package managers coming in or being established here, we managed to ship Fortran codes to
users of Windows 6 Linux different variants of binary packages through pip.
So it's essentially probably one way of thinking of the PyPy.org platform.
And from the point of view of what I mentioned earlier, providing students or researchers
using this package with tool to disseminate their research workflows, including input
data, output data analysis workflow in a single, for example, Jupyter file for a paper peer
review.
And finally, PyPy.org mc allows to extend the Fortran code with some Python logic.
So since we interface with, we expose the internals of the package, we can do in a simulation
the time stepping can actually be done from Python.
And you can add to, let's say, if you have 10 different steps of the simulation done
in Fortran, you can add an 11th one that is in Python, Julia or whatever.
And the final point is probably one of the key things here is that having statically
linked all the dependencies, we can actually use the package on platforms such as Colab
or Jupyter Hubs of various institutions by doing just pip install and importing what
otherwise would require getting a lot of dependencies and a lot of compile time stuff available.
Take home messages.
So I wanted to kind of give you a little bit of a little bit of a little bit of a little
bit of a little bit of a little bit of a little bit of a little bit of a little bit
kind of underline that PyBind 11, despite being a C++ tool is actually a valuable thing
for interfacing Fortran with Python.
And this is linked to the fact that PyBind 11 offers CMake integration.
So your C++ projects can have build automation in CMake, and CMake handles Fortran well,
so this was the key thing here.
The glue language role of Python is, I think, nicely exemplified here with Julia and Matlap,
including CI.
Static linkage of the dependencies was essential for us, for example, due to the fact that
there is no standardized ABI for four different
versions, even of the same compiler,
have different
binary incompatibilities, and this was essential to get it working on on
platforms such as Colab or other Jupyter Hubs.
But this prevented us from from publishing the package on KONDA due to KONDA policy of no static linkage.
We've used more than 10 Git submodules for
tracking our dependencies from the GitHub repo.
As I mentioned, help welcome in getting the universal binaries
generated with G4tran.
The CI on
using MATLAB is possible thanks to the
MATLAB actions. So the producer of MATLAB MapWorks offers
CI, GitHub actions that actually do not require any MATLAB license.
So if one wants to run MATLAB code on GitHub, this is important and just wanted to thank them. And finally,
a fun fact or the positive thing that actually when we submitted the paper about the project to the Software X Journal,
just reporting that during the peer review, the reviewers indeed tried the code and
provided us with feedback that also helped. So this was kind of positive that
it did work. Let me acknowledge funding from US National Science Foundation and Polish National Science Center
and thank you for your attention.
Any questions?
Yes, thank you for that presentation. My question was exactly did you keep in Fortran and what did you pass to
Python site? So it's a race or some or just single values?
So the question is about if I understand correctly what kind of data we are tackling with
passing us during the simulation. So it's a
the Monte Carlo simulations here are tracking particles in kind of attribute space that
tracks their physical and chemical properties. So it's usually 20, 30 dimensional attribute space that is randomly
sampled. So we have vectors of these particles in this attribute space. So usually this could be
from thousands to
hundreds of thousands of particles that each of the particle has like 30 attributes.
From Python perspective, usually the user does not really use the roll data of the simulation, the state vector, just some aggregate information
which is passed back to Python as enumerables that can be used with NAMPy, but we don't actually assume that it must be NAMPy. So one can use just lists if they are enough.
I hope that answers.
My question is just because we need some roll data from Fortran site to Python site and then it's just some two dimensional matter. Here we have some problems that we need to know where we keep the data.
We are not exposing particle locations in memory. They are always returned as new objects to Python because this is it is never the state vector of the simulation. It's just a
some aggregate information that characterizes it in a simpler way. So usually we have just one dimensional enumerable.
For you it's much more simple. Thank you.
Time for one more question.
If there is one.
Okay, if not we'll wrap up here because apparently there's a queue outside to get in for the next talks. Thank you. Thank you very much.