[00:00.000 --> 00:10.360]  Thank you for the opportunity to present our project, Namba MPI.
[00:10.360 --> 00:12.760]  Let me first acknowledge the co-authors.
[00:12.760 --> 00:18.880]  My name is Sylvester Arrabas and we are here with Olex Ibulenok and Kacper Darlatka from
[00:18.880 --> 00:24.200]  Jagiellonian University in Kraków, Poland, Maciej Manna from the same university contributed
[00:24.200 --> 00:28.680]  to this project and we have also, we will be presenting some work from David Zwicker
[00:28.680 --> 00:33.320]  from Max Planck Institute for Dynamics and Self-Organisation in Göttingen.
[00:33.320 --> 00:40.960]  So let's start with a maybe controversial, provocative question, Python and HPC.
[00:40.960 --> 00:48.320]  And let's try to look for answers to this question in a very respected journal, okay?
[00:48.320 --> 00:53.120]  So maybe you have some guesses what's written there.
[00:53.120 --> 00:59.720]  2019, in scripting languages such as Python, users type code into an interactive editor
[00:59.720 --> 01:00.840]  line by line.
[01:00.840 --> 01:03.400]  It doesn't sound like HPC.
[01:03.400 --> 01:08.600]  Next year, level of computational performance that Python simply couldn't deliver.
[01:08.600 --> 01:14.520]  Same year, same journal, Namba runs on machines ranging from embedded devices to the world's
[01:14.520 --> 01:21.600]  largest supercomputers with performance approaching that of compiled languages.
[01:21.600 --> 01:23.840]  Same year, nature astronomy.
[01:23.840 --> 01:27.400]  Astronomers should avoid interpreted scripting languages such as Python.
[01:27.400 --> 01:32.480]  In principle, Namba and Nampa can lead to enormous increase in speed, but please reconsider
[01:32.480 --> 01:36.080]  teaching Python to university students.
[01:36.080 --> 01:38.320]  Same year, nature methods.
[01:38.320 --> 01:44.000]  Implementing new functionality into SciPy, Python is still the language of choice.
[01:44.000 --> 01:50.000]  Full test should pass with the PyPy just-in-time compiler as of 1.0 SciPy.
[01:50.000 --> 01:52.600]  Are they talking about the same language?
[01:52.600 --> 01:54.080]  No.
[01:54.080 --> 01:56.880]  The left-hand side are papers about Rust and Julia.
[01:56.880 --> 01:59.720]  The right-hand side are papers about Python.
[01:59.720 --> 02:01.880]  So maybe that's the reason.
[02:01.880 --> 02:09.000]  So just to set the stage, let me present, I think, a way that is apt for thinking about
[02:09.000 --> 02:10.000]  Python.
[02:10.000 --> 02:16.760]  So Python as a language lacks any support for multi-dimensional arrays or number crunching
[02:16.760 --> 02:21.680]  because it leaves it to packages to be handled.
[02:21.680 --> 02:27.680]  Python also leaves it to implementations to actually interpret its syntax.
[02:27.680 --> 02:32.140]  And SciPy, of course, the major, the main implementation, but it's not the only one
[02:32.140 --> 02:37.640]  and actually solutions exist that streamline, for example, just-in-time compilation of Python
[02:37.640 --> 02:38.640]  code.
[02:38.640 --> 02:46.080]  Moreover, Nampa, while de facto standard, is not the implementation of the Nampa API.
[02:46.080 --> 02:52.760]  And alternatives are embedded in just-in-time frameworks, just-in-time compilation frameworks,
[02:52.760 --> 02:58.200]  GPU frameworks for Python, and they leverage typing and concurrency.
[02:58.200 --> 03:04.160]  So probably here the highlight is that Python lets you glue these technologies together
[03:04.160 --> 03:09.600]  and package them together, leveraging some of the Python ecosystem and its popularity,
[03:09.600 --> 03:11.800]  et cetera.
[03:11.800 --> 03:15.360]  And probably, arguably, I would say that's an advantage.
[03:15.360 --> 03:19.800]  I'm not saying that please use Python for HPC instead of Julia.
[03:19.800 --> 03:25.520]  Probably vice versa, actually, but still it's an interesting question to see how it can
[03:25.520 --> 03:26.520]  perform.
[03:26.520 --> 03:30.080]  OK, so let's check it.
[03:30.080 --> 03:36.320]  I will present a brief benchmark, a very tiny one, that we have come up with in relation
[03:36.320 --> 03:39.080]  with this project, and it uses Nampa.
[03:39.080 --> 03:44.800]  Nampa is just-in-time compiler that translates a subset of Python and Nampa into machine
[03:44.800 --> 03:50.200]  code that is compiled at runtime using LLVM, OK?
[03:50.200 --> 03:55.560]  So here is the story about the super simple benchmark problem.
[03:55.560 --> 03:58.320]  It's related to a numerical weather prediction.
[03:58.320 --> 04:05.640]  So you can imagine a grid of, well, numbers representing weather here.
[04:05.640 --> 04:09.920]  And numerical weather prediction, or part of numerical weather prediction, the integration
[04:09.920 --> 04:17.120]  part involves solving equations for the hydrodynamics that is the transport of such pattern in
[04:17.120 --> 04:22.760]  space and time, and, of course, thermodynamics that tell you what's happening in the atmosphere.
[04:22.760 --> 04:23.960]  Super simplified picture.
[04:23.960 --> 04:30.280]  I'm not saying that's the whole story about NWP, but for benchmarking Nampa, let's simplify
[04:30.280 --> 04:34.200]  it down to, in this case, two-dimensional simple problem.
[04:34.200 --> 04:37.520]  You have a grid, x, y, some signal.
[04:37.520 --> 04:43.520]  And if we look at just the transport problem, a partial differential equation for transport,
[04:43.520 --> 04:49.000]  we can see what happens if we move around such signal, which could be some, I don't
[04:49.000 --> 04:52.520]  know, humidity, temperature, whatever, in the atmosphere, OK?
[04:52.520 --> 04:55.440]  So we have a sample problem.
[04:55.440 --> 05:01.360]  Here I'm showing results from a three-dimensional version of what was just shown.
[05:01.360 --> 05:07.120]  And let's start with the right-hand side plot, x-axis, the size of the grid.
[05:07.120 --> 05:10.520]  So if it's 8, it means 8 by 8, super tiny.
[05:10.520 --> 05:18.800]  If it's 128, it's 128 by 128 by 128, and wall time per time step on the y-axis, OK?
[05:18.800 --> 05:19.800]  Green.
[05:19.800 --> 05:26.400]  C++ implementation of one particular algorithm for this kind of problems, and orange, pi
[05:26.400 --> 05:32.480]  MP data, the same algorithm, numerically, but a Python implementation.
[05:32.480 --> 05:42.280]  So here you see that actually Namba, just compiled version outperformed C++, maintaining
[05:42.280 --> 05:48.960]  even better scaling for the tiny matrices, but they are kind of irrelevant for the problem.
[05:48.960 --> 05:53.080]  And please note that in both cases we have used multi-threading.
[05:53.080 --> 05:58.400]  So here on the left-hand side, you can see actually on the x-axis number of threads, y-axis
[05:58.400 --> 06:00.720]  wall time per time step.
[06:00.720 --> 06:04.080]  And again, the green line is the C++ implementation.
[06:04.080 --> 06:10.000]  These two are two variants of the Python 1.jit compiled with Namba, almost an order of magnitude
[06:10.000 --> 06:12.680]  a faster execution, five times faster.
[06:12.680 --> 06:19.040]  And what's probably most interesting for now is that when you compare with just setting
[06:19.040 --> 06:26.900]  the environment variable for Namba.jit to disabled, we jump more than two orders of
[06:26.900 --> 06:31.240]  magnitude up in wall time.
[06:31.240 --> 06:39.980]  So this is how Namba timing compares with plain Python timing.
[06:39.980 --> 06:43.260]  But there are two important things to be mentioned here.
[06:43.260 --> 06:50.540]  The Python package is written having Namba in mind, that is, everything is loop-based,
[06:50.540 --> 06:56.980]  which is the reason why plain C Python with Nampa performs badly.
[06:56.980 --> 07:01.220]  This line is kind of irrelevant, just as a curiosity.
[07:01.220 --> 07:06.340]  On the other hand, the C++ version is kind of legacy, it's based on Blitz++ library.
[07:06.340 --> 07:10.540]  Back then, when it was developed, IGN didn't have support for multiple dimensions.
[07:10.540 --> 07:16.860]  And it's object-oriented RIProcessing, which was reported and measured to be kind of five
[07:16.860 --> 07:21.300]  times slower than 4777 for these kind of small domains.
[07:21.300 --> 07:24.220]  It's not the same for larger domains.
[07:24.220 --> 07:28.700]  Anyhow, we can achieve high performance with Python.
[07:28.700 --> 07:31.100]  But what if we need MPI?
[07:31.100 --> 07:35.380]  We need message passing in our code.
[07:35.380 --> 07:36.980]  How would we use it?
[07:36.980 --> 07:42.700]  Let's say we divide in a domain that can position spirit our domain in two parts.
[07:42.700 --> 07:49.460]  So the same problem, same setup, just half of the domain is computed by one process or
[07:49.460 --> 07:59.260]  node or anything that has distributed, has different memory addressing than another work.
[07:59.260 --> 08:02.380]  So this is how we want to use it, why we want to use MPI?
[08:02.380 --> 08:07.700]  Well, because despite expansion in parallel computation, both in the number of machines
[08:07.700 --> 08:12.420]  and the number of cores, no other parallel programming paradigm has replaced MPI.
[08:12.460 --> 08:15.060]  At least as of 2013.
[08:15.060 --> 08:21.340]  And already in 2013, people were writing that this is, even though it's universally acknowledged
[08:21.340 --> 08:24.300]  that MPI is rather a crude way of programming these machines.
[08:24.300 --> 08:27.020]  Anyhow, still, let's try it.
[08:27.020 --> 08:29.060]  And let's try it with Python.
[08:29.060 --> 08:37.180]  So here is a seven-line snippet of code where we try to import Namba to get the jit compilation
[08:37.180 --> 08:38.780]  of Python code.
[08:38.780 --> 08:43.300]  And then we use MPI for pi, which is Python interface to MPI.
[08:43.300 --> 08:44.300]  What do we do?
[08:44.300 --> 08:48.540]  We define some number crunching routine, and we try to use MPI from it.
[08:48.540 --> 08:51.340]  And then we try to Njit.
[08:51.340 --> 08:58.820]  Njit means the highest performance variant of Namba jit compilation.
[08:58.820 --> 09:02.820]  We try to jit compile this function and straight ahead execute it.
[09:02.820 --> 09:03.820]  What happens?
[09:03.820 --> 09:06.500]  It doesn't work.
[09:06.500 --> 09:14.780]  It cannot compile because Namba cannot determine type of MPI for pi.mpi.intra.com because it's
[09:14.780 --> 09:16.420]  a class.
[09:16.420 --> 09:22.100]  Classes do not work with Namba, at least not the ordinary Python classes.
[09:22.100 --> 09:24.100]  So something doesn't work.
[09:24.100 --> 09:29.300]  So the problem is that we have Namba, which is one of the leading solutions to speed up
[09:29.300 --> 09:30.300]  Python.
[09:30.300 --> 09:36.620]  MPI, which is clearly the de facto standard for distributed memory parallelization.
[09:36.620 --> 09:40.500]  We try to work with them together, but it doesn't work.
[09:40.500 --> 09:43.780]  So stack overflow.
[09:43.780 --> 09:45.940]  Let's go it.
[09:45.940 --> 09:46.940]  Nothing.
[09:46.940 --> 09:48.140]  Let's quant it.
[09:48.140 --> 09:49.780]  Nothing.
[09:49.780 --> 09:51.460]  Wrong search engine, right?
[09:51.460 --> 09:53.900]  Someone must have solved the problem.
[09:53.900 --> 09:54.900]  Nothing.
[09:54.900 --> 09:58.380]  Let's ask Namba guys and MPI for pi guys.
[09:58.460 --> 10:03.660]  In 2020, you will not be able to use MPI for pi's siton code.
[10:03.660 --> 10:05.980]  It was not designed for such low-level usage.
[10:05.980 --> 10:09.500]  Well, okay, it's siton.
[10:09.500 --> 10:12.820]  But I mean, it must be doable, right?
[10:12.820 --> 10:16.100]  We have two established packages.
[10:16.100 --> 10:18.980]  The aim is kind of solid and makes sense.
[10:18.980 --> 10:22.060]  So it must be doable.
[10:22.060 --> 10:28.340]  And 30 months later, 120 comments later, 50 PR slater from five contributors on a totally
[10:28.340 --> 10:32.500]  unplanned site project, we are introducing Namba MPI.
[10:32.500 --> 10:42.460]  Namba MPI is an open source, kind of small Python project, which allows you to, let's
[10:42.460 --> 10:48.340]  jump here to the Hello World example, which allows you to use the Namba NGIT decorator
[10:48.340 --> 10:57.060]  on a function that involves rank, size, or any other MPI API calls within the Python
[10:57.060 --> 10:59.260]  code.
[10:59.260 --> 11:03.860]  As of now, we cover size rank, send, receive, or reduce broadcast barrier.
[11:03.860 --> 11:07.340]  The API for Namba MPI is based on NumPy.
[11:07.340 --> 11:09.420]  We have auto-generated documentation.
[11:09.420 --> 11:12.500]  We are on PyPy and Conda Forge.
[11:12.500 --> 11:15.540]  Few words about how it's implemented.
[11:15.540 --> 11:22.260]  Essentially we start with Ctypes built into Python to try to address the C API.
[11:22.260 --> 11:26.780]  There are some things related with passing addresses, memories, void pointers, et cetera,
[11:27.620 --> 11:31.220]  super interesting.
[11:31.220 --> 11:36.580]  Probably the key message here is that we are offering the send function that is already
[11:36.580 --> 11:41.460]  NGITed, which means that you can use it from other NGITed functions.
[11:41.460 --> 11:46.540]  We handle non-continuous arrays from NumPy, so we try to be user-friendly.
[11:46.540 --> 11:53.540]  We then call the underline C function, and kind of that's all.
[11:53.540 --> 11:57.860]  But really, there is the key line number 30.
[11:57.860 --> 11:58.860]  This one.
[11:58.860 --> 12:04.380]  Well, that's nothing but, in principle, without it, Namba optimizes out all our code.
[12:04.380 --> 12:09.900]  Anyhow, these are kind of things that you see when trying to implement such things.
[12:09.900 --> 12:14.620]  Unfortunately, there are quite more of such hacks inside Namba MPI.
[12:14.620 --> 12:20.820]  The next slide is kind of a thing that you prefer to never see, but they cannot be unseen,
[12:20.820 --> 12:22.780]  in a way, if you work with it.
[12:22.780 --> 12:28.940]  So please just think of it as a picture of some problems that we have challenged and
[12:28.940 --> 12:36.340]  essentially wrote to Namba guys asking how can it be done, and we got this kind of tools
[12:36.340 --> 12:44.860]  for handling void pointers from C types in Namba with Python, NumPy, et cetera.
[12:44.860 --> 12:51.700]  But well, that's utilspy, and that's it, and it kind of works, and why do we know it works?
[12:51.700 --> 13:00.500]  Because we test it, and let me handle the mic to Olexi to tell you more about testing.
[13:00.500 --> 13:10.660]  Okay, it's focused.
[13:10.660 --> 13:18.180]  So I'm going to tell you about the CI that we have set up for our project for Namba MPI.
[13:18.180 --> 13:34.860]  So the CI was set up at Github Actions, as I said, and this is the screen of the workflow.
[13:34.860 --> 13:40.060]  We start from running the PDoc, Precommit, and PyLint.
[13:40.060 --> 13:46.580]  PDoc is for automatic documentation generation, PyLint for static code analysis, and Precommit
[13:46.580 --> 13:49.140]  for styling.
[13:49.140 --> 13:53.540]  After that, if these steps were successfully moving to the main part where we run our unit
[13:53.540 --> 14:03.420]  tests, this is the example, not example, but actually the workflow file that we run.
[14:03.420 --> 14:10.140]  As you can see, when we run the CI against multiple systems, different Python versions
[14:10.140 --> 14:16.540]  and different MPI implementations, and here we should say a big thank you to MPI for
[14:16.700 --> 14:25.300]  PyTeam for providing set up MPI Github Action, because this has saved us a lot of time.
[14:25.300 --> 14:29.260]  So thank you, MPI for Py.
[14:29.260 --> 14:36.620]  And as of operation systems and MPI implementations, we are running, in case of Linux, we're testing
[14:36.620 --> 14:44.100]  against OpenMPI, MPICH, and Intel MPI, Mac OS, MPI, and MPICH, and in case of Windows,
[14:44.340 --> 14:48.180]  of course, MSMPI implementation.
[14:48.180 --> 15:01.300]  But when we are talking about MPICH, there is a problem that has recently occurred, namely
[15:01.300 --> 15:07.940]  starting from version 4 of MPICH, it fails for on Ubuntu on our CI for Python version
[15:07.940 --> 15:10.660]  less than 3.10.
[15:10.660 --> 15:18.020]  So if anyone has ideas how to fix it, please contact us, we will appreciate any help.
[15:18.020 --> 15:26.740]  Okay, so sample, we are running the unit tests on different systems and so on.
[15:26.740 --> 15:29.060]  Let's see the sample unit test.
[15:29.060 --> 15:40.220]  In this test, we are testing the logic of the wrapper of the broadcast function of MPI
[15:40.420 --> 15:46.420]  and the main thing that you should remember from this slide is that we are testing this
[15:46.420 --> 15:56.340]  function in plain Python implementation as well as Github compiled by Namba.
[15:56.340 --> 16:04.940]  We have also set up an integration test, the integration test is in another project named
[16:04.940 --> 16:10.900]  isopredoperate-les, and this is just a scheme of this test.
[16:10.900 --> 16:19.620]  We are starting from providing the initial conditions for the APDS solver, and these initial
[16:19.620 --> 16:27.060]  conditions are written to the HDF5 file.
[16:27.060 --> 16:34.460]  After that, we are running three runs, the first one we run with only one process, the
[16:34.460 --> 16:41.340]  second we have two processes, the third three processes, and in each we divide, well, in
[16:41.340 --> 16:50.020]  the first we don't divide the domain, but the other ones we divide the domain accordingly,
[16:50.020 --> 16:57.820]  and in the assert state we just compare the results and we want the results to be the
[16:57.820 --> 17:00.740]  same for different runs.
[17:00.740 --> 17:10.020]  And also these results are also written to HDF5 file.
[17:10.020 --> 17:19.660]  Interesting fact that everything works on Windows except installing HDF5 package for
[17:19.660 --> 17:28.780]  concurrent file access, HDF5 package was enabled in PIO, we have troubles setting up on Windows,
[17:28.780 --> 17:36.940]  but everything else works fine, and there is also an independent use case, the PyPD
[17:36.940 --> 17:46.020]  project that uses our library, our package, and it's not developed by us, so there is
[17:46.020 --> 17:52.060]  a user, and this is the Python package for solving partial differential equation, it
[17:52.060 --> 18:04.340]  focuses on finite differencing, and these are defined by, I provide it as strings, and
[18:04.340 --> 18:09.220]  the solution strategy as follows, we start from partitioning the grid onto different
[18:09.220 --> 18:18.540]  nodes using number MPI after add that with partial expressions using the SIMPY and compile
[18:18.540 --> 18:26.380]  the results using number, and then we trade the PDE exchange in boundary information between
[18:26.380 --> 18:31.420]  the nodes using number MPI.
[18:31.420 --> 18:38.980]  Take home messages, there is a common mismatch between the Python language and Python ecosystem,
[18:38.980 --> 18:45.260]  we should remember that the language could be slow, but we also should consider the
[18:45.260 --> 18:52.620]  ecosystem around this language, the libraries that are available, the libraries that are
[18:52.620 --> 19:00.140]  available, and probably different implementations, and Python has a range of global HPC solutions
[19:00.140 --> 19:07.300]  such as just-in-time compilation, GPU programming, multi-trading, and MPI, and in case of number
[19:07.300 --> 19:20.300]  MPI, this is the package to glue the MPI with LFMG compiled Python code, it is tested
[19:20.300 --> 19:34.980]  on CI, on GitHub Actions, we are aiming for 100% unit test coverage, and also there is
[19:34.980 --> 19:45.260]  also already the two projects that are dependent on this package, here you can find the links
[19:45.260 --> 19:52.180]  for number MPI, the GitHub links, and also the links to the packages at PyPy and Anaconda,
[19:52.180 --> 20:00.820]  and we also welcome contributions, the first two issues I have mentioned earlier, and we
[20:00.820 --> 20:07.660]  also welcome and encourage to provide the logo for number MPI, as well as adding support
[20:07.660 --> 20:15.780]  for the other functions, or we are also aiming for dropping dependency on MPI for Py in our
[20:15.780 --> 20:25.460]  project, and also the plan is to benchmark the performance of this package, and we also
[20:25.460 --> 20:32.180]  we wanted to acknowledge funding, the project was funded by National Science Centre of Poland,
[20:32.180 --> 20:46.980]  so thank you for your attention, and probably we now have time for questions.
[20:46.980 --> 20:56.980]  Thank you very much, any questions?
[20:56.980 --> 20:59.300]  Question from an MPI expert?
[20:59.300 --> 21:06.940]  Hello, thank you for the talk, so the interface you are proposing is very close to the let's
[21:06.940 --> 21:12.700]  say CMPI interface, let's say when you do a send you work with a buffer, or do you try
[21:12.700 --> 21:19.420]  to provide a bit higher level interface, for example, serializing some Python object,
[21:19.420 --> 21:29.420]  or it could be very useful.
[21:29.420 --> 21:37.020]  Yeah, the interface is as slim thin as possible probably, very close to the CMPI, one of the
[21:37.020 --> 21:44.820]  reasons being that within Namba and Jitted Code, probably things like serialization might
[21:44.820 --> 21:52.860]  not be that easy to do, there is no problem in combining MPI for Py and Namba MPI in one
[21:52.860 --> 21:58.140]  code base, so when you are out of the Jitted Code, you can use MPI for Py, which has high
[21:58.140 --> 22:06.300]  level things like serialization, et cetera, so you can use it there, but within LLVM compiled
[22:06.300 --> 22:13.540]  blocks, you can use Namba MPI for simple send, receive, already use, I mean, without higher
[22:13.540 --> 22:22.820]  level array functioning, having said that we, for example, handle transparently non-contiguous
[22:23.020 --> 22:32.420]  devices of arrays, we also, yeah, there are some things that are higher level than C interface,
[22:32.420 --> 22:37.420]  but in general, we try to provide wrapper around the C routines.
[22:37.420 --> 22:42.780]  Okay, thank you.
[22:42.780 --> 22:45.780]  Any other questions?
[22:46.740 --> 22:55.220]  Thanks for a great talk, it seems really interesting what you are working on, I have got a couple
[22:55.220 --> 23:00.580]  of questions, probably born out of ignorance, but I just kind of wondered if you could help
[23:00.580 --> 23:06.140]  me with them, so firstly, I was wondering why you went with making a separate package
[23:06.140 --> 23:14.740]  rather than sort of trying to build this functionality on top of MPI for Py, would it have been possible
[23:14.820 --> 23:21.180]  to sort of add this, add the feature of making things jit-compilable into MPI for Py, and
[23:21.180 --> 23:26.500]  secondly, I was kind of wondering with the MPI IO thing that you were looking at with
[23:26.500 --> 23:34.060]  Windows, if that requires kind of concurrent file access from separate processes in Windows,
[23:34.060 --> 23:39.300]  is that just a complete, completely a no-go for Windows, because I understand that's something
[23:39.300 --> 23:42.180]  that Windows kernel doesn't support.
[23:42.180 --> 23:43.180]  Thank you.
[23:43.420 --> 23:46.380]  Thanks, let me start from the second one.
[23:46.380 --> 23:54.380]  So here our, well, essentially it's a fun fact that everything else worked for Windows,
[23:54.380 --> 23:59.460]  we do not really target Windows, but it was nice to observe that all works, it's kind
[23:59.460 --> 24:07.340]  of one of these advantages of Python that you code and you don't really need to take
[24:07.340 --> 24:12.420]  too much care about the targeted platforms, because the underlying packages are meant
[24:12.420 --> 24:18.620]  to work on all of them, and here everything works with Microsoft MPI, the only thing that
[24:18.620 --> 24:25.540]  actually was a problem for us was to even install H5py on Windows with MPI support.
[24:25.540 --> 24:32.660]  So we don't really know what's the true bottleneck, but even the documentation of H5py suggests
[24:32.660 --> 24:35.060]  against trying.
[24:35.060 --> 24:41.820]  For the first question, why do we create, why do we develop a separate package instead
[24:41.820 --> 24:45.900]  of adding it on top of MPI 4Py?
[24:45.900 --> 24:55.260]  So I think even on the slide with the story of the package, there was a link to, yeah,
[24:55.260 --> 25:01.700]  there's a link to MPI 4Py issue, the bottom footnote, where we suggested would it be possible
[25:01.700 --> 25:10.540]  to add it, and in relation to the first question, so probably the scope, the goal of MPI 4Py
[25:10.540 --> 25:20.580]  is to provide very high level API for MPI in Python.
[25:20.580 --> 25:26.420]  So with discussing with the developers there, we realized that it's probably not within
[25:26.420 --> 25:34.940]  the scope of a very high level interface, so we started off with just, well, small separate
[25:34.940 --> 25:41.940]  project, but I mean, well, great idea, it could be glued together, as of now we aim for dropping
[25:41.940 --> 25:48.020]  dependency on MPI 4Py, which we now use just for some utility routine, not for the communication
[25:48.020 --> 25:55.700]  or nothing that is used by Namba, and probably that might be an advantage, because you can
[25:56.460 --> 26:02.980]  eventually you should be able to install Namba MPI with very little other dependencies,
[26:02.980 --> 26:08.700]  and Namba MPI is written purely in Python, so installing it, you do not need to have
[26:08.700 --> 26:13.820]  any Python related C-compiled code, and you can do it quite easily.
[26:13.820 --> 26:17.420]  Okay, thank you very much.