[00:00.000 --> 00:09.200] All right, we'll get started with the next talk. [00:09.200 --> 00:12.800] Julia Anon from Paratools is going to talk about testing and validation. [00:12.800 --> 00:13.800] Hello, everyone. [00:13.800 --> 00:14.800] Thanks again for the introduction. [00:14.800 --> 00:15.800] So, yes, I'm Julia Anon. [00:15.800 --> 00:16.800] I'm going to talk to you about testing. [00:16.800 --> 00:17.800] As you may notice, I'm a bit happy to have a mic today. [00:17.800 --> 00:18.800] So, please let me know if I'm becoming unreadable. [00:18.800 --> 00:40.920] So, first of all, a bit of background, back in some years ago, we were actually, we still [00:40.920 --> 00:49.720] had a team developing an MPI runtime, and while developing this runtime, we had the major [00:49.720 --> 00:56.680] stake to develop a validation system to assess our software quality, but also to be able [00:56.680 --> 00:59.440] to compare our implementation to others. [00:59.440 --> 01:07.480] So, like everybody in HPC field at this time, we started to build our own ultra-specific [01:07.480 --> 01:12.840] shell scripts to validate our implementation, because we were considering that our implementation [01:12.840 --> 01:15.720] were too specific to be able to use some mainstream tools. [01:15.720 --> 01:19.680] So, we started with self-scripts, great idea. [01:19.680 --> 01:26.120] The fact is that with the team growing, people working in a separate place, working on multiple [01:26.120 --> 01:33.440] heterogeneous machines, we had a huge issue to make people continue to validate, to use [01:33.440 --> 01:41.240] this validation system, because it was slow, not really efficient, and hard to make it [01:41.240 --> 01:42.240] grow. [01:42.240 --> 01:48.280] Especially about maintenance, when we wanted to add anything from to the validation process, [01:48.280 --> 01:54.160] it was just a nightmare, it was really costly in term of time, but also when our software [01:54.160 --> 02:04.040] was evolving, was growing, just adding, just a little test into this non-requestion based [02:04.040 --> 02:05.040] was a nightmare. [02:05.040 --> 02:12.200] So, we started to consider why not creating a validation system able to take care of [02:12.200 --> 02:17.880] HPC environment, and what it implies, like validating on multiple architecture, multiple [02:17.880 --> 02:27.320] machine by multiple users, with multiple benchmarks, and we just thought about having [02:27.320 --> 02:31.120] a generic tool able to handle all of that at what point. [02:31.120 --> 02:38.480] So, what we want in that scenario, we are here in the case of validating an MPI implementation, [02:38.480 --> 02:45.880] which means having a standard API, okay, so there is a lot of well-known MPI benchmarks [02:45.880 --> 02:50.200] already existing, we don't have to rewrite a whole non-requestion, we have benchmarks, [02:50.200 --> 02:56.640] you have proxy applications, and so on, so how to scale these benchmarks to any runtime [02:56.640 --> 03:02.120] or any project using them to build a proper validation process. [03:02.120 --> 03:08.520] So, what people want, I mean, what users of this validation system want, it's a simple [03:08.520 --> 03:09.520] tool. [03:09.520 --> 03:15.600] You don't want to have really complex Q, GTK, any kind of complex architecture to deploy [03:15.600 --> 03:16.600] to test. [03:16.600 --> 03:22.880] So, what we want is just a command line interface, basically, and really, really, really few [03:22.880 --> 03:29.720] setup, really, really, really few configuration to deploy to have such test working, because [03:29.720 --> 03:36.960] a lot, maybe it's not a generality here, but a lot of people hate tests, basically, so [03:36.960 --> 03:42.560] we have to convince them that testing is good for software quality, something really simple, [03:42.560 --> 03:49.560] but also able to handle any really complex scenario we may find in HPC, running an MPI [03:49.560 --> 03:55.600] application is not that hard, but it may become really, really complex, and we don't want [03:55.600 --> 04:00.960] to have to rewrite our, I use this tool every two years, because a new technology, a new [04:00.960 --> 04:07.480] approach, a new paradigm is implemented, and the validation process should have to be rewriting. [04:07.480 --> 04:16.920] So, from that state, we decided to write a tool to fit our needs, but to be able to be [04:16.920 --> 04:20.120] used by people meeting the same requirements. [04:20.120 --> 04:27.080] Before going further, I would like to tell you that testing is not a brand new field, [04:27.080 --> 04:33.080] and some other projects are tackling such kind of issue until now, and so please take [04:33.080 --> 04:38.080] a look at them, if you think that PCVS does not completely fit your needs, they are really, [04:38.080 --> 04:42.240] really powerful tools in the field. [04:42.240 --> 04:47.520] So today, I'm here to talk to you about PCVS. [04:47.520 --> 04:53.560] It's a tool maintained by the CEA, it's retained in Python, yes, everything is in Python now, [04:53.560 --> 05:03.200] so we are based on Python, it's a simple command line interface, and with few configuration [05:03.200 --> 05:10.720] files, obviously in EML, it's a trend, the design of this framework, the testing framework [05:10.720 --> 05:18.840] is to be, to bring simplicity when writing test logic to users, so we want tests to be [05:18.840 --> 05:28.120] simple to write, easy to port, okay, and these benchmarks, written by the user, we want to [05:28.120 --> 05:34.800] be able to be run in multiple environments, so we don't want to bound a test suite to [05:34.800 --> 05:40.320] a given application, let's consider MPI run time, we don't want to have our Lulesh or [05:40.320 --> 05:48.320] our IMB benchmark bound to a specific MPI implementation to be run on any kind of architecture. [05:48.320 --> 05:55.240] So this is what we call in PCVS, the retargeting approach, the other approach we want to focus [05:55.240 --> 06:02.720] on is the fact that we have heterogeneous test environments, we have benchmarks, how [06:02.720 --> 06:08.080] to scale, automatically scale this benchmark to the actual test environment, consider, [06:08.080 --> 06:12.840] for example, having users wanting to validate the IMB, but they are working on their work [06:12.840 --> 06:18.640] session, we don't want to launch up to, I don't know, 100 MPI processes on their work [06:18.640 --> 06:25.040] session, they're going to be not happy with you squashing their machine, but once this [06:25.040 --> 06:31.520] validation system is run on a real HPC cluster, you want the test suite to automatically scale [06:31.520 --> 06:38.280] to this supercomputer resources without having to rewrite 100 of configuration files, or [06:38.280 --> 06:42.840] even one file is already too much, okay. [06:42.840 --> 06:49.400] So this work is maintained by the CEA, which is a French research administration, and we [06:49.400 --> 06:56.640] are collaborating with us, with them, I'm sorry, to make this tool more generic or not [06:56.640 --> 07:06.520] generic and attempt to make as many users as possible, as many users as possible. [07:06.520 --> 07:14.440] So at Eglance, some feature specificities provide, the idea is to split the test effort [07:14.440 --> 07:23.760] into specific fields, the first one is the specification of test, what a benchmark need [07:23.760 --> 07:29.160] to expose to build the test, so this kind of information is carried by the benchmark, [07:29.160 --> 07:34.320] obviously, how to build, what is the program, what are the parameters, and so on. [07:34.320 --> 07:39.920] And the environment, it's a testing environment, here is carried by the people deploying or [07:39.920 --> 07:47.480] providing the testing resources, so most of the time it's just a team policy to schedule [07:47.480 --> 07:58.480] jobs, or our system admins, right, all of this to pursue two goals, still the retargeting [07:58.480 --> 08:06.880] of tests automatically when the user is calling this benchmark, depending on the compiler and [08:06.880 --> 08:12.560] run time you want to target, and auto scaling test to test environment. [08:12.560 --> 08:21.120] Obviously, as PCVS is a kind of test framework, we add to add some functionality around testing, [08:21.120 --> 08:28.520] like in-place reporting, because most users are running their tests on HPC cluster where [08:28.520 --> 08:33.880] the set of functionality can be restrained, so they don't have access to all their GitLab, [08:33.880 --> 08:43.440] GitHub, Gira and so on stuff, so we add some basic tools to answer these needs, and beyond [08:43.440 --> 08:52.960] a single execution validation run, we added a way to build the history of the validation [08:52.960 --> 09:00.000] for a given application by storing results over time, and allowing the user to run simple [09:00.000 --> 09:07.200] analysis, still in Python, to produce statistics over time. [09:07.200 --> 09:14.280] So quickly, the architecture of PCVS, it's based on files, so Bison CLI, so Bison file [09:14.280 --> 09:22.880] system, it's parsing some user inputs, and it's run through a dedicated connector to [09:22.880 --> 09:31.400] mainly SLURM currently, mainly SLURM, but we are focusing on supporting as a batch manager. [09:31.400 --> 09:36.600] So the idea is that the benchmark express job descriptions and resource requirements and [09:36.600 --> 09:38.960] the environment will provide resources. [09:38.960 --> 09:45.440] Let's consider, for example, a basic component called number of MPI tasks the job is taking, [09:45.440 --> 09:52.160] the job will say, okay, my job is only running up to two processes, or it's only taking, [09:52.160 --> 10:01.520] I don't know, cubic value of MPI processes, people are aware of users of the Lulech proxy [10:01.520 --> 10:07.680] application will know what I mean by cubic, so it will describe constraints, and then [10:07.680 --> 10:14.960] we will have environment configuration files called profile in PCVS, where the admin will [10:14.960 --> 10:21.840] say, okay, in that context, I will have up to 100 nodes, so you will be able to spawn [10:21.840 --> 10:25.520] up to 100 MPI processes to run your application. [10:25.520 --> 10:32.680] Based on that, PCVS will cross this information and will say, okay, I have MPI jobs, I can [10:32.680 --> 10:40.800] run up to 1000 MPI processes based on this specification, why not running the user benchmark [10:40.800 --> 10:45.400] 100 times once for each combination. [10:45.400 --> 10:53.280] PCVS is an as an opt out approach, so it will consider that every combination provided by [10:53.280 --> 10:59.000] the environment will be used to scale your tests, okay, so if your test is not able to [10:59.000 --> 11:07.760] run up to 1000 jobs, it's up to you to specify that you can't reach this limit. [11:07.760 --> 11:13.000] So here is a quick example, I don't know if you can see up there, but we have a really, [11:13.000 --> 11:16.880] really basic environment configuration where you can see that there is what we call an [11:16.880 --> 11:24.800] operator, this is a variadic component, and it can take up to 4 values for the NMPI attributes [11:24.800 --> 11:31.080] and when describing a simple job, we're just saying, okay, my job just consists in running [11:31.080 --> 11:43.480] a program, PCVS will enroll up to 4 common lines to run 4 independent tests to execute. [11:43.480 --> 11:54.080] So PCVS will automatically build your test scenarios based on your specification. [11:54.080 --> 11:59.560] So how to basically write a test, but more specifically a compilation test, there is [11:59.560 --> 12:05.200] a lot of things to customize or you can build your tests, so it looks complicated, but as [12:05.200 --> 12:15.160] you may see on the previous slide, all the keys in that TML are not mandatory, at least [12:15.160 --> 12:22.280] the files one, obviously you have to specify which kind of application you want to compile, [12:22.280 --> 12:32.040] so the framework will try to auto detect your language to select the proper compiler, obviously [12:32.040 --> 12:36.200] you have a manually designed approach also. [12:36.200 --> 12:41.920] If you are not based on compiling source files but already using a well-known build system, [12:41.920 --> 12:47.240] we have also an interface to invoke directly build system to build your framework. [12:47.240 --> 12:54.480] I'm thinking about, for example, Lulesh, which is using a Mac file, DIMB using a Mac file, [12:54.480 --> 12:59.360] or even the Mpitch test suite using a Configure Mac-Macon style approach. [12:59.360 --> 13:02.880] So you have many options, all of them are optional. [13:02.880 --> 13:05.280] What I would like to highlight is a variant. [13:05.280 --> 13:14.160] What a variant, it's a capability from PCVS to expose to job, to benchmarks, a specificity [13:14.160 --> 13:17.680] or requirement this job has to be run. [13:17.680 --> 13:24.200] So in that case, the OpenMP keyword probably means something to everyone here, but it just [13:24.200 --> 13:30.120] you know a token saying that to run this job, the variant, the component OpenMP is required [13:30.120 --> 13:36.200] to be scheduled and in the environment, the user will say, okay, what is my variant OpenMP [13:36.200 --> 13:43.200] in case of GCC, GCC like compiler, it will be dash f OpenMP if it's Intel, it's not [13:43.200 --> 13:44.200] the same option. [13:44.200 --> 13:45.200] You see the ID? [13:45.200 --> 13:52.200] I will add, PCVS will add Flavor depending on what you have specified in your environment. [13:52.200 --> 13:57.040] How to write a run test. [13:57.040 --> 14:01.720] So this is where the component, the iterative component takes place, we didn't port it yet [14:01.720 --> 14:10.520] on CompilationModel because we have issues with the race condition reaching the same [14:10.520 --> 14:15.800] file on the file system, but we are planning to containerize, to encapsulate, to isolate [14:15.800 --> 14:23.360] such model to be able to support also CompilationTest, I'm sorry. [14:23.360 --> 14:30.880] So what as a user have to do to integrate such test in your workflow, just write a PCVS.yml [14:30.880 --> 14:39.240] file, you put it anywhere you want in your pass, I mean your benchmark pass, and it looks [14:39.240 --> 14:45.480] like just a run node and everything below is totally optional. [14:45.480 --> 14:51.640] Here we can see that we restrict, we restrict our add.out program to specific MPI values [14:51.640 --> 14:58.240] because as a tester, I know that my test has this constraint. [14:58.240 --> 15:06.960] You can even create this variadic component for your own application if you want to programmatically [15:06.960 --> 15:15.400] generate a list of scenario that PCVS should integrate to its own process to build multiple [15:15.400 --> 15:16.400] scenarios. [15:16.400 --> 15:22.040] So in that case, with this, I'm sorry, why am I moving in the program node, you'll see [15:22.040 --> 15:29.920] that with this three simple lines, it will be able to build three times the number of [15:29.920 --> 15:36.760] scenarios that you were expected to have initially, right? [15:36.760 --> 15:42.520] And what a test without having a way to express or validate a test, obviously. [15:42.520 --> 15:49.400] So for any test, not only for run, you have a YAML description to say, okay, so I want [15:49.400 --> 15:57.360] my job to exit with this particular return code, having an execution time within this [15:57.360 --> 16:02.120] range matching or not matching this kind of pattern. [16:02.120 --> 16:09.200] Even here is my script, give him the regular output of my test and I will tell you if it's [16:09.200 --> 16:12.840] okay or not. [16:12.840 --> 16:17.920] Okay, so I just write my test, how to run them now. [16:17.920 --> 16:24.040] It's just celibate, so you just have to call PCVS run, but before running my benchmark, [16:24.040 --> 16:29.920] I just have to create a profile to express the resources my environment has. [16:29.920 --> 16:35.520] Obviously, in case of MPI, we provided some templates, some basic templates to initiate [16:35.520 --> 16:37.760] the testing process. [16:37.760 --> 16:42.640] So here, we are just creating a profile named my MPI based just on MPI. [16:42.640 --> 16:50.000] So by quickly running that, I will have a full profile based on MPI running tests for [16:50.000 --> 16:57.360] one, two, three, and four MPI processes, but from that, you can then expand the profile [16:57.360 --> 16:59.600] to fit your needs. [16:59.600 --> 17:04.680] The whole build of PCVS relies on a single directory. [17:04.680 --> 17:12.760] And in that directory, you will find anything required to analyze the results and even rerun [17:12.760 --> 17:17.920] in the same condition the tests for reproducibility. [17:17.920 --> 17:25.200] You can see on the right, repository we provided alongside with PCVS, which is called PCVS benchmarks, [17:25.200 --> 17:36.800] and we attempt to put in that repository many well-known MPI benchmarks, PCVS enabled, right? [17:36.800 --> 17:38.720] So here is a fancy view of PCVS. [17:38.720 --> 17:43.720] Obviously, there is many options when running a validation. [17:43.720 --> 17:49.680] You can have an interactive approach, non-interactive approach, slur enabled, I mean, batch manager [17:49.680 --> 17:58.520] enabled, running inside an allocation, outside an allocation, and once the whole configuration [17:58.520 --> 18:05.280] has been generated, we have commands, especially a PCVS exec, to interact independently, uniquely [18:05.280 --> 18:10.960] with your benchmarks, so for instance, what people are used to do is to run their validation [18:10.960 --> 18:16.840] and after maybe 10 seconds, some failures appear, and they would like, without interrupting [18:16.840 --> 18:19.000] the non-requestion system. [18:19.000 --> 18:25.960] But rerun in an isolated environment, their tests to see why it failed. [18:25.960 --> 18:36.000] So we have some extra commands to rerun this special pattern, okay? [18:36.000 --> 18:43.880] Obviously, I'm going to just pick it up, obviously, everything is not always perfect, and the [18:43.880 --> 18:47.440] static approach of the ML file is not what you need, you would like something more dynamic [18:47.440 --> 18:52.040] because you have some stuff to interpret, to read, to process before knowing what you [18:52.040 --> 18:54.840] want to test, even within a benchmark. [18:54.840 --> 18:56.520] So we have a dynamic approach. [18:56.520 --> 19:03.160] Instead of providing a static ML file, you will provide an executable script, an executable [19:03.160 --> 19:09.360] file, whatever it is, and it will produce by itself the actual ML file. [19:09.360 --> 19:17.320] This way, you can programmatically generate your benchmark suite without having to know [19:17.320 --> 19:22.560] it in advance, without knowing in which environment your non-requestion base will be run. [19:22.560 --> 19:30.520] Let's consider the NAS framework, where within the name of the binary bit, you have the number [19:30.520 --> 19:35.840] of MPI processes, cool. [19:35.840 --> 19:41.840] You know, we have run our tests, but we would like to see what it looks like, okay? [19:41.840 --> 19:44.600] We have a test framework, not just a job runner. [19:44.600 --> 19:55.880] So obviously, we had some tools to report results to the user, spoiler, cannot be compared [19:55.880 --> 19:58.280] with real tools for that, okay? [19:58.280 --> 20:06.040] And the idea is just to offer a way to users to grab their results directly on their machine, [20:06.040 --> 20:07.360] in place, okay? [20:07.360 --> 20:13.840] And essentially, it's just a way to look at tests at a glance to be able to rerun if [20:13.840 --> 20:17.680] necessary in the process. [20:17.680 --> 20:26.360] So as I said, we can isolate and rerun independently jobs, which is pretty convenient when some [20:26.360 --> 20:29.720] failure want to be explored right away. [20:29.720 --> 20:46.800] And in FINE, we are using a web server to report in a web browser directly, offering [20:46.800 --> 20:49.360] more interactivity for your results. [20:49.360 --> 20:53.560] So what it looks like, for example, here, gathered by Label, you can see that there [20:53.560 --> 20:58.600] is some red, so some failures. [20:58.600 --> 20:59.600] Let's dive into it. [20:59.600 --> 21:03.320] You can see that some trouble with MPIO, what a surprise. [21:03.320 --> 21:08.800] And when clicking a job, you'll see the complete log of this trend, so the command line and [21:08.800 --> 21:14.680] the actual, so I truncated, I'm sorry, I truncated the actual error, and you can directly dive [21:14.680 --> 21:21.040] into the error without leaving your actual SSH terminal. [21:21.040 --> 21:26.840] So a quick overview of how to configure your site, so the test environment configuration [21:26.840 --> 21:27.840] part. [21:27.840 --> 21:36.920] This is also AML, you will define directly compilers, compiler and run times, and the [21:36.920 --> 21:42.200] special variadic component here. [21:42.200 --> 21:46.000] It's split in five different modules, why? [21:46.000 --> 21:51.320] Because this whole profile can be split up to five blocks, independent blocks, we can [21:51.320 --> 21:57.360] be distributed over a cluster because it's not always the same teams who are responsible [21:57.360 --> 21:58.880] of this particular block. [21:58.880 --> 22:06.240] Let's consider, for example, the variadic component, it's in charge of the team to build [22:06.240 --> 22:11.920] this list, while for the compiler and run time, it may be in charge of the C side means [22:11.920 --> 22:16.760] of the test environment machine. [22:16.760 --> 22:23.560] And after running a single job, what I would like to see is to have a trend over multiple [22:23.560 --> 22:35.000] run, all my test suite behave, and in PCVS we integrated a way of using to stack multiple [22:35.000 --> 22:44.200] runs over time, and then run analysis on them, to build trends, to have more things than [22:44.200 --> 22:49.400] just a test result, but a test result over time. [22:49.400 --> 23:00.640] So here is an example of what you can do afterwards, running analysis directly on this [23:00.640 --> 23:10.960] storized pass, and this is enabled thanks to, I would like to call that a DSL, but actually [23:10.960 --> 23:19.040] just a Python API to interact with that, and you can build such beautiful graphics to see [23:19.040 --> 23:29.040] over time the rates of success inside your test benchmark. [23:29.040 --> 23:34.000] So finally, just a quick glance at the SPAC plus PCVS. [23:34.000 --> 23:39.240] We are not in SPAC, but we are supporting SPAC, especially to do such things by specifying [23:39.240 --> 23:47.480] a simple package, a simple spec package, we will be able to check any combination of building [23:47.480 --> 23:53.800] this package to see if there is some curiosity into your package recipe. [23:53.800 --> 24:02.160] For future work, we have many things scheduled, and most of the most interesting in the capture [24:02.160 --> 24:09.920] in metrics, the capacity to PCVS to capture directly some metadata to be able to then [24:09.920 --> 24:17.160] run analysis on them, and many other things, but I think I'm running out of time. [24:17.160 --> 24:32.600] Thanks for your attention, I have two questions. [24:32.600 --> 24:37.360] From your configuration file, I assume you already have control of the cluster, at least [24:37.360 --> 24:39.920] you have allocated some nodes or something. [24:39.920 --> 24:45.000] Do you have some step that then allocates and deallocates these resources on the fly [24:45.000 --> 24:46.680] for each one of the tests? [24:46.680 --> 24:53.160] So actually, currently, most of our test scenario has run through an MPI run with slurm enabled [24:53.160 --> 24:59.160] or srun commands directly, so they are taking care of the resource allocations. [24:59.160 --> 25:06.600] Some other users are just running the whole PCVS inside a given allocation, like resource [25:06.600 --> 25:11.560] allocation, just a saloc, for example, and then any test doing srun does not pay the [25:11.560 --> 25:13.840] cost of waiting the actual speed. [25:13.840 --> 25:18.600] If some tests need some type of CPU, the other tests need other type of CPU, then you need [25:18.600 --> 25:24.080] to, and if one of them is unavailable because an other user is using, you have to wait instead [25:24.080 --> 25:25.080] of fail. [25:25.080 --> 25:31.920] Yes, this is something we still haven't had a solution for, would be to be able to put [25:31.920 --> 25:35.520] a job aside while we have the allocation. [25:35.520 --> 25:38.520] Yes, this is something we are currently investigating, absolutely. [25:38.520 --> 25:45.520] Do you have any questions? [25:45.520 --> 25:56.320] Yeah, so one thing that I wanted to ask was kind of for your future work you had mentioned [25:56.320 --> 26:03.120] building out a graphical front end using textualize, I was kind of wondering how much assessment [26:03.120 --> 26:07.120] have you done into that, because I've done some work like trying to build GUIs with textualize [26:07.120 --> 26:13.400] and while I do think that it's very interesting framework and it's great for making textual [26:13.400 --> 26:19.000] GUIs, I think that it still has a bit of a way to come before it can really make a standalone [26:19.000 --> 26:24.480] or comprehensive seal or a textual interface, so I was just wondering what your thoughts [26:24.480 --> 26:25.480] were on that. [26:25.480 --> 26:31.480] I'm not sure, I understand the whole question, but you mean, why did we choose textualize? [26:31.480 --> 26:40.240] Absolutely, we discovered just recently because we were using Rich to highlight the output [26:40.240 --> 26:48.440] of PCVS within the console and we are looking for a solution to present the things graphically [26:48.440 --> 26:57.400] in a terminal and we still are looking for the ideal framework and as Rich is already [26:57.400 --> 27:01.960] as textualize, I'm sorry, it's based on that, we are considering textualize, but if you [27:01.960 --> 27:07.040] have any other offer to propose, I would be happy to discuss with you about that. [27:07.040 --> 27:28.840] Thank you.