[00:00.000 --> 00:11.440]  Hello. How's it going, everybody? A lot of people in this room didn't really expect
[00:11.440 --> 00:15.640]  so many. This is wonderful. Thank you for coming to see us. I just want to say that
[00:15.640 --> 00:19.640]  we want to talk today about mutation testing. That's what we're here for. If you like this
[00:19.640 --> 00:27.480]  penguin, does anyone not like this penguin? Just you. Okay. Personal vendetta noted.
[00:27.480 --> 00:33.200]  This is a penguin generated by Dolly. Hopefully, it's friendly enough because this is going
[00:33.200 --> 00:36.640]  to be part of our talk. We're going to see a lot of penguins in this talk. If anyone
[00:36.640 --> 00:40.640]  has a personal objection to penguins, please speak now. Otherwise, if you like penguins,
[00:40.640 --> 00:44.280]  can I get a hand up just to see if we cool with that? Awesome. I've never seen so many
[00:44.280 --> 00:47.880]  people want to put their hands up but not really be sure. I absolutely love the energy
[00:47.880 --> 00:55.240]  in this room. My name's Max. I'm this guy. As you can tell, I'm also this guy. I'm here
[00:55.240 --> 00:59.440]  to talk to you about mutation testing. I work for a company called Vonage and I'm a Python
[00:59.440 --> 01:04.600]  developer advocate there. Now, what that means is that I maintain our Python tooling. I'm
[01:04.600 --> 01:07.760]  here to talk about mutation testing because I've just kind of went through this process
[01:07.760 --> 01:11.760]  myself of understanding all this stuff and applying it to my own work. I want to show
[01:11.760 --> 01:16.160]  you kind of how that went. But with me, not only do I have the tallest person in the room.
[01:16.160 --> 01:22.360]  Stand up straight. Stand up straight. This person is 196 centimeters tall. I'm like 177.
[01:22.360 --> 01:28.040]  I'm not sure. I promise. I'm average in Britain. In this place, right? This person knows a
[01:28.040 --> 01:31.000]  lot more about mutation testing than me. I'm really not the expert here but I just want
[01:31.000 --> 01:36.520]  to say this is Paco. Yes. I'm Paco. I work for Opavailu, a small consultancy company
[01:36.520 --> 01:41.880]  in the Netherlands. I got into mutation testing via my thesis. When I wrote my thesis on test
[01:41.880 --> 01:47.360]  effectiveness, I wanted to learn more about mutation. Also, after that, I got into speaking
[01:47.360 --> 01:51.440]  at conferences and more spreading the word about this. Quite awesome, too. I hope that
[01:51.440 --> 01:57.680]  at the end of the talk, you have another cool tool in your toolbox to write better code.
[01:57.680 --> 02:02.440]  Awesome. If we're cool with that, we do have to do the obligatory. These companies paid
[02:02.440 --> 02:06.680]  for us to come here and paid for our flights and stuff. What my company does, I'll just
[02:06.680 --> 02:11.640]  quickly tell you. We do communications APIs as a service, basically. Things like SMS,
[02:11.640 --> 02:16.000]  like voice calls, like video chats, like two-factor authentication, all via API. That's kind
[02:16.000 --> 02:20.200]  of what we do. That's really just what I want to say. It is relevant because I will show
[02:20.200 --> 02:24.120]  you what I actually applied this to, which was one of our SD case.
[02:24.120 --> 02:27.960]  For me, we don't actually have a product to sell. Also, definitely didn't fly here from
[02:27.960 --> 02:34.200]  the Netherlands just to make that sure. It's just a two-hour car drive. No, so we're here
[02:34.200 --> 02:38.000]  just a consultancy company, and we really like to share knowledge. That's mostly the
[02:38.000 --> 02:41.520]  reason why I'm here to tell you more and teach you more. It's quite simple.
[02:41.520 --> 02:47.120]  Yeah. He doesn't have the funding crush that I do, unfortunately. Luckily, we're all good.
[02:47.120 --> 02:51.240]  There's two of us on this talk. There's two of us here, and actually, there is a third
[02:51.240 --> 02:55.000]  person in this talk. We've seen a hint about this person already, but this person's really
[02:55.000 --> 02:58.680]  the thing that's going to tie this whole talk together, and it's going to get us all feeling
[02:58.680 --> 03:03.280]  good about mutation testing. This person's very important, so say hello to Henry. This
[03:03.280 --> 03:05.520]  is Henry. Look at his little face.
[03:05.520 --> 03:11.920]  Thank you. Hands up if you think Henry's a cute AF penguin.
[03:11.920 --> 03:17.280]  As for it, thank you very much. Yes, I'm glad we agree. I'm glad we're on the same page.
[03:17.280 --> 03:21.200]  Now, just some quick audience participation, because if you can't tell, we're quite big
[03:21.200 --> 03:25.720]  on audience participation. So, quick question here. Who has heard of this stock photo, but
[03:25.720 --> 03:29.600]  more importantly, who's heard of testing? This is just a check to see if we found the
[03:29.600 --> 03:35.040]  room. Thank you very much. Great stuff. Okay. Who's heard about code coverage? A lot of
[03:35.040 --> 03:37.680]  people maybe not everybody, and that's okay if you haven't. We're going to talk about
[03:37.680 --> 03:41.680]  code coverage, so please don't worry if you haven't. But yeah, it's awesome to know that
[03:41.680 --> 03:46.200]  some people have. That's a good starting point too. Okay, final one. I'm going to say other
[03:46.200 --> 03:51.880]  than Vaia knowing about this talk, who's heard of mutation testing? Oh, quite a few. Yeah.
[03:51.880 --> 03:59.520]  And now, quick break. Who actually was already using mutase testing? Ah, nice. There are
[03:59.520 --> 04:03.160]  enough quick wins here, and hopefully you have some good experiences.
[04:03.160 --> 04:07.680]  Yeah. So, really nice to see that people are familiar with the concept, but if you're not,
[04:07.680 --> 04:10.200]  it's also okay, because we're going to go through this like you don't know anything
[04:10.200 --> 04:13.440]  at all, because when I started doing this, you know, a few months ago, I didn't know
[04:13.440 --> 04:16.160]  anything at all, and so I want to take you through that journey as well, and that's what
[04:16.160 --> 04:21.280]  we're going to do. But before that, what I want to do first is give us some background,
[04:21.280 --> 04:23.920]  and what I actually really want to do is pass to Paco, who knows a lot more about this than
[04:23.920 --> 04:28.560]  me, so I'm going to pass to you right now. Yes. This is going to be some improvising.
[04:28.560 --> 04:34.080]  Good work. Good luck. I'm going to drink water with this. I'll feed you. Yeah. Nice. Great.
[04:34.080 --> 04:37.720]  So, yeah, we're first going to talk a bit about testing in general, and then we're going
[04:37.720 --> 04:43.040]  to more specifically talk about unit testing. So, just a quick check. Does anybody know
[04:43.040 --> 04:47.920]  what a unit test is? That's great. I don't have to explain that part. For those who don't
[04:47.920 --> 04:52.040]  know, it's the smallest possible test you can write in your code base, just in one method,
[04:52.040 --> 04:56.280]  and you write one test for it to test the outcome of that method. Now, there are many
[04:56.280 --> 05:01.920]  different reasons why we're writing unit tests, and I think one of them, my favorite or the
[05:01.920 --> 05:06.800]  most used one is for maintenance. We write tests because we want to be confident in the
[05:06.800 --> 05:10.600]  changes we make to our code base. So, whenever we make a small change, we add a new field
[05:10.600 --> 05:15.120]  to some endpoint that we know that we didn't completely break the database integration
[05:15.120 --> 05:21.680]  because it can happen at times. So, yeah, that's very important maintenance regression
[05:21.680 --> 05:28.200]  testing, but there are more reasons. One I like also a lot is tests can actually serve
[05:28.200 --> 05:35.440]  documentation purposes as documentation. You can use tests to describe certain scenarios
[05:35.440 --> 05:39.640]  in your code base that when you have a specific test for that, it already makes clear this
[05:39.640 --> 05:46.240]  is intended behavior. I have an example for this, which is I worked for a company where
[05:46.240 --> 05:52.920]  we had an endpoint that returned warehouses, and these warehouses, just a domain object,
[05:52.920 --> 05:56.320]  had a soft delete. So, there was a flag in there that indicated whether it was deleted
[05:56.320 --> 06:02.960]  or not. At some point, so this endpoint returned both deleted and non-deleted warehouses, and
[06:02.960 --> 06:08.080]  at some point over time, as we were working on it, a new guy came in and looked at it
[06:08.080 --> 06:12.320]  and said, hmm, that's strange. Why are we returning deleted warehouses? Why would you
[06:12.320 --> 06:17.360]  want that? It was a fair question because we also forgot, and there was only one test
[06:17.360 --> 06:21.440]  which tested the success flow, and you can already kind of guess here a bit. So, the
[06:21.440 --> 06:25.960]  success flow in this case meant they only returned non-deleted warehouses in the test.
[06:25.960 --> 06:30.480]  So he made the changes, and we all thought, oh, this makes sense. It looks broken. Of
[06:30.480 --> 06:34.120]  course, they didn't check with product management, the product team deployed it, and then you
[06:34.120 --> 06:39.240]  can guess, of course, this was broken, so we had to revert it. And the whole lesson here
[06:39.240 --> 06:44.040]  was just one test which also included a negative scenario with tests of warehouses that were
[06:44.040 --> 06:48.120]  deleted could have already been a trigger to think like, hey, this behavior is intended,
[06:48.120 --> 06:53.400]  and that's where documentation, so where tests can serve as sort of a documentation purpose.
[06:53.400 --> 06:57.080]  Also very useful in getting to learn a new code base. So, whenever you're on a new code
[06:57.080 --> 07:01.720]  base, you have this very complicated method. A test can help you step through the method
[07:01.720 --> 07:05.760]  to sort of explain what's going on, for example, while debugging it.
[07:05.760 --> 07:11.640]  Now, another one, and this one is here for the consultant. So, who here works as a consultant?
[07:11.640 --> 07:18.480]  Oh, not that many. Wow. Because we're sort of the root of all evil always. We tend to
[07:18.480 --> 07:24.400]  run to the next project, and we don't have to maintain our own code often, not always.
[07:24.400 --> 07:32.200]  So I have this nice quote that's mostly also for us. Keep in mind that you're not doing
[07:32.200 --> 07:39.160]  this only for yourself. I had a colleague who once told me, keep in mind that you always
[07:39.160 --> 07:42.280]  have this point in your development process where you think, okay, should I write a unit
[07:42.280 --> 07:45.960]  test for this? It's going to be a painful unit test. I know that it works. I do really
[07:45.960 --> 07:50.720]  have to document it. We all know how it works. Yeah, sure, we all know how it works, but
[07:50.720 --> 07:56.520]  we also leave the project and then go on and go to another project. We as consultants.
[07:56.520 --> 08:01.680]  And I will speak to myself, what would I do if I would be the next person? So what would
[08:01.680 --> 08:06.480]  I do if I were the next John or Jane Doe working on this project? So tests are not there just
[08:06.480 --> 08:11.040]  for you, but also for the next person working. I would actually like to jump in here, because
[08:11.040 --> 08:15.920]  I've been that person. Thank you. I've been the person who works on a project after someone's
[08:15.920 --> 08:20.440]  left that. And honestly, if you have good documentation or if you don't have that, if you have good
[08:20.440 --> 08:25.120]  testing, thank you, you do your water break. So if you have good testing, it can really
[08:25.120 --> 08:29.600]  help you understand what a project does. And so when I came to a certain project recently,
[08:29.600 --> 08:33.160]  I didn't have necessarily the kind of testing that I would have liked to really document
[08:33.160 --> 08:37.000]  my code that well. And so like, honestly, if I'd had someone like Parker, who actually
[08:37.000 --> 08:40.160]  was a bit more conscientious with what they tested, that would have really helped me get
[08:40.160 --> 08:43.560]  on board with the project quickly. But as it was, this was a real problem for me. And
[08:43.560 --> 08:46.920]  it was something that we want to hopefully avoid other people having to deal with as well.
[08:46.920 --> 08:50.720]  Like, quick question, actually. Has anybody ever taken over a code base that they may be
[08:50.720 --> 08:56.000]  looking at and go, what the heck is this? Okay, so you know what the point of this slide,
[08:56.000 --> 08:58.800]  right? You know why we're saying this. We know this is important. Now, let's stop that
[08:58.800 --> 09:03.400]  from happening to the next generation of very pain developers, right? Let's stop that happening.
[09:03.400 --> 09:09.320]  Yes, so write tests. And so if all these reasons haven't convinced you, there's often maybe
[09:09.320 --> 09:13.960]  a team lead or a boss or somebody else who's telling you to write tests. In most cases,
[09:13.960 --> 09:21.120]  there's always, of course, exceptions. Ah, okay. Wow. This is annoying. So at the end
[09:21.120 --> 09:26.240]  of the day, we're all writing tests, if it's not for ourselves and it's for someone else.
[09:26.240 --> 09:29.600]  And as we're, even though we're now sort of happily all adding tests, we also have to
[09:29.600 --> 09:35.200]  sort of sketch a problem scenario here. And this problem is that as projects evolve and
[09:35.200 --> 09:41.200]  grow, our tests also evolve and grow. But the problem is that we do reflect there a lot
[09:41.200 --> 09:44.640]  and we spend a lot of time on keeping our production code clean and well monitored. We
[09:44.640 --> 09:49.000]  have lots of metrics where on the other hand for tests, what you can see on long living
[09:49.000 --> 09:52.800]  projects is that sometimes you just get tests where nothing more than a blank setup and
[09:52.800 --> 09:59.720]  tear down and some mocking going on because the functionality already moved long ago,
[09:59.720 --> 10:03.120]  which means to the point that test code is often not monitored. Test code is sort of
[10:03.120 --> 10:08.920]  our, the kid that didn't get all the attention that needed. So there is still one metric
[10:08.920 --> 10:15.920]  for testing. What's one, what do you think is the most used metric for test code? Yes,
[10:15.920 --> 10:20.920]  yeah, we have sort of gave it away already in the intro, but yes, yes, code coverage.
[10:20.920 --> 10:25.680]  Code coverage tells you how much of the code is executed when you run the test suite. And
[10:25.680 --> 10:30.120]  I personally really like code coverage because it already helps you write more and better
[10:30.120 --> 10:36.600]  tests. And I want to go through a simple example here to show you how it can already help you.
[10:36.600 --> 10:42.160]  So here we have a submit method. So this is the Python guy. I'm the Java guy. Yeah, he
[10:42.160 --> 10:48.280]  said simple example, but I don't, I don't think so. Yeah. So the context is you are at
[10:48.280 --> 10:54.680]  the conference and you have a service where you can submit proposals. You can only have,
[10:54.680 --> 10:59.080]  you can't have more than three, three or more over proposals and you can submit after the
[10:59.080 --> 11:05.240]  deadline. If you do that, there will be a failure and otherwise you will get success. So quite
[11:05.240 --> 11:09.360]  a simple method with everything as a parameter just to make it easy to explain. So if you
[11:09.360 --> 11:13.680]  would take method coverage, method coverage is the simplest coverage metric we can get
[11:13.680 --> 11:19.480]  which checks is this method coverage as or no, we can add one simple test called a test
[11:19.480 --> 11:25.480]  X which submits a proposal. There are no open proposals, which is good. And we have a deadline
[11:25.480 --> 11:32.160]  that's 999 seconds in the future. So great. Now we can get a step further. We can get
[11:32.160 --> 11:36.760]  into statement coverage and with statement coverage, we check, well, if each statement
[11:36.760 --> 11:42.400]  was executed and now we see, hey, we didn't cover our unhappy flow. So we need to add
[11:42.400 --> 11:47.840]  another test. In this case, we add another test which has five over proposals, which
[11:47.840 --> 11:54.160]  means this check evaluates the true and we have a negative scenario. Now we can even
[11:54.160 --> 11:59.440]  go one step further through, for example, condition coverage. And with condition coverage,
[11:59.440 --> 12:05.320]  we check if each Boolean sub-expression has been evaluated to both true and false because
[12:05.320 --> 12:09.680]  what we don't know now is whether our deadline check is actually working. We just know that
[12:09.680 --> 12:14.640]  it returns false, but we haven't seen it return true yet. So we add one more test now with
[12:14.640 --> 12:21.440]  a deadline that is 999 seconds in the past. And now we have three tests. And this is already
[12:21.440 --> 12:26.080]  why I like code coverage so much because it really helps you write proper tests. Proper,
[12:26.080 --> 12:31.120]  it helps you write tests because let me get on to the good part here. As I said, writes
[12:31.120 --> 12:36.160]  better and more tests. Code coverage is really easy and cheap to measure. In most, I think
[12:36.160 --> 12:40.200]  most of the languages, it's just a matter of instrumenting the code. You run the test
[12:40.200 --> 12:43.920]  suite and you get a nice report out of it that everybody can quickly see and you can
[12:43.920 --> 12:51.280]  quickly see the pain points of where you're lacking in testing. But to get a bit further,
[12:51.280 --> 12:56.640]  so it guarantees, as I mentioned, it shows you what you didn't test. But the only guarantee
[12:56.640 --> 13:00.560]  I'm going to get to the bad parts next is that the only thing that shows you that what
[13:00.560 --> 13:06.160]  you did test didn't crash. It doesn't guarantee anything actually about functionality because
[13:06.160 --> 13:12.080]  code coverage can actually be quite misleading. It doesn't guarantee any test quality. So
[13:12.080 --> 13:16.400]  if I take this method, for example, this is a unit test, a valid unit test, this test
[13:16.400 --> 13:21.160]  generates coverage. It calls a method, but there is no assertion on the result, which
[13:21.160 --> 13:26.240]  makes this test, for example, generate 80% coverage, yet the test actually only guarantees
[13:26.240 --> 13:30.520]  it said the method doesn't crash. It doesn't tell us whether it's returned true, false
[13:30.520 --> 13:36.760]  or anything. And this is the pain point of code coverage, which brings us to the something
[13:36.760 --> 13:40.240]  nice which Max told me about, which is called the good horse law. So can you maybe explain
[13:40.240 --> 13:47.120]  a bit about that? Can I grab your clicker? Can I explain about
[13:47.120 --> 13:55.240]  good horse law? No, sorry. I can't. Just kidding. Okay, so when a metric becomes a target, it
[13:55.240 --> 13:59.240]  seems it's to be a good metric. So quick question, has anyone ever written a unit test just to
[13:59.240 --> 14:03.920]  get coverage up rather than because the test was useful? Come on, let's be honest. This
[14:03.920 --> 14:10.600]  is the safe space. Okay. Microphone, okay. Hello, everybody. Welcome to the live stream.
[14:10.600 --> 14:17.040]  This is our radio announcer voice. Right. So this is something, I'll be honest, I've
[14:17.040 --> 14:20.800]  done this. We now know a lot of people in the room have done this. But what we don't
[14:20.800 --> 14:23.920]  want to have is with code coverage. It's supposed to tell us something about our code. But if
[14:23.920 --> 14:27.840]  instead we turn that into a target, that can really limit what we actually, you know,
[14:27.840 --> 14:31.920]  what the kind of useful tests that we actually create. And that leads to a few quite big questions
[14:31.920 --> 14:37.720]  that we do genuinely care about. So I'll wait for that photo if you. Cool. Sorry, I'm very
[14:37.720 --> 14:41.680]  audience participation. I'm very sorry. So the next question that we ask there is how
[14:41.680 --> 14:45.480]  do we know if our tests are high quality? How do we know if these tests are actually
[14:45.480 --> 14:50.620]  good quality tests? We test them. We test them. Great, great answer. I've got a further
[14:50.620 --> 14:56.000]  follow up question for you. How can we understand what our tests are really doing? Same answer,
[14:56.000 --> 15:04.680]  if anyone, I see a hand. I literally had a code base where I could delete half the tests
[15:04.680 --> 15:09.320]  and nothing changed. And they all, yeah. So I'm in delete or half the kid. Hello.
[15:09.320 --> 15:29.200]  Yes. So just for the live stream, I'll just repeat that because that's a really good
[15:29.200 --> 15:32.240]  point. I won't repeat this wearing, but I do understand and appreciate the, you know,
[15:32.240 --> 15:37.080]  the emotion behind it. If you, if you end up, you know, shipping some code that does,
[15:37.080 --> 15:40.520]  does not do what it's supposed to do. You end up with users getting very angry at you.
[15:40.520 --> 15:44.240]  And yeah, that's a problem, right? That's going to be an issue. And that is a way of
[15:44.240 --> 15:47.680]  finding out, but I guess the real question we're asking here is how do we know if we
[15:47.680 --> 15:52.240]  can trust our tests? That's really the crux of this, this problem, right? And so as it
[15:52.240 --> 15:57.880]  turns out, the very, the very famous Roman poet juvenile, actually in 100 AD after he'd
[15:57.880 --> 16:02.680]  had a few drinks, he was able to summarize this in such a beautiful way. And this was
[16:02.680 --> 16:06.280]  something that maybe wasn't appreciated at the time because, you know, obviously he
[16:06.280 --> 16:10.720]  was talking about mutation testing 2,000 years before it was relevant. But I will mention
[16:10.720 --> 16:15.800]  it here. It's who watches the watchers, right? And this is the question. Who's testing our
[16:15.800 --> 16:20.000]  tests? Who cares about that? Who's, how do we actually gain trustworthiness for our tests?
[16:20.000 --> 16:23.080]  And I see there's, there's people in production who's having bugs. There's people who understand
[16:23.080 --> 16:28.000]  here that this is a really big deal. Luckily, we have a two-word answer for you, which is
[16:28.000 --> 16:36.640]  the reason we're all in this room. Mutation testing. So, spot the odd one out. You might
[16:36.640 --> 16:40.280]  see here, that's, that's Henry. He's having a great time, but maybe he shouldn't be still
[16:40.280 --> 16:44.160]  in a row of pigeons. But more importantly right now, I'll just explain the basic premise
[16:44.160 --> 16:48.040]  and then Paco here will explain in a little more detail how it's actually kind of done.
[16:48.040 --> 16:51.400]  So first of all, mutation testing, this is a really quick summary. What you do is you
[16:51.400 --> 16:54.520]  introduce some faults in your code, so just a few little things that you change. And for
[16:54.520 --> 17:00.280]  each of those little changes, that's a mutant version of your code. Once you've got that,
[17:00.280 --> 17:04.960]  you run your test suite against those mutant versions of your code. And if they fail, awesome,
[17:04.960 --> 17:10.240]  because that means that, awesome, because that means that your, your tests have actually
[17:10.240 --> 17:13.880]  picked up that change. And that's a good thing, right? That's, that's good. We want those tests
[17:13.880 --> 17:18.680]  to fail if our code changes, right? But if they don't fail, that's a bad time because
[17:18.680 --> 17:23.000]  that means those tests didn't test that change. It didn't test for that. And so that's something
[17:23.000 --> 17:27.600]  that could have made it to production. So what mutation testing kind of gives you is
[17:27.600 --> 17:32.080]  a way to evaluate that test quality. But this is very abstract. So let's look at penguins.
[17:32.080 --> 17:36.200]  I like penguins. So Henry here, he's a great example and he's going to, he's going to bring
[17:36.200 --> 17:40.720]  all this home. So I was kind of unfamiliar to the topic. So I kind of created some analogies
[17:40.720 --> 17:44.000]  with penguins that really helped me. So I'll share those with you. So the way I kind of
[17:44.000 --> 17:47.880]  imagine my software is we do lots of stuff with messaging. And so I imagine software
[17:47.880 --> 17:52.240]  that works properly to be like a pigeon or a dove, like a bird that can fly. I've used
[17:52.240 --> 17:56.040]  a dove here because Paco has a deadly fear of pigeons. He's terrified of them.
[17:56.040 --> 17:57.360]  Not fear. Vendetta.
[17:57.360 --> 18:00.960]  He has a personal vendetta against pigeons. Sorry. He doesn't like them. So I've used
[18:00.960 --> 18:04.960]  a dove here. But ideally we want something that I can tie a message to the bird's leg
[18:04.960 --> 18:08.600]  and it can go and deal with that message for me, right? So it can go, it can go do something
[18:08.600 --> 18:14.080]  like that. So one of the key features of penguins is that they're not very good at flying,
[18:14.080 --> 18:18.320]  right? I think we, can we all agree that that's probably not the best. If you want to tie
[18:18.320 --> 18:22.120]  a message to a bird's leg and get it to deliver it, a penguin might not be the bird you choose
[18:22.120 --> 18:26.800]  unless you may be delivering something underwater. So this is the kind of example here where
[18:26.800 --> 18:30.160]  we've got an a bird, but it's not the kind of thing that performs the way we expect
[18:30.160 --> 18:33.720]  it to. And this would cause some serious problems if we try to use this kind of thing in production.
[18:33.720 --> 18:37.520]  If we wanted to send a message via a penguin, we're going to have a tough time, right?
[18:37.520 --> 18:41.240]  So Paco, I'd like you, if possible, to explain this in a way that makes more sense than what
[18:41.240 --> 18:42.240]  I just did.
[18:42.240 --> 18:43.240]  Good luck.
[18:43.240 --> 18:50.720]  We have one mic. It's a bit, it's a bit, yeah. So let's get into the positive mutation testing.
[18:50.720 --> 18:55.240]  The first step of mutation testing, so we've, what Max just taught you is about introducing
[18:55.240 --> 19:00.960]  faults. So you can do, introduce faults manually, but this is a process that's, well, manually,
[19:00.960 --> 19:04.160]  and that means it's a lot of work and it's usually also not that reproducible. You don't
[19:04.160 --> 19:08.200]  want to do it manually. We want to do this in an automated manner. And this is where
[19:08.200 --> 19:12.120]  mutation testing comes in. In the first step of mutation testing, we're going to generate
[19:12.120 --> 19:18.160]  mutants. And each mutant is just a very tiny version of the production code. Mutate testing
[19:18.160 --> 19:22.760]  works with the concept of mutators. And mutators are the ones that are making these very small
[19:22.760 --> 19:30.520]  changes. So what we have in this case, we have a perfectly fine dove, which is the production
[19:30.520 --> 19:36.640]  code. And then at the end of it, we have a mutator, which generates, makes a tiny change,
[19:36.640 --> 19:40.160]  which kind of transforms this into Henry, our penguin who can't fly and we want our
[19:40.160 --> 19:44.800]  software to fly. So this would be a bad thing. So how does it look? Because this is still
[19:44.800 --> 19:50.720]  a bit abstract. And I'm going to give you some examples. This would be an example here.
[19:50.720 --> 19:55.200]  So for the Dutch, and I think for other countries as well, you have to be 17 years or older
[19:55.200 --> 20:00.080]  to apply for driving license. This could be code that's in your code base, which will
[20:00.080 --> 20:06.040]  fly, which is good. Now, the mutant would be the entire code base stays the same. And
[20:06.040 --> 20:09.800]  just this, this little piece change. So here we inverted the logic. This is, of course,
[20:09.800 --> 20:15.120]  the bug. This is something we don't want to manage and get into production. And actually
[20:15.120 --> 20:20.240]  just from this, this single line, we can already generate quite some mutants because we can
[20:20.240 --> 20:26.160]  not only invert the conditional operator, we can also change the conditional boundaries.
[20:26.160 --> 20:31.520]  So this means that we now have age larger than 17, which is a very nice bug that would
[20:31.520 --> 20:36.520]  force us to test the edge cases, the, the famous off by one errors, whether we forgot
[20:36.520 --> 20:42.360]  our equal operation in our conditional check. This, this will help you find that one. But
[20:42.360 --> 20:45.920]  it can also just return always true and false. We can generate quite some mutants for this
[20:45.920 --> 20:49.440]  this and we can do the same for, for example, mathematical operations. We can make each
[20:49.440 --> 20:57.040]  plus into a minus each multiplication into a division, etc. And therefore, we also have
[20:57.040 --> 21:01.760]  the ability to remove statements. So in this case, we have a method that adds a published
[21:01.760 --> 21:06.680]  date to some object. And we can also just remove the whole setter. And now this means
[21:06.680 --> 21:12.040]  that we have a bug in which we don't set this attribute anymore, which is something that,
[21:12.040 --> 21:16.760]  of course, we don't want to make the production. What's important to note here is that we
[21:16.760 --> 21:20.680]  mutate testing is always important that the code actually compiles because we're not testing
[21:20.680 --> 21:25.720]  the compiler. We're testing the code. The compiler is definitely out of scope here. Now at the
[21:25.720 --> 21:31.960]  end of step one, we have a lot of Henry's. We have a lot of mutants. And now Henry is
[21:31.960 --> 21:41.120]  going to try to fly. So he already got his wings ready to try to fly. And now for each
[21:41.120 --> 21:46.040]  Henry, we're going to run the test suite. And if this test suite fails, as Mike's already
[21:46.040 --> 21:49.960]  mentioned, then we have, then we, then it's good because then we expose Henry 40 is, which
[21:49.960 --> 21:55.240]  is just a penguin, something that can't fly. So this is great. The not so happy scenario
[21:55.240 --> 22:00.120]  is where the test passed, which means that Henry made it into production. And as we know,
[22:00.120 --> 22:06.280]  well, assuming that it also got through the PR, of course, we have more than just tests.
[22:06.280 --> 22:10.320]  Is that a problem? Because Henry is not supposed to fly. And then we have a bug into production.
[22:10.320 --> 22:16.000]  So this is something that you don't want. So this is the theory of mutation testing.
[22:16.000 --> 22:19.880]  And now, Max, you can tell a bit more about the frameworks.
[22:19.880 --> 22:27.520]  Sure. It works for me. Alrighty. So first of all, I just want to say I'm so proud of
[22:27.520 --> 22:31.360]  this prompt. I don't know why Dali chose this, but I'm really happy. Like, I think I typed
[22:31.360 --> 22:37.480]  in penguin trying to be a pigeon. And it came up with this. And I'm very happy. Okay. So
[22:37.480 --> 22:42.360]  moving on, yeah, frameworks. So this is going to get a little bit more specific to, you
[22:42.360 --> 22:47.960]  know, to actually implementing this stuff. So anyone here is a Python developer? Heck
[22:47.960 --> 22:53.200]  yeah. All right. Awesome. So I'm going to show you what I did in Python. So as you
[22:53.200 --> 22:57.160]  can see, you know, Parker's Java developer, he'll explain Java in a sec. But I'll just
[22:57.160 --> 23:01.200]  show you the kind of basic concepts, but using using my code and using what I did. So there's
[23:01.200 --> 23:05.600]  two kind of main supported packages that you can use in Python. It's not like, you know,
[23:05.600 --> 23:08.560]  in Java, there's like an enterprise thing you can get in Python, it's very community
[23:08.560 --> 23:12.760]  supported. So you're not, you know, you're not going to get big products. But what we
[23:12.760 --> 23:17.280]  do have are these kind of like nice and supported repose for mutation testing, which have just
[23:17.280 --> 23:21.800]  these packages. So I am not a professional, you know, in this, I'm not a doctor, I'm not
[23:21.800 --> 23:25.720]  a lawyer, I'm not a professional financial advisor. I'm just a person who, you know, has
[23:25.720 --> 23:29.920]  a certain opinion. And so my opinion of those two frameworks I showed you, there's Muttmutt
[23:29.920 --> 23:36.200]  and Cosmic Ray. And personally, I prefer Muttmutt. Because it's easy to get going. Oh, angry,
[23:36.200 --> 23:44.240]  angry face, shaking hairs. You don't like Muttmutt. We will talk later.
[23:44.240 --> 23:52.000]  So if we have time, we'll have a third presenter very shortly. So for now, while I've still
[23:52.000 --> 23:57.000]  got the mic, while I'm still, you know, while I'm still here, we'll talk about Muttmutt.
[23:57.000 --> 24:01.520]  And so this framework is quite simple to use. You know, it's the reason I kind of like it
[24:01.520 --> 24:04.720]  is because it's very much you install it and you run it. You know, there's a bit of config
[24:04.720 --> 24:07.960]  you can do. But really, it's quite simple just to get an idea of your code base and
[24:07.960 --> 24:12.360]  what's going on. So I want to show you this slide. This is just, this is the SDK that
[24:12.360 --> 24:16.240]  I maintain. And I'm showing you this because it's what I've applied my mutation testing
[24:16.240 --> 24:21.880]  to. So it's what I'm showing my examples. But basically, what we do is when we go here,
[24:21.880 --> 24:27.280]  I had this locally, first of all. So I installed Muttmutt with pip install. It's that simple.
[24:27.280 --> 24:30.080]  It's a Python package. It's what we do. If you went to my talk on malware earlier, you
[24:30.080 --> 24:36.040]  know why that's a bad idea, but I did it. So after we do that, we've got Muttmutt run,
[24:36.040 --> 24:40.560]  which just runs those tests for you. So when we do that, I'll show you what my output was.
[24:40.560 --> 24:44.520]  So when I ran this myself, I actually got a whole lot of this output. But really what's
[24:44.520 --> 24:48.280]  important here is that, first of all, it ran my entire test suite. And the reason it ran
[24:48.280 --> 24:52.560]  my entire test suite is just to check how long that's supposed to take and just to make
[24:52.560 --> 24:55.880]  sure everything does work as expected because there's various types of mutants to do with
[24:55.880 --> 25:01.160]  timeouts as well that we might want to consider. After it's done that, what it will do is it
[25:01.160 --> 25:06.520]  will generate mutants based on lines of code in my code base. That's what it will do. And
[25:06.520 --> 25:11.120]  once it's done that, it will run my tests against those. So there's a few different types and
[25:11.120 --> 25:14.920]  it can characterize them like this. So the first type is mutants that we've caught, not
[25:14.920 --> 25:18.720]  killed. We never kill a penguin. We love penguins. We catch them. We've caught them and put them
[25:18.720 --> 25:23.680]  back into the zoo. In this case, in this case, we've managed to say, yep, our test failed.
[25:23.680 --> 25:28.760]  That's great. But it could be the case where the mutant's timed out. So it's taken way
[25:28.760 --> 25:32.240]  too long for this code to run or it's taken enough time that we feel like we're not so
[25:32.240 --> 25:37.320]  great feeling great about that code. Alternatively, we might end up in a situation where the mutant
[25:37.320 --> 25:41.520]  survived and made it through our test code. In that case, it corresponds to a bug that
[25:41.520 --> 25:46.920]  might make it to production. So when I run this on my particular SDK, what I saw was
[25:46.920 --> 25:53.000]  that we checked the stuff. I created 682 mutants, versions of my code, with changes in them.
[25:53.000 --> 26:00.080]  And of that, it managed to catch 512 of those, but it managed to miss 170 of them. Now, if
[26:00.080 --> 26:03.280]  that's a good number or a bad number, we'll talk about later. But what's important now
[26:03.280 --> 26:08.360]  is let's just look at some of those mutants. So first of all, the ones that we actually
[26:08.360 --> 26:12.120]  did catch. Here's a couple of examples. So here's a line where basically we say, here
[26:12.120 --> 26:15.840]  are some valid message channels. So for our messages API, here are some valid messages
[26:15.840 --> 26:20.640]  ways you can send, right? But what's important here is that this basically removed the ability
[26:20.640 --> 26:25.440]  to send an SMS. And so when I tried to test that, it failed, which is what we want to
[26:25.440 --> 26:30.960]  see. Here's another one. Again, this is Python. So if you're a Java dev, don't worry, we'll
[26:30.960 --> 26:36.920]  look after you soon. And here's another one. We've got a decorator here, which basically
[26:36.920 --> 26:41.200]  runs this method. And we can see when we remove that, that will never happen. This is actually
[26:41.200 --> 26:45.320]  through Pydantic, if anyone has used that before. But basically, it means that we're
[26:45.320 --> 26:49.040]  not going to round a number anymore. And so when we test for that, a number doesn't get
[26:49.040 --> 26:54.880]  rounded and we catch that. But that is not really very interesting. That doesn't tell
[26:54.880 --> 26:57.960]  us anything. That tells us about this much, right? It doesn't tell us much at all. And
[26:57.960 --> 27:01.400]  the reason for that is that we kind of know that our tests work. We kind of know that
[27:01.400 --> 27:05.720]  our tests work for that. Thank you very much. I'll do the M&M thing. So we kind of know
[27:05.720 --> 27:09.800]  that our tests work for that. And so what's kind of useful is to see if we do what much
[27:09.800 --> 27:16.400]  show, we can see the mutants that we didn't catch. We can also do what more HTML, which
[27:16.400 --> 27:21.560]  shows us essentially an HTML coverage output as well. So we can see in a list all of the
[27:21.560 --> 27:26.400]  mutants that we didn't catch. So with more and more show, on that code base that I just
[27:26.400 --> 27:30.800]  showed you, we can see the 170 mutants that survived. It shows you the indices of these.
[27:30.800 --> 27:36.600]  And then we can manually specify the ones we want to look at. So here we can see, for
[27:36.600 --> 27:39.880]  example, that we changed the authentication method to fail. And we can see in this case
[27:39.880 --> 27:44.760]  we caught that because we did a test for authentication and it failed, so that's great.
[27:44.760 --> 27:48.680]  But more importantly though is you get this HTML output, which you can then explore. You
[27:48.680 --> 27:53.440]  can explore every method, every sort of module that you have. You can explore all the methods
[27:53.440 --> 28:00.040]  inside of there and which ones were and were caught. And you do that with the HTML command.
[28:00.040 --> 28:03.160]  So to do that, I'll just show you this is a mutant that we did not catch. And I want
[28:03.160 --> 28:06.080]  to show you why we didn't catch it and what it's going to do. And I'll just do that for
[28:06.080 --> 28:11.200]  a few just so you get some context if that's cool. So first of all, what this mutant did
[28:11.200 --> 28:16.040]  was it renamed the logger. Now I think logging is out of scope of my test code, so personally
[28:16.040 --> 28:19.880]  I don't care too much about anything related to logging. So I don't mind if I don't get
[28:19.880 --> 28:26.840]  a pass here. Here's another one. In this case, what we do is we've slightly changed the value
[28:26.840 --> 28:30.880]  of a constant. This is just part of a function signature. And again, we don't care about
[28:30.880 --> 28:36.040]  this that much. It isn't something that I really mind about. What's more important though
[28:36.040 --> 28:40.800]  is this mutant here. Because this is from our client class where we instantiate all
[28:40.800 --> 28:45.920]  of our different API classes. And you can see we actually set voice to non, so we completely
[28:45.920 --> 28:51.640]  remove that instantiation. And our tests are still passing. So the reason that actually
[28:51.640 --> 28:56.520]  still works, our code base still works even though this isn't testing that case, is because
[28:56.520 --> 29:01.440]  our tests actually test the voice API separately. They call it manually. But if our clients
[29:01.440 --> 29:04.560]  are calling it like this, maybe we should have a test for this as well. So this tells
[29:04.560 --> 29:09.320]  me, hey, maybe my test suite does need to be expanded. Does that make sense? I'm seeing
[29:09.320 --> 29:14.040]  some very, very like, yeah, yeah, that makes sense. I like it. Awesome. Okay, so if you
[29:14.040 --> 29:16.680]  are a Python Dev, this isn't the end of the talk, by the way, this is, you know, we've
[29:16.680 --> 29:19.760]  got some more context and we'll show you about CI. But if you are interested, then, you know,
[29:19.760 --> 29:25.360]  feel free to scan this. You've got like four seconds before I move slides. And as I move
[29:25.360 --> 29:30.640]  slides, I'll very slow motion, I'll be passing over this microphone. Because this was just
[29:30.640 --> 29:36.720]  Python, of course. And I think there are more non-Python Devs here. Just not Python.
[29:36.720 --> 29:42.280]  Let's see. We, of course, have more frameworks. I think they're from more languages out there.
[29:42.280 --> 29:46.760]  But I think they're the most important ones that I like personally. And pretty much the
[29:46.760 --> 29:52.240]  only really good one for Java is PyTest. And we also have Striker. And Striker is one that
[29:52.240 --> 29:57.080]  supports quite some languages. It supports JavaScript, C sharp, Scala. Of course, it doesn't
[29:57.080 --> 30:02.880]  do this in one two. Each one has their own dependencies because you can't have one solution
[30:02.880 --> 30:07.800]  for all. And what you particularly like about is that it supports JavaScript. And this brings
[30:07.800 --> 30:13.640]  this kind of back end heavy tool. Testing is usually mostly, I think, in front end can
[30:13.640 --> 30:18.760]  use some law when it comes to testing often. And this also brings the testing frameworks
[30:18.760 --> 30:23.200]  and the testing quality more to the front end. So that's what I really, really like.
[30:23.200 --> 30:28.400]  But we wanted to discuss a bit more. Mike's already sort of introduced it. So what is
[30:28.400 --> 30:35.440]  a good mutation score? We had the good hearts law where we sort of saw that code coverage
[30:35.440 --> 30:41.560]  can also lead to people implementing tests just to improve coverage, not just sort of
[30:41.560 --> 30:45.440]  defeats the purpose. You're doing it just for the metric, not for the actual purpose.
[30:45.440 --> 30:51.920]  And so how does this work with mutation score? Now, first, here's a picture of how PyTest
[30:51.920 --> 30:57.920]  report looks. So not to bash on Python, but much prettier and much clearer. Because now
[30:57.920 --> 31:01.640]  what, particularly what is interesting about this one, it shows you both the line coverage
[31:01.640 --> 31:05.960]  and the mutation coverage. We can ignore the test strain. And this shows us the sweet spots
[31:05.960 --> 31:10.840]  in a report. Because at the end, we have generated a lot of mutants. We have a lot of classes.
[31:10.840 --> 31:13.760]  And we only have very little time. So where are we going to look and investigate this
[31:13.760 --> 31:18.640]  report to see where the strains are? And the one that's the least interesting here is the
[31:18.640 --> 31:23.160]  notification service. The notification service also doesn't have any coverage. And if there
[31:23.160 --> 31:26.280]  is no coverage, then the mutants are also not interesting because you have a bigger problem
[31:26.280 --> 31:31.000]  here, which is you don't have tests at all for this. Then you have a choice. You have
[31:31.000 --> 31:34.280]  the proposal service and proposed service too. Now, the fact that they are named equally
[31:34.280 --> 31:39.480]  is because they're from another example. But proposed service too is the one that has 100%
[31:39.480 --> 31:43.600]  coverage and yet it didn't kill a single mutant. And this is the sweet spot because this means
[31:43.600 --> 31:48.200]  that we have code that is well tested. Or at least there's tests that covering this
[31:48.200 --> 31:52.520]  piece of code, but there is no single bug that was caught. So this deserves some attention
[31:52.520 --> 31:56.600]  because it means that we didn't fully test this. So these are the hotspots where you
[31:56.600 --> 32:00.960]  open a report. The ones with high line coverage and low mutation coverage, those that are
[32:00.960 --> 32:04.760]  the ones you really want to go through. Those are the ones that give you the findings to
[32:04.760 --> 32:08.960]  go to your team and say, hey, see, we need mutation testing because here, just these
[32:08.960 --> 32:14.440]  two classes alone already, it showed me that we need to improve our quality. Now, back
[32:14.440 --> 32:25.440]  to the score. So the example we had, we managed to kill 512 out of 682 mutants, which is about
[32:25.440 --> 32:36.520]  a 75% score. Now, the question is, is this a good score? Yes, yes, the golden answer.
[32:36.520 --> 32:41.960]  It depends. I love that answer. We already saw that 100% doesn't make sense. Things like
[32:41.960 --> 32:46.600]  logging and there are more things like generated code, et cetera, things that you don't necessarily
[32:46.600 --> 32:50.920]  want to test, even though there are mutants generated for it. Now, there are a couple
[32:50.920 --> 32:54.600]  of things you can, of course, do. You can also, depending on the language and the framework
[32:54.600 --> 32:59.600]  you use, you can tweak the mutation testing framework quite a bit. For example, the PyTest
[32:59.600 --> 33:04.600]  version actually, out of the box, already ignores and doesn't mutate any logging lines.
[33:04.600 --> 33:10.400]  And all the big frameworks are known to the tool. So anything that goes to SLF4J, it doesn't
[33:10.400 --> 33:14.840]  mutate it. So it also doesn't appear in your report, which is quite nice. And you can easily
[33:14.840 --> 33:19.440]  add things, like if you have a custom metrics facade somewhere, also typically something
[33:19.440 --> 33:24.960]  you don't want to cover unit tests, you can add that as well. So the thing here is that
[33:24.960 --> 33:28.480]  mutate testing is not really a score you want to achieve. It's more that the report can
[33:28.480 --> 33:32.840]  be interesting to look at and gives you sort of the nice spots. And once you completely
[33:32.840 --> 33:36.160]  set it up nice and you're familiar with the report, you can maybe start looking at the
[33:36.160 --> 33:41.200]  score, but definitely it shouldn't become an 80% goal or something like it was of code
[33:41.200 --> 33:49.800]  coverage. It just goes through the report instead. So now we've sort of discussed all
[33:49.800 --> 33:54.880]  the tools you need. We have discussed the frameworks. We have discussed the technology
[33:54.880 --> 34:03.200]  technology. And now it's time, of course, for you to fly. So we need to, how would you
[34:03.200 --> 34:08.240]  get started on this? And the thing I think that's important here is if you want to start,
[34:08.240 --> 34:13.920]  so you now think, oh, this is a great talk. I want to start with mutate testing. Depending
[34:13.920 --> 34:18.360]  on the size of your project, it might be wise to just start with just a single package.
[34:18.360 --> 34:23.520]  I've done this on projects that are a couple of, say, a thousand lines big. And even though
[34:23.520 --> 34:27.640]  in Max's example, we had 682 mutants, this can also, depending on the kind of code you
[34:27.640 --> 34:31.880]  have there, easily grow to tens of thousands of mutants, which can be quite slow. It can
[34:31.880 --> 34:35.080]  also be that there's something weird in your code base that doesn't really work well with
[34:35.080 --> 34:41.320]  mutate testing or something that's just extremely slow. An example that I had was that we hadn't
[34:41.320 --> 34:48.360]  thought it's good to keep in mind. It's actually just to take a sidestep. The mutate test framework
[34:48.360 --> 34:53.800]  also measures in the beginning for each individual test, which code it covers. So there's a nice
[34:53.800 --> 34:59.520]  graph from code, production code, to the tests. This helps us optimize because if we want
[34:59.520 --> 35:03.440]  to run the entire test suite, all the tests for every single mutant, it's going to take
[35:03.440 --> 35:08.000]  endless. Instead, because we know the coverage, we can also see if we mutate this one line,
[35:08.000 --> 35:13.280]  we know which test is covered. So we only need to execute those few tests. But what
[35:13.280 --> 35:16.880]  if you have tests that actually cover half your code base? So, for example, one of the
[35:16.880 --> 35:21.440]  things you can do in Java, if you're doing things with Spring, is you can actually boot
[35:21.440 --> 35:25.520]  up the entire Spring application and start doing acceptance tests from your unit tests,
[35:25.520 --> 35:30.080]  which is typically also quite, not necessarily the worst thing to do, but you now have a
[35:30.080 --> 35:34.640]  very slow test that does cover half your code base that will be executed for each single
[35:34.640 --> 35:38.360]  mutant. So these are things you want to get rid of. You want to exclude this acceptance
[35:38.360 --> 35:44.120]  test because otherwise you're going to be waiting endlessly. So my point about starting
[35:44.120 --> 35:47.920]  locally and starting small was start just with one package. Start with the utility package
[35:47.920 --> 35:52.080]  to see if it works, see if the report works for you. And then from there, you can expand
[35:52.080 --> 35:58.160]  at more packages and also you can see, oh, now it's taking 10 times as long. Why is this?
[35:58.160 --> 36:03.960]  And you can find the painful packages there. So as I mentioned, you can exclude some tests
[36:03.960 --> 36:07.960]  and also there are often candidates, certain pieces of code you might want to exclude.
[36:07.960 --> 36:13.320]  For example, there's no use in testing generated code, but also it might be that you have certain
[36:13.320 --> 36:19.640]  domain packages that contain just all your domain objects, your pojos, which is just
[36:19.640 --> 36:23.880]  setters and getters, something that you also typically want to exclude in your coverage
[36:23.880 --> 36:31.440]  report. You might also want to exclude this from code mutation, from mutation testing.
[36:31.440 --> 36:36.680]  And now that's done. We need to, so we talked about running it on your machine. We also
[36:36.680 --> 36:43.720]  can do this in the cloud, of course. Thank you. So as you can see, there's a pigeon
[36:43.720 --> 36:47.760]  on the slide and Paco, as we've said, has a personal vendetta, so I've taken over the
[36:47.760 --> 36:53.760]  section. So here we can see that we're going to run off our machine. So why would you want
[36:53.760 --> 36:57.200]  to run off your machine rather than on your machine? Any questions? Any ideas?
[36:57.200 --> 36:59.800]  What happens in the background?
[36:59.800 --> 37:03.440]  Yes. So what happens in the background is what was said there. Any other reason you
[37:03.440 --> 37:08.640]  might want to run non-locally? No. I got a couple. Oh, oh, hand.
[37:08.640 --> 37:09.640]  CI.
[37:09.640 --> 37:12.760]  CI. Yeah, you might want to end your CI system. In fact, that's what we'll be showing you.
[37:12.760 --> 37:19.040]  So foreshadowing. I like it. So yeah, it takes some time. And if you're using a CI
[37:19.040 --> 37:22.960]  system, you get to use those cloud resources. And also what's important is that you can,
[37:22.960 --> 37:26.920]  if you've got code which is maybe dependent on different OSes, might behave differently,
[37:26.920 --> 37:31.400]  you can specify different versions and platforms to run on as well.
[37:31.400 --> 37:35.680]  So stop talking. I hear you cry. Well, I'm afraid this is what we're here for. But unfortunately,
[37:35.680 --> 37:40.200]  I will be keeping talking. But what I will do is show it a bit of an example. So I applied
[37:40.200 --> 37:45.160]  this to my code base, my own code base myself, into my CI system. So you can see here, this
[37:45.160 --> 37:49.640]  is GitHub Actions. And I've got a piece of YAML, essentially. I've got this mutation
[37:49.640 --> 37:55.520]  test.yaml file. And what that does is set up an action for me to use. So this is something
[37:55.520 --> 38:00.760]  that I manually run. And I can do this here. So I manually run that. What it will do is
[38:00.760 --> 38:05.840]  do the mutation test non-locally, and it will produce some HTML output for me to look at.
[38:05.840 --> 38:10.320]  Now that seems, I'll go a little bit into what YAML does, but it seems like something
[38:10.320 --> 38:14.400]  that should be able for everyone to do themselves if they want to. So GitHub Actions, the reason
[38:14.400 --> 38:17.760]  I show that partly is because what we use, but also it's free for open source projects.
[38:17.760 --> 38:21.920]  So, you know, it's been useful for me because I've not had to pay for it. So, you know,
[38:21.920 --> 38:25.920]  just a heads up. See, I'll be showing you this with GitHub Actions really quickly. And
[38:25.920 --> 38:29.040]  I'll show you the YAML, I'll show you what I did. Hopefully by the end of this, the next
[38:29.040 --> 38:33.400]  couple of slides, you will see how easy it is actually to do this and why actually this
[38:33.400 --> 38:37.000]  is all good and maybe you want to try this yourself when you get home.
[38:37.000 --> 38:42.160]  So here's some YAML. First of all, this is our mutation test YAML. It's got one job.
[38:42.160 --> 38:45.600]  It's pretty simple. All we're doing, we're running on Ubuntu. We're running one specific
[38:45.600 --> 38:50.360]  Python version to do this. Depending on what your test base is. Oh, they're running a great
[38:50.360 --> 38:56.240]  time in there. Oh, there is thunder. So basically, we have, yeah, we're testing on one version
[38:56.240 --> 38:59.760]  for me because I just, my code doesn't vary enough between versions and OSes. So for me,
[38:59.760 --> 39:04.320]  it's not relevant to do that. But if we look at this next slide, I'll actually show you
[39:04.320 --> 39:09.040]  the workflow that goes through when I actually run this action. So first of all, we check
[39:09.040 --> 39:14.520]  out the code. Then we set up a version of Python with it. Once we've done that, we actually
[39:14.520 --> 39:18.320]  install our dependencies, including now MutMut as well as our regular dependencies. So now
[39:18.320 --> 39:21.840]  we've got the new mutation testing framework installed here as well on this kind of test
[39:21.840 --> 39:27.280]  runner. Then what we do is we run a mutation test. So we do that with MutMut run. But because
[39:27.280 --> 39:31.240]  we're running in a CI system, we don't want insanely long logs and due to how it's outputted,
[39:31.240 --> 39:34.560]  we want a no-progress flag there just to show that we're not seeing every line of output.
[39:34.560 --> 39:39.360]  We just see the important parts. We also have the CI flag, which is one of my only contributions
[39:39.360 --> 39:45.000]  to watch a open source, but I added that and I'm kind of proud of myself. So that basically
[39:45.000 --> 39:50.000]  means that you get a good, sensible output, like return code when you run in a CI system.
[39:50.000 --> 39:52.840]  Because the default for MutMut is, depending on the type of mutants that we call, it will
[39:52.840 --> 39:57.800]  give you a different exit code that is non-zero. So you kind of need to consider that or to
[39:57.800 --> 40:02.120]  suppress that with some scary, scary bash. That's why I did it first. That's why I wrote
[40:02.120 --> 40:06.800]  the flag. So once we've done that, we save it as HTML and we upload it so that you can
[40:06.800 --> 40:11.400]  access that yourself as well. So that's it. That's the whole piece of YAML. It's 35 lines
[40:11.400 --> 40:14.720]  and that set up the entire mutation test for my suite. So you can see, hopefully, does
[40:14.720 --> 40:19.080]  this seem kind of easy? I think it seems pretty gentle to do, at least in this sort of scope.
[40:19.080 --> 40:23.480]  If you're a Java dev with a 20,000 line project, you might want to be a bit more careful, but
[40:23.480 --> 40:28.480]  if you've got like a Python hobby thing, try it out, right? Try it out. What I would say,
[40:28.480 --> 40:33.000]  there are some more concerns. So first of all, I chose to run this manually when I want
[40:33.000 --> 40:38.240]  to run it. I chose not to run this on push or PR. I chose to run this manually. And the
[40:38.240 --> 40:41.720]  reason for that is that I don't expect my code base to sufficiently change between
[40:41.720 --> 40:45.960]  like small commits. And what I want to do is really not use like mutation test as a,
[40:45.960 --> 40:49.920]  you know, that kind of score, that 75%. I don't want that to be a metric for me that
[40:49.920 --> 40:53.840]  I've just turned into a target. I want it to stay as just a good idea, an indicator
[40:53.840 --> 40:57.680]  of what my tests are doing and what I could be doing better. So for me, I don't want to
[40:57.680 --> 41:01.040]  run every time, partly because it takes a blum in a long time, especially if I'm using
[41:01.040 --> 41:06.360]  multiple versions, which we also have to factor in. So you might want to do that. So I didn't.
[41:06.360 --> 41:09.440]  I just ran on a bun two and that was fine for me. But yeah, depends on what your code
[41:09.440 --> 41:12.680]  is, you might want to run on different platforms, right? So do factor that in and that will
[41:12.680 --> 41:17.240]  help you a lot if you're in a CI system. So the other question there is just should
[41:17.240 --> 41:20.480]  we run on push or PR? My opinion is no. I think there'll be people in this room who disagree
[41:20.480 --> 41:24.040]  with me, maybe say on a PR, you should run that or maybe there's some kind of metric
[41:24.040 --> 41:28.080]  you want to associate with score that you then want to look at in some way. For me,
[41:28.080 --> 41:31.360]  that's not how I use mutation testing. And I think what I want to get out of this is
[41:31.360 --> 41:35.560]  we don't want, we don't want a situation where mutation testing becomes a new target. We've
[41:35.560 --> 41:38.360]  got to get a certain score because then we're just kind of abstracting that problem of code
[41:38.360 --> 41:41.480]  coverage targets. We're just doing that all over again, right? So we're trying to avoid
[41:41.480 --> 41:47.720]  that. So the final question here is one I'll ask of Paco to explain is Paco, do you think
[41:47.720 --> 41:51.120]  I should use mutation testing as, you know, in my role as an audience member right now?
[41:51.120 --> 41:58.320]  What do you reckon? Yes. Well, so with stair already, it depends. There's some things you
[41:58.320 --> 42:03.880]  can ask yourself because needed is a question. So mutation testing is of course definitely
[42:03.880 --> 42:09.000]  not a silver bullet. It's something that the reports take quite some time to go through
[42:09.000 --> 42:16.800]  and of course it's quite computationally expensive to run the process. So the couple of questions
[42:16.800 --> 42:22.640]  that you can ask yourself that are quite obvious are for projects which have a really high
[42:22.640 --> 42:26.920]  quality goal. When people die or when a lot of money is lost or a combination of those
[42:26.920 --> 42:32.440]  two. So just to check, how many of you are working on a project that fits in these three?
[42:32.440 --> 42:41.000]  Okay, then you need this yesterday. Yes. But for the rest of the room, including me, there
[42:41.000 --> 42:45.240]  are some other questions we can ask yourself. And I think one of the important ones is are
[42:45.240 --> 42:48.960]  you using code coverage? Because if you're not using code coverage, let's start with
[42:48.960 --> 42:53.760]  that and let's first get coverage and get to see how many tests you have. Then the next
[42:53.760 --> 42:58.640]  question is, is how much value do you put into this? How much value do you get out of
[42:58.640 --> 43:04.160]  this code coverage? And what I mean with that is, do you make decisions based on it? It's
[43:04.160 --> 43:09.120]  like a definition of done on your sprint or is it with a build fail if there's 80% coverage
[43:09.120 --> 43:14.880]  or also in the case of due diligence, you're selling a company, not something we also would
[43:14.880 --> 43:19.440]  do. But then you would also want to know how well is the software I'm buying or how well
[43:19.440 --> 43:24.440]  is the software I'm working on. So here I would say, if you're using code coverage
[43:24.440 --> 43:28.360]  and you're making decisions based on that code coverage, then yes, you should at least
[43:28.360 --> 43:33.240]  have a look at mutations testing to see what the state is. You don't have to do this always.
[43:33.240 --> 43:37.640]  You don't have to put it in CI just once a year or go home, run it on your computer
[43:37.640 --> 43:40.680]  once just to see what the current state of your team is. Because it can very well be
[43:40.680 --> 43:44.680]  that you're on a high-performing team, which already has their PRs and everything so well
[43:44.680 --> 43:49.520]  and set up that it's not worth the time maybe. Because apparently the mutation testing report
[43:49.520 --> 43:54.920]  might even confirm that, the fact that you killed all the mutants. So that would be great.
[43:54.920 --> 44:00.760]  And there's another question that I like. What's the cost of fixing a bug? And I have
[44:00.760 --> 44:05.280]  two stories for this. My first example is, and that's the first company I worked for.
[44:05.280 --> 44:09.760]  This was an enterprise company that built software that was running on-premise at the
[44:09.760 --> 44:16.800]  customer and the customer was government. And then you're in the line with all these
[44:16.800 --> 44:20.880]  big integrators, which means you have feature freezes and moments where you can actually
[44:20.880 --> 44:24.560]  go to the customer and deploy your software, which is quite expensive, which also means
[44:24.560 --> 44:31.040]  that if you get a bug after this feature freeze or after this upgrade window, you have a serious
[44:31.040 --> 44:34.840]  issue because you need to go to the customer, you need to explain what went wrong. It's
[44:34.840 --> 44:39.880]  a very costly thing, a very costly issue. So here definitely, again, mutated testing
[44:39.880 --> 44:44.040]  can be quite interesting because a lot of money can be involved with the reputation.
[44:44.040 --> 44:49.240]  The other example that I had was more of a Greenfield project, which had more of the
[44:49.240 --> 44:54.320]  startup vibes where it was really of a feel fast and fix fast mentality. So this was a
[44:54.320 --> 45:00.160]  project where rather than focusing on getting our quality monitoring up to speed, we were
[45:00.160 --> 45:07.160]  mostly focusing on making sure that we could very quickly fix bugs as well. It was of course
[45:07.160 --> 45:11.680]  running on-premises in the cloud, so we could control it. And most important goal was there
[45:11.680 --> 45:15.720]  to just click a button and be in production again in 10 minutes and have active monitoring
[45:15.720 --> 45:20.200]  to see if anything goes wrong. Here the cost of fixing a bug is already a lot lower, which
[45:20.200 --> 45:25.800]  means that the reason to consider it might be a bit less, especially if you're again
[45:25.800 --> 45:30.080]  in a, for example, a high-performing team, which are all worked into each other, you
[45:30.080 --> 45:33.560]  know what you're doing, and you know you can trust each other because you're really,
[45:33.560 --> 45:38.200]  you're all professionals. Then maybe it's not worth to also spend half a day going through
[45:38.200 --> 45:41.520]  mutated testing report if you already know what the outcome is probably going to be.
[45:41.520 --> 45:45.640]  Then again, still do it once. These are two things you could consider and want to use it.
[45:45.640 --> 45:50.760]  So those are the things I want to give away with you is don't go into it blindly, just
[45:50.760 --> 45:57.200]  ask yourself, should I really use it? And then, yeah, for the last part, I'd just like
[45:57.200 --> 46:02.720]  to sum up. So I think hopefully if we've gotten here, we've kind of shown you what
[46:02.720 --> 46:06.240]  mutation testing is, why you might want to consider using it, and how you could possibly
[46:06.240 --> 46:10.280]  get going starting with running that, and also why you should. So if we're here, I just
[46:10.280 --> 46:14.520]  want to summarize, first of all, I'm sorry I used this penguin as an evil penguin earlier.
[46:14.520 --> 46:18.040]  It is adorable. I just like the Dali. When I asked it to give it some fake wings, it
[46:18.040 --> 46:23.080]  gave it three. It gave it this extra flipper here. I'm not sure what that was for. But
[46:23.080 --> 46:28.200]  what I'd like to do is just quickly summarize what we talked about today. First of all,
[46:28.200 --> 46:33.760]  mutation testing is a way to test your tests. It helps you to beat the problem where you're
[46:33.760 --> 46:37.400]  using Goodhart's law for coverage, right? So it saves you from just trying to turn
[46:37.400 --> 46:41.400]  coverage into a metric that you don't have as a target, right? You don't want to have
[46:41.400 --> 46:45.320]  code coverage. It's got to be above this threshold or we don't merge. That's not where we want
[46:45.320 --> 46:52.080]  to be. What we want to do is write good tests. So if you are going to do this yourself, an
[46:52.080 --> 46:56.360]  important part is to start small. So start locally on your machine. If you've got a big
[46:56.360 --> 47:00.400]  code base, then what you need to do really is run on a subset of that code base. If you've
[47:00.400 --> 47:04.600]  got a smaller code base like me, you're probably okay. But either way, start locally on your
[47:04.600 --> 47:09.840]  machine. You also want to run if you can. If you want asynchronous reports, if you want
[47:09.840 --> 47:14.800]  to use the resources available on a CI system, you can run mutation testing there. So do
[47:14.800 --> 47:20.560]  consider that if your stuff is in CI. And finally, I just want to say that mutants, hopefully
[47:20.560 --> 47:23.800]  we've demonstrated that mutants are like adorable penguins, right? They're valuable and they
[47:23.800 --> 47:26.920]  are wonderful, right? They're really great to use. They can tell you so much about your
[47:26.920 --> 47:31.920]  code. They're extremely useful. So don't fear them because you should love them. Thank you
[47:31.920 --> 47:32.920]  very much.
[47:32.920 --> 48:02.160]  If there are any questions, comments, objections, love mail, hate mail, anything, shout at me.
[48:02.160 --> 48:06.440]  So the question there was just if we can give some more examples of the kind of range of
[48:06.440 --> 48:10.880]  things that are possible to mutate. So essentially, the short answer is anything that will still
[48:10.880 --> 48:16.240]  make the code run. So in the Java case, the code compiled, in my case, the code run. So
[48:16.240 --> 48:21.120]  in this situation, things like, I'll give you some Python examples. For example, changing
[48:21.120 --> 48:25.120]  a variable from a certain type to another. So you might typecast something. You might,
[48:25.120 --> 48:28.520]  with a mathematical expression, you might add extra terms to that expression. You might
[48:28.520 --> 48:34.480]  change return types, error types. You might set things to non at any given time. You might
[48:34.480 --> 48:38.760]  call something and, yeah, remove parts of it, set things to zero. There's other stuff.
[48:38.760 --> 48:41.280]  Paco, can you think of any mutation testing Java examples?
[48:41.280 --> 48:44.960]  Yeah. So I think that the examples you gave for that, it sort of depends on the mutators
[48:44.960 --> 48:49.000]  you use. So you often, you can also, of each framework, you can also go through the list
[48:49.000 --> 48:53.040]  of mutators to see what kind of mutators are out there. What's good to keep in mind is that
[48:53.040 --> 48:57.960]  it does use some basic, fundamental strategies to determine if it can be mutated. Because
[48:57.960 --> 49:01.880]  for example, if you have a stream, and in this stream, you do some operations, which
[49:01.880 --> 49:07.640]  you could, in theory, cut out, you're still using the return value, which means that the
[49:07.640 --> 49:11.080]  mutation testing framework thinks, okay, let's keep that intact. The same goes for if you're
[49:11.080 --> 49:16.280]  using the spring reactor framework. You could do lots and lots of smart mutations in there,
[49:16.280 --> 49:20.840]  but it's not really there yet. It's really the rudimental things, the conditional logic,
[49:20.840 --> 49:26.120]  the mathematical logic, I think, are the two main things you'll see. And actually also,
[49:26.120 --> 49:34.320]  the count for often the most typical programming errors, I would say, awesome. I mean, anything
[49:34.320 --> 49:37.240]  you'd like to mutate, you know, because I guess a lot of these things are open source,
[49:37.240 --> 49:42.240]  you know, anything that you might be good if it did exist. Any other question, answer?
[49:42.240 --> 49:57.240]  So, two questions. The first one, could you comment on some framework for C and C++? And
[49:57.240 --> 50:08.240]  the second, what do you think about the idea to force developers or to require developers
[50:08.240 --> 50:15.240]  to use that code, which they have actually changed, just to save computational power
[50:15.240 --> 50:18.240]  on the entire machine and the server side is yet.
[50:18.240 --> 50:22.680]  Okay. So, the question there just for the live stream was two things. One is, are there
[50:22.680 --> 50:27.720]  any mutation testing frameworks for C or C++? I will say personally, I don't know. You haven't
[50:27.720 --> 50:31.480]  used C++ since my physics degree, so I couldn't tell you. I don't know if you know anything
[50:31.480 --> 50:32.680]  about that, Parker.
[50:32.680 --> 50:39.800]  I just did a quick Google search. That's all. So, I see there are some frames available
[50:39.800 --> 50:40.800]  for you.
[50:40.800 --> 50:45.800]  There is a project by the University of Luxembourg, which is called FAQAS.
[50:45.800 --> 50:46.800]  FAQAS.
[50:46.800 --> 50:58.280]  And it's not there quite yet, but it's something for C and also for a bit more for C++. Regarding
[50:58.280 --> 51:03.280]  your other question, by the way, so should you do it as a get hook? I'm given that it's,
[51:03.280 --> 51:04.280]  right, that was the question.
[51:04.280 --> 51:11.280]  Yeah, the idea was basically to require developers to run those mutation tests, but not the full
[51:11.280 --> 51:17.280]  set, but only mutation tests which are touching the machine, which are testing the unit tests,
[51:17.280 --> 51:20.280]  which are testing the code, which was modified in this one.
[51:20.280 --> 51:24.000]  Yeah. Yeah. So, actually there are some, depending on the framework, some have features which
[51:24.000 --> 51:28.600]  are incremental reports, so where they can just store the last state, then you can do
[51:28.600 --> 51:33.800]  a diff and use the results from your last execution to not execute all mutants and not
[51:33.800 --> 51:37.480]  generate all mutants, because it knows I only changed these production lines, so I only
[51:37.480 --> 51:42.040]  need to generate mutants for these, and I only changed these tests, so I only need to rerun
[51:42.040 --> 51:47.280]  the tests for these mutants, which can tremendously speed it up. But still using it as a get
[51:47.280 --> 51:52.800]  hook, I'm not sure. You can, by the way, use the same logic in CI as well to use the incremental
[51:52.800 --> 51:55.680]  reporting that saves a bit, despite and also supports a thing.
[51:55.680 --> 52:00.640]  Yeah. So, with what you have, you have caching, so you can cache those tests that you've done
[52:00.640 --> 52:05.480]  already, and if those cases aren't touched, then you're sort of good if that changes to
[52:05.480 --> 52:10.920]  your code don't affect that. So, that is an option. I would say, yeah, thank you. My
[52:10.920 --> 52:16.800]  opinion is, again, that maybe you don't want to explicitly mandate this on every run, and
[52:16.800 --> 52:20.040]  the reason for that is it's kind of like, it can then become kind of a metric that you
[52:20.040 --> 52:23.320]  try and optimize for or something to look at, whereas really, I think the nice way to
[52:23.320 --> 52:27.000]  use it is every now and then is how I would say so. I think if you've got a super critical
[52:27.000 --> 52:31.320]  project where that's really important, you may want to run it like that. For me, I don't
[52:31.320 --> 52:35.360]  need to, but I think that's really up to you, up to you as an implementer, what you want
[52:35.360 --> 52:38.520]  to do, and I think there's definitely a use case to do it in that way if that was important
[52:38.520 --> 52:39.520]  to you.
[52:39.520 --> 52:56.080]  Hand over here, hello. Yes, yes. Short answer is yes. Long answer is, depending on the actual
[52:56.080 --> 52:59.880]  framework, it might be that you add a comment to ignore it. Alternatively, there is a config
[52:59.880 --> 53:04.280]  file set up as well in Python where you can say, only mutate these paths, only do these
[53:04.280 --> 53:14.680]  things. What language do you use? That would be Striker. I would say yes. I haven't looked
[53:14.680 --> 53:18.680]  that much into Striker, but I think they make quite some nice stuff. It's quite generic
[53:18.680 --> 53:23.880]  for all frameworks. Exclude code for mutation, definitely yes. Depending on the framework,
[53:23.880 --> 53:29.240]  some even have nice things like exclude, do not mutate any calls to these classes, which
[53:29.240 --> 53:32.760]  is interesting for the logging, for example. Do not mutate any calls to this logging class,
[53:32.760 --> 53:36.520]  but same you can do for packages, class paths, et cetera.
[53:36.520 --> 53:41.840]  I'd say with Striker as well. One of my colleagues uses Striker because he maintains our.NET SDK,
[53:41.840 --> 53:45.480]  and he's actually also got mutation testing there in Striker. It does seem very perform
[53:45.480 --> 53:49.080]  and it seems like it does have a lot of those features as well. Honestly, if you're interested
[53:49.080 --> 53:51.200]  in TypeScript, I think there is something for you there.
[53:51.200 --> 53:52.200]  Cool.
[53:52.200 --> 53:54.400]  I think it might be free on open source repos. Sorry, another question.
[53:54.400 --> 54:09.720]  Yeah. Are specific mutated runs that are reproducible for debugging purposes? If you have a run
[54:09.720 --> 54:14.720]  and you see something you don't expect, can you reproduce that specific run with a given
[54:14.720 --> 54:15.720]  SC or something?
[54:15.720 --> 54:18.000]  Well, so the question is, how reproducible are the mutants? If you find one, then next
[54:18.000 --> 54:22.640]  run, is it still there? As far as I know, there shouldn't be any randomness in these
[54:22.640 --> 54:26.280]  mutant generations. It just goes over the code. Any condition that it finds that it
[54:26.280 --> 54:31.680]  can mutate, it will mutate. So the next time you run it, the same mutant should be there
[54:31.680 --> 54:35.800]  at the same place. So you could also see whether you killed it the next time. So yes, it's
[54:35.800 --> 54:36.800]  reproducible.
[54:36.800 --> 55:00.800]  I think that's the person who was first, sorry.
[55:00.800 --> 55:14.200]  That's a good question. I'll repeat that one. That's a good one. So the question there
[55:14.200 --> 55:18.000]  was, so mutation testing, we've talked all the big game, we've come up here and been
[55:18.000 --> 55:21.080]  like, hey, look, this is important, right? That's what we've talked about. And the question,
[55:21.080 --> 55:23.680]  which is a very valid question is, hey, if it's so important, why is no one supporting
[55:23.680 --> 55:27.120]  this in Python? Why is this all open source stuff, right? And you know what? I agree.
[55:27.120 --> 55:31.080]  That's a really good question. It's one I asked as well, to be honest. So no, I totally
[55:31.080 --> 55:35.160]  support the question. And the question I'll properly say is, yeah, why aren't employers
[55:35.160 --> 55:40.680]  supporting this? The short answer I think is to do with ROI, unfortunately. And that
[55:40.680 --> 55:45.480]  sucks, honestly, because I would like us to invest more time in certain things. And I
[55:45.480 --> 55:49.880]  think it's just to do with company priorities, right? So I would like to spend more time.
[55:49.880 --> 55:52.560]  Honestly, I had quite a lot of fun adding the one feature I did get to add. I'd quite
[55:52.560 --> 55:57.920]  like to do some more. But again, I've got this API to implement, so do I have time? Well,
[55:57.920 --> 56:01.800]  no one's funding me to do it. So unfortunately, it really is like, unless there's an obvious
[56:01.800 --> 56:04.680]  ROI, this just seems to be the way things go. Unfortunately, that's the way we've kind
[56:04.680 --> 56:10.400]  of structured our platforms and so on. So I gave a talk earlier on a PyPy and malware.
[56:10.400 --> 56:14.600]  And there was actually, the reason that that kind of is so prevalent and so possible on
[56:14.600 --> 56:21.040]  PyPy is because PyPy haven't really implemented many ways to actually protect against malware
[56:21.040 --> 56:24.520]  being uploaded. So currently, I've uploaded some malware to PyPy that you can get yourself.
[56:24.520 --> 56:29.560]  And actually, the reason that they, it's not real malware to be clear, it's a rick roll.
[56:29.560 --> 56:35.240]  But you saw that. But basically, what I'm trying to say here is that that project kind
[56:35.240 --> 56:38.840]  of didn't really get off the ground in terms of protecting users just because I think originally
[56:38.840 --> 56:42.560]  Facebook were funding it and they stopped funding and that just didn't then continue.
[56:42.560 --> 56:45.440]  So unfortunately, yeah, this is just kind of the way that things are in open source right
[56:45.440 --> 56:49.120]  now. And yeah, I do feel your pain. I do understand. But that's all I can really say. I'm afraid.
[56:49.120 --> 56:54.400]  Yeah, to quickly add to this, by the way, a Striker, for example, is actually funded,
[56:54.400 --> 56:59.480]  is backed by a company who, for example, let's work, interns work on it as well. So some
[56:59.480 --> 57:03.120]  frameworks actually are backed and there are people already investing it. So it's not always
[57:03.120 --> 57:04.920]  bad. But sorry, Nick, let's go to that side.
[57:04.920 --> 57:09.920]  So you showed some HTML reports for the results of the mutual test.
[57:09.920 --> 57:10.920]  Yes.
[57:10.920 --> 57:15.920]  We all know all managers and PDT teams love your KPIs. So I'm wondering, is there any
[57:15.920 --> 57:22.880]  integration or plugins to export the mutual test results in Sonar Cloud or other platforms?
[57:22.880 --> 57:26.720]  That's a really good question. So I'll answer quickly for Python and then I'll pass it over.
[57:26.720 --> 57:31.320]  Because in Python, the answer is quite short. The answer is unfortunately no. So the maintainers,
[57:31.320 --> 57:35.880]  the maintainer movement is not really a big fan of the CI system stuff and the report stuff.
[57:35.880 --> 57:38.960]  I think it's like, I think the premise there is, you know, I like running this locally
[57:38.960 --> 57:43.040]  and, you know, that's fair. And that is really how you can get started and get an idea. So
[57:43.040 --> 57:46.720]  in Python, unfortunately, the answer is no. But I think that Paco might have a more positive
[57:46.720 --> 57:47.720]  answer for you.
[57:47.720 --> 57:51.480]  Yeah. So let's also ask the, you were the maintainer of the other framework. So how
[57:51.480 --> 57:54.480]  does it go for the other part of the framework?
[57:54.480 --> 58:03.480]  So, okay. So I talked about not having that facility, that feature in Cosmic Ray. Is that
[58:03.480 --> 58:13.480]  a bit un-maintained? I don't want to say names, but it is a very, very large, 450 maybe vendor
[58:13.480 --> 58:18.480]  that uses it. And we asked them, can you fund development? They said, you know, no. And
[58:18.480 --> 58:19.480]  yeah.
[58:19.480 --> 58:23.480]  So they have shown this around at large events, like in front of thousands, thousands of people.
[58:23.480 --> 58:28.480]  But yeah. They're like, okay, we keep all the data stores there for whatever we find
[58:28.480 --> 58:29.480]  as it is.
[58:29.480 --> 58:32.480]  Yeah. So for the Python frameworks.
[58:32.480 --> 58:33.480]  Yeah.
[58:33.480 --> 58:34.480]  Yeah.
[58:34.480 --> 58:35.480]  Yeah.
[58:35.480 --> 58:36.480]  Yeah.
[58:36.480 --> 58:47.480]  So for the Python frameworks, there's not really CI plug-in support. I do know that, for
[58:47.480 --> 58:52.680]  example, for PyTest, there is support for Jenkins and Sonar. And I'm not sure about Stryker,
[58:52.680 --> 58:57.880]  but I know it's there. And usually these things are relatively easy to build yourself here
[58:57.880 --> 59:02.280]  as well, because all you have to do is, if there is a report in some JSON file, you can
[59:02.280 --> 59:06.680]  quite easily parse it and make a nice HTML form about this. Because again, they're all
[59:06.680 --> 59:10.280]  open for contributions. Do we have time for one last?
[59:10.280 --> 59:12.280]  I want to just add to that a little bit.
[59:12.280 --> 59:13.280]  Okay.
[59:13.280 --> 59:17.840]  Okay. Really quickly. First of all, with your question, yeah, I, when I originally implemented
[59:17.840 --> 59:21.120]  my mutt mutt thing, I did do it on PR. And in that case, I got a, you know, an action
[59:21.120 --> 59:24.720]  that would comment my coverage in a nice metricy way. And so you can, it's quite simple to
[59:24.720 --> 59:25.720]  do.
[59:25.720 --> 59:29.120]  So about, about cosmic, very first of all, that sucks. And I'm sorry. That's, that's
[59:29.120 --> 59:34.360]  blumming awful. Like, yeah, sadly, it does seem that a lot of, a lot of what we've kind
[59:34.360 --> 59:36.920]  of been discussing on the side of the room is just like, man, it would be good if some,
[59:36.920 --> 59:39.600]  you know, we all agree this is important, right? And it's useful for a lot of things.
[59:39.600 --> 59:43.600]  It'd be great if someone funded it. So I think, unfortunately with Python, that is the state
[59:43.600 --> 59:49.120]  of play. And it does suck. But yes, I, I get you. Any other questions? Finally, I think
[59:49.120 --> 59:50.120]  one, yes, hello.
[59:50.120 --> 59:56.800]  Can you write custom mutation to mutate your code in a custom logic?
[59:56.800 --> 01:00:02.480]  That's a really good question. So, sorry. That's, I will now repeat your really good
[01:00:02.480 --> 01:00:08.040]  question. The question was, so the question was, if I, if I have a certain type of mutant
[01:00:08.040 --> 01:00:14.080]  that I want to make, can I do that? So I would say with the stuff that I used in Python,
[01:00:14.080 --> 01:00:18.520]  the answer is you, you'd need to actually, you know, use the version you've downloaded,
[01:00:18.520 --> 01:00:21.720]  edit it yourself and add that stuff. So sadly, there's not an easy customizable way. That
[01:00:21.720 --> 01:00:25.320]  will be an awesome enhancement, though, that I would, I would like to see, you know, that
[01:00:25.320 --> 01:00:28.040]  would be cool. In other platforms, Paco, any other?
[01:00:28.040 --> 01:00:32.280]  I do know that I think Python did have some extension points. So it really depends. I
[01:00:32.280 --> 01:00:36.760]  know that the company I work for currently called Picnic, they're also working on extending
[01:00:36.760 --> 01:00:42.960]  it, for example, for reactive code. So there are some extension points often. So in short,
[01:00:42.960 --> 01:00:47.440]  it depends on the framework and how easy it is.
[01:00:47.440 --> 01:00:50.960]  Are we, we're done. Okay. We're at time. Thank you so much. This has been a really nice discussion
[01:00:50.960 --> 01:00:51.960]  as well. So thank you for sharing this.
[01:00:51.960 --> 01:00:55.960]  Thank you.