Let me do a quick survey. Who has a JavaScript background? Okay, maybe like 10%. Who has a C background? C++? Holy hell. It's like 80% for the people on stream. Who has a Python background? What are you, Paulie Glotz? What's going on? 70% or so. Any other languages? Just scream out. I heard something like, it was something like, oh, but I can't really remember. Does anyone own this book? I found this book on my attic and it was kind of peculiar because it had some arcane cantations in it and it looked like magic, but it certainly had something to do with Rust. And I was really excited. I was really enticed by this book. This is why I want to talk about that book. It was pretty old. There was one section in there which I really liked and it was called the Four Horsemen of Bad Rust Code. This is what this talk is about. Before we get into what the Four Horsemen are, I would like to introduce myself. I'm Matthias. I live in Düsseldorf in Germany. I've been doing Rust since around 2015. I do Rust for a living as a consultant. I did a Rust YouTube channel a long, long time ago called Hello Rust. Only 10 episodes, but well, what can you do? And lately I started a podcast called Rust in Production. If you like what I say in this talk, maybe you also want to subscribe to the podcast later on. That's it for the advertisement, going back to the Four Horsemen. I thought about this title a lot. Why would you talk about Bad Rust Code? I think from my experience as a Rust consultant, I see patterns evolving over time. I see people doing the same things in Rust that they do in other languages. They repeat the same mistakes and I saw that no one really talked about those problems. That is an issue when you come from a different language and you try to learn the rustic way, the idiomatic way to write Rust code. This is what this talk is about. Let me present to you the antagonists. While I do that, try to picture yourself. Imagine who you are and what you think your role would be in this talk. The first horseman is this. Actually, let me show all of them. And the first one is ignorance. What is ignorance? Magical little term. We will get to that in the next slide. And we have excessive abstraction, premature optimization, and omission. Of course, you could add your own personal Rust horseman. And these are just very subjective, but these are the things that I see in the real world. Now that we introduced the antagonists, let's go through their anti-patterns and what they are famous for one by one, starting with ignorance or ignorance. The horseman that is behind this pattern is someone that uses stringy type APIs. You have seen it before. Someone uses a string where they could have used an enum or they don't really embrace pattern matching. And that makes APIs brittle. You are in a situation where if you refactor something, you might run risk of forgetting that you changed something or maybe you make a typo and then your string is incorrect. And so it doesn't represent what you want to represent. They also freely mutate variables. They go and say, yeah, this is state and I can change it. Rust has the mud keyword for this, but they do that liberally across the entire code base, which makes reasoning on a local scope very, very hard. They also use bad or no error handling. We will get to that in a second. They use unwraps a lot and they don't really think about the error conditions of your application. They also have a lack of architecture in their applications. And they use a general prototype style language of writing Rust code. And where do they come from? Usually those are people that were administrators before or they write shell scripts or they come from other languages like scripting languages. And this is what they know. Nothing wrong with that, but they haven't fully embraced what Rust is capable to offer. How do you discover that you belong to this group in the code? Well, if you do things like this, you have highly imperative code. You go through the code and then you tell the program, hey, do this, do that, do this, do that, instead of using, for example, a declarative way of describing what the stage should be. They also use magic return values like minus one or an empty string to represent a certain special value instead of using errors. Everything is a string. Unwrap is used freely. You clone all the things and you use the mod keyword. Why is cloning a bad thing? I don't think it is. But the problem with clone is that you maybe don't buy into the Rust model of ownership and borrowing. And that means that you bring what you learned from the past from other languages to Rust and at some point you run into issues with your architecture which you cannot easily resolve anymore. And this is why clone is kind of a stop sign. It's not a warning sign, but it should make you think for a moment. It's an indicator of structural problems in your code, if you like. Okay. With that out of the way, let's make it a little more practical. How could we maybe put this into practice and improve our code step by step? Imagine you wanted to calculate prices for different cities for a bunch of hotels that you have in these cities. For example, imagine this was a map. This is an actual map, by the way. Africa does not look like this. And also, Jerusalem is not the center of the world. I mean, we can debate about that, but certainly geographically there are some issues with this map. Imagine your input looked something like this. It's a CSV file. You get a hotel name, a city, a date, a room type, and a price. And you go through this file line by line and you try to parse it into something that looks like that. For Brussels, you have a minimum hotel price of 40 bucks, a mean price of 80, and a maximum price of 150. Fun fact, I arrived yesterday not having a hotel room because I thought I booked a hotel, but it was last year. So I was in the upper range here. Thanks, Walshbeng, by the way, for sharing your room with me. Otherwise, they would have been a nightmare. If you wanted to parse the input file and create a result like this, all you have to do is write this code. That's the entire code. Nothing really big going on here. There are some peculiarities, but this is usually what someone would write who would say Rust is not their first language. Maybe they just try to port whatever they had in another language to Rust. This is code that I see them doing. What you do is you read the CSV file, then you create a hash map of cities, then you iterate over each hotel, you try to parse the data by splitting each line, you extract fields from it, you parse the price, and then you update the city. Updating the city happens somewhere in the lower end. At the end of it, you print the mean, the max, and the minimum. That's it. That's the entire code. You know, it's working. Technically, you could run this code and it will produce the result that you expect. Prices for different cities, we're done, right? Unless we think about the bigger picture and the demons and the monsters that are out there out in the ocean, and they can haunt us and bite us. There's dangerous beasts out there, killer animals. I think what you want to do is improve that code a little bit. How can we make this code a little more idiomatic? This is the same code. Now, let's look at some parts that I personally wouldn't want to have. Consider this block. There's some things going on, but overall, it's a very manual way, a very imperative way of going through the list of hotels. We literally have a couple if conditions here. If price is smaller than city data zero and so on, we update the price, yada, yada, yada. There are patterns that make that a little nicer to read in Rust. This is the same code. It's just something very similar, but we kind of manage to shrink it down a little bit. In comparison to what we had before, we get city data and then we use some sort of tuple extraction to get the mean at a minimum and the max. That makes things a little easier. We can suddenly talk about mean instead of city data zero, for example. That's not the major problem with this code. There's unwraps too in here. Well, for a first prototype, that might work fine, but later on, maybe you don't want to have that. What if you cannot open the hotel's CSV file? What if you cannot parse a price? In this case, the entire program just stops. A question of design, but I would say if there's a single line that is invalid, you probably don't want to stop the execution right away. Another problem is that we index into the memory right away. Who tells us that a line has that many entries, five entries? It might have three. It might have zero. Who knows? But if we index into something that doesn't exist, the program will panic and that is kind of a bad thing. The underscores mean that the variables are not used, so we can remove them. We have a little bit of a cleaner structure and a simple way to check that a line is valid would be to just have this manual check in there. I know it's not very sophisticated, but it helps us along the way. Now we check if the hotel data length is five and if it is not, we just skip the entry. Let's look at parsing for a second. How do we want to handle parsing? I said that maybe we don't want to stop the execution when we run into an issue and we can do that in Rust by matching on the parse result. A very simple way to do that would be to say match price dot parse and if we have an okay value, we take it and if we have an error, we don't really care about the error. We just print an error on standard error and then we continue with the rest of the parsing. Looking at the input, one thing we can do as well is apply a similar pattern and introduce a result type. Now we use a box for representing a result type. This is because you don't need anything, any external library to have a result type that has an error type which can be literally anything. So it can be a string, anything that implements error, the error trade. In this case, it's a very simple way to improve your Rust code. It's a good first step. What we do instead now is we say read to string and then we map the error in case we have an error to something that a user could understand and act on. Then yeah, the code is already a little cleaner. We handled a few error cases already and this is something that might pass a first iteration of a review cycle. Now of course there are certain other issues with this code. For example, CSV handling. CSV is tricky. Proper handling of delimiters is very hard. For example, you might have an entry which has semicolons like on the left side here or you have something that has quotes around a semicolon and you probably want to handle that. So a simple string split does not suffice. Same with encodings. On what platform are we operating on? Do we know the encoding right away? Does the CSV file contain headlines or no headlines? And there's many, many caveats like that. If you're interested, there's a talk called stop using CSV. I don't say you should stop using CSV, but I say you should start watching this talk because it's really good. Right. How can we introduce types? I talked about types a lot and Rust is great with types. We should use more of them. Here's a simple way. I already talked about the result type and in the first line we just create an alias for our result and we say it's anything that has a T where T is generic and the error type is of type box dün stet error. And then we can use the result in our code to make it a little easier to read. As well, we introduce a hotel struct and we have a couple fields, just strings and floating points at this point. But this helps us make the code a little more idiomatic already. We will combine those things on the next slides. But first let's look at the CSV parsing. There's a CSV create. I advise you to use it. It's pretty solid. And what you can do is you create a builder and a builder pattern allows you to modify a struct and add members or modify members dynamically. And in this case we decide that our CSV file has no headers and the delimiter is a semi colon. And the way you can use it is like this. You now say for hotel in hotels deserialize. No more strings splitting. And now we match on the hotel because this returns a result. And now we need to make sure that the hotel that we parse is in fact correct. And after the step we don't have to deal with edge cases anymore because we know that the struct is valid. That means it has the required amount of fields and prices are also floats. Which is great makes the code much more readable already. And it was very simple to do so. Now I want to quickly talk about this part. There's a cities hash map. It has a string which is the city name. And then it has three floats which are the mean, the min and the max price. I don't think this is particularly idiomatic. The way it was used before was something like this. And we kind of managed to work our way around it. But a better way I would say would be to introduce a type for this as well. Because if we're talking about prices and pricing seems to be something that is very central to what we do in this application maybe we should have a notion of a price. It's very simple to do that. You just introduce a price type. Now you might be confused why we suddenly don't have a mean anymore. But instead we have a sum in account. And the reason being that when we parse the files we update the sum and later on at the end we can calculate the mean. Which has some mathematical properties which are favorable because now we don't really have, we don't run into rounding issues anymore. This is an aggregation that we can do whenever we want to get kind of a mean on the fly. And at the same time we have a default. Now the default is not really idiomatic too I would say. But the great part about it is that we can later reuse it and make our code a little more readable. In this case we set the min price to the maximum float. But then whenever we introduce a new price it will overwrite the maximum because I guess by definition it's smaller than the maximum or smaller or equal. And same for the max and some in account are kind of set to zero to begin with. And just before we bring it all together here's one more thing that we should do which is have a notion of a display for price. In this case we implement the display trade and we say yeah if ever you want to print a price this is the structure that you should use. The min, the mean and the max. And then this way we can make our code way more readable. Now you can see that instead of using a tuple or floats here we use a price. And when we update the prices we can talk about this object. We can tell the object hey update your min for example. Here we say price.min.min holds a price and we automatically get the min price as well. We update those price fields and yeah we can even introduce a price.add method. I don't show it here but technically why not. We can add a new hold up price. Prices could be added over time. Now that depends on I guess your taste, your flavor of rust. This is the entire code. It's a little longer but you saw all the parts. And now you have something that I would say isn't a workable state. It's not great but we did one thing. We considered rust. We thought the ignorance. We started to embrace the rust type system. We started to lean into ownership and borrowing which are fundamental concepts in rust. We lean into design patterns and we learn how to improve our architecture. And I would also say if you want to improve this part try to learn a different programming paradigm. Rust is not the only language. Try rock or try a functional language like Haskell. It might make you a better rust programmer too. This is how you fight ignorance. Now if you see that none of these horsemen fit to you by the way just think of your colleagues how you would want to introduce them to rust because this is the code you have to review and also probably maintain in the future. So it's time well invested. If you want to learn more about idiomatic rust specifically there is a website. I just put it there. It's an open source repository. It has some resources. This is a rendered version of it. You can sort by difficulty so that's your experience and then you can sort by interactivity if you want to have a workshop or not. For example there are free resources on there and paid resources too. Right let's go on and look at the next horsemen. Excessive abstraction. Everyone in this audience knows someone like that. They try to over engineer solutions because rust leans into that. It allows you to do that. It's a nice language to write abstractions. Everyone likes to do that. But then you add layers of indirection that maybe people don't necessarily understand if they come from a different background. They use trade successively and generics and lifetimes and all of these concepts are great in isolation. The combination of which makes the programs hard to read and understand for newcomers. Now if you find yourself in this camp try to fight this as well. Common symptoms of this are things like this where you have a file builder which takes a t as ref of str and a lifetime of a and this makes sure that you can pass any type and that it has no allocations that are not visible because of the lifetimes. So this might be fast and it might also to some extent be idiomatic but it is something that your colleagues also have to understand. Another thing is I might use this again. Let's make it generic or trades everywhere. And how do you get to that mindset? It's very simple. After you wrote your CSV parser it's natural that you want other parsers too. Of course you want to chase on. Of course you want to read and write into a database. You start thinking that you'll need all of those formats at some point and this is the part that is important at some point. And then you end up with something like this. It's a trade definition for a hotel reader and it has a single method called read and it takes a self that's why it's a method but it also takes a read which implements the read. That means you can pass anything that implements the read trade and it returns a box of iterator of item equals result hotel with a lifetime of A. No allocations except for the box but the iterator itself is a very idiomatic way to say a result of hotel so parsing errors are considered and it's very applicable for all of the reader types that you could possibly want. Let's say you wanted to use that trade and implement it for our hotel reader. Now suddenly we blow up the code to something that is harder to understand or if it is easy for you to understand please reconsider if your abstractions are too much. Maybe you ain't going to need it. Right. So we have a hotel reader and it owns a reader builder and inside of our new method we initialize the CSV hotel reader and we implement hotel reader down here. The single method called read and we say self.reader builder this is the code that we saw before we just put it here this is our CSV parser the initialization of it and then we return a reader.into the serialized hotel map and this is where we map the errors. Right. Does it look great? I don't know depends on someone's nodding. We need to talk but it's certainly nice to use I guess. Now we can say for hotel in hotels.read file. Should hotels know about files? Maybe not. But it's great if you go one step further and you implement iterator on it and now you can say for hotel in hotels. Alright we're getting somewhere from a user's perspective that is really great. But remember we're talking about application code. There's probably code that you earn money with. It's not a library function that is used by thousands of people. It's your simple CSV parser and now we just blew it up into something that is harder to understand. Do you really need this? Well I don't think so. I don't know what this person on the bull does but it certainly looks confusing to me and this is what people think when they see the top signature. I know kind of you wanted to optimize it a bit but at what cost? Right whenever you sit here and you think oh I should implement JSON support and you don't do it for fun. Start thinking if you really need those subscriptions because they can haunt you. Most of the time they don't have no need of it. I don't know what sort of animal this is. Is it a lion cat or something but it's kind of strapped to a cannon and it doesn't look too happy to me. I don't want this. Probably you're not going to need it. As a side note another thing probably you shouldn't do too often are macros. There are traits out there that excessively use macros. What do I mean by macros? Macro rules but also macro derives and these are great but they come at a cost and the cost could be compile times. Just yesterday I talked to Daniel Kerkman who I don't know is he here? He's not here. But thanks for the tip. He has a situation at work where compile times just blow up because of macros and for you it might be easy to write but for other people it might be hard to use. Maybe you want to prefer traits over macros if you can. That was the second horseman fighting excessive abstraction. How can it be done? If you find yourself in this situation keep it simple. Avoid unnecessary complexity. Just think that the person that will maintain the code is not a mass murderer but your best friend. Do you treat friends like this? Watch newcomers use your code. That can be humbling. Ensure that abstractions add value. Yes you can add a layer of abstraction. Does it add value? That's up to you. Decide and don't add introductions that you might need in the future. Add them when you need them. Right. Two off the list we have two more to go. Next one is premature optimization. This is for a lot of people in here because you are C and C++ programmers. I'm looking at you right now because 90% of you raised your hand. I see a lot of people from C and C++ come to Rust with this mindset with these patterns. What are the patterns? They optimize before it's necessary. This is important different from adding too many layers of abstraction. Optimization in this case means profiling is not done but instead you kind of try to outsmart the compiler and you think about performance optimizations way too early before you even need it. Did I even tell you how big that CSV file was in the beginning? How many entries does it have? You don't know. Maybe you should not optimize for it right away. They use complex data structures where simple ones would suffice. For example we saw the hash map with the three tuple elements. These are things that are kind of unravel and then it ends up being a mess not very idiomatic and arguably not even faster. And they also have a tendency to neglect benchmarks. Some red flags. Quotes you might have heard. Without a lifetime this needs to be cloned. Ignore that. If you know that you have a performance problem then you can think about lifetimes. It's fine to clone. Let me help the compiler here. The box is so much overhead. I use B3Map because it's faster than hash map. No need to measure I've got years of experience. They love the term zero cost abstraction or zero copy. Actually it should be zero cost in here. And they hate allocations. Whenever they look at an allocation they feel terrified and they bend over backwards to make that program faster. So whether this is the developer or the compiler and vice versa is up to you. I've been in both situations. They turn a completely simple hotel struct with a couple string fields which are owned yes they live on the heap. Do something that lives on the stack and has a lifetime. And every time you use a hotel you have to carry on the weight of the lifetime. Well does it matter for this one particular case? Probably not. But then you look at other places of the code base and you see that they kind of reverted your changes. They made what you introduced your hard won knowledge about the abstractions and they took them away. Now we start to index into our data structure again. We use string split again. We go backwards. We've been there before. It is super fragile. Again we are going backwards. Now let me play a little game here. Since there are so many C and C++ programs in here I expect you to answer this. What is the bottleneck? This is a very famous medieval game who wanted to be a millionaire. What is the bottleneck? Is it CSV parsing? The DC realization of our entries. Is it string object creation after we DC realized it? We put it into a hotel struct. Is that the bottleneck? Is it floating point operations when you parse the price? Or is it hash map access? Who's for A? Some shy hands? Don't be shy. Who's for B? Okay. Nice. Who's for C? No one. And who's for D? The hash map. Nice. The correct answer is you forgot to run with release. How do you find the actual performance improvements? There's just one correct answer and it is measure. Profile. Use the tools. Cargo flame graph. Cool thing. You will see that in a second. Use benchmarks. There's criteria on Nick still in the room? Nicolet? No. His benchmarking tool. Divan. Pretty great. Use it. Okay. I will give you one example. Let's look at a flame graph of our initial program. The one that a junior developer could write in two hours. What is the bottleneck? There is no bottleneck. This is the setup of our flame graph itself. This is the profiler setup. The code itself is negligible. Negligible, I guess. And why is that? Again, because I didn't tell you how big the fire was, do you think I can come up with thousands of alliterations for hotels? No. So I added 100 entries. There is no bottleneck here. Okay. You might say, but okay. What if the fire grows? Let's add a million entries. Okay. Oh, this is still 120 records. So let's add more. This is a million. You probably ain't going to read it. Let's increase it to 10 million. And indeed, deserialization of the struct takes most of our time. Okay. If we look a little closer, it says, serde deserialize deserialize struct. Okay. We have some memory movement going on. Let's take a baseline. That is our baseline. This is what it takes. 34 seconds. Okay. Now, let's say we kind of want to prove our C and C++ developer wrong. Does this other abstraction that we added for the hotel struct really add that much overhead? No. It's the same. It's like 34 seconds still. Oh, actually, this is the part where we remove the unnecessary fields. But we can go further. We can say, yeah. Here we have a little safer version. We don't index, but we say nth.1. And we have 32 seconds. Now, our bottleneck is append string. String appending. Okay. I think there's something that we can fix. Well, okay. Maybe this is not really that readable. But what we do is we split now by a string. And instead of doing an allocation where we append to our string over and over again, we use this pattern matching here. And this reduces the runtime by 30% already because we save on allocations. Now, if we try to profile this code again, where's the bottleneck now? Read until. Okay. What is that about? We have a lot of memory movement going on. And now we reach a point where the disk becomes the bottleneck. We can use an M-map for this. Now, remember, we are talking about performance and maybe you should not do those optimizations, but prove a C and C++ program were wrong and they are in tuition. And then you see that the bottleneck might be solved elsewhere. Now we are at 30 seconds by changing like four or five lines from the entire program, not the entire thing. We can keep using our abstractions. That's the main point. Here we use an M-map. That's a memory map in the kernel. We save on allocations. 30 seconds. Okay. What if we wanted to do more? It's hard to read, but now we reach the point where in fact the hash map is the bottleneck. And one more step to improve the code would be to split it up into multiple chunks. You can use rayon. You can now finally use a better hash map like a hash map. And we are down to 3.5 seconds. And we did that not by guessing, but by profiling. Now if we want to run a profile, it looks different again. Very different. These are the individual chunks that we managed to split up. We went from 40 seconds to three or four seconds in a couple slides and with few changes. And the point is don't guess, measure. This is the worst part that C developers bring into Rust. They think everything is a performance overhead. And if this challenge, by the way, looked very similar to the one billion row challenge, this is why it was inspired by it. And it is very similar. Read it up. It's kind of fun. We did something similar for hotel data. But the more important point here is how can we fight premature optimization? Measure, don't guess. Focus on algorithms and data structures, not micro-optimizations. More often than not, if you change from a vector to a hash map, this will be way, way more efficient than if you remove your little struct. And if you add lifetimes everywhere. You can get carried away pretty quickly and Rust encourages you to do so, but it also has the tooling to fight it. Be more pragmatic. Focus on readability and maintainability first and foremost. Use profiling tools to make informed decisions. You covered all of that. Your code is idiomatic. It is fast. You didn't overdo it. What is missing? Well, the entire rest. Do you have tests? Do you have documentation? Is your API too large? Does your code lack modularity and encapsulation? These are things that I see from people that are like the lone wolf coders. They know all about Rust, but what they are not really good at is the rest. Explaining the differences to their code maintainers. And writing documentation. Not about the what, but not about the how, but the what. What does your program do? Some things they say. It compiles. My work is done here. The code is documentation. Let's just make it all pop. I'll refactor that later, which never happens. Let's look at that code again. This is our first version junior programmer. Three hours. Okay. How do we test that? It's kind of impossible because this is one big binary, one main. How would we test that? Well, I guess the question is what do we want to test? Well, first off, I would say let's add a test for parsing the entire thing can be a very simple, true test. But if we refactor it such that we have a function that parses cities, now we can start to introduce a path here and do the parsing. And this is where the parsing logic is, by the way. We split it up into a main and the parsed cities. Great. This is our first test. Very crude, but we get to a point where suddenly we can test our changes. We create a temporary directory. We have a path and then we write into a file and that's it. The parsing is done. Great. If we wanted to make it a little better, instead of passing in a path, we pass in something that impels read. Now we don't need to create files like here. Instead, we can have our input as a binary blob. And these are simple things. Add some documentation, add some tests. It's not that hard. And in order to fight a mission, what you need to do is write more documentation, write unit tests, use tools like Clippy and cargo UDAPs, set up CI CD so that you can handle your changes, create releases, use release please, Marco, greetings go out to you, and keep a change lock of what you changed. Right. We're getting towards the end. We have seen the anti patterns. You know them now. I hope that you will be able to, you know, see them in your code. If you want to learn more, there are some other talks that were given here at FOSSTEM and other places. You might want to check them out. Maybe I can put the slides somewhere. And that is all I have to say. Thank you. Thank you.