Let me do a quick survey. Who has a JavaScript background? Okay, maybe like 10%. Who has
a C background? C++? Holy hell. It's like 80% for the people on stream. Who has a Python
background? What are you, Paulie Glotz? What's going on? 70% or so. Any other languages?
Just scream out. I heard something like, it was something like, oh, but I can't really
remember. Does anyone own this book? I found this book on my attic and it was kind of peculiar
because it had some arcane cantations in it and it looked like magic, but it certainly
had something to do with Rust. And I was really excited. I was really enticed by this book.
This is why I want to talk about that book. It was pretty old. There was one section in
there which I really liked and it was called the Four Horsemen of Bad Rust Code. This is
what this talk is about. Before we get into what the Four Horsemen are, I would like to
introduce myself. I'm Matthias. I live in Düsseldorf in Germany. I've been doing Rust
since around 2015. I do Rust for a living as a consultant. I did a Rust YouTube channel
a long, long time ago called Hello Rust. Only 10 episodes, but well, what can you do? And
lately I started a podcast called Rust in Production. If you like what I say in this
talk, maybe you also want to subscribe to the podcast later on. That's it for the advertisement,
going back to the Four Horsemen. I thought about this title a lot. Why would you talk
about Bad Rust Code? I think from my experience as a Rust consultant, I see patterns evolving
over time. I see people doing the same things in Rust that they do in other languages. They
repeat the same mistakes and I saw that no one really talked about those problems. That
is an issue when you come from a different language and you try to learn the rustic way,
the idiomatic way to write Rust code. This is what this talk is about. Let me present to you
the antagonists. While I do that, try to picture yourself. Imagine who you are and what you think
your role would be in this talk. The first horseman is this. Actually, let me show all
of them. And the first one is ignorance. What is ignorance? Magical little term. We will
get to that in the next slide. And we have excessive abstraction, premature optimization,
and omission. Of course, you could add your own personal Rust horseman. And these are just
very subjective, but these are the things that I see in the real world. Now that we
introduced the antagonists, let's go through their anti-patterns and what they are famous
for one by one, starting with ignorance or ignorance. The horseman that is behind this
pattern is someone that uses stringy type APIs. You have seen it before. Someone uses a string
where they could have used an enum or they don't really embrace pattern matching. And
that makes APIs brittle. You are in a situation where if you refactor something, you might
run risk of forgetting that you changed something or maybe you make a typo and then your string
is incorrect. And so it doesn't represent what you want to represent. They also freely
mutate variables. They go and say, yeah, this is state and I can change it. Rust has the
mud keyword for this, but they do that liberally across the entire code base, which makes reasoning
on a local scope very, very hard. They also use bad or no error handling. We will get
to that in a second. They use unwraps a lot and they don't really think about the error
conditions of your application. They also have a lack of architecture in their applications.
And they use a general prototype style language of writing Rust code. And where do they come
from? Usually those are people that were administrators before or they write shell scripts or they
come from other languages like scripting languages. And this is what they know. Nothing wrong
with that, but they haven't fully embraced what Rust is capable to offer. How do you discover
that you belong to this group in the code? Well, if you do things like this, you have
highly imperative code. You go through the code and then you tell the program, hey, do
this, do that, do this, do that, instead of using, for example, a declarative way of describing
what the stage should be. They also use magic return values like minus one or an empty string
to represent a certain special value instead of using errors. Everything is a string. Unwrap
is used freely. You clone all the things and you use the mod keyword. Why is cloning a bad
thing? I don't think it is. But the problem with clone is that you maybe don't buy into
the Rust model of ownership and borrowing. And that means that you bring what you learned
from the past from other languages to Rust and at some point you run into issues with
your architecture which you cannot easily resolve anymore. And this is why clone is kind
of a stop sign. It's not a warning sign, but it should make you think for a moment.
It's an indicator of structural problems in your code, if you like.
Okay. With that out of the way, let's make it a little more practical. How could we maybe
put this into practice and improve our code step by step? Imagine you wanted to calculate
prices for different cities for a bunch of hotels that you have in these cities. For
example, imagine this was a map. This is an actual map, by the way. Africa does not look
like this. And also, Jerusalem is not the center of the world. I mean, we can debate
about that, but certainly geographically there are some issues with this map. Imagine your
input looked something like this. It's a CSV file. You get a hotel name, a city, a date,
a room type, and a price. And you go through this file line by line and you try to parse
it into something that looks like that. For Brussels, you have a minimum hotel price of
40 bucks, a mean price of 80, and a maximum price of 150. Fun fact, I arrived yesterday
not having a hotel room because I thought I booked a hotel, but it was last year. So
I was in the upper range here. Thanks, Walshbeng, by the way, for sharing your room with me.
Otherwise, they would have been a nightmare. If you wanted to parse the input file and
create a result like this, all you have to do is write this code. That's the entire code.
Nothing really big going on here. There are some peculiarities, but this is usually what
someone would write who would say Rust is not their first language. Maybe they just
try to port whatever they had in another language to Rust. This is code that I see them doing.
What you do is you read the CSV file, then you create a hash map of cities, then you iterate
over each hotel, you try to parse the data by splitting each line, you extract fields
from it, you parse the price, and then you update the city. Updating the city happens
somewhere in the lower end. At the end of it, you print the mean, the max, and the minimum.
That's it. That's the entire code. You know, it's working. Technically, you could run this
code and it will produce the result that you expect. Prices for different cities, we're
done, right? Unless we think about the bigger picture and the demons and the monsters that
are out there out in the ocean, and they can haunt us and bite us. There's dangerous beasts
out there, killer animals. I think what you want to do is improve that code a little bit.
How can we make this code a little more idiomatic? This is the same code. Now, let's look at
some parts that I personally wouldn't want to have. Consider this block. There's some
things going on, but overall, it's a very manual way, a very imperative way of going
through the list of hotels. We literally have a couple if conditions here. If price is smaller
than city data zero and so on, we update the price, yada, yada, yada. There are patterns
that make that a little nicer to read in Rust. This is the same code. It's just something
very similar, but we kind of manage to shrink it down a little bit. In comparison to what
we had before, we get city data and then we use some sort of tuple extraction to get the
mean at a minimum and the max. That makes things a little easier. We can suddenly talk
about mean instead of city data zero, for example. That's not the major problem with
this code. There's unwraps too in here. Well, for a first prototype, that might work fine,
but later on, maybe you don't want to have that. What if you cannot open the hotel's
CSV file? What if you cannot parse a price? In this case, the entire program just stops.
A question of design, but I would say if there's a single line that is invalid, you probably
don't want to stop the execution right away. Another problem is that we index into the
memory right away. Who tells us that a line has that many entries, five entries? It might
have three. It might have zero. Who knows? But if we index into something that doesn't
exist, the program will panic and that is kind of a bad thing. The underscores mean
that the variables are not used, so we can remove them. We have a little bit of a cleaner
structure and a simple way to check that a line is valid would be to just have this manual
check in there. I know it's not very sophisticated, but it helps us along the way. Now we check
if the hotel data length is five and if it is not, we just skip the entry. Let's look
at parsing for a second. How do we want to handle parsing? I said that maybe we don't
want to stop the execution when we run into an issue and we can do that in Rust by matching
on the parse result. A very simple way to do that would be to say match price dot parse
and if we have an okay value, we take it and if we have an error, we don't really care
about the error. We just print an error on standard error and then we continue with the
rest of the parsing. Looking at the input, one thing we can do as well is apply a similar
pattern and introduce a result type. Now we use a box for representing a result type.
This is because you don't need anything, any external library to have a result type that
has an error type which can be literally anything. So it can be a string, anything that implements
error, the error trade. In this case, it's a very simple way to improve your Rust code.
It's a good first step. What we do instead now is we say read to string and then we map
the error in case we have an error to something that a user could understand and act on. Then
yeah, the code is already a little cleaner. We handled a few error cases already and this
is something that might pass a first iteration of a review cycle. Now of course there are
certain other issues with this code. For example, CSV handling. CSV is tricky. Proper handling
of delimiters is very hard. For example, you might have an entry which has semicolons like
on the left side here or you have something that has quotes around a semicolon and you
probably want to handle that. So a simple string split does not suffice. Same with encodings.
On what platform are we operating on? Do we know the encoding right away? Does the CSV
file contain headlines or no headlines? And there's many, many caveats like that. If you're
interested, there's a talk called stop using CSV. I don't say you should stop using CSV,
but I say you should start watching this talk because it's really good. Right. How can
we introduce types? I talked about types a lot and Rust is great with types. We should
use more of them. Here's a simple way. I already talked about the result type and in the first
line we just create an alias for our result and we say it's anything that has a T where
T is generic and the error type is of type box dün stet error. And then we can use the
result in our code to make it a little easier to read. As well, we introduce a hotel struct
and we have a couple fields, just strings and floating points at this point. But this
helps us make the code a little more idiomatic already. We will combine those things on the
next slides. But first let's look at the CSV parsing. There's a CSV create. I advise you
to use it. It's pretty solid. And what you can do is you create a builder and a builder
pattern allows you to modify a struct and add members or modify members dynamically.
And in this case we decide that our CSV file has no headers and the delimiter is a semi
colon. And the way you can use it is like this. You now say for hotel in hotels deserialize.
No more strings splitting. And now we match on the hotel because this returns a result.
And now we need to make sure that the hotel that we parse is in fact correct. And after
the step we don't have to deal with edge cases anymore because we know that the struct is
valid. That means it has the required amount of fields and prices are also floats. Which
is great makes the code much more readable already. And it was very simple to do so.
Now I want to quickly talk about this part. There's a cities hash map. It has a string
which is the city name. And then it has three floats which are the mean, the min and the
max price. I don't think this is particularly idiomatic. The way it was used before was
something like this. And we kind of managed to work our way around it. But a better way
I would say would be to introduce a type for this as well. Because if we're talking about
prices and pricing seems to be something that is very central to what we do in this application
maybe we should have a notion of a price. It's very simple to do that. You just introduce
a price type. Now you might be confused why we suddenly don't have a mean anymore. But
instead we have a sum in account. And the reason being that when we parse the files
we update the sum and later on at the end we can calculate the mean. Which has some mathematical
properties which are favorable because now we don't really have, we don't run into rounding
issues anymore. This is an aggregation that we can do whenever we want to get kind of
a mean on the fly. And at the same time we have a default. Now the default is not really
idiomatic too I would say. But the great part about it is that we can later reuse it and
make our code a little more readable. In this case we set the min price to the maximum float.
But then whenever we introduce a new price it will overwrite the maximum because I guess
by definition it's smaller than the maximum or smaller or equal. And same for the max
and some in account are kind of set to zero to begin with. And just before we bring it
all together here's one more thing that we should do which is have a notion of a display
for price. In this case we implement the display trade and we say yeah if ever you want to
print a price this is the structure that you should use. The min, the mean and the max.
And then this way we can make our code way more readable. Now you can see that instead
of using a tuple or floats here we use a price. And when we update the prices we can talk
about this object. We can tell the object hey update your min for example. Here we say
price.min.min holds a price and we automatically get the min price as well. We update those
price fields and yeah we can even introduce a price.add method. I don't show it here but
technically why not. We can add a new hold up price. Prices could be added over time.
Now that depends on I guess your taste, your flavor of rust. This is the entire code. It's
a little longer but you saw all the parts. And now you have something that I would say
isn't a workable state. It's not great but we did one thing. We considered rust. We thought
the ignorance. We started to embrace the rust type system. We started to lean into ownership
and borrowing which are fundamental concepts in rust. We lean into design patterns and we
learn how to improve our architecture. And I would also say if you want to improve this
part try to learn a different programming paradigm. Rust is not the only language. Try
rock or try a functional language like Haskell. It might make you a better rust programmer
too. This is how you fight ignorance. Now if you see that none of these horsemen fit
to you by the way just think of your colleagues how you would want to introduce them to rust
because this is the code you have to review and also probably maintain in the future. So
it's time well invested. If you want to learn more about idiomatic rust specifically there
is a website. I just put it there. It's an open source repository. It has some resources.
This is a rendered version of it. You can sort by difficulty so that's your experience
and then you can sort by interactivity if you want to have a workshop or not. For example
there are free resources on there and paid resources too. Right let's go on and look
at the next horsemen. Excessive abstraction. Everyone in this audience knows someone like
that. They try to over engineer solutions because rust leans into that. It allows you
to do that. It's a nice language to write abstractions. Everyone likes to do that. But
then you add layers of indirection that maybe people don't necessarily understand if they
come from a different background. They use trade successively and generics and lifetimes
and all of these concepts are great in isolation. The combination of which makes the programs
hard to read and understand for newcomers. Now if you find yourself in this camp try
to fight this as well. Common symptoms of this are things like this where you have a
file builder which takes a t as ref of str and a lifetime of a and this makes sure that
you can pass any type and that it has no allocations that are not visible because of the lifetimes.
So this might be fast and it might also to some extent be idiomatic but it is something
that your colleagues also have to understand. Another thing is I might use this again. Let's
make it generic or trades everywhere. And how do you get to that mindset? It's very
simple. After you wrote your CSV parser it's natural that you want other parsers too. Of
course you want to chase on. Of course you want to read and write into a database. You
start thinking that you'll need all of those formats at some point and this is the part
that is important at some point. And then you end up with something like this. It's
a trade definition for a hotel reader and it has a single method called read and it
takes a self that's why it's a method but it also takes a read which implements the
read. That means you can pass anything that implements the read trade and it returns a
box of iterator of item equals result hotel with a lifetime of A. No allocations except
for the box but the iterator itself is a very idiomatic way to say a result of hotel so
parsing errors are considered and it's very applicable for all of the reader types that
you could possibly want. Let's say you wanted to use that trade and implement it for our
hotel reader. Now suddenly we blow up the code to something that is harder to understand
or if it is easy for you to understand please reconsider if your abstractions are too much.
Maybe you ain't going to need it. Right. So we have a hotel reader and it owns a reader
builder and inside of our new method we initialize the CSV hotel reader and we implement hotel
reader down here. The single method called read and we say self.reader builder this is
the code that we saw before we just put it here this is our CSV parser the initialization
of it and then we return a reader.into the serialized hotel map and this is where we
map the errors. Right. Does it look great? I don't know depends on someone's nodding.
We need to talk but it's certainly nice to use I guess. Now we can say for hotel in hotels.read
file. Should hotels know about files? Maybe not. But it's great if you go one step further
and you implement iterator on it and now you can say for hotel in hotels. Alright we're
getting somewhere from a user's perspective that is really great. But remember we're talking
about application code. There's probably code that you earn money with. It's not a
library function that is used by thousands of people. It's your simple CSV parser and
now we just blew it up into something that is harder to understand. Do you really need
this? Well I don't think so. I don't know what this person on the bull does but it certainly
looks confusing to me and this is what people think when they see the top signature. I know
kind of you wanted to optimize it a bit but at what cost? Right whenever you sit here
and you think oh I should implement JSON support and you don't do it for fun. Start thinking
if you really need those subscriptions because they can haunt you. Most of the time they
don't have no need of it. I don't know what sort of animal this is. Is it a lion cat or
something but it's kind of strapped to a cannon and it doesn't look too happy to me.
I don't want this. Probably you're not going to need it. As a side note another thing probably
you shouldn't do too often are macros. There are traits out there that excessively use
macros. What do I mean by macros? Macro rules but also macro derives and these are great
but they come at a cost and the cost could be compile times. Just yesterday I talked
to Daniel Kerkman who I don't know is he here? He's not here. But thanks for the tip.
He has a situation at work where compile times just blow up because of macros and for you
it might be easy to write but for other people it might be hard to use. Maybe you want to
prefer traits over macros if you can. That was the second horseman fighting excessive
abstraction. How can it be done? If you find yourself in this situation keep it simple.
Avoid unnecessary complexity. Just think that the person that will maintain the code is
not a mass murderer but your best friend. Do you treat friends like this? Watch newcomers
use your code. That can be humbling. Ensure that abstractions add value. Yes you can add
a layer of abstraction. Does it add value? That's up to you. Decide and don't add introductions
that you might need in the future. Add them when you need them. Right. Two off the list
we have two more to go. Next one is premature optimization. This is for a lot of people
in here because you are C and C++ programmers. I'm looking at you right now because 90%
of you raised your hand. I see a lot of people from C and C++ come to Rust with this mindset
with these patterns. What are the patterns? They optimize before it's necessary. This
is important different from adding too many layers of abstraction. Optimization in this
case means profiling is not done but instead you kind of try to outsmart the compiler and
you think about performance optimizations way too early before you even need it. Did
I even tell you how big that CSV file was in the beginning? How many entries does it
have? You don't know. Maybe you should not optimize for it right away. They use complex
data structures where simple ones would suffice. For example we saw the hash map with the three
tuple elements. These are things that are kind of unravel and then it ends up being
a mess not very idiomatic and arguably not even faster. And they also have a tendency
to neglect benchmarks. Some red flags. Quotes you might have heard. Without a lifetime this
needs to be cloned. Ignore that. If you know that you have a performance problem then you
can think about lifetimes. It's fine to clone. Let me help the compiler here. The box is so
much overhead. I use B3Map because it's faster than hash map. No need to measure I've got
years of experience. They love the term zero cost abstraction or zero copy. Actually it
should be zero cost in here. And they hate allocations. Whenever they look at an allocation
they feel terrified and they bend over backwards to make that program faster. So whether this
is the developer or the compiler and vice versa is up to you. I've been in both situations.
They turn a completely simple hotel struct with a couple string fields which are owned
yes they live on the heap. Do something that lives on the stack and has a lifetime. And
every time you use a hotel you have to carry on the weight of the lifetime. Well does it
matter for this one particular case? Probably not. But then you look at other places of
the code base and you see that they kind of reverted your changes. They made what you
introduced your hard won knowledge about the abstractions and they took them away. Now we
start to index into our data structure again. We use string split again. We go backwards.
We've been there before. It is super fragile. Again we are going backwards. Now let me play
a little game here. Since there are so many C and C++ programs in here I expect you to
answer this. What is the bottleneck? This is a very famous medieval game who wanted to
be a millionaire. What is the bottleneck? Is it CSV parsing? The DC realization of our
entries. Is it string object creation after we DC realized it? We put it into a hotel
struct. Is that the bottleneck? Is it floating point operations when you parse the price?
Or is it hash map access? Who's for A? Some shy hands? Don't be shy. Who's for B? Okay.
Nice. Who's for C? No one. And who's for D? The hash map. Nice. The correct answer is
you forgot to run with release. How do you find the actual performance improvements?
There's just one correct answer and it is measure. Profile. Use the tools. Cargo flame
graph. Cool thing. You will see that in a second. Use benchmarks. There's criteria on
Nick still in the room? Nicolet? No. His benchmarking tool. Divan. Pretty great. Use it. Okay. I
will give you one example. Let's look at a flame graph of our initial program. The one that a
junior developer could write in two hours. What is the bottleneck? There is no bottleneck. This
is the setup of our flame graph itself. This is the profiler setup. The code itself is
negligible. Negligible, I guess. And why is that? Again, because I didn't tell you how big the
fire was, do you think I can come up with thousands of alliterations for hotels? No. So I added
100 entries. There is no bottleneck here. Okay. You might say, but okay. What if the fire grows?
Let's add a million entries. Okay. Oh, this is still 120 records. So let's add more. This is a
million. You probably ain't going to read it. Let's increase it to 10 million. And indeed, deserialization
of the struct takes most of our time. Okay. If we look a little closer, it says,
serde deserialize deserialize struct. Okay. We have some memory movement going on. Let's take a baseline.
That is our baseline. This is what it takes. 34 seconds. Okay. Now, let's say we kind of want to prove our
C and C++ developer wrong. Does this other abstraction that we added for the hotel struct really add that much
overhead? No. It's the same. It's like 34 seconds still. Oh, actually, this is the part where we remove the
unnecessary fields. But we can go further. We can say, yeah. Here we have a little safer version. We don't index,
but we say nth.1. And we have 32 seconds. Now, our bottleneck is append string. String appending. Okay. I think there's
something that we can fix. Well, okay. Maybe this is not really that readable. But what we do is we split now by a
string. And instead of doing an allocation where we append to our string over and over again, we use this pattern
matching here. And this reduces the runtime by 30% already because we save on allocations. Now, if we try to profile this
code again, where's the bottleneck now? Read until. Okay. What is that about? We have a lot of memory movement going on.
And now we reach a point where the disk becomes the bottleneck. We can use an M-map for this. Now, remember, we are talking about
performance and maybe you should not do those optimizations, but prove a C and C++ program were wrong and they are in tuition. And then you see that
the bottleneck might be solved elsewhere. Now we are at 30 seconds by changing like four or five lines from the entire program, not the
entire thing. We can keep using our abstractions. That's the main point. Here we use an M-map. That's a memory map in the kernel. We save on allocations.
30 seconds. Okay. What if we wanted to do more? It's hard to read, but now we reach the point where in fact the hash map is the bottleneck. And one more step to improve the
code would be to split it up into multiple chunks. You can use rayon. You can now finally use a better hash map like a hash map. And we are down to 3.5 seconds.
And we did that not by guessing, but by profiling. Now if we want to run a profile, it looks different again. Very different. These are the individual chunks that we managed to split up.
We went from 40 seconds to three or four seconds in a couple slides and with few changes. And the point is don't guess, measure. This is the worst part that C developers bring into Rust.
They think everything is a performance overhead. And if this challenge, by the way, looked very similar to the one billion row challenge, this is why it was inspired by it. And it is very similar. Read it up. It's kind of fun.
We did something similar for hotel data. But the more important point here is how can we fight premature optimization? Measure, don't guess. Focus on algorithms and data structures, not micro-optimizations.
More often than not, if you change from a vector to a hash map, this will be way, way more efficient than if you remove your little struct. And if you add lifetimes everywhere.
You can get carried away pretty quickly and Rust encourages you to do so, but it also has the tooling to fight it. Be more pragmatic. Focus on readability and maintainability first and foremost. Use profiling tools to make informed decisions.
You covered all of that. Your code is idiomatic. It is fast. You didn't overdo it. What is missing? Well, the entire rest. Do you have tests? Do you have documentation? Is your API too large? Does your code lack modularity and encapsulation?
These are things that I see from people that are like the lone wolf coders. They know all about Rust, but what they are not really good at is the rest. Explaining the differences to their code maintainers.
And writing documentation. Not about the what, but not about the how, but the what. What does your program do?
Some things they say. It compiles. My work is done here. The code is documentation. Let's just make it all pop. I'll refactor that later, which never happens. Let's look at that code again. This is our first version junior programmer. Three hours.
Okay. How do we test that? It's kind of impossible because this is one big binary, one main. How would we test that?
Well, I guess the question is what do we want to test? Well, first off, I would say let's add a test for parsing the entire thing can be a very simple, true test.
But if we refactor it such that we have a function that parses cities, now we can start to introduce a path here and do the parsing. And this is where the parsing logic is, by the way.
We split it up into a main and the parsed cities. Great. This is our first test. Very crude, but we get to a point where suddenly we can test our changes.
We create a temporary directory. We have a path and then we write into a file and that's it. The parsing is done. Great.
If we wanted to make it a little better, instead of passing in a path, we pass in something that impels read. Now we don't need to create files like here.
Instead, we can have our input as a binary blob. And these are simple things. Add some documentation, add some tests. It's not that hard.
And in order to fight a mission, what you need to do is write more documentation, write unit tests, use tools like Clippy and cargo UDAPs, set up CI CD so that you can handle your changes, create releases, use release please, Marco, greetings go out to you, and keep a change lock of what you changed.
Right. We're getting towards the end. We have seen the anti patterns. You know them now. I hope that you will be able to, you know, see them in your code.
If you want to learn more, there are some other talks that were given here at FOSSTEM and other places. You might want to check them out. Maybe I can put the slides somewhere. And that is all I have to say. Thank you.
Thank you.