Good afternoon.
We have now Mach-Andre Leimburg.
He's the CEO and founder of E-Gennex.
He's not only that, but he's a Python C-Python core developer.
He's also one of the organizers of the EuroPython.
He's a EuroPython Society Fellow,
and he's been making many contributions to Python.
So, yes, we have this pop star here.
Now, he's going to talk about Match of Things Python,
parsing structured content on Python's new match statement.
Thank you very much, Mar.
Thank you.
And thank you all for coming.
The reason why I'm doing a talk about the match statement
is that I'm getting a feeling that it doesn't receive enough traction.
So, I wanted to know from you how many of you know the match statement?
How many of you have actually used the match statement?
A lot less.
Yeah, that's what I thought.
So, maybe a short introduction.
Tatiana already mentioned a couple of things.
I did a lot of stuff in Python.
I've been working with Python since 1994, so a very long time.
I did lots of things in the core development,
Unicode, Db, API, the platform module.
I'm based in Germany.
If you have a need for, I don't know,
a senior software architect, then please contact me.
But that's not the point of this talk.
The point of this talk is to show you this.
So, this is the match statement that you have in Python.
And it's actually a very, very useful thing,
especially if you want to parse structured data.
Now, the match statement itself is actually quite complex
if you look at all the details.
And I'm going through all the details in this talk.
There are so many details that I have to rush a bit, unfortunately.
And I'm not going to be able to show you live demos or anything
because I simply don't have the time for that.
So, let's just head right in.
So, what's the motivation behind the match statement?
People wanted to have something like a switch statement,
as you probably know from maybe C,
your other languages, for a very, very long time.
I just, I wrote a pep a very long time ago,
which basically suggested adding something like that to Python.
It was rejected at the time,
so it took another 20-something years
to actually something like this to make it into Python.
What we now have with the match statement
is a lot more powerful than the switch statement.
So, you can do not only matching on literals, for example,
but you can also do matching on types.
You can do matching all kinds of things,
including conditions that you apply to these things.
You can combine all of these things.
You can also do parsing and matching at the same time,
which is quite useful, so you don't have to have two passes.
First, to figure out whether something is actually valid,
and then in the second pass to then figure out
how to actually use the data that you have there.
It all started in Python 3.10.
That's more than two years ago.
But, like I said, it hasn't received that much traction yet.
So, what you see here, or maybe you cannot see it,
it's a graph from py-code.org,
which is a very nice site.
If you don't know that one, you should go there and have a look.
It basically scans all the PyPI code
and then does analysis on that.
The maintainer did an analysis in July last year
and looked at various features of the language,
whether they were being used in the packages on PyPI or not.
As you can see, in July, there were only 2,600-something packages
on PyPI using the match statement.
That's two years after the release,
and it's only 0.55% of all the packages, so it's next to nothing.
So, I guess one of the reasons for that
is that the documentation for this match statement
is not all that great.
I'm talking about the official Python documentation.
There are many blog posts about it,
and many other resources that you can tap into and overviews.
But the Python documentation for the match statement is not ideal.
What you have is these three PEPs,
and this is basically the best that you have
in the official documentation for Python.
If you want to get into these things,
then I would suggest to go with the PEP 636,
which is a very nice introduction,
a tutorial kind of introduction to the match statement,
and then you can go to the other PEPs to have more detail.
So, how does it all work?
We're going to have a look at this example,
and I'm going to go through the various different parts of it.
So, the first part is the match object itself.
This is what you want to match, this is what you want to analyze.
The next thing is what you have behind the case statements in there.
Those are called match patterns, and there are quite a few of those.
I'm going to go through a list of many other patterns that exist.
Then, of course, you have the match code.
This gets executed in case, one of those case statements,
the case patterns actually do match.
And then you have something called capturing variables.
I'm not going to explain what that is now,
because I have a few slides on those.
This is a way basically to store the data that's being matched in a variable.
Plus, you have something that's a bit strange, which is just the underscore.
These are non-capturing wild cards.
So, it's basically like an ELTS in an if-else statement.
So, if the matching goes down, and you have a, as the last case,
you have one of these wild card things, then this will always match.
So, this is a way to do the ELTS in the match statement.
Matching itself is always tried from top to bottom, and the first match wins.
So, the order in which you list these match statements,
the case statements, is actually very important.
There's no fall through, like in C. How many of you know C?
Well, quite a lot. That's good.
So, you don't have that, because in C you can often make a mistake.
If you forget a break, for example, in one of these,
the code that comes behind the case, then it just falls through,
and then you execute code that you probably don't want to execute it.
So, let's have a look at these pattern types that we have.
Like I said, there are quite a few.
I'm going to go through them rather quickly.
So, the first one is the literal.
So, you can just write a little bit of string, a little bit of number,
an integer, a float.
It can also handle a couple of special singletons, like true, false, or none.
Not many more.
If you have something else that you want to actually match,
and you don't want to write this down as a literal,
you can use a variable kind of notation for that.
So, if you have some other value, you put that into a variable
that's accessible to the match statement.
And what's very important is that you have a dot in that reference.
The reason for that is a bit strange,
because the match statement also works on types.
And in order to differentiate between type names and variable names,
the match statement and the parser, they need to have some kind of hint for this,
so that they know what they're dealing with.
And the dot is that hint.
Now, the next two types are sequences and mappings.
They look very natural to a Python programmer for sequences.
You just use like the square brackets or the round brackets,
and then you match a sequence.
What's not necessarily intuitive about this is that
this actually matches sequences, not just lists or tuples.
So, if you write something like, for example, in the tuple notation,
and then you pass in a list as an object that gets matched,
the tuple case will still match in your match statement.
So, that's a bit like a gutcher.
You have to watch out for that.
And it's similar for mappings.
For mappings, you write them like the, like a dict kind of notation.
It actually matches all kinds of mappings,
not just dictionaries.
There are ways to, you know, just match dictionaries.
I'm going to show them.
You can also match, like I said, different types.
The very, you know, very simple ones are like all the built-in types
that you have there.
You can have support for user-defined classes.
You have to pay some attention in user-defined classes
about the order of the arguments that you have in there.
I'm going to talk about that in a bit.
What's very important are these parentheses.
If you don't have parentheses behind this,
then the match statement is going to basically treat this,
the name that you have there as a variable,
and very often as a capturing variable.
So that's going to, that's another gutcher
you need to be careful with.
Of course, you can nest all these things.
You can combine all these things that I just mentioned in various ways.
There's an OR combination with a pipe character.
And to make things even more complex,
you can add guards to these match patterns that you have.
So you can say, OK, for example, down here,
if you can see that it's a sequence AB,
and then this should only match if the value A in that sequence
is above 10.
So you can write very complex things in those match statements.
And then finally, you have these white-card patterns.
I mentioned those already.
There are two types of these white-card patterns.
One is the anonymous one, a non-binding one, which is the underscore.
And the second one is one where you basically put something
at the bottom of your match statement,
and you just assign a variable to that.
I often use unknown for this because it just makes sense.
If you read that, it's very, you can easily comprehend that.
If you read the code, you can easily understand
that this is actually something that matches anything
a bit unlike the underscore.
I'm not too much of a fan of this underscore thing.
Right, so now let's have a look at the capturing variables.
Like I mentioned in the beginning,
the nice thing about the match statement
is that you can actually combine the matching and the parsing.
So whenever something matches, Python
will put the matched value into a variable that you define,
which is very much like, for example, the ass notation
that you have with context managers.
There are two forms for this.
One is an explicit form.
So I put an example here.
So what happens is it matches a list.
And then if the list type matches,
it will put the value into the variable sublist.
And then you can use that variable in your other matching code
that you have or in the actual code
that you want executed for that particular case.
Very easy to understand.
It's a bit more verbose, but it always works, which is nice.
And then there's an implicit form.
This can cause some problems because it introduces
some of these gotchas.
The way that this works is that instead of putting
literals in these, for example, sequence notations
or mapping notations, you put variables in there.
And what happens there is that implicitly, for example,
in the first example up there, the first entry
in that sequence will go into A, and the second entry
will go into B. And then you can immediately use A and B,
for example, in guards that you have on the code that
comes afterwards.
And these things are actually bound variables in your code.
This works very well if you have well-defined variable names.
If you don't, you can get into lots of trouble.
So using short names is probably not a good idea.
They should be very explicit.
This does also work with some of the built-in types,
not all of them.
So there is a, I think this is actually
a full list of all of the ones that support this.
It does work with classes that you define,
but you need to have a look at this pep for the details.
There are some special attributes
that you have to define in order for the parser
to know in which kind of order these variables should
be assigned.
Unfortunately, it doesn't work with ABCs,
but there are workarounds for that.
So if you work with ABCs, for example,
if you want to test whether something is a float or an int,
and you want to put that kind of logic into an ABC,
then there are ways to still make that happen.
There are some things that don't work with the match statement.
Some are a bit unfortunate, because, for example,
if you use a scripting shell language, like bash,
for example, a very, very common use case for matching
is regular expressions.
So basically, you have a case, and then you
put a regular expression there to match
a particular regular expression, kind of like how
the string should look like.
This is not supported directly.
There are ways to work around this.
I'm going to show you a reference later on,
where you can basically find how to do this.
Something else that doesn't work well is a set member matching.
There are ways, again, to work around this.
You can use a guard to kind of do this set matching.
So the guard works by having the wild card,
so it always matches.
And then it uses the guard to do the actual check
whether something is in a value set,
or you can use the OR pattern.
But the OR pattern is sequential,
so it's not really efficient.
Optimizations haven't been done yet,
which is a very common theme that you always have in Python.
First, something gets implemented
to have something to work with.
And then, in the next couple of releases,
people then worry about performance and add better
performance.
So that has happened a lot in Python in the history.
It's probably going to happen for this as well.
So I talked a bit about the guard trust.
I just want to reiterate some of them.
This I already mentioned.
If you use the tuple notation or the list notation,
and you think that, OK, this is just
going to match a tuple or just a list,
you can easily get this wrong.
So if you want to do this explicitly,
then you actually have to use the type notation for this.
So you have to write list or tuple,
and then the sequence that you want to match.
The same issue you have with the mapping types.
So you have to pay attention to that as well.
Another gotcha is the wildcard pattern.
So you can only use the wildcard pattern
at the very end of the list if you put something up
at the top of the list.
For example, if you start with case and then wrong values,
because wrong values is a capturing variable,
it's regarded as a wildcard case.
And so it will match anything.
And the parser will actually complain about this.
So this is not valid Python.
However, if you put a guard with it, then you can use it.
Which is probably in order to make certain workarounds
possible.
I don't really know what the reason is
why this works.
It's a bit strange.
And then the parentheses.
If you look at this code, if I wouldn't have put an error there,
you probably wouldn't have seen this.
What I did there is I put a dict there,
meaning that I want properties to have a dict,
like a dictionary value.
And they want to match that.
But I forgot the parentheses.
So what's going to happen is the parser
is going to regard this as a binding, sorry, capturing
variable.
So it's going to put the value into a dict.
And then it's not only going to not parse correctly,
because it will just put any kind of value
that you have there into this dict capturing variable.
But it will also bind dict to this value
that you have in there, possibly breaking code that
comes afterwards, because you can no longer access
the built-in dict.
So this is something to watch out for.
And finally, this is the talk that I wanted to mention.
Raymond Hettinger.
Who knows, Raymond Hettinger?
Not that many people.
That's strange.
You should definitely look him up.
I mean, he has done so many good talks.
It's just incredible.
I mean, if you want to learn something deep about how
Python works, he has all the talks in his stack.
So definitely have a look at that.
He did a great talk at PyCon Italia 2022,
also on the pattern matching.
And he shows a lot of tricks on how
to work some of the deficiencies that you currently
have in the match statement.
So I was actually faster than I thought.
So I'm done.
So yeah, this is always my last slide.
Never stop to learn.
Always learn new things.
Never always try out new stuff that comes out in Python.
And I hope this talk will kind of make you have a look
at the match statement and maybe use it more,
because it's actually quite useful.
Thank you.
Thank you, Mark.
Thank you, Mark.
So now it's time for questions.
So I can say a few people with the hands raised.
I will start here, and we will go up.
So we have four people, at least.
One of your first examples, you first
had to check whether this is a list, like with the list
in the parentheses.
And then two cases later, you are
trying to catch against the sequence.
That means that this will only match if it's a sequence,
but it's not a list, I guess.
Like on your first slide, literally.
The first one, like this one?
Yes, this one.
So on the third case, it will match
if the thing is a sequence with three elements,
but that sequence is not a list, because otherwise it
would have gotten into the first case.
Is that correct?
Given this one, yeah?
Yes.
Since you have a case list, oh, yeah.
Yeah, so you're right.
What happens here is that this will always match for lists.
So if you put in a real, like a true Python list,
then you will always go in here.
If you have defined your own kind of sequence,
that's not a Python list.
Only then it will get in the top.
Then it will drop down here, and we'll parse here.
And as Heckelman and Laska mentioned for me,
what happens if you put a generator in there?
Can you match against generators?
Because then you will kind of mutate the element
while casing the case.
Would that work?
This is a good question.
I think if you put a generator in there,
it will actually match the generator type and nothing
much else.
It won't actually call the generator
to give back any values.
But it's a good question.
I'm not really sure.
It probably works like that.
Hi.
Thanks for the great talk.
I had a question regarding the caveat you gave at the end
regarding the dict.
Is there a proper way to do it, like putting parenthesis,
or is it not possible to match a type inside of a hash map
like that?
Let me just find the slide.
This one, right?
Yeah, that one.
So what was the question?
So here you put the dict, and you
said that, of course, if it will overwrite,
let's say, the Python dict, would it
be possible in that case to put parenthesis to match the type
here?
Yes, of course.
And that was the code is actually
written in a way that this would have been intended, right?
So the intention was that properties,
well, it's matching a mapping, right?
So if you put in a mapping that has,
as one of the keys has properties,
and as a value has a dictionary, then this will match, right?
Without the parenthesis, it won't match
any mapping that has a key that has properties,
but not actually look at the value,
and simply just put the literal value
into the variable dict.
That's what happens.
OK, I think I see you up there, right?
Yes, hello.
I was wondering, with this capturing variable,
it can sometimes lead into ambiguity.
So I was working how well this would
work with the existing typing system, where you would,
for example, have an object that, like,
dict that represents the type.
So that is something that I did not really cover in here,
but perhaps you noticed the syntax that's being used here
is actually somewhat different from the type annotations
that you have in Python, right?
So those are two distinct kind of, basically, systems
working here.
These types that you have here are actual Python type objects
that you work with, whereas the type annotations are being used
by, for example, MyPy or other tools,
other static code analysis tools to figure out
whether something is correct or not.
So this actually happens at runtime.
I don't know if that answers your question, so.
Well, sort of, I guess.
So you can't really put the typing types in here, let's say,
because there is generics in there, of course,
that would be highly convenient for matching.
Right, right.
I think that, I mean, in typing, you do have some actual
Python type objects.
Those you can use in here, right?
But you cannot use the type annotation kind of syntax,
for example, for matching an integer or something, yeah?
No, it doesn't make sense, of course.
That doesn't work.
Thank you.
Do we have any more questions?
We have time for one last one.
Yes, we do.
Oh, my God, we have two.
I'm going to the right side, because we haven't had
many questions from there.
I'm coming.
Let's go.
Thank you.
So, yeah, maybe this is wishful thinking, but how difficult
would be to implement or to provide, like, a match that
will match not in order, but it will give me the best match?
Would be that possible?
Because, for example, I'm working in code generators for
wrapping CAP from wrapping C into Python, and sometimes you
can't do that.
And from C++ goes over, function overload.
So I can think, OK, I can have function overload to Python
and translate that to a single function with match for
different signatures.
However, I will have to, I don't know, I need to know which
is the best match for each case in order to order the match
statement.
Will it be possible to have that kind of logic embedded in
Python, or that's too wishful thinking?
You can try to do this by ordering the cases from, you
know, the longest match to the shortest match.
But apart from that, I think it's, this is actually a hard
problem that you're describing there.
Because if you want to, if you want to figure out what's the
best match that you have, then you actually have to go
through all the different cases that you have in here, and
that's going to have different semantics than what you have
now in the match statement.
Usually the problem is like to know which is the most
concrete type.
Usually the problem that I have the most is like to know
which is the most concrete type to the base type, so to
that it matches the most concrete one instead of the base
one, because it's like it can match us both.
But in C or C++, it will always match the most concrete
one.
And if, and it's not there, it will get to the base.
So, and for example, for now, it's like right now in Python
I have no idea how I will solve that when I'm wrapping
APIs.
You can do that by ordering, like I said, you can order the
case statement that you have here from the most, let's say,
abstract one to the most concrete one, and sorry, the
other way around, from the most concrete one to the most
abstract one, and then like in the example I just gave where
you have a list, yeah, when, if you pass in the Python list
object, then it will match the first one.
If you pass in, in this other example that I had here, if you
pass in, let's say, a user defined sequence, then it will
drop down and then match that one.
So that's more abstract, right?
Thank you very much, Mark.
Another round of applause, Mark.
Thank you.
Thank you.