[00:00.000 --> 00:12.000] So, we have Samuel here to talk about Pydantic 2 and how it leverages REST superpowers. [00:12.000 --> 00:18.000] Thank you very much. Can you hear me at the back? [00:18.000 --> 00:26.000] Great. It's a bit about me. I'm Samuel. I've been a software developer for 10 years, among other things. [00:26.000 --> 00:35.000] I've been doing open source quite a lot for the last five years, mostly Python projects, but moving a bit into REST over the last few years. [00:35.000 --> 00:47.000] The most high profile Python project that I maintain is Pydantic, which I started back in 2017 and has subsequently kind of taken over my life. [00:47.000 --> 00:52.000] I've been working on it full time for the last year. [00:52.000 --> 01:04.000] So, what I'm going to talk about today, I'm going to give you a bit of an introduction to Pydantic, some hype numbers for some vanity, but also for some context of why making Pydantic better is worthwhile. [01:04.000 --> 01:09.000] I'm going to explain why I decided to rebuild Pydantic completely. [01:09.000 --> 01:17.000] I'm going to talk a bit about how I've done that with REST, and I guess most importantly why doing it in REST is the right choice. [01:17.000 --> 01:22.000] I'm kind of preaching to the converted, but hey, what I'm not going to do is a like, hello world. [01:22.000 --> 01:28.000] This is how you would build a Python extension in REST. There were lots of other talks on that. They're great. [01:28.000 --> 01:41.000] And also the PyO3 documentation is amazing, so I think it's more interesting to go into a bit of depth on the challenges, the advantages than just to do the hello world example again. [01:41.000 --> 01:46.000] What is it? Well, Pydantic is a data validation library in Python. [01:46.000 --> 01:52.000] It's not the first. It's definitely not the last. It started off as a side project like so many open source projects. [01:52.000 --> 01:59.000] Nothing special. I maintained it my spare time. People came along occasionally, said nice things, reported bugs. [01:59.000 --> 02:05.000] Occasionally said not very nice things, and then something weird happened, and its usage went crazy. [02:05.000 --> 02:15.000] So the first thing that happened, which you can't really see on this graph in 2018, my friend Sebastian Ramirez started the fast API project, which is a web framework in Python, [02:15.000 --> 02:22.000] which uses Pydantic and has now got, I don't know, how many thousand stars, 60,000 stars or something. [02:22.000 --> 02:31.000] It's got a lot of attention. You can see fast API growth there. That got a lot of people, I think, to first find out about Pydantic, [02:31.000 --> 02:38.000] but something else happened at the beginning of 2021 to cause Pydantic's download numbers to go crazy. [02:38.000 --> 02:48.000] Now, I'm well aware that Aaron Armin's speech at talk earlier kind of pre-trolled me before I'd even made my talk, saying that download numbers are a terrible metric, [02:48.000 --> 03:00.000] but they are the only metric, so that's what we have to use. It's also worth saying that I have actually looked at Pydantic's downloads in terms of as a dependency and as a direct download. [03:00.000 --> 03:08.000] It's not that easy to do with PyPI, but it looks like about 15 million downloads a month are from, as a dependency of another package, [03:08.000 --> 03:17.000] and the remaining 25 or so million are people installing Pydantic directly, so it seems like people are using it not just as a dependency of another library. [03:17.000 --> 03:25.000] I've included Django on there as the middle line, because it's the most high-profile, most well-known web framework in Python. [03:25.000 --> 03:36.000] Not to be critical of it, it's amazing, it's changed my life. I mean, no disrespect by saying we've overtaken it, but just that Pydantic's usage has gone mad. [03:36.000 --> 03:46.000] In terms of how it's used, it's used by lots of organizations you would expect, all the fang companies, something like 19 out of the top 25 people companies in NASDAQ, [03:46.000 --> 03:54.000] but also by organizations which you wouldn't expect, like JPMorgan, use it quite a lot, I don't know in what regard. [03:54.000 --> 03:59.000] But it's quite interesting, if you have an open source project, if you look in analytics at the referrers, [03:59.000 --> 04:06.000] lots of those big, very security-centric companies forget to turn off the referrer header from their internal systems, [04:06.000 --> 04:16.000] and they name their internal systems things like github.jbmorgan.net, so you can see which companies are using your dependencies by looking at those referrers. [04:16.000 --> 04:25.000] So, for example, Apple have no public demonstration of using Pydantic at all, but six different enterprise instances of github within Apple use Pydantic, [04:25.000 --> 04:36.000] and you can even see which ones they are. They're like maps.github.maps.apple.com, github.serie.apple.com, etc. [04:36.000 --> 04:46.000] It's also used by some cool organizations. It's used by NASA for processing imagery from James Webb, and it's used by the international panel on climate change [04:46.000 --> 04:55.000] for processing the data that they give to the UN on climate change every month, which is the stuff that I'm most proud of and why I want to make Pydantic better. [04:55.000 --> 05:03.000] So, what's so great about Pydantic? Why are so many people using it? The short answer is I don't know, because you can't go and ask those people, [05:03.000 --> 05:11.000] you can't look at a graph, but it can't really tell you, but we can kind of look at Pydantic and what people say, and we can kind of guess at what's made it popular. [05:11.000 --> 05:19.000] So, this is, I know we're in the Rust room, we've got some Python code, don't worry, we'll get to Rust later. This is some Python code that demonstrates what Pydantic does. [05:19.000 --> 05:28.000] So, we have a model which kind of represents a talk, which has four fields in this case. Obviously, title is a string, attendance is an integer, [05:28.000 --> 05:41.000] the number of people who came, when, which is a date time, or none, and has a default value of none, and then the mistakes I make, which is a list of two pools with the time they were made at and a description. [05:41.000 --> 05:52.000] So, and then lastly, last line, we instantiate an instance of talk using that data, and if there was a mistake, we'd get an error, if there wasn't a mistake, we'd obviously get the instance. [05:52.000 --> 05:59.000] The first thing that makes Pydantic special, and the reason that people like it, is because we use Python type hints to define the types. [05:59.000 --> 06:09.000] That's become reasonably commonplace now, there are a whole suite of different libraries that do the same thing, either because it's obvious or because they're copying Pydantic, but Pydantic was the first to do that, [06:09.000 --> 06:24.000] because type hints were kind of new in 2017, and obviously, the main advantage is it's easy to learn, you don't need to learn a new kind of DSL to define stuff, but it's also compatible with static type checking with all the rest of your code, with your IDE. [06:24.000 --> 06:32.000] Once you defined your model, and if Pydantic's worked correctly, then you know you've got a proper instance of talk. [06:32.000 --> 06:46.000] The frustration that caused me to create Pydantic was that type annotations existed, they sat there in the code, you could read them, but they did nothing at runtime, and so, effectively, could we try and make them work? [06:46.000 --> 06:55.000] The second and slightly more controversial thing that Pydantic does, which I think is one of the reasons that people find it easy to use, is because we default to coercion. [06:55.000 --> 07:16.000] So, you can see a tendance there, although it needs to be an integer, it's defined as a string. Pydantic will automatically coerce from, for example, a valid string to an integer, but it'll also do other coercions that are a bit more commonplace like coercing a string as an isodate format into a datetime object, and same for the durations. [07:16.000 --> 07:33.000] Some people hate that, some people complain about it a lot, I suspect that lots of people who don't even realize they're using it, they process environment variables, or JSON, or URL arguments, and they're always strings, and Pydantic just works and they don't even see it. [07:33.000 --> 07:43.000] A few other reasons I think we're quite popular, we're fast-ish, we're friendly-ish on the bug tracker, I don't promise not to ever be cross with people, and we're reasonably feature-complete. [07:43.000 --> 07:55.000] So, that was Pydantic, it's great, lots of people are using it, what's the problem? Well, it started off as a side project for me, it wasn't designed to be successful, and the internals stink. [07:55.000 --> 08:04.000] I'm very proud of what Pydantic is doing in terms of how it's being used, I'm not proud of what's under the hood, and so I've been keen for a long time to fix the internals. [08:04.000 --> 08:22.000] Also, second way in which I'm in kind of trouble before my talk was talking about API compatibility, we're going to have to break a lot of things in Pydantic V2 to get it right, but that's the right thing to do, I think, to get the future API to be correct and stable and not break again. [08:22.000 --> 08:37.000] And while we're building V2, why don't we do some other stuff, so make it even faster, it's already quite fast, but if you think about that number of downloads, you think about the number of CPU cycles globally every day devoted to doing validation with Python, [08:37.000 --> 08:56.000] but that's currently with Pydantic, that's currently all in Python, that's probably quite a lot of carbon dioxide that's being released, effectively unnecessarily, because we could make Pydantic significantly faster. Strict mode, I already talked about, because while often you don't need it, there are legitimate cases where you want strict mode. [08:56.000 --> 09:09.000] We have functional validators, which is effectively running some Python code to validate a field, they're useful, but they would be more useful if they could operate like an onion, so like middleware where you take both a value and a handler, [09:09.000 --> 09:17.000] and call the handler if you want to, once you've done some processing of the value, that would be super valuable, another thing we could add, composability. [09:17.000 --> 09:33.000] So Pydantic, as I showed you earlier, is based on the Pydantic model, often your root type doesn't need to be a Pydantic model or shouldn't be a Pydantic model, it might be a list, it might be a tuple, it might be a list of models, it might be a type dict, which is a common new type in Python, [09:33.000 --> 09:52.000] and then lastly, maintainability, since I maintain Pydantic, I want maintaining it to be fun, so about a year ago, last March, I started as a kind of experiment, could I rebuild some of it in Rust, a year later, I'm still working on it full time, and we're nearly there. [09:52.000 --> 10:13.000] So what does it mean to validate Python data in Rust? What's the process? Well, phase one, we need to take a Pydantic model and convert it to a Rust structure, so unlike libraries like CERD, we're not compiling models in Rust, [10:13.000 --> 10:22.000] the compiled Rust code doesn't know anything about the models it's going to receive, because obviously Python developers don't want to be compiling Rust code to get their model to work. [10:22.000 --> 10:28.000] So we have to have a, in Rust terms, dynamic definition of our schema, which we can then use for validation. [10:28.000 --> 10:43.000] The way we build that is effectively these validators, which are structs that contain both characteristics of what they're going to validate, but also other validators recursively such that you can define complex structures. [10:43.000 --> 11:07.000] So in this case, our outermost validator is a model validator, which effectively just instantiates an instance of torque and sets its attributes from a dictionary. It contains another validator, which is a type dict validator, which contains the definition of all the fields, which have effectively the key that they're going to look for and then a validator that they're going to run. [11:07.000 --> 11:28.000] The first two are reasonably obvious. I've added a few constraints to show how you would manage those constraints. And then the third one, the when field is obviously a union, which in turn contains a vect of validators to run effectively in turn to try and find the value. [11:28.000 --> 11:47.000] And then the last one, which is the kind of more complex type, contains this list validator, which contains tuple validator, which contains two more validators. And we can build up effectively infinitely complex schemas from a relatively simple, I say relatively simple principle at the outset, which is we have a validator. [11:47.000 --> 11:54.000] It's going to contain some other stuff. So what does that look like in code? [11:54.000 --> 12:03.000] I said I was going to show you some Rust code. I'm going to show you some Rust code because I think this is the most clear way of explaining what it is that we do. [12:03.000 --> 12:23.000] So the root of everything is this trait validator, which contains effectively three things. It contains a const, a static string, which is used for defining, as I'll show you later, which validator we're going to use for a given bit of data build, which is a simple function to construct an instance of itself in the generic sense. [12:23.000 --> 12:28.000] And then the validate function that goes off and does the validation. [12:28.000 --> 12:42.000] We then take all of those, well, we then implement that trait for all of the common types that we want. So I think we have 58, 48 or so different validators, and then we bang all of them into one massive enum. [12:42.000 --> 12:55.000] Then the magic bit, which is provided by enum dispatch, which is a Rust crate that effectively implements a trait on an enum if every member of that enum implements that trait. [12:55.000 --> 13:06.000] Effectively, it goes and does a big procedural macro to create an instance, an implementation of the function, which is just a big match, choosing which function to call. [13:06.000 --> 13:12.000] But it's significantly faster than dine. [13:12.000 --> 13:22.000] And in fact, in some cases, it can abstract away everything and be as fast as just calling the implementation directly. [13:22.000 --> 13:26.000] So I said earlier that we needed to use this constant, the expected type. [13:26.000 --> 13:36.000] We use that in another effectively big enum to go through and we take the type attribute out of this schema, which is a Python dictionary. [13:36.000 --> 13:40.000] And we use that to effectively look up which validator we're going to go and build. [13:40.000 --> 13:43.000] And again, I've shown a few here, but there's obviously a bunch more. [13:43.000 --> 13:48.000] This in real life is not implemented as a big match statement like this. [13:48.000 --> 13:53.000] It's a macro that builds this function, but it's clearer here if you get the idea. [13:53.000 --> 13:58.000] So I showed you earlier this validate function and I kind of skipped over the input argument. [13:58.000 --> 14:02.000] So the input argument is just an implementation of a trait. [14:02.000 --> 14:08.000] That trait input is like the beginnings of which are defined here. [14:08.000 --> 14:13.000] And it effectively gives you all the things that the validation functions are going to need on a value. [14:13.000 --> 14:26.000] So is none strict string, lack string, int, float, et cetera, et cetera, but also more complex types like date, date time, dictionary, et cetera, et cetera. [14:26.000 --> 14:33.000] And then we implement that trait on both a Python value and on a JSON value, [14:33.000 --> 14:39.000] which means that we can parse Rust directly without having to go via Python. [14:39.000 --> 14:41.000] That's super valuable for two reasons. [14:41.000 --> 14:43.000] One for performance reasons. [14:43.000 --> 14:52.000] So if our input is a string and if we were to then parse it into Python objects and then take that into our validator and run all of that, [14:52.000 --> 14:58.000] that would be much lower than parsing in Rust and then running the validator in Rust straight away. [14:58.000 --> 15:02.000] The other big advantage is to do with strict mode. [15:02.000 --> 15:07.000] So I said earlier that people want strict mode, but they say they want strict mode, but often they don't. [15:07.000 --> 15:11.000] So what people will say is, I want totally strict Python, why isn't it strict? [15:11.000 --> 15:13.000] And then you'll say, well, do you want to load data from JSON? [15:13.000 --> 15:15.000] And they say, yeah, of course I do. [15:15.000 --> 15:18.000] And you say, well, how are you going to define a date? [15:18.000 --> 15:21.000] And they're like, oh, well, obviously I'll use a standard date format. [15:21.000 --> 15:23.000] But that's not strict then. [15:23.000 --> 15:25.000] You're parsing a string. [15:25.000 --> 15:28.000] And they're like, oh, that's fine because it should know in that case it's coming from JSON. [15:28.000 --> 15:30.000] Well, obviously, how are we going to do that? [15:30.000 --> 15:38.000] By parsing JSON directly and in future potentially other types, we can implement our strict date method, [15:38.000 --> 15:41.000] both on that JSON input. [15:41.000 --> 15:44.000] We can say, well, we're in JSON. [15:44.000 --> 15:45.000] We don't have a date type. [15:45.000 --> 15:46.000] So we're going to have to do something. [15:46.000 --> 15:48.000] So we're going to parse a string. [15:48.000 --> 15:53.000] And effectively, the strict date implementation for JSON will parse a string. [15:53.000 --> 15:56.000] And therefore, we can have a strict mode that's actually useful, [15:56.000 --> 16:00.000] which we wouldn't have had if we couldn't have had in pydantic v1, [16:00.000 --> 16:04.000] where the validation logic doesn't know anything about where the date is coming from. [16:04.000 --> 16:06.000] Even if we have a parse JSON function, [16:06.000 --> 16:12.000] all it's doing is parsing JSON to Python and then doing validation. [16:12.000 --> 16:14.000] So then that's all very well. [16:14.000 --> 16:16.000] That defines effectively how we do our validation. [16:16.000 --> 16:17.000] What's the interface to Python? [16:17.000 --> 16:22.000] So that's where we have this schema validator rust struct, [16:22.000 --> 16:29.000] which using the PyClass decorator is also available as a Python class. [16:29.000 --> 16:32.000] And all it really contains is a validator, [16:32.000 --> 16:37.000] which of course can in turn contain other validators, as I said earlier. [16:37.000 --> 16:43.000] And its implementation, which are all then exposed as Python methods, are new, [16:43.000 --> 16:44.000] which just construct it. [16:44.000 --> 16:49.000] So we call the build validator and get back an instance of our validator, [16:49.000 --> 16:53.000] which we then store and return the type. [16:53.000 --> 16:55.000] Actually, this is much more complicated. [16:55.000 --> 17:01.000] One of the cleverest and most infuriating bits of pydantic core is that we, [17:01.000 --> 17:05.000] as you can imagine, this schema for defining validation becomes quite complex. [17:05.000 --> 17:07.000] It's very easy to make a mistake. [17:07.000 --> 17:11.000] So we validate it using pydantic core itself, [17:11.000 --> 17:13.000] which when it works is magic, [17:13.000 --> 17:16.000] and when it doesn't work leads to impossible errors, [17:16.000 --> 17:21.000] because obviously all of the things that you're looking at as members of dictionaries [17:21.000 --> 17:24.000] are in turn the names of bits of validation. [17:24.000 --> 17:26.000] So it's complete hell, but it works, [17:26.000 --> 17:33.000] and it makes it very hard to build an invalid or not build the validator that you want. [17:33.000 --> 17:35.000] And then we have these two implementations, [17:35.000 --> 17:38.000] two functions which do validate Python objects, [17:38.000 --> 17:40.000] as I said earlier, which call validate, [17:40.000 --> 17:45.000] and same with JSON, where we parse a JSON using SERD to a JSON value [17:45.000 --> 17:48.000] and then call validate again with that input. [17:48.000 --> 17:52.000] This code is obviously heavily simplified so that it fits. [17:52.000 --> 17:54.000] It doesn't fit on the page, but nearly fits on the page. [17:54.000 --> 17:56.000] So not everything is exactly as it really is, [17:56.000 --> 18:02.000] but I think that kind of gives you an idea of how we build up these validators. [18:02.000 --> 18:03.000] The other thing missed here, [18:03.000 --> 18:06.000] we also do the whole thing again for serialization. [18:06.000 --> 18:12.000] So the serialization from both a pydantic model to a Python dictionary [18:12.000 --> 18:16.000] and from a pydantic model straight to JSON is all written in Rust, [18:16.000 --> 18:21.000] and it does useful things like filtering out elements as you go along, [18:21.000 --> 18:26.000] and it's effectively the same structure, uses the same schema, [18:26.000 --> 18:32.000] but it's just dedicated to serialization rather than validation. [18:32.000 --> 18:36.000] So what does the Python interface then look like? [18:36.000 --> 18:39.000] So what I didn't explain earlier is that pydantic v2, [18:39.000 --> 18:42.000] which is going to be released, fingers crossed in Q1 this year, [18:42.000 --> 18:44.000] is made up of two packages. [18:44.000 --> 18:47.000] We have pydantic itself, which is a pure Python package, [18:47.000 --> 18:51.000] and then we have pydantic core, which is almost all Rust code. [18:51.000 --> 18:54.000] We have a little bit of shim of Python to explain what's going on, [18:54.000 --> 18:57.000] but it's really just the Rust code I've been showing you. [18:57.000 --> 19:00.000] So what pydantic now does, [19:00.000 --> 19:04.000] all that pydantic effectively takes care of is converting those type annotations [19:04.000 --> 19:08.000] I showed you earlier into a pydantic core schema [19:08.000 --> 19:10.000] and then building a validator. [19:10.000 --> 19:12.000] So looking at an example here, [19:12.000 --> 19:14.000] we obviously import schema validator, [19:14.000 --> 19:18.000] which I just showed you from, that's come up at the wrong time, [19:18.000 --> 19:21.000] from pydantic core, [19:21.000 --> 19:27.000] and then the base level, [19:27.000 --> 19:32.000] the schema for the base validator is model, [19:32.000 --> 19:34.000] and it contains, as I said earlier, a class, [19:34.000 --> 19:36.000] which is the Python class to instantiate, [19:36.000 --> 19:42.000] and another schema, which in turn defines the fields. [19:42.000 --> 19:47.000] And that inner validator is then defined by a type dict validator, [19:47.000 --> 19:49.000] as I said earlier. [19:49.000 --> 19:51.000] So this is completely valid Python code. [19:51.000 --> 19:53.000] This will run now. [19:53.000 --> 19:57.000] So yeah, we have a type dict validator which contains fields, [19:57.000 --> 20:00.000] which in turn are those fields which I showed you earlier. [20:00.000 --> 20:05.000] So title attendances of type int when I talked about earlier. [20:05.000 --> 20:07.000] The most interesting thing here is, [20:07.000 --> 20:10.000] if you look at the when validator, [20:10.000 --> 20:12.000] it gets a bit confusing. [20:12.000 --> 20:15.000] It's schema is of type default, [20:15.000 --> 20:19.000] which in turn contains another schema, [20:19.000 --> 20:21.000] which is of type nullable, [20:21.000 --> 20:24.000] which is the simplest union, either a value or none. [20:24.000 --> 20:29.000] The default validator contains another member, [20:29.000 --> 20:32.000] which is the default value, in this case none, [20:32.000 --> 20:35.000] and the inner schema is then nullable, [20:35.000 --> 20:37.000] which in turn contains another inner schema, [20:37.000 --> 20:39.000] which is then the date time. [20:39.000 --> 20:42.000] So that's how we define effectively default values [20:42.000 --> 20:44.000] and null or nullable. [20:44.000 --> 20:47.000] So one of the other mistakes in Pyrantic in the past [20:47.000 --> 20:49.000] was that we kind of conflate, [20:49.000 --> 20:51.000] effectively Python made a mistake about 10 years ago [20:51.000 --> 20:56.000] where they used, they had a alias for union of something and none, [20:56.000 --> 20:58.000] that they called optional, [20:58.000 --> 21:01.000] which then meant that I didn't want to have a thing [21:01.000 --> 21:03.000] called optional, but was not optional. [21:03.000 --> 21:07.000] And so we conflated nullable with optional in Pyrantic [21:07.000 --> 21:09.000] and rightly it confused everyone. [21:09.000 --> 21:12.000] And so the solution, the solution from Python [21:12.000 --> 21:15.000] was to start using the pipe operator for unions [21:15.000 --> 21:17.000] and to just basically ignore optional. [21:17.000 --> 21:19.000] They can't really get rid of it, [21:19.000 --> 21:22.000] but they just pretend it didn't really happen. [21:22.000 --> 21:26.000] My solution is to define default [21:26.000 --> 21:28.000] and nullable as completely separate things [21:28.000 --> 21:32.000] and we're not going to use the optional type anywhere in our docs. [21:32.000 --> 21:35.000] We're just going to use union of thing and none [21:35.000 --> 21:38.000] to avoid that confusion. [21:38.000 --> 21:41.000] And then I think mistakes, I hope it kind of makes sense to you. [21:41.000 --> 21:44.000] Again, it's this like schema within schema within schema, [21:44.000 --> 21:47.000] which become validator within validator. [21:47.000 --> 21:51.000] And we take our code, as I showed you earlier, [21:51.000 --> 21:52.000] run validation. [21:52.000 --> 21:54.000] In this case, we call validate Python. [21:54.000 --> 21:55.000] We've got some Python code, [21:55.000 --> 21:57.000] but we could just as well have a JSON string [21:57.000 --> 21:59.000] and call validate JSON. [21:59.000 --> 22:00.000] And then we have a talk instance, [22:00.000 --> 22:08.000] which lets us access the members of it as you normally would. [22:08.000 --> 22:11.000] So where does Rust excel in these applications? [22:11.000 --> 22:14.000] Why build this in Rust? [22:14.000 --> 22:17.000] There are a bunch of obvious reasons to use Rust, [22:17.000 --> 22:19.000] performance being the number one, [22:19.000 --> 22:21.000] multi-threading and not having the global interpreter lock [22:21.000 --> 22:24.000] in Python is another one. [22:24.000 --> 22:29.000] The third is using high-quality existing Rust libraries [22:29.000 --> 22:32.000] to build libraries in Python instead of implementing it yourself. [22:32.000 --> 22:35.000] So I maintain two other Python libraries written in Rust [22:35.000 --> 22:40.000] watch files, which uses the notify crate to do file-watching [22:40.000 --> 22:42.000] and then RTOML, which, as you can guess, [22:42.000 --> 22:47.000] is a TOML parser using the TOML library from Rust. [22:47.000 --> 22:54.000] And the RTOML library is the fastest Python TOML parser [22:54.000 --> 22:57.000] out there. [22:57.000 --> 22:59.000] And actually watch files is becoming more and more popular. [22:59.000 --> 23:00.000] It's the default now with u-vehicle, [23:00.000 --> 23:03.000] which is one of the web servers. [23:03.000 --> 23:07.000] But perhaps less obviously in terms of where Rust fits in best. [23:07.000 --> 23:09.000] Deeply recursive code, as I've just showed you, [23:09.000 --> 23:11.000] with these validators within validators. [23:11.000 --> 23:12.000] There's no stack. [23:12.000 --> 23:15.000] And so we don't have a penalty for recursion. [23:15.000 --> 23:17.000] We do have to be very, very careful, [23:17.000 --> 23:19.000] because, as I'm sure you all know, [23:19.000 --> 23:21.000] if you have recursion in Rust and you don't catch it, [23:21.000 --> 23:22.000] you just get a segfault. [23:22.000 --> 23:26.000] And that would be very, very upsetting to Python developers [23:26.000 --> 23:28.000] who've never seen one before. [23:28.000 --> 23:31.000] So there's an enormous amount of, as a significant amount of code [23:31.000 --> 23:34.000] in Pydantic Core dedicated to catching recursion, [23:34.000 --> 23:37.000] we have to have, is it two or three different sorts of guard [23:37.000 --> 23:40.000] to protect against recursion in all possible different situations, [23:40.000 --> 23:43.000] because it's effectively the worst thing that we can have, [23:43.000 --> 23:45.000] is that there is some data structure [23:45.000 --> 23:46.000] that you can pass to Pydantic, [23:46.000 --> 23:50.000] which causes your entire Python process to segfault, [23:50.000 --> 23:53.000] and you wouldn't know where to even start looking. [23:53.000 --> 23:55.000] So that's a blessing. [23:55.000 --> 23:58.000] The lack of a stack is a blessing and a curse. [23:58.000 --> 24:00.000] And then the second big advantage, [24:00.000 --> 24:03.000] I think, of where Rust excels, [24:03.000 --> 24:05.000] is in the small modular components. [24:05.000 --> 24:07.000] So where I was showing you before, [24:07.000 --> 24:11.000] these relatively small, in terms of code footprint validators, [24:11.000 --> 24:14.000] which in turn hold other ones, [24:14.000 --> 24:16.000] there's obviously no performance penalty [24:16.000 --> 24:18.000] for having these functions in Rust. [24:18.000 --> 24:21.000] I say almost, because we actually have to use box [24:21.000 --> 24:23.000] around validators because they hold themselves. [24:23.000 --> 24:28.000] So there is a bit of an overhead of going into the heap, [24:28.000 --> 24:31.000] but it's relatively small, particularly compared to Python. [24:31.000 --> 24:34.000] And then the lastly complex error handling, [24:34.000 --> 24:37.000] obviously in Python, you don't know what's going to error [24:37.000 --> 24:41.000] and what exceptions you're going to get in Rust. [24:41.000 --> 24:43.000] Putting to one side the comment about panic earlier, [24:43.000 --> 24:45.000] you can in general know what errors you're going to get [24:45.000 --> 24:49.000] and catch them and construct validation errors [24:49.000 --> 24:52.000] in the case of Pydantic, which is a great deal easier [24:52.000 --> 24:56.000] than it would ever have been to write that code in Python. [24:56.000 --> 25:00.000] So the way I want to think about the future development [25:00.000 --> 25:03.000] of Python is not as Python versus Rust, [25:03.000 --> 25:06.000] but effectively as Python as the user interface for Rust, [25:06.000 --> 25:10.000] or the application developer interface for Rust. [25:10.000 --> 25:12.000] So I'd love to see more and more libraries [25:12.000 --> 25:14.000] do what we've done with Pydantic Core [25:14.000 --> 25:20.000] and effectively implement their low-level components in Rust. [25:20.000 --> 25:23.000] So my dream is a world in which, [25:23.000 --> 25:26.000] thinking about the lifecycle of a HTTP request, [25:26.000 --> 25:29.000] but you could think the same about some NL pipeline [25:29.000 --> 25:31.000] or many other applications, [25:31.000 --> 25:35.000] we effectively, the vast majority of the execution [25:35.000 --> 25:37.000] is Rust or C, [25:37.000 --> 25:40.000] but then all of the application logic can be in Python. [25:40.000 --> 25:43.000] So effectively we get to a point where we have 100% [25:43.000 --> 25:45.000] of developer time spent in high-level languages, [25:45.000 --> 25:50.000] but only 1% of CPU dedicated to actually running Python code, [25:50.000 --> 25:52.000] which is slower and is always going to be slower. [25:52.000 --> 25:54.000] I don't think there's ever a world in which [25:54.000 --> 25:56.000] someone's going to come up with a language that is as fast [25:56.000 --> 25:59.000] and as safe as Rust, but also as quick to write as Python. [25:59.000 --> 26:02.000] So I don't think it should be one versus the other. [26:02.000 --> 26:05.000] It should be building the low-level, building the Rails, [26:05.000 --> 26:08.000] perhaps a bad term, but the Rails in Rust, [26:08.000 --> 26:12.000] and building the train in Python. [26:12.000 --> 26:15.000] It doesn't work, but you get where I'm coming from. [26:15.000 --> 26:19.000] Anyway, on that note, thank you very much. [26:19.000 --> 26:22.000] A few links there, particularly thanks to the PyO3 team [26:22.000 --> 26:25.000] who built the bindings for Rust in Python, which is amazing. [26:25.000 --> 26:29.000] And if you want a laugh, there's a very, very funny issue [26:29.000 --> 26:31.000] on GitHub where a very angry man says [26:31.000 --> 26:33.000] why we should never use Rust. [26:33.000 --> 26:36.000] So if you want to read that, I then took some time [26:36.000 --> 26:38.000] to take them to pieces, which was quite satisfying, [26:38.000 --> 26:39.000] although a waste of time. [26:39.000 --> 26:41.000] So have a look at that. [26:41.000 --> 26:42.000] Questions? [26:42.000 --> 27:04.000] First, especially for the sanitation, [27:04.000 --> 27:07.000] are you thinking to publish a library of Rust? [27:07.000 --> 27:09.000] The job is already done, [27:09.000 --> 27:13.000] and you could have a public API in a library [27:13.000 --> 27:16.000] ready to validate Rust data. [27:16.000 --> 27:19.000] I don't understand quite what... [27:19.000 --> 27:22.000] So you wrote the library in Rust, [27:22.000 --> 27:27.000] so could you publish just an API to validate JSON, [27:27.000 --> 27:30.000] for example, from Rust, instead of through Python? [27:30.000 --> 27:33.000] Absolutely, you could, and it would be useful [27:33.000 --> 27:36.000] if you wanted to somehow construct the schema [27:36.000 --> 27:39.000] at runtime fast, but it's never going to be anywhere [27:39.000 --> 27:41.000] near as performant as said, [27:41.000 --> 27:45.000] because you were not compiling... [27:45.000 --> 27:47.000] We can't do anything at compile time. [27:47.000 --> 27:50.000] Secondly, it's currently all completely intertwined [27:50.000 --> 27:52.000] with the PyO3 library and the Python types. [27:52.000 --> 27:56.000] So there is a future nascent possible project, [27:56.000 --> 27:58.000] Tidantic, which is Pidantic for TypeScript, [27:58.000 --> 28:00.000] where we take the PyO3 types, [28:00.000 --> 28:02.000] we effectively replace them with a new library [28:02.000 --> 28:05.000] which has a compile time switch between the Python bindings [28:05.000 --> 28:08.000] and the JavaScript bindings or the Wasm bindings, [28:08.000 --> 28:10.000] and then we can build Tidantic. [28:10.000 --> 28:12.000] That's a future plan, but a long way off. [28:12.000 --> 28:14.000] Right now, it wouldn't really be worth it, [28:14.000 --> 28:16.000] because you would get lots of slowdown from Python [28:16.000 --> 28:17.000] and from compile time. [28:17.000 --> 28:19.000] So we need a completely different library, [28:19.000 --> 28:21.000] just for us, like you're saying. [28:21.000 --> 28:22.000] Yeah, SIRD is amazing. [28:22.000 --> 28:25.000] I don't think I'm going to go and try and compete with that. [28:25.000 --> 28:30.000] At least, it's great for that application. [28:30.000 --> 28:32.000] Thanks for the talk. [28:32.000 --> 28:34.000] Recently, I think the Python library cryptography [28:34.000 --> 28:37.000] introduced Rust, and had some complaints [28:37.000 --> 28:39.000] from people using obscure build processes [28:39.000 --> 28:41.000] where Rust didn't work. [28:41.000 --> 28:43.000] Are you expecting anything from that? [28:43.000 --> 28:46.000] So I will actually bring up now. [28:46.000 --> 28:48.000] Now I'm going to get into how to... [28:48.000 --> 28:50.000] Effectively, go and read that issue, [28:50.000 --> 28:53.000] where, among other things, I... [28:53.000 --> 28:58.000] Oh, how do I get out of this mode? [28:58.000 --> 29:01.000] So, rant, rant, rant, rant from him. [29:01.000 --> 29:05.000] Effectively, I went through the... [29:05.000 --> 29:07.000] just over a quarter of a billion downloads [29:07.000 --> 29:09.000] over the last 12 months of Pydantic, [29:09.000 --> 29:11.000] and I worked out looking at the distribution [29:11.000 --> 29:13.000] of the different operating systems [29:13.000 --> 29:17.000] and, like, libc implementations, et cetera, [29:17.000 --> 29:21.000] that 99.9859% of people would have got a binary [29:21.000 --> 29:23.000] if they had installed Pydantic Core then. [29:23.000 --> 29:24.000] That number will be higher now, [29:24.000 --> 29:27.000] because there will be fewer esoteric operating systems. [29:27.000 --> 29:30.000] Most of the other ones, most of the failed ones, [29:30.000 --> 29:32.000] if you look, are actually installing, [29:32.000 --> 29:35.000] say, they're installing Python onto iOS. [29:35.000 --> 29:36.000] I don't know what that means, [29:36.000 --> 29:39.000] or whether it could ever work, but... [29:39.000 --> 29:41.000] Also, the other thing I would say is, [29:41.000 --> 29:43.000] Pydantic Core is already compiled to WebAssembly, [29:43.000 --> 29:45.000] so you can already run it in the browser. [29:45.000 --> 29:47.000] So I understand why people complained, [29:47.000 --> 29:49.000] but I think it's not a concern for... [29:49.000 --> 29:51.000] it's a straw man for most people. [29:51.000 --> 29:53.000] So that's why you slapped down. [29:53.000 --> 29:54.000] Yeah. [29:54.000 --> 29:56.000] And, again, if there's another... [29:56.000 --> 29:58.000] if there's a distribution that we don't... [29:58.000 --> 30:00.000] if we release 60 different binaries, [30:00.000 --> 30:03.000] if there's another one, we'll try and compile for it [30:03.000 --> 30:05.000] and release the binary. [30:08.000 --> 30:11.000] There's a question right at the back, I think, just to... [30:13.000 --> 30:15.000] I'll get back to the talk rather than... [30:15.000 --> 30:17.000] where are we? [30:17.000 --> 30:30.000] Is there a way to use the Django models as Pydantic models? [30:30.000 --> 30:33.000] Say again? [30:33.000 --> 30:36.000] To use the Django to have, like, a binding [30:36.000 --> 30:41.000] or to translate the Django model directly into a Pydantic model? [30:41.000 --> 30:43.000] There's no way at the moment. [30:43.000 --> 30:45.000] There's a number of different ORMs, [30:45.000 --> 30:47.000] I know of, built on top of Pydantic, [30:47.000 --> 30:49.000] which effectively allow that... [30:49.000 --> 30:51.000] if you were wanting specifically Django, [30:51.000 --> 30:53.000] there's a project called DjangoNinja [30:53.000 --> 30:55.000] that makes extensive use of Pydantic. [30:55.000 --> 30:57.000] I don't know that much about it, [30:57.000 --> 30:59.000] but if you actually wanted Pydantic models, [30:59.000 --> 31:02.000] you'd probably want some kind of code reformat to convert them. [31:02.000 --> 31:04.000] So I look at DjangoNinja, [31:04.000 --> 31:07.000] I'm sure what they're doing is the best of what's possible right now. [31:07.000 --> 31:09.000] Okay, thank you. [31:13.000 --> 31:17.000] If you had additional time, say, after finishing Pydantic, [31:17.000 --> 31:21.000] are there any other projects where you'd like to follow this vision [31:21.000 --> 31:25.000] of, like, a Rust core with Python user space or, like, API? [31:25.000 --> 31:28.000] Yeah, there are a number of ones. [31:28.000 --> 31:31.000] So there's already OR JSON, which is a very, very fast [31:31.000 --> 31:36.000] if unsafe in the sense of littered with unsafe JSON parser, [31:36.000 --> 31:38.000] which is very, very fast. [31:38.000 --> 31:41.000] The obvious one is a web framework where you do, [31:41.000 --> 31:43.000] like I kind of showed here, [31:43.000 --> 31:47.000] like the HTTP parsing, the routing, all in Rust. [31:47.000 --> 31:51.000] That's not very easy using ASGI. [31:51.000 --> 31:54.000] There are already a few projects doing that. [31:54.000 --> 31:57.000] So that would be the obvious one, but there's no winner yet. [31:57.000 --> 32:01.000] Currently, the best low-level web framework is Starlit, [32:01.000 --> 32:03.000] which FastAPI is built on, [32:03.000 --> 32:06.000] but I think it does use Rust for, it uses a Rust library [32:06.000 --> 32:10.000] for HTTP parsing or a C library. [32:10.000 --> 32:14.000] So some of it's already happening, but no obvious candidate right now. [32:16.000 --> 32:19.000] What I would say, though, is libraries like Rich, [32:19.000 --> 32:22.000] no criticism of Will, but, like, Rich is incredibly complicated. [32:22.000 --> 32:24.000] It's for terminal output. [32:24.000 --> 32:26.000] It's not so much performance critical, [32:26.000 --> 32:28.000] but it's really quite involved in complex logic. [32:28.000 --> 32:31.000] I would much prefer to write that logic in Rust than Python. [32:31.000 --> 32:34.000] Yeah, I think there are lots of candidates. [32:39.000 --> 32:41.000] Tonya online is asking, [32:41.000 --> 32:45.000] what do you mean by Python as the application layer? [32:45.000 --> 32:50.000] So I guess I could have added some example code here, [32:50.000 --> 32:55.000] but you can imagine a Python function, [32:55.000 --> 32:58.000] which is a view endpoint in a web framework, [32:58.000 --> 33:01.000] which takes in some validated arguments [33:01.000 --> 33:03.000] from done by the Pydantic. [33:03.000 --> 33:07.000] You then decide in Python to make a query to the database [33:07.000 --> 33:09.000] to get back the user's name from the ID, [33:09.000 --> 33:13.000] and then you return a JSON object containing data about the user. [33:13.000 --> 33:15.000] If you think about that, [33:15.000 --> 33:18.000] all of the code outside the Python functions, [33:18.000 --> 33:21.000] excuse me, could be written in a faster language, [33:21.000 --> 33:23.000] whether it be the database query accessing the database, [33:23.000 --> 33:27.000] TSL termination, HTTP parsing, routing, validation, [33:27.000 --> 33:31.000] but effectively using Python to define as a way [33:31.000 --> 33:35.000] to effectively configure Rust code or configure compile code. [33:47.000 --> 33:48.000] Yes, hello. [33:48.000 --> 33:52.000] I have a question, just a Pydantic one. [33:52.000 --> 33:55.000] Is there any support or are you planning any support [33:55.000 --> 33:57.000] alternative schema types like Protobuf, [33:57.000 --> 34:00.000] or JRPC, or Avro? [34:00.000 --> 34:02.000] Possibly in future. [34:02.000 --> 34:04.000] What I have a plan for is, [34:04.000 --> 34:06.000] I don't want to build them into Pydantic. [34:06.000 --> 34:08.000] Pydantic's already big, but there is a, [34:08.000 --> 34:10.000] obviously you can parse them to Python now, [34:10.000 --> 34:12.000] parse them and then validate them as a Python object. [34:12.000 --> 34:16.000] There is a plan effectively to take the, [34:16.000 --> 34:32.000] this, that's the one. [34:32.000 --> 34:34.000] Which you would then construct in Rust, [34:34.000 --> 34:37.000] parse as a Python value into Pydantic core, [34:37.000 --> 34:40.000] which would then extract the raw underlying Rust instance [34:40.000 --> 34:42.000] and then validate that. [34:42.000 --> 34:44.000] And that would allow you to get basically [34:44.000 --> 34:46.000] a Python validation effectively, [34:46.000 --> 34:48.000] but without having to, [34:48.000 --> 34:50.000] us having to either have compile time dependencies [34:50.000 --> 34:54.000] or build it all into Pydantic core. [34:54.000 --> 34:57.000] I think that's our last question that we have time for. [34:57.000 --> 35:01.000] One comment we did get from Matrix was that [35:01.000 --> 35:04.000] this code is a bit small on the, [35:04.000 --> 35:06.000] on the display, so if you upload the, [35:06.000 --> 35:08.000] I will do, yeah. [35:08.000 --> 35:09.000] Perfect. [35:09.000 --> 35:11.000] So if you're watching the stream, [35:11.000 --> 35:13.000] the slides will be uploaded and you can read the code. [35:13.000 --> 35:15.000] Oh, I'll put them on Twitter as well, but yeah, definitely. [35:15.000 --> 35:16.000] I'll upload them as well. [35:16.000 --> 35:17.000] Awesome. [35:17.000 --> 35:46.000] Thank you very much.