[00:00.000 --> 00:11.960] And this is the last talk of the day room, and Mackenzie Morgan is going to talk to us [00:11.960 --> 00:14.000] about dealing with a monster query. [00:14.000 --> 00:15.000] Give it up. [00:15.000 --> 00:16.000] So, hello. [00:16.000 --> 00:17.000] I'm Mackenzie. [00:17.000 --> 00:22.000] I actually work at Nextro now, and I did not put that on the thing, because it's kind of [00:22.000 --> 00:29.000] weird, since this is about something that happened at a previous job. [00:30.000 --> 00:31.000] Let's see. [00:31.000 --> 00:33.000] Let's go over here in the spacebar. [00:33.000 --> 00:35.000] There we go. [00:35.000 --> 00:42.000] So, back in 2020, I learned Elixir because the company that I was working at, which is Axios, [00:42.000 --> 00:51.000] it's a news company, they launched our new mobile app, and then it promptly crashed every [00:51.000 --> 01:02.000] morning at 6 a.m., and I am not a morning person, so I did not want those pings. [01:02.000 --> 01:06.000] And so, we needed to do something about this, and so there's a quick rewrite into Elixir. [01:06.000 --> 01:08.000] I was not involved in the rewritings. [01:08.000 --> 01:10.000] I didn't know Elixir yet. [01:10.000 --> 01:14.000] They grabbed a couple of contractors and said, hey, learn Elixir, because we're going to [01:14.000 --> 01:15.000] be handing this off to you. [01:15.000 --> 01:16.000] Okay. [01:22.000 --> 01:27.000] And everything worked out, worked really great, except that there was this one query. [01:35.000 --> 01:44.000] So, we had this one query that had a whole lot of OR clauses in it, because, well, and [01:44.000 --> 01:47.000] this was responsible for the majority of our database load. [01:47.000 --> 01:56.000] And we also had the biggest day in U.S. politics coming up, the U.S. presidential election. [01:56.000 --> 02:02.000] If you have dealt with news organizations, you know that politics, big political events [02:02.000 --> 02:05.000] mean a ton of traffic, right? [02:05.000 --> 02:09.000] And so, that is a huge day working in a news org, and this was the second newspaper I'd [02:09.000 --> 02:13.000] worked for, so I knew how this went. [02:13.000 --> 02:19.000] Usually, for advice for optimizing stuff is to move as much computation as possible out [02:19.000 --> 02:22.000] of the code and into the database, right? [02:22.000 --> 02:27.000] But this is the story of how refactoring the opposite direction was what actually saved us. [02:35.000 --> 02:40.000] So, it's pretty standard in ACMS to have a structure that looks kind of like this, right, [02:40.000 --> 02:45.000] where you've got, okay, you've got a post, and it can be in a category, and it can be [02:45.000 --> 02:49.000] tagged, and it can be this, and it can be that, and you're trying to find posts in [02:49.000 --> 02:52.000] any of these different ways. [02:52.000 --> 02:56.000] So, we had four different taxonomies that we were using to decide what we were going to [02:56.000 --> 02:58.000] show you in the mobile app. [02:58.000 --> 03:01.000] You could subscribe to a channel, either tag or whatever. [03:01.000 --> 03:06.000] And so, we had all these four ORs where you go get the post through taxonomy one, [03:06.000 --> 03:09.000] one's the aggregator, two, three, four, all those things. [03:09.000 --> 03:14.000] And that's where our four big OR queries came in. [03:14.000 --> 03:18.000] Which looked like this. [03:18.000 --> 03:20.000] And that's the simplified version. [03:20.000 --> 03:23.000] That doesn't have the sorting, that doesn't have the time limits. [03:23.000 --> 03:25.000] That's the simplified version. [03:25.000 --> 03:27.000] But that's what that looks like, and it's ridiculous. [03:27.000 --> 03:33.000] And so, AWS stats told us that this was going absolutely bonkers. [03:34.000 --> 03:42.000] I did a Postgres explain, analyze on the query that the Ecto generated. [03:42.000 --> 03:50.000] And Postgres said it was over 3,600, was like cost for the analyze, and that it would take [03:50.000 --> 03:53.000] eight milliseconds to execute. [03:53.000 --> 03:59.000] But that's like, just computing what it needed to run was the huge problem for it. [03:59.000 --> 04:03.000] So, I'm going to go through how we factor this to be super fast. [04:12.000 --> 04:16.000] So, okay, so we had four taxonomies, so four smaller queries. [04:16.000 --> 04:19.000] So, really, they each look like that. [04:19.000 --> 04:21.000] That's very simple. [04:21.000 --> 04:25.000] And those each take one-eighth of a millisecond. [04:25.000 --> 04:28.000] So, this is a good start. [04:28.000 --> 04:31.000] It's still kind of ugly if you write that four times, though. [04:34.000 --> 04:40.000] So, but what if we take advantage of Adams and the PIN operator in Elixir, [04:40.000 --> 04:44.000] because Elixir's got some pretty cool syntax features. [04:47.000 --> 04:52.000] And we make it into a query, sort of make a function that we can call four times, [04:52.000 --> 04:53.000] passing those in. [04:53.000 --> 04:57.000] And that's a bit better, but we're still calling it four times separately. [04:57.000 --> 05:01.000] And so, if we go a little bit further and we take advantage of the concurrency [05:01.000 --> 05:06.000] that we all know the beam has, we can pass in the list of what the taxonomies are [05:06.000 --> 05:09.000] that we're going through and use the task async stream. [05:11.000 --> 05:15.000] And now, we can make all four queries running at the same exact, [05:15.000 --> 05:21.000] like running simultaneously, just by passing in that list, which makes it really easy [05:21.000 --> 05:23.000] to, you know, instead of copying and pasting more and more code, [05:23.000 --> 05:26.000] just adds the list when we add another taxonomy. [05:26.000 --> 05:27.000] And guess what? [05:27.000 --> 05:30.000] By the time I left the company, yes, there were five. [05:36.000 --> 05:42.000] So, what did changing over from that big nasty block to this get us? [05:42.000 --> 05:46.000] The database CPU utilization went down from 50% to 40%, [05:46.000 --> 05:49.000] so that's a 20% drop because math. [05:49.000 --> 05:55.000] I know it looks like 10, but, you know, you do 40s, 80% of 50, yeah. [05:57.000 --> 06:02.000] The Postgres, remember I said the Postgres analyzes over 3,600? [06:02.000 --> 06:03.000] It was 16 after that. [06:03.000 --> 06:06.000] That was a much happier database server. [06:06.000 --> 06:11.000] And the execution time, like I said, went from eight milliseconds to one eighth of a millisecond each, [06:11.000 --> 06:16.000] so a total of a half a millisecond if you were to string them along continuously. [06:16.000 --> 06:22.000] So, yeah, so much faster and our database overhead down by 20%. [06:22.000 --> 06:23.000] Great. [06:26.000 --> 06:33.000] We also had a seven times increase in the number of requests per second that we could handle according to our benchmarking scripts. [06:42.000 --> 06:45.000] So, we got to have a stress free election night. [06:45.000 --> 06:52.000] I did not have to be trying to restarting servers at two o'clock in the morning as we waited and waited for results. [06:53.000 --> 06:55.000] So, that's it. [06:55.000 --> 06:57.000] That's all I'm going to show you about. [06:57.000 --> 07:00.000] And that's how to find me.