Before we get started, these folks in the blue vest, the ones that just left the ones here, give them a big round of applause. They have a lot of time and effort. They've been doing something like napping or eating. They're doing something more fun than this. They've got to do this for the lovely youth in the community. So today we're talking about building your own Postgres DBA out of available MySQL DBA. And Dave Stokes, I'm a technology evangelist for Percona. Two years ago I joined Percona. Before that I was on the community team for MySQL. Stars MySQL AD, you don't know through sun and the more flax acquisition. Origin story. 2007 I hired to join MySQL AB, which was then a cute little startup company. And became the, well, what's it mean a PHP programmer to be running the certification group? If you want to be certified to be a ZBA or developer, I was the guy who signed off on your application. The big trouble I had was that hiring managers would call me and they'd say, it's hard to find a MySQL DBA. And I said, well, you have a list of folks who recently certified on the website. And they're like, yeah, by the way, it's impossible to find Postgres DBAs. And over the past couple of years, or several years, I've noticed it's still that problem with getting DBAs. So an economics comes down to a make versus buy decision. If you go out in the free market trying to buy a DBA, it can be expensive. You're also not quite sure what you're going to get. But if you have folks who run MySQL databases and they're fairly tolerant at it, you might have a good chance of converting them over to running Postgres. This is the definition of make versus buy for those who are interested in that. So why MySQL DBAs? There's a lot of them out there. They have basic knowledge of what a database does, what you do to it, what you don't do to it. They know the care in feeding, watering, and taking care of the basics. They're usually Postgres curious. For years they've heard, yeah, MySQL is okay, doesn't do this, doesn't do this right. And Postgres is so much better. And they can go, yeah, well sure, eventually the curiosity gets to them. Also there's a lot of similarities between MySQL and Postgres. Both were started by guys named Michael who tend to piss people off by the things they say. Also, when you show them some of the goodies that Postgres has, it becomes very, very attractive to them. Also in the past couple of years we've had a lot of folks who weren't DBAs for businesses who have seen their data taken from underneath them and pushed up in the cloud. What they're doing now is either boring or redundant for now if they want to do. And they're looking for another opportunity to be database administrators. So how do you recognize MySQL DBAs? Now I recognize some folks in this room. It's a couple of them are kind of scooting down in their chairs. First sense are t-shirts and coffee cups. You'll see a whole bunch of wide variety of them. And some of them are kind of cute, some of them are clever, others are kind of scary. But if you know that they're not a Postgres DBA but you know they're the admin of database, you might see some of these signs around. So Postgres versus MySQL differences. Both are relational database management systems. Both open source. Both very popular and both are technically old enough to drink. And since that's turned... Uh-oh, did I not turn it on? Yeah. Oh, there we go. So Postgres. Postgres has better support of the SQL standards. It's governed by a mailing list, which I always thought was kind of crazy until I've seen in actions the past over years. It actually works fairly well. Active community as you've probably seen today in this room alone. MySQL has seen this easier. It's governed by Oracle. If you're one of the folks who spent Thursday and Friday in the MySQL Belgium days, you've seen a lot of the stuff that is coming out from Oracle that may not impress you. But they also have an active community. So someone once said the devil is in the details. Okay, so you found your candidate to make your brand new Postgres DBA. What do you do with them? Well, you're going to mention to them that they're going to, at the end of the process, have better skills, be cross-trained, have better job opportunities, and now complain about knowing two databases. How many folks here run more than one database? Okay. And by the way, what we start is you're going to tell them you have different approaches to the same problem. Different isn't better, isn't worse, it's just different. They're going to learn a whole bunch of new tools. The basics are still the basics. You still have to do backups. You still have to do restores. Account administration is very similar. Tuning is wildly different, but not in theory. Query tuning is a big difference in Postgres. It's a little more complicated. Then there's the really neat stuff. Postgres has two JSON data types versus just one. The merge operator, more indexes as you can shake a stick at. But unfortunately there's some stuff that I call the, oh my god, why do we still have this stuff in 2023? It should be 2024 that you need to warn them about. Let me get to that in a bit. First steps. Give them an environment that's similar to what they're used to working with. The way to do this is go down and download the video rental database. Now for you folks without gray hair, 20 years ago if you wanted to watch a movie in your house, you had to leave the house, go to a store where they had these things made out of plastic, either VHS or Sony beta format that had the movie, and you take it home and you put it in a special player. You actually had to go out and get that. And they may not have your movie in stock. Now 15 years ago, rather than DVDs, rather than having video rentals, they actually had DVDs, which is a little bit easier because that was a standardized format. MySQL used the Secura database, which is also a DVD or video rental database in their documentation, training, and blogs for years and years and years. So you're going to give them a similar environment to what they're used to. And you'll talk them through how to do a simple create database, show them how to use PG restore to load in the data. Real simple. By the way, if you've never seen one of these, this is what a video store used to look like. Okay, now that they have that DVD and rental up there, have them log in. Well, they're still logged in to use your Postgres and create a user. Now, I advise you to do it as a super user. You might have heard Mr. Booze earlier having great ideas of why you don't want to do that. MySQL DVAs are used to having God privileges, and if they screw it up, you go back and you show them how to reinstall the database. As I mentioned, this is dangerous. You bypass a lot of security stuff, but you want them screwing up at this level, so you learn not to do it at a higher level, which is much more expensive for you. Okay, back in their user account, they type in PSQL minus D DVD rental. That tells them that they're talking to the database of DVD, DVD, rental, and they get that lovely little prompt. The prompt with the equal sign OctoThorpe is warning them that they have super user privilege. You'll have to point that out to them. So, at this point, we have a Sekela-like database, something they're used to, and you can have them do assignments, play around, do similar stuff that they're used to. And the great thing about this is it's a familiar to them, lots of stuff to join. It's easy to do, and up to now, it's been dirt cheap for you. Now, the first thing they're going to do is they're going to type in the command show tables. No, show tables. Oh, my God, this thing's broken. What's going on? This is where you tell them different, equal, isn't worse, or better, it's just different. So, you'll have to tell them how to use the slash D commands. Turn them out a cheat sheet, walk them through them, show them tables, show them indexes, show them sequences, and you'll have to explain what a sequence is. But it's just different, and it's going to take them a while to get used to this. Have them use cheat sheets, there's nothing wrong with that. So, if there's no show or create table, you show them how to do a slash D actor, and they get the information they're used to looking. Now, the format, I see one person who's a MySQL DBA that I know, is noticing that the format is a lot different than what they're used to, but it's the same information, column name, type, whether it's nullable or not. This is going to blow them away, but you're going to explain them what a sequence is. It tells them where the indexes are. The information's there. It's just in a different format. Once again, different isn't better or worse, it's just different. Now, have them do a simple query. That works like what they're used to seeing. Hooray. About this point, they're going, you know, this isn't so bad, I can get used to this. And at this point, you have them hooked. Have them do a simple backup. PG underscore dump, very much like MySQL dump. I know you probably use PG restore or something else, but this is a very simple, fairly generic, fairly common tool they can find just about anywhere. And show them that you're piping it to a file. And explain to them, you know, what's going on. By the way, if you're going to be at Confu in Montreal in two weeks, I'm giving the other side of this talk where it's the MySQL DBAs learning this information. This talk is mainly for folks who run Postgres, who want to see how they steer the boat of learning around the shoals and reefs and rocks that are out there for these folks who want to learn. Simple restore. They're used to doing this in MySQL. Now you've shown them how to get around not having show tables, how to do a backup and restore. For a lot of companies, that's 50% of what a DBA does when they get started. Whoops. Once again, print out this cheat sheet. The slides, I'll show you how to get that later. It's, once again, different, and they'll get used to it. The other thing is, when they start looking at data types, they're going to notice that auto increment has disappeared and it's been replaced by serial. There's a serial data type in MySQL. It's not heavily used. There's some issues with using it. But it's the same idea, two bytes, four bytes, eight bytes. That's great. Just call something different. Now we get to sneak in sequences. Now MariaDB has sequences MySQL doesn't. And to get around this, MySQL has something called auto increment. Every time you add a record, and don't specify that column name or specify null as the value for that column name, it auto-mentally increments. Now, you show them this and they're going to be happy. They're going to... We create a simple table and we tell them, okay, this is going to use a function called nextVal. It's going to go out to the sequence and pull the next one off in the stack. It's real simple. And they'll catch that. I give them a quick demo, point out that we're not giving any values for column X into table X, but the system automatically generates them. As you see over here, they're going to be happy with that. They're used to seeing that sort of that work. And you can tell them that if they go out to a slash D, they're going to see the entry for the table and the sequence that supports it. And you're telling them that this X here is the X same X here, this is the column name, this is to say that it's a sequence. So about this time, they realize that things work the same. They just...at the end, but the intermediate function, ality is a lot different. Something else you can do for them that will amaze them is create a table and have it populate itself with a generateSeries function. Can't do this in mySQL. I don't know how many hours I've wasted in my life generating test data in tables, and to have this suddenly unleashed was a big revelation. Wrapping sequences. This is the part where you're going to have to toss them very slowly because it takes a while to catch this. We create a sequence. Minimum value is one, maximum value is two, we tell it to cycle. As we do the next value function, it goes between one and two, one and two. Now there are some edge cases where this is extremely handy. I used to work in a place where we had product that came in a part one and a part two, and there was always a pain in the butt to generate the data for that. Show them how to check the details on sequences. Here's the one we just created, minimum one, maximum two. It increments and it cycles. Sticking points. This is where you're going to have to be patient. In some cases it's like talking to a five-year-old, other times it's like talking to a 15-year-old, and other times it's like talking to a 35-year-old. Explain is a lot different. The MySQL addition that they're used to shows them column, table, partitions, key length, references, but possible keys, gives them the query plan down here. Now, using the explain that's found in Postgres, it's the same rough material. It's just a different format. You'll have to explain to them why you want to put Analyze in here, or Buffers, or whatever else you want in there, and tell them that it's the same rough information. It gives us the query plan that doesn't actually show the SQL. The one thing that they're going to freak out at is they're going to see cost, the startup cost, and the overall cost. They're not used to seeing that information. Then you can explain to them that when you have things like indexes, there's some set up time before you use the index, and that's where this cost will pop up. Once again, different, not better, not worse, just different, they'll get used to the format soon enough. Also, my SQL is not used to seeing YAML or XML output from this. Now, talk them through how to read this, how the nodes are selected and mentioned here. It's a different format than what they're used to, and they will learn how to pick out the various information from the various tables. In this case, I put Film in Magenta, Film ID in Blue, and Actor in Red. Show them that the various types of joins that are available to them, and it's just in a different format. I don't want to calmly discuss. This is the hard part on you because you're used to this, they're not. I mentioned sequences already, they'll pick that up. Materialized views. There is a way to get materialized views in my SQL, but it's a third-party software. It doesn't always work the way you want. That is something that's going to pique their interest because they might have worked in a company where they have something like delayed stock quotes, where a materialized view would have been handy for them if they've had it. Explain, as I showed you, it's different for them. It's just formatted in a way they're not used to. They're used to connecting to the server and getting a thread. They're not used to the overhead of actually getting a Linux process. That's something that's a little different. They also need to be taught that everyone's using some sort of pooler. Not a big problem, but it's there. Now, vacuum. Be very careful when you mention vacuum. They're not used to calling rows tuples, and the idea of the heap will really throw them. That's the hardest part when I start talking to folks about this one-on-one. It's the equivalent of a teenager throwing all his laundry in the middle of the bedroom, and he knows exactly where it is, and he can pull out whatever he wants, but no one else can. Then tell him, yeah, when the stuff gets dirty, you actually run a vacuum over it and cleans it up or cleans up the tables. That will freak them out. There's lots of good documentation out there on the web about that. The next thing to do is teach them automatically about auto-vacuum. That will save a lot of problems. Toast. MySQL has something similar to toast. If you have things that don't neatly fit into a block, that's how it extends it calmly, and we'll actually tell them about that. Then buy them an adult drink and talk about wrap-around XIDs. For the MySQL folks who are in here, every transaction has a unique ID, and it's a 32-bit number. Unfortunately, in lots of possible... It's possible to wrap those around, and once you wrap those around, the older numbers, the data that's tied to them, you can't get to without doing a whole bunch of really nasty mechanical work to get that, and then you lost the new stuff. There's ways to modify... Well, ways to monitor how that goes, but the first couple of times you run into that, or here I mentioned of it, it's frightening. The other thing that you're going to show them that's going to really pique their interest is tricks like this. MySQL, you can't do a filter like this. This isn't a thing where we're going through movies, and we're trying to get the summary of the links for the R-rated and the PG movies. To do this in MySQL, you're going to have to write some CTEs and do some other nasty stuff or some winnowing functions. Here it's just a simple query. Now there's some reading out there for them in the watch. I produced and I'm about to reproduce a series of stuff on MySQL, teaching MySQL DBAs postgres. Halfway through my production of that, the company changed their logo, so I need to go back, update some of the material, change the logo. Checking for bloat and tables and how to scale postgres. The other thing I recommend is this book. You'll hear that it's postgres 14. We're on 16, 17. We'll come up with this later. This gentleman has done a marvelous job on documenting everything. It's been available free as a PDF for a long time. You can actually get print copies, recommend print copies. They make a nice impressive thump when you throw it when you get frustrated, but the material is there. I think it costs 20, 25 bucks to get the printed copy delivered to your house. Check out scaling postgres. That's an interesting weekly website. It gives you a lot of great information. These are the videos that I mentioned earlier. I'm redoing those. Follow my ex information and you'll see that. Now, if you want a longer version of this presentation, I have it there. I gave an early version of this at Percona Live last year. That was mainly from the MySQL folks learning postgres. I'm here right now to beg you for your help. Percona, the company that employs me, is working on transparent data encryption for postgres. Other databases have that natively. Postgres doesn't. We have it in alpha, about to go beta. Unfortunately, the guy who was here, who was the expert on it, is unfortunately recovering from his bad dinner last night. I was going to have him answer questions. Download the code. You can run it off Docker. I run it off Ubuntu's version of postgres. It encrypts your data. Someone just casually going through your data can't read what you have in there. It's very handy and we need people to try it. With that, I want to answer your questions. For folks who have suggestions on how to teach postgres, I'd love to hear that too. Thank you. Any hints? Any questions, please? Thank you. Any questions? Yes? How would you introduce my SQL DBA as a sort of postgres replication cluster, etc.? How would I introduce my SQL DBAs into the clustering models found in the postgres world? Replication. If they have a Galera background, it's a little easier. If they're used to an inner-DV cluster, it's a little more complex. You're going to have to speak to the options between running FCD to control everything. That's similar to kind of like my SQL router. Something like Petroni that does the coordination. There are some articles out there that people have them, but there's really no easy correlation. I think five years from now, postgres will kind of be on par with inner-DV cluster, and we won't have those problems. Right now, it's kind of a mess because you have third-party software all over the place. By the way, is booking.com seriously thinking of switching over? Yes, it's a postgres. For here. First of all, thanks a lot for the talk. I'm actually quite young, a SQL DBA. It's just two years since I've worked on that. It's actually pretty interesting to see the differences that you are showing up. One of the things that you mentioned, like it felt like having emphasis on is the explain, how could it be quite different? Actually, the latest version is not so different. There is this theme of the execution plan that is slightly different. But I guess that one of the key points for me, at least, while checking postgres recently, is about the amount of operations and operators that it actually has. That is like more than 50% of my time spent with engineers, helping them to fix their queries and to find creative ways to actually get around to the limitations. Postgres has an embarrassing amount of great features out there. There's like eight different types of indexes out there. In MySQL, you used to hashes and... I guess the index is the only one that works. So there's a lot of really neat stuff out there for MySQL DBA to discover. Getting it mastered and getting it running right, it's going to take a little bit of time learning in elbow grease. But it's a wonderful opportunity if you're in MySQL and you don't like the heat wave pressure that MySQL is pushing on everything, and you want to try something different. Thank you. Thank you. Thank you. Any other questions, please? Thank you. I was wondering if you could... I mean, I've worked a lot on Over the Years on Oracle and Postgres, but not so much on MySQL. And the thing which always intrigues me, and I was working with MySQL DBA, is the indexing. Now, you have... originally you had MyISAM, then you had INODB, but these are not traditional sort of heap and B-tree on top style indexes. It works differently, but I don't know. Can you just... is that a conversation? You did allude to it in your talk. Yeah. I mean, different isn't better or worse. MySQL, especially with the INODB storage engine, would love to have every table have a primary key on it, and they store everything in the primary key. When you update a row, race that row, put in the new stuff, write the old stuff off in the undo log, and away you go. You're not having an old version of that sitting out there in the heap. That's kind of the 10-cent version of how that works. Thank you. I'm very curious. I'm going to look into this more. It sounds expensive. Well, that's the great thing about open-source software. You can get, like the demonstration there, you can take an old laptop, put on a copy of Ubuntu for free, put down your favorite version of Postgres for free, have them download the DVD rental tar for free. Opportunity cost is doing things out the door. If you had to teach someone to be a DB2 or Oracle DBA and had to give them their own environment to get started up, you've got licensing costs, you've got a whole bunch of other stuff that's out the door. So... Thank you. Thank you. So thanks for coming.