[00:00.000 --> 00:11.320]  What's new in 11.0 and that's basically a new optimizer release, a new and new but
[00:11.320 --> 00:16.000]  at least the cost model is totally new.
[00:16.000 --> 00:24.400]  So I think that from the optimizer point this is one of the biggest milestones, the only
[00:24.400 --> 00:33.760]  time we did something comparable was in MariaDB 5.3 and the big change is that before one
[00:33.760 --> 00:43.520]  cost unit if you do, last query cost was one IO, the one IO was not really that exact,
[00:43.520 --> 00:52.400]  it could basically be one IO or one key read or one row read or one access to some file
[00:52.400 --> 01:02.160]  and that also meant and the things were not that balanced so some costs were just taking
[01:02.160 --> 01:08.240]  oh this sounds good, let's use that and even my SQL has that even now.
[01:08.240 --> 01:12.880]  So I decided to actually do something that you can measure and then that also makes
[01:12.880 --> 01:19.280]  it very easy to fix the optimizer that if you see that the cost is not something's milliseconds
[01:19.280 --> 01:25.440]  and something is off and then your justings accordingly.
[01:25.440 --> 01:33.520]  So and we decided to call this 11.0 because if you change things in the optimizer as drastically
[01:33.520 --> 01:39.640]  as we have done some plan may change, hopefully it should always be the better because now
[01:39.640 --> 01:47.120]  we actually have a proper cost and it's really easy to change things because almost all costs
[01:47.120 --> 01:51.600]  are available for the user to change.
[01:51.600 --> 01:58.240]  So I just think that this will be a good foundation for all future MariaDB releases.
[01:58.240 --> 02:07.560]  So with the optimizer the idea is to get better table combination and better plans.
[02:07.560 --> 02:13.280]  The old optimizer was actually pretty good in deciding things for simple things because
[02:13.280 --> 02:19.720]  if it found a good index it would use it and so on but when you had to decide that should
[02:19.720 --> 02:26.520]  I use this index or this index and this index I could use with the index and the lookup
[02:26.520 --> 02:34.960]  and the other one not, their things started to fall apart and also cost between different
[02:34.960 --> 02:38.640]  engines were not taking into account.
[02:38.640 --> 02:46.800]  The only one who had some information was the heap table although the ones were more
[02:46.800 --> 02:47.800]  or less the same.
[02:47.800 --> 03:00.000]  So I wanted to fix that and also allow people to function the optimizer.
[03:00.000 --> 03:07.080]  So what the new optimizer should be able to do, it should be able to use the different
[03:07.080 --> 03:12.880]  methods to access rows which is table scan, index scan, index merge and hash and be able
[03:12.880 --> 03:19.840]  to choose those correctly what is optimal for things.
[03:19.840 --> 03:32.120]  And I don't, only those who have complex queries should see a big difference.
[03:32.120 --> 03:37.000]  And I don't know how many user use optimizer trace that was added to MariaDB 10.4 but
[03:37.000 --> 03:42.640]  I couldn't have done this work without that because that shows me exactly how the planner
[03:42.640 --> 03:51.240]  is doing and we have, I've been able to use it to find out where the optimizer calculates
[03:51.240 --> 03:57.560]  things wrong and as part of 11.0 I've been, lots of things added to it so it's very easy
[03:57.560 --> 04:04.840]  to know look at the plan and see if the optimizer does something wrong.
[04:04.840 --> 04:12.280]  There's lots and lots of bug fixes related to optimizer like selectivity, you couldn't
[04:12.280 --> 04:20.280]  use selectivity level four at all before, sometimes the selectivity would become bigger
[04:20.280 --> 04:29.120]  than one which means that the optimizer would assume that you will get more rows when you
[04:29.120 --> 04:32.640]  have condition instead of less rows.
[04:32.640 --> 04:38.480]  So all that should be fixed.
[04:38.480 --> 04:46.280]  And we also added lots of new optimizations like if you have several indexes that you
[04:46.280 --> 04:50.160]  can use and one index is faster than another.
[04:50.160 --> 04:56.360]  But we noticed that the slow index actually will result in smaller set of rows.
[04:56.360 --> 05:03.760]  We actually used that as the estimated rows, something we didn't do before but that helps
[05:03.760 --> 05:09.720]  with a lot of different plans.
[05:09.720 --> 05:15.120]  When we have created the right tables before they were not using unique keys I don't really
[05:15.120 --> 05:22.040]  know why that decision was made but know most the right table using unique keys which are
[05:22.040 --> 05:32.480]  faster because the optimizer can estimate better how many rows we actually will do.
[05:32.480 --> 05:38.320]  Cost calculations we have, there's lots of different places where we calculate costs.
[05:38.320 --> 05:45.120]  I basically gone through as far as I know every single one and they show that the costs
[05:45.120 --> 05:52.800]  are comparable and will be close to microseconds for those.
[05:52.800 --> 05:59.320]  For example we didn't really have a good estimate before, what's the cost of file sort?
[05:59.320 --> 06:00.320]  No we haven't.
[06:00.320 --> 06:10.600]  We have filters, selectivity, metallization, using index for group buy and one big difference
[06:10.600 --> 06:16.640]  is that all these access costs are now based on SSDs, not hard disk as before.
[06:16.640 --> 06:22.800]  I think that most people use SSDs with the database but that's something actually you
[06:22.800 --> 06:28.440]  can change just by changing one variable.
[06:28.440 --> 06:34.560]  We also had a problem with cost that if we assume you have a big table and then you have
[06:34.560 --> 06:36.200]  a small lookup table.
[06:36.200 --> 06:43.760]  Basically before we assume that every read in the lookup table will have a disk access
[06:43.760 --> 06:48.620]  but in practice if the table is small after you have read a couple rows everything is
[06:48.620 --> 06:49.620]  in memory.
[06:49.620 --> 06:53.120]  No we assume that.
[06:53.120 --> 07:01.920]  So here you can see some of the cost and one can retrieve those for every engine and I
[07:01.920 --> 07:05.280]  just to show the difference between InnoDB and Aria.
[07:05.280 --> 07:12.800]  So InnoDB is using clustered index, Aria using a direct access to rows, cached.
[07:12.800 --> 07:19.400]  So with InnoDB basically the key lookup and the row lookup cost is roughly the same which
[07:19.400 --> 07:26.520]  means that if you search for a key and you search for a row both are using indexes so
[07:26.520 --> 07:29.120]  it's roughly the same.
[07:29.120 --> 07:35.200]  For example with Aria the row lookup cost is notable smaller.
[07:35.200 --> 07:46.120]  So this is one example why it's important to do this at a very very low level.
[07:46.120 --> 07:52.120]  All costs, all engine costs and most SQL costs are now available.
[07:52.120 --> 07:59.840]  So for example optimize disk read cost this is the time to read a 4k block from the device
[07:59.840 --> 08:01.760]  and that's a typical SSD.
[08:01.760 --> 08:09.680]  If you have a hard disk you just have to change that one cost.
[08:09.680 --> 08:15.200]  And the disk read ratio is how often we actually have to go to them.
[08:15.320 --> 08:21.200]  Is there a way on your system just to run something so you can populate these values automatically?
[08:21.200 --> 08:28.960]  You don't, the only one that you need to populate is basic, the optimator disk read cost.
[08:28.960 --> 08:35.400]  They are basically there so that assuming something goes really wrong then you can populate
[08:35.400 --> 08:36.400]  this.
[08:36.400 --> 08:37.400]  I don't see that.
[08:37.400 --> 08:41.040]  They are part of the engine behavior not part of the amount of behavior.
[08:41.040 --> 08:45.960]  So basically the three things that you normally would like to change is disk read cost if
[08:45.960 --> 08:55.120]  you have a fast SSD then the disk read ratio I plan to sometimes do that automatically
[08:55.120 --> 08:57.120]  based on engine statistics.
[08:57.120 --> 09:01.240]  I didn't want to do that at the beginning because if we do that automatically that means
[09:01.240 --> 09:05.360]  that you do a query and then you do the same query and then the plan changes.
[09:05.360 --> 09:06.360]  That confuses people.
[09:06.880 --> 09:14.680]  It may be better but so and the wear cost is the cost added to each row.
[09:14.680 --> 09:21.080]  So if you want to ensure that you get the minimal amount of row accesses you just increase
[09:21.080 --> 09:22.080]  that one.
[09:22.080 --> 09:33.840]  So how I checked all these things was that there's a part program part of the server
[09:33.840 --> 09:42.520]  check cost you can run it with any engine and it then produce that's a lot of different
[09:42.520 --> 09:51.000]  checks tables can index can key look up and so on and then it you get here you get the
[09:51.000 --> 09:59.040]  costs and here you have the timing milliseconds for doing that and if the if things are correct
[09:59.040 --> 10:06.840]  you get as far close to one and this I have a fixed for all engines.
[10:06.840 --> 10:13.760]  So that means that I have a way to verify that the cost is up or okay it's almost impossible
[10:13.760 --> 10:20.160]  to get them totally because even when you run things on a machine things actually changes
[10:20.160 --> 10:38.400]  from run to run but it's a millisecond yeah there were lots of things in the optimizer
[10:38.400 --> 10:45.800]  that was cost based sorry but still also a lot of things rule based no basically everything
[10:45.800 --> 10:58.160]  is everything I found is no cost base which means that it's easier for the optimizer to
[10:58.160 --> 11:07.320]  do choices therefore there is patches in the DB that's all still in MySQL where they recommend
[11:07.320 --> 11:20.080]  that MySQL prefers table scan so let's reduce all index scans to half just to force the
[11:20.080 --> 11:26.520]  optimizer to use indexes instead of table scans which is of course a disaster for the
[11:26.520 --> 11:32.480]  optimizer because then it gets wrong data and can do good decisions so all of those
[11:32.480 --> 11:48.080]  are removed so no inner DB gives the best optimizer it can and that helps things a lot
[11:48.080 --> 12:01.440]  and I spent a lot of time doing improving things from performance point of view especially
[12:02.440 --> 12:13.400]  is probably 50% faster than before more caching simplified code and had I haven't worked in
[12:13.400 --> 12:20.720]  the optimizer since the first version of MySQL in 95 maybe between 95 and 2000 I worked on
[12:20.720 --> 12:27.200]  it and then a lot of other people worked on it and they did a lot of amazing jobs in different
[12:27.200 --> 12:33.480]  parts of the server but nobody took the time to ensure that how things related to this one and
[12:33.480 --> 12:45.040]  this one and this one especially with costs so all of that's no done I also fixed small things
[12:45.040 --> 12:57.480]  then we also I also changed that we tried to use a longer indexes if they are there
[12:57.480 --> 13:06.360]  the one thing that is a problem is that especially for the test suite is that no when we actually
[13:06.360 --> 13:14.440]  have proper costs table scans is preferred for most queries in the test system because the tables
[13:14.520 --> 13:20.200]  can both in the DB and other engines is really really really fast one disk seek and you get
[13:20.200 --> 13:29.600]  hundred rows compared to index lookups so there's optimizers can set up cost that is
[13:29.600 --> 13:40.520]  is I think it's 10 milliseconds as default just to encourage the optimizer to use indexes for
[13:40.520 --> 13:45.960]  small tables mostly because if you don't that you can confuse both the test system and users
[13:45.960 --> 13:53.200]  and if they're small tables nobody like here if it takes one 10 to a milliseconds or 100 slow
[14:00.640 --> 14:08.000]  basically this affect tables that are less than than 20 rows and that's unfortunately most of the
[14:08.000 --> 14:16.080]  tests in my Maria DB have 10 rows or small yes what I see in concurrency is that it doesn't work
[14:16.080 --> 14:20.880]  very well because he's doing a lot of full scan at the end of the plan even if the index is there
[14:20.880 --> 14:28.480]  and could choose the accurate yeah but then no things are cost based and except with a small
[14:28.480 --> 14:36.760]  very small penalty for table scan but that's more for getting more and more chocolate so
[14:36.760 --> 14:43.280]  that you don't need indexes anymore because the hash algorithms are much faster hash is really really
[14:43.280 --> 14:51.120]  slow for you if you are going to fit a small of a small amount of rows I was in 10,000 20,000
[14:51.120 --> 14:59.040]  12,000 of those that you have all in cash and also depending it depends on total on queries and
[14:59.440 --> 15:09.240]  and concurrency hashing takes a lot of memory no but that means that you get this concurrency
[15:09.240 --> 15:16.120]  because the CPU is just moving things from memory when it doesn't so hashing is good in some cases
[15:16.120 --> 15:26.280]  but it especially if you want to access a lot of rows directly or indirectly if you only need to
[15:26.280 --> 15:33.920]  access a few rows then hashing is really a disaster yeah and most if you look at banks and
[15:33.920 --> 15:39.160]  sections everything else hashing wouldn't work or any of those because usually just want to have
[15:39.160 --> 15:48.000]  everything from a small set of customers so here the from the user point of view those are the
[15:48.000 --> 15:58.560]  only variables that I think that you but may need those who create engines may need more and one
[15:58.560 --> 16:06.920]  thing to be aware of that from the use user variables they are in in microseconds not in
[16:06.920 --> 16:12.680]  milliseconds because I first had them in in milliseconds but the numbers get so small that
[16:12.680 --> 16:21.040]  it was very hard to look at those so there's a so when they internally they use the milliseconds
[16:21.040 --> 16:41.960]  but from user point the costs are in microseconds you have these variables here
[16:41.960 --> 16:51.640]  all these are just for memory and then you have a cost for fetching the fetching the disks fetching
[16:51.640 --> 17:04.440]  the blocks yeah so this is all memory yeah so optimize optimize a discrete cost that's the one
[17:04.440 --> 17:16.080]  that is an IO so there's like running more easy on like bare metal machine with very fast SSD like
[17:16.080 --> 17:24.720]  the IO is half it's one 10 of a millisecond but in the cloud IOs go through yeah network as you
[17:24.720 --> 17:37.000]  compare but I guess you need to tune this variable probably I haven't done that so I've been basically
[17:37.000 --> 17:42.120]  focused just to get this to work so everything is focused on getting the memory part of work but
[17:42.120 --> 17:48.960]  the disk is there and it's only two variables so it's very hard to get those totally wrong yeah
[17:54.720 --> 18:01.760]  if you run on a managed database in the cloud they might tune it with people like you might
[18:01.760 --> 18:09.680]  miss this would be interesting to see like the difference in how wrong things get if MeruDB
[18:09.680 --> 18:16.240]  thinks you're on fast SSDs but you're actually on network SSDs and this variable every engine has
[18:16.600 --> 18:22.000]  has its own variable so you can choose it changes for different engines if you want if you for
[18:22.000 --> 18:32.360]  example run on different devices so chasing cost variables is easy you just session or
[18:32.360 --> 18:43.880]  change global all engines are global but the wear cost and or that things are local so this is
[18:44.560 --> 18:53.160]  see I have a couple of minutes left so so question why does this matter to you if you ever had to
[18:53.160 --> 19:02.880]  go and say force index or have to try to tweak queries in any way or had to use analyze the
[19:02.880 --> 19:12.120]  analysis table is still useful because you get the statistics but I would say that one main thing is
[19:12.120 --> 19:22.800]  that much less tweaking queries these should just work and especially we use in a DBM memory
[19:22.800 --> 19:31.120]  engine for example or other engines things are not a little better and no even the server knows
[19:31.120 --> 19:38.600]  that no we have temporal tables in in in area or heap so it can take that that cost into account
[19:38.600 --> 19:47.960]  so state of things basically everything is done we have had QA testing this no form is one month
[19:47.960 --> 19:56.760]  founder some bugs most of the bugs is in the also in older releases so I in this I basically fixing
[19:56.760 --> 20:03.880]  everything related to optimizer in this one there's one issue left that I will push on
[20:04.840 --> 20:14.520]  at this week and then basically the level should be done we have a BB 11 0 that includes everything
[20:14.520 --> 20:22.960]  I think this is one of the most tested releases ever done internally just because I've been working
[20:22.960 --> 20:31.640]  so much with our QA team so I expect this be almost stable from from start or access table from
[20:31.640 --> 20:42.200]  start and for anybody who wants to help if you have a slave where you can put 11 0 on put it on
[20:42.200 --> 20:48.320]  send feedback make your entries anything related to optimizer will be fixed it immediately
[20:48.320 --> 21:09.440]  should it should be the same except something like row by the filters is faster so all code
[21:09.440 --> 21:16.320]  I don't think that anything will be slower in the optimizer but the plan should be better so
[21:16.560 --> 21:20.800]  the end result should be faster there are a couple of things that are not a little better
[21:27.760 --> 21:34.800]  row by the filter means that if you have two indexes you can use and then we then we create
[21:36.000 --> 21:43.280]  we take we will use the the faster one but if if it makes sense we take the other one
[21:44.240 --> 21:50.160]  fetch all primary keys and then when we read other ones we see that only those who has
[21:52.240 --> 21:57.600]  an existing primary key we need to consider so basically where there's a wind we don't have
[21:57.600 --> 22:03.040]  to fetch the row for things that we can filter out and the algorithm's name
[22:03.840 --> 22:12.560]  you know basically it's a lookup of all primary keys that are acceptable and we
[22:12.560 --> 22:20.560]  use the used we do a check against those which is actually pretty fast so state of things basically
[22:21.920 --> 22:24.160]  basically ready this will be released
[22:24.880 --> 22:35.920]  I think it's next in February yes so this month so future plans I will start working on
[22:35.920 --> 22:43.520]  parallel query there's still some optimized cleanups to be done I also want to enable all
[22:43.520 --> 22:51.040]  optimizers which is by default for example MariaD has supported has joins forever with the
[22:51.040 --> 23:00.480]  actually very hard to get to enable those bushy plans is something that we
[23:02.800 --> 23:07.680]  I would like to do because we have this you big users who would need that bushy plans basically
[23:07.680 --> 23:16.240]  have two big tables and then lots of tables that you have a relationship and then you have a join
[23:16.240 --> 23:22.000]  between those directly the big tables directly or indirectly and our optimizer currently can't do
[23:22.000 --> 23:27.040]  that very very efficiently so that's something I would like to do but the parallel query is the
[23:27.920 --> 23:32.720]  next big task that I will start was start on and I have some plans or ideas how to do that
[23:35.920 --> 23:42.560]  and I've been working with Sergi Petrugna who's know the leader the optimizer for doing this
[23:43.360 --> 23:47.920]  and then he got go got some help from the sensor Andrew so that's about 20 minutes
[23:50.800 --> 23:53.280]  okay thanks I don't think we have time for questions our next speaker
[23:54.000 --> 23:56.960]  set up but if you have questions from my team you can chat to him in the hallway
[23:57.520 --> 24:04.480]  we've got a stand downstairs where you can meet some other MariaD team as well it's the one without
[24:04.480 --> 24:08.400]  any uh any banner or anything like that because the person was supposed to bring all this work
[24:09.360 --> 24:18.640]  but let's say this way I'm really happy with this work and I think that for those who have
[24:20.400 --> 24:26.880]  complex plan which is especially when we're looking at things coming from oracle customers
[24:26.880 --> 24:33.520]  where we have lots of store procedures and really big queries queries that are in thousand of lines
[24:33.520 --> 24:42.320]  and this is just one query but this optimizer we really have so thank you