Hi everyone.
How many Django users in here?
Raise your hands.
Keep your hands up if you are dealing with Django projects with a lot of migrations, with
time and continuous integration minutes.
Okay, let's talk it for you.
Perfect.
You are in the right room.
Now, I am Denny.
I am on your right side of the photo.
I work again in JavaScript, Python, Vue.js, Django, everything.
It's pain-me-stuff.
So let's start with Django migrations.
Our way to propagate changes from your models to a database schema and keeping track of
them.
Let's quickly recap migration commands.
So you can use make migrations, migrate, show migration, and SQL migrate.
The first one, make migrations, create new migrations based on your model chains.
You can use different parameters in there.
For example, an empty migration you can customize.
You can give a migration a specific name, and you can restrict the creation of a migration
to a specific application.
The model, for example, if you want to recreate Twitter, we know the reason for that, is this
one.
You can create a class for a model, and then creating the migration with the command will
create a new file in your project in the migration folder with this content.
So initial equal true if it's the first migration in your project.
A list of dependencies if you are using something like, for example, authentication, or if you
are on the second migration in the project, the first dependency is the first migration,
and a list of operations performed during the migration.
Then you can migrate your migration, of course, using this command specifying an application
or not, or a migration name.
So if you want to move to a specific point in the history of your migration, you can specify
this.
So as a new project, you can migrate everything using managepi migrates, and everything is
at the last version of your database schema.
Then if you want to roll back every migration in a project, you can migrate to the zero
migration, and everything is rolled back.
You can move to the second migration in your project with this, and without specifying a
migration number, you can migrate everything to the latest version.
Now how this works under the hood, you have in your database a Django migration table with
a content like this, so the application name, the name of the migration, and the date time
when the migration has been applied to your database, so everything is on your database.
There is a better way to show this, so using show migration, you can have a view of your
list of migration in your database, in your schema, with a tick if the migration has been
already applied in your database.
And then with SQL Migrate, you can print your SQL statement for a specific migration.
So with our example, we can display the SQL code for this.
So let's take a look at this.
A transaction will be opened, every command will be applied on your database, and then
the transaction will be committed if no errors.
Now if you need to make further changes in your model, you can apply those changes and
then create another migration.
The migration will depend on the first one, and then the code will be another transaction,
the SQL command, and commit.
And again, and again, you can apply migration on your database in production using this.
What if you need to do further changes, then for example, an every tweet likes and a lot
of other stuff, then you can make change in your models, create a single migration, because
of course I like to be well organized and structured, so every single change for me
means a single migration.
Then you end up having a lot of migration like this one.
But even worse, if you need to create, for example, a shop app for a customer, then you
need to create a model, and then during the lifetime of your application, you need to do
a lot of changes to your model structures.
Okay, we won't list this, but we had to do a lot of changes, for example, adding tables,
switching data from a table to another, to a main table to a detailed table, and a lot
of other stuff, changing data during your workflow.
So changes can be a lot of pain, a lot of stuff, and when migrations become a lot, then
your performance during tests could decrease a lot, because during the deploy is perfect,
you can move forward and backward with simplicity, but in tests it's not that simple, because
you need to wait for every migration to apply before running tests.
And if you are paying for your testing time on GitHub workflows or other platforms, then
that could be painful.
As a disclaimer, the timing for this talk may change from laptop to laptop, so keep this
in mind, but on my old laptop, this is brand new, so it's faster, hopefully, on my old
laptop, it was the timing.
So running tests on 20 apps like Shop, I just copy pasted them 20 times in the example
repository.
Test took just a single second, less than a second to run, and that was perfect, so there's
no need to do this talk.
Well, not exactly, because creating the test database took 20 seconds.
So one second of tests for this project, and 20 seconds for database creation.
And that was not optimal, because we were on the verge between the team license and the
enterprise license for the timing of workflow runs, so between 3,000 minutes monthly, and
that wasn't optimal, we wanted to remain in the team license, because it was cheap, and
then we wanted to optimize that time.
The first possible workaround is to use KIPDB, running tests, and this parameter preserve
the test database between runs, and that's perfect, because the first run applies the
migrations, and then the database will be kept on your cache somewhere, on your Oculus,
for example.
If the database, of course, does not exist, it will be first created and migrated, and
during other changes in other prequests, for example, migration will also be applied,
so everything is okay, hopefully.
So this approach saves 20 seconds for us after the first test run.
The problem was configuring your CI CD, because a solution could be using cache or artifacts
in GitHub workflow, but this takes time to create and store artifacts in GitHub, or,
for example, using an external test database from inside the GitHub workflow, but that wasn't
optimal, and a friend of mine, or mistaken, suggested me this package, Django migration
CI, that allows you to simply configure an external test database, so you can consider
this and save 20 seconds if you have an external database.
Another possible workaround, one line workaround, is to use in your settings migrate equal false,
so if you are using this, migration won't run during the test, and this is similar to
set none as a value in migration modules, but for every apps in your project, so it's
better this way, so single line change, and this has a lot of pros and cons, pros, of
course, single line change, and it doesn't run migration during tests.
The problem is it's like make migrations plus migrate before every test run, so this will
add in our example repository five seconds of time, so that was the opposite of what
I wanted to obtain.
So diving into the Django documentation, I discovered this great, great comment, squash
migration, and this squash an existing set of migrations into a single one, you can specify
your migration name, and optionally start migration name, it will squash every migration
into a single one.
This was pretty good, I tried this one on the shop application, and I decided to squash
every migration into a single one.
It was good, not perfect for us, but it was good.
The problem is that we needed to add manual porting, because for example we used a lot
of functions, manual function, from a migration to another, from a version to another, and
that weren't migrated or automatically squashed, so we had to copy paste the function code
into the squash migration and make some adjustments.
And if we inspect the squash migration file, we can see there is on the top of the class
definition a list of things, a list of tuples in the replaces variable.
So the first item is shop, the application name, and the second one is the migration
name, for every one of the 26 migration.
And the recommended process is first squash, keep the old files, commit and release to
production, to staging the demo until production, then wait until all systems are upgraded with
the new release, then you can remove the old migration files, commit and do a second release.
Then last but not least, you need to transition your squash migration to a normal migration,
delete all migration files, all old migration files that has been replaced, and update all
migration that depends on the deleted ones with the new squash migration, and after everything
you can remove the replaces attribute in the squash migration, and everything is fine.
Then if you want to clean up your database, you can prove references, so in your database
there won't be references to old migrations.
Let's test performances after squashing, after spending a week on my work project doing that,
and oh no changes, so I lost a week doing that without results, and don't tell my chief.
So what's the point?
Well the point of squash migration is to move back from having several hundred migrations,
five to just a few, for example if you create a branch, a separate branch where you are
working you alone, you can squash migrations and propose just a single migration file in
your request.
I know, I know you wanted to speed up tests, so let's do it.
Are you ready?
It's not that easy, but first you need to recreate migrations.
So let's annotate migrations for a single specific application with show migrations,
and then copy paste all the names of your migration files, and then you need to manually
create a replaces, you remember this one from a moment ago, you need to recreate the
replaces list with application name and migration file name, and store it somewhere in your
computer, then move your migrations in a temporary directory, so out of the way, and
make sure that show migrations doesn't show stuff.
Now it's time to recreate migrations using your application name and a name, a specific
name, for example init squash, so you remember that this is the squash migration, and that
will create the first migration at your last model version.
Then open your migration file, copy paste the replaces array list, you created a moment
ago inside your class, then you can restore your old migration files in the original directories,
make sure for missing or overwritten files, and then remove the temporary directory.
Now with show migrations you need to check that everything is there, so all in this case
26 migrations are there, and the first one, the squash migration is there but has not been
applied, then apply your squash migrations and check again with squash show migrations
that everything has been squashed and you have just a single migration, and then you
can go back to your post squash task, so commit and release to production, upgrade those systems,
of course staging demo production, everything else, update on migrations that depends on
the deleted migrations, remove the replace attribute, and if you want to bring references
to the little migration, and everything is perfect, right?
Well, not exactly, if you have migrations providing initial data, you need to create
a new migration for that, because recreating migration from scratch, it doesn't create that
insertion in your modules, or even better, you can use fixtures, and in the doc you can
see how to use fixtures in both database migration and also in testing, and that's perfect,
and then you need to be aware of circular dependencies, because if your project is big
and grows during the time, you could have circular dependencies from a project to another
and backward, and this problem requires you to remove all foreign key, causing the circular
dependency, create the first migration, restore the foreign keys, and create a second migration,
and this way you will hopefully solve this.
Now, let's try to test performances after all of this, after another week spent on the
project trying to tell your chief that, oh, I'm working on something useful, I promise
you, and yeah, of course, after recreating everything from scratch, our database creation
task took five seconds instead of 20, that was perfect.
Yeah, it was perfect, but does this apply to everyone?
It depends, because if you have really big, big projects and you are paying by the minute
your CI CD workflows, and you are on the verge of having to pay $3-4 per user per month,
to 20 something dollars per user per month, then maybe you want to stay on the little
cheaper branch of this, so that could be a solution, but if you just want to make order
in your migration file, then just use squash migration without everything else, or if you
want to speed up tests on your localhost, you just need to use KipDB, and everything
is fine, without having to spend, in my case, two weeks working on this, just to save maybe
a couple of seconds on your project, so it depends on your use case, and we are done,
so if you want to see the example repository, it's there with three different branches,
if you want to compare them on your local machine, and I uploaded the slides on the
FOSDEN website, so they are there if you want to take a look at them.
Thank you very much.
Okay, we have time for quite a few questions, I see one up there.
Given your salary, and these two weeks of work you've done, how many years of enterprise
lessons did you avoid?
That's a nice question, hopefully my chief didn't ask me that, but I think we could have
paid maybe a year, I don't know, one year of this, but yeah, it was fun to play with
this, and for me at least spending two weeks trying new stuff, or trying to discover hidden
stuff from the jungle.
More questions?
Good question.
Yeah, thanks for the great talk, I was wondering if you looked into using like seed data betas
for CI, so that...
Sorry.
Yeah, you don't hear it?
No, I didn't hear you, sorry.
If you looked into seed data betas for CI, so that you run your migrations locally,
and then dump the database, and then use that database during CI to start off with a pre-migrated
database.
No, I didn't think about that, it's a good idea, so you just upload your database dump,
and then on your...
Yeah, so you just set up your CI script to use that database when it initializes.
That could be a good idea, I need to try that, thank you.
So you restore the database and just applies your last migrations without having to apply
everything.
Yeah, exactly.
Yeah, that's a good idea, thank you.
Thank you.
I was also wondering if you're using Postgres for example, you can disable fsync that will
just keep database in memory, so that probably be a solution for big time.
So locally we kept the database in memory, the problem was on our CI CD, so we created
a service in the workflow files, and that was creating a database from scratch.
So it was just a configuration you can just add on your Postgres site on the CI...
We had to consider the time for storing and restoring the database on that configuration
from the cache.
So it was a little bit of time for that, but yeah, that was an option I tried to...
More questions?
So very cool talk.
I like your method.
I basically came up with the same method about five years ago for this approach.
Do you think there's an opportunity to create a tool to automate some of this process?
Well, that's a good question.
Maybe implementing that in the squash migration in some way, I don't know.
We could, we can try to do it just to save other two weeks of salary from other people.
Okay, I think we're done with questions, so we're going to have another five minute
break and then continue with the next talk.
Thank you.