[00:00.000 --> 00:10.000]  And welcome to Sam for his talk on Music Recommendations in Python.
[00:10.000 --> 00:11.000]  Welcome.
[00:11.000 --> 00:12.000]  Thank you.
[00:12.000 --> 00:13.000]  Can you hear me?
[00:13.000 --> 00:14.000]  Yes.
[00:14.000 --> 00:15.000]  Good.
[00:15.000 --> 00:16.000]  Okay.
[00:16.000 --> 00:28.000]  So I'm a system software developer, actually, and this is a hobby that I like to make music
[00:28.000 --> 00:30.000]  playlists and play around with Python.
[00:30.000 --> 00:33.000]  I'm also a musician and a music fan.
[00:33.000 --> 00:35.000]  And I also used to be a teacher.
[00:35.000 --> 00:36.000]  I think that's relevant.
[00:36.000 --> 00:38.000]  That's going to come into play later.
[00:38.000 --> 00:45.000]  Thanks to my player code think for sponsoring the travel and allowing me to be here.
[00:45.000 --> 00:48.000]  So as a music fan, I used to make a lot of playlists.
[00:48.000 --> 00:49.000]  I still do.
[00:49.000 --> 00:53.000]  And I'm quite old, so when I first started making playlists, they look like this.
[00:53.000 --> 00:55.000]  And very convenient to share.
[00:55.000 --> 00:59.000]  Just give someone else the piece of plastic and they have a machine that plays it.
[00:59.000 --> 01:04.000]  But quite difficult to make because you have to, you remember, you have to line up all the songs,
[01:04.000 --> 01:07.000]  just write, press record, press play.
[01:07.000 --> 01:10.000]  The 2000s came and we all moved to digital music.
[01:10.000 --> 01:13.000]  If you were cool, you had like a win-amp skin.
[01:13.000 --> 01:16.000]  If you were really cool, you had XMMS.
[01:16.000 --> 01:20.000]  And these playlists, much easier to make, you just drag and drop.
[01:20.000 --> 01:24.000]  But they were more difficult to share because nobody else had the same MP3s as you.
[01:24.000 --> 01:28.000]  So you couldn't give the playlist to your friend anymore quite so easily.
[01:28.000 --> 01:33.000]  But if you ask someone now to make a playlist, probably they're going to think of this.
[01:33.000 --> 01:37.000]  They're going to make you a playlist on Spotify or YouTube and send you a link.
[01:37.000 --> 01:40.000]  And that's even better, right?
[01:40.000 --> 01:41.000]  It's super easy to make.
[01:41.000 --> 01:42.000]  You drag and drop.
[01:42.000 --> 01:43.000]  It's easy to share.
[01:43.000 --> 01:44.000]  You send someone the link.
[01:44.000 --> 01:48.000]  And it even recommends you songs to put on the list.
[01:48.000 --> 01:50.000]  So what's not to like?
[01:50.000 --> 01:52.000]  Honestly, I don't actually want that song on the list.
[01:52.000 --> 01:57.000]  So the recommendations aren't always helpful.
[01:57.000 --> 01:59.000]  Spotify is fine.
[01:59.000 --> 02:00.000]  You can use it.
[02:00.000 --> 02:01.000]  It has a great team of researchers.
[02:01.000 --> 02:03.000]  There are some negative things about the company.
[02:03.000 --> 02:05.000]  I mean, it's a private company.
[02:05.000 --> 02:09.000]  The duty theater investors is to minimize the amount that they pay out to musicians
[02:09.000 --> 02:12.000]  and pay that to investors instead.
[02:12.000 --> 02:16.000]  And they've been steadily doing that and reducing the rates they pay to musicians.
[02:16.000 --> 02:18.000]  And they kind of focus on passive listening, right?
[02:18.000 --> 02:23.000]  So you put on an album, it finishes, but they put on more songs for you.
[02:23.000 --> 02:27.000]  People actually now adapt to their music to fit the Spotify algorithm.
[02:27.000 --> 02:30.000]  So the first 10 or 20 seconds are very important.
[02:30.000 --> 02:33.000]  So songs don't have long intros anymore.
[02:33.000 --> 02:36.000]  That's been done to please the Spotify algorithm.
[02:36.000 --> 02:39.000]  So I started to think, what would the opposite look like?
[02:39.000 --> 02:43.000]  And I came up with, it would have to be something DIY,
[02:43.000 --> 02:46.000]  something that doesn't have a profit motive behind it.
[02:46.000 --> 02:52.000]  It would focus on having local music and going to artist websites,
[02:52.000 --> 02:56.000]  buying music from Bandcamp, from paying them on Petrion.
[02:56.000 --> 02:59.000]  It would also involve working with open data.
[02:59.000 --> 03:04.000]  So when I say open data, I don't necessarily mean public data that everyone can see,
[03:04.000 --> 03:08.000]  but data where it's hosted by you or by an entity you trust.
[03:08.000 --> 03:11.000]  And you can choose if it's open or private.
[03:11.000 --> 03:15.000]  You can download it, export it, et cetera.
[03:15.000 --> 03:18.000]  So I have no idea really what I'm doing,
[03:18.000 --> 03:23.000]  but back in 2016 I started experimenting with some ideas.
[03:23.000 --> 03:26.000]  And I was inspired by one or two other projects.
[03:26.000 --> 03:30.000]  So has anyone heard of Dynamic Land?
[03:30.000 --> 03:32.000]  That's a shame.
[03:32.000 --> 03:34.000]  So note that down if you have a notebook.
[03:34.000 --> 03:37.000]  It's something very interesting to research about.
[03:37.000 --> 03:39.000]  Out of scope for this talk.
[03:39.000 --> 03:42.000]  It's a project where the room, the whole room is a computer.
[03:42.000 --> 03:46.000]  Each of these pieces of paper has a program on it or some data,
[03:46.000 --> 03:50.000]  and you can interact with them by moving them around physically.
[03:50.000 --> 03:52.000]  Now, I can't create that myself,
[03:52.000 --> 03:56.000]  but I like the idea of having a program that fits on a sheet of A4 paper.
[03:56.000 --> 03:59.000]  You know, the philosophy is if your program doesn't fit on the paper,
[03:59.000 --> 04:02.000]  then it's too big and it needs to become smaller.
[04:02.000 --> 04:04.000]  And I like that as a philosophy.
[04:04.000 --> 04:07.000]  I feel like the playlist generators that I want to write
[04:07.000 --> 04:11.000]  should also fit on a piece of A4 paper or on a slide deck.
[04:11.000 --> 04:15.000]  And it should be a process that people can participate in.
[04:15.000 --> 04:18.000]  Okay, another thing that really inspired me was Git.
[04:18.000 --> 04:21.000]  That might seem counterintuitive,
[04:21.000 --> 04:26.000]  but Git, Linus Torvalds recently said he's better known actually for Git than for Linux,
[04:26.000 --> 04:31.000]  despite having basically created Git in a month.
[04:31.000 --> 04:33.000]  And quite an achievement, right?
[04:33.000 --> 04:36.000]  So there were a few key ideas.
[04:36.000 --> 04:39.000]  Git's data model is really well defined.
[04:39.000 --> 04:42.000]  It's simple. You have refs and commits.
[04:42.000 --> 04:44.000]  You work with those directly.
[04:44.000 --> 04:48.000]  And then your commits are made of trees and your trees have blobs.
[04:48.000 --> 04:52.000]  And you work with this directly. Like, you get your hands dirty.
[04:52.000 --> 04:54.000]  Git is also a multi-core binary,
[04:54.000 --> 04:58.000]  which has a really nice advantage that you can write one part of it in Perl
[04:58.000 --> 05:02.000]  and then another part of it in TCL and then another part of it in C.
[05:02.000 --> 05:04.000]  So you don't have to keep rewriting.
[05:04.000 --> 05:08.000]  You can have different people working on small components.
[05:08.000 --> 05:12.000]  And I had this idea of having the user interface commands,
[05:12.000 --> 05:15.000]  they call the porcelain, and the innards, like the plumbing.
[05:15.000 --> 05:17.000]  But it's all available, right?
[05:17.000 --> 05:19.000]  So if you have Git on your laptop,
[05:19.000 --> 05:23.000]  you can build a commit using the lowest level commands that you want.
[05:23.000 --> 05:26.000]  And that's a huge advantage in getting people involved.
[05:26.000 --> 05:28.000]  Git is a real DIY project.
[05:28.000 --> 05:31.000]  It's not some shiny thing that just magically works.
[05:31.000 --> 05:33.000]  You push a button and have a nice day.
[05:33.000 --> 05:35.000]  It's something that you really have to get involved with.
[05:35.000 --> 05:37.000]  It'll break. You have to learn how it works.
[05:37.000 --> 05:40.000]  And that's the secret to its success, I think.
[05:40.000 --> 05:44.000]  And of course, Git, the interface to Git is the command line, right?
[05:44.000 --> 05:46.000]  So you can build a website around it in Ruby.
[05:46.000 --> 05:49.000]  You can build a website around it in Python.
[05:49.000 --> 05:51.000]  You can build extensions.
[05:51.000 --> 05:52.000]  Very inspiring.
[05:52.000 --> 05:58.000]  I set out to build a similar tool, but for playlists.
[06:01.000 --> 06:05.000]  And the first thing I thought about was the data model.
[06:05.000 --> 06:08.000]  And I realized that actually everything is a playlist.
[06:08.000 --> 06:11.000]  You know, a music collection is just a playlist
[06:11.000 --> 06:14.000]  where the order doesn't really matter.
[06:14.000 --> 06:17.000]  Metadata can be stored as metadata in the playlist.
[06:17.000 --> 06:20.000]  So everything is a playlist.
[06:20.000 --> 06:22.000]  I wanted to write a multi-core binary.
[06:22.000 --> 06:24.000]  This is called CPE.
[06:24.000 --> 06:26.000]  The tool I wrote is called Calliope, by the way.
[06:26.000 --> 06:29.000]  I'm not really here to show off about the tool, actually.
[06:29.000 --> 06:31.000]  You can look at it, and it's fun,
[06:31.000 --> 06:34.000]  but the ideas are the thing I'm more excited about.
[06:34.000 --> 06:36.000]  I'd like people to re-implement this in other languages
[06:36.000 --> 06:39.000]  and go forth with the ideas and do stuff I never thought of,
[06:39.000 --> 06:42.000]  or contribute to the project itself.
[06:42.000 --> 06:44.000]  So it has a multi-core binary.
[06:44.000 --> 06:46.000]  Currently everything's written in Python.
[06:46.000 --> 06:52.000]  That could change if somebody decides to write a new tool in Haskell or whatever.
[06:52.000 --> 06:54.000]  The main interface is the command line.
[06:54.000 --> 07:00.000]  So you can create a recommendation pipeline as a shell pipeline.
[07:00.000 --> 07:04.000]  Or you can do stuff in Python directly for greater control.
[07:04.000 --> 07:06.000]  And it's optimized for ease of maintenance, right?
[07:06.000 --> 07:07.000]  Because I'm lazy.
[07:07.000 --> 07:09.000]  I have one hour a weekend to spend on this,
[07:09.000 --> 07:14.000]  so it has to be easy to maintain.
[07:14.000 --> 07:18.000]  Okay, so the data model, as simple as possible.
[07:18.000 --> 07:19.000]  Here's a playlist item.
[07:19.000 --> 07:22.000]  It's a Python dictionary, which we can represent as JSON,
[07:22.000 --> 07:26.000]  and it has key value pairs.
[07:26.000 --> 07:30.000]  And then a playlist is a list of playlist items.
[07:30.000 --> 07:32.000]  One quite key decision is that,
[07:32.000 --> 07:36.000]  notice I haven't represented this as a JSON list.
[07:36.000 --> 07:38.000]  It's a JSON lines document.
[07:38.000 --> 07:42.000]  So that's JSON objects separated by a new line.
[07:42.000 --> 07:43.000]  And this is really cool,
[07:43.000 --> 07:46.000]  because you can process it with shell pipeline tools.
[07:46.000 --> 07:47.000]  You can process it with JSON tools,
[07:47.000 --> 07:53.000]  but you can also process it with line-based processing tools.
[07:53.000 --> 07:55.000]  Think if we had a JSON list,
[07:55.000 --> 07:57.000]  and this playlist was 10,000 items long,
[07:57.000 --> 07:58.000]  then we stream it,
[07:58.000 --> 08:00.000]  and you have to wait for the closing parenthesis
[08:00.000 --> 08:02.000]  before the next process can read it.
[08:02.000 --> 08:05.000]  But this way, the processes can read a line at a time,
[08:05.000 --> 08:08.000]  and you can have an infinite-length playlist
[08:08.000 --> 08:11.000]  and start processing the beginning of it
[08:11.000 --> 08:14.000]  before it's even, before it's finished.
[08:14.000 --> 08:18.000]  Okay, so that's the data model.
[08:18.000 --> 08:20.000]  Those key value pairs, creator and title,
[08:20.000 --> 08:21.000]  those on arbitrary,
[08:21.000 --> 08:24.000]  those come from an existing playlist format called SPF,
[08:24.000 --> 08:29.000]  which has been around since 2006 and is almost perfect.
[08:29.000 --> 08:31.000]  Like, they got the design almost perfect.
[08:31.000 --> 08:32.000]  One of the flaws was choosing XML,
[08:32.000 --> 08:35.000]  which was a good idea in 2006.
[08:35.000 --> 08:39.000]  And the other tweak I made was representing it as JSON lines,
[08:39.000 --> 08:45.000]  but the data model is effectively the same as SPF.
[08:45.000 --> 08:48.000]  So we can already do some fun stuff with this playlist, right?
[08:48.000 --> 08:52.000]  Let me quickly show you what you can do.
[08:52.000 --> 08:54.000]  Here's a playlist.
[08:54.000 --> 08:59.000]  These songs aren't real, obviously.
[08:59.000 --> 09:01.000]  We can shuffle it.
[09:01.000 --> 09:03.000]  I have to give it a file name,
[09:03.000 --> 09:04.000]  and the file name is standard in.
[09:04.000 --> 09:07.000]  Okay, so now it's shuffled.
[09:07.000 --> 09:11.000]  I can export it to a different playlist format.
[09:11.000 --> 09:14.000]  So now I've converted it into an actual SPF playlist,
[09:14.000 --> 09:18.000]  so you can put it into rhythm box.
[09:18.000 --> 09:20.000]  But we don't even need to use calliope tools, right?
[09:20.000 --> 09:26.000]  I could use head to get the first item.
[09:26.000 --> 09:30.000]  I could shuffle it using sort.
[09:30.000 --> 09:33.000]  Okay.
[09:33.000 --> 09:35.000]  And I can use data-oriented tools as well.
[09:35.000 --> 09:36.000]  So this is actually new shell,
[09:36.000 --> 09:38.000]  which is a data-oriented shell.
[09:38.000 --> 09:43.000]  So I can also load it into new shell,
[09:43.000 --> 09:45.000]  and now I have JSON,
[09:45.000 --> 09:49.000]  and now I can sort it by the artist's name or by the title.
[09:49.000 --> 09:51.000]  So just by defining a data format,
[09:51.000 --> 09:53.000]  you get all this stuff for free.
[09:53.000 --> 09:55.000]  Like, I didn't even have to write any code yet,
[09:55.000 --> 09:59.000]  and we can already shuffle a playlist.
[09:59.000 --> 10:02.000]  So what's next?
[10:02.000 --> 10:05.000]  Well, these aren't even real songs, right?
[10:05.000 --> 10:06.000]  You can't play them.
[10:06.000 --> 10:08.000]  There's no content.
[10:08.000 --> 10:10.000]  So the next step is get some content
[10:10.000 --> 10:14.000]  so we can actually listen to the playlist.
[10:14.000 --> 10:16.000]  The developers of the SPF format have thought of this,
[10:16.000 --> 10:21.000]  and they designed SPF with a portable design
[10:21.000 --> 10:23.000]  where when you go to play the music,
[10:23.000 --> 10:25.000]  you resolve it at that moment.
[10:25.000 --> 10:29.000]  So you search based on the metadata, like creator and title,
[10:29.000 --> 10:33.000]  and then you find a URL where you can actually play it.
[10:33.000 --> 10:36.000]  So I implemented that,
[10:36.000 --> 10:38.000]  and I can demo that as well.
[10:38.000 --> 10:40.000]  Okay, so here's three.
[10:40.000 --> 10:42.000]  These are real songs now,
[10:42.000 --> 10:50.000]  and if I pipe it to the Spotify sub-command,
[10:50.000 --> 10:53.000]  they get resolved to actual tracks on Spotify.
[10:53.000 --> 10:55.000]  So over here somewhere is a URL,
[10:55.000 --> 10:59.000]  and you can click it and listen to the track.
[10:59.000 --> 11:01.000]  This is all done using the Spotify API,
[11:01.000 --> 11:03.000]  so you need a Spotify API key to do that.
[11:03.000 --> 11:06.000]  You can get it for free, but it's a little bit of an effort.
[11:06.000 --> 11:10.000]  And it works by searching based on creator, title,
[11:10.000 --> 11:12.000]  and ranking the results.
[11:12.000 --> 11:15.000]  Or I can resolve it to tracks on my local machine.
[11:15.000 --> 11:17.000]  So I'm a GNOME developer,
[11:17.000 --> 11:20.000]  so I have the tracker search engine installed,
[11:20.000 --> 11:23.000]  and tracker can match against my local music collection
[11:23.000 --> 11:26.000]  and return the URL.
[11:26.000 --> 11:29.000]  Let me make that pretty.
[11:29.000 --> 11:33.000]  Okay, so it's resolved to URLs on my local machine.
[11:33.000 --> 11:36.000]  This one, I seem to have deleted the Madonna album,
[11:36.000 --> 11:38.000]  but the other two are here.
[11:38.000 --> 11:41.000]  And then you see here I exported it as an M3U playlist as well,
[11:41.000 --> 11:44.000]  now that we have URLs.
[11:44.000 --> 11:49.000]  So this is the basics of how you can make playlists in Python, right?
[11:49.000 --> 11:52.000]  What's next?
[11:52.000 --> 11:54.000]  So I promised music recommendations, right,
[11:54.000 --> 11:57.000]  and we haven't actually done any recommendations yet.
[11:57.000 --> 12:02.000]  So the next thing I'm going to talk about is a program I made
[12:02.000 --> 12:05.000]  that generates me a playlist every day.
[12:05.000 --> 12:08.000]  And that's as far as I've got with this,
[12:08.000 --> 12:11.000]  because actually I quite like the playlists it generates,
[12:11.000 --> 12:14.000]  so I haven't needed to make any other recommenders yet.
[12:14.000 --> 12:16.000]  I'm still happy with this one.
[12:16.000 --> 12:18.000]  Soon I shall look at some more.
[12:18.000 --> 12:20.000]  But a recommendation algorithm is basically this.
[12:20.000 --> 12:23.000]  You have a very big playlist on the left,
[12:23.000 --> 12:25.000]  which is all the possible music you could listen to,
[12:25.000 --> 12:28.000]  and then some sort of algorithm happens,
[12:28.000 --> 12:30.000]  and on the right you have a shorter playlist,
[12:30.000 --> 12:35.000]  which is hopefully better, and that's the one you listen to.
[12:35.000 --> 12:40.000]  So the algorithm I came up with, I called it the Special Mix,
[12:40.000 --> 12:45.000]  and its goal is to create a one-hour playlist of music that I already know,
[12:45.000 --> 12:47.000]  and there's three ingredients for that.
[12:47.000 --> 12:49.000]  All of these are Python libraries.
[12:49.000 --> 12:51.000]  One is PyListenBrains,
[12:51.000 --> 12:54.000]  which is an interface to the ListenBrains database.
[12:54.000 --> 12:56.000]  One is the Beats Music Organiser,
[12:56.000 --> 12:59.000]  which is a great tool for maintaining a local music collection.
[12:59.000 --> 13:02.000]  And one is the Python SimpleAI module,
[13:02.000 --> 13:05.000]  which gives you really basic AI algorithms
[13:05.000 --> 13:08.000]  that let you do constraint solving.
[13:08.000 --> 13:10.000]  So I'll go through those one at a time.
[13:10.000 --> 13:12.000]  I'll go have a little drink first.
[13:16.000 --> 13:20.000]  So if you want to do music recommendations,
[13:20.000 --> 13:23.000]  it's a good idea to save the history of what you listen to.
[13:23.000 --> 13:25.000]  Spotify already does that,
[13:25.000 --> 13:29.000]  although they make it a little difficult for you to then get at the data.
[13:29.000 --> 13:32.000]  Lastfm does that, and ListenBrains,
[13:32.000 --> 13:35.000]  which I recommend that solution because it's open.
[13:35.000 --> 13:37.000]  It's an open source platform.
[13:37.000 --> 13:39.000]  It's open data.
[13:39.000 --> 13:42.000]  So you can get a browser extension,
[13:42.000 --> 13:44.000]  or phone apps and music players
[13:44.000 --> 13:46.000]  that will save everything you listen to
[13:46.000 --> 13:48.000]  into the ListenBrains database,
[13:48.000 --> 13:51.000]  and then ListenBrains gives you charts and graphs
[13:51.000 --> 13:53.000]  to show what a great taste you have.
[13:53.000 --> 13:57.000]  And Python ListenBrains and the Kaliot ListenBrains command
[13:57.000 --> 13:59.000]  let you access the data.
[14:01.000 --> 14:03.000]  So...
[14:05.000 --> 14:08.000]  I would run the ListenBrains history command,
[14:08.000 --> 14:10.000]  put my username,
[14:10.000 --> 14:12.000]  and fetch all the listens.
[14:12.000 --> 14:14.000]  This does something kind of dumb.
[14:14.000 --> 14:17.000]  It just syncs all of the listens into a local SQLI database.
[14:17.000 --> 14:19.000]  And then I've dumped the first one here
[14:19.000 --> 14:21.000]  to show the kind of metadata you get.
[14:21.000 --> 14:24.000]  So you get a timestamp, you get an ID for the track,
[14:24.000 --> 14:27.000]  and then you get the creator and the title and the album.
[14:27.000 --> 14:29.000]  And in this case, the URL of where I listened to it,
[14:29.000 --> 14:31.000]  because it came from the web scrubbler.
[14:31.000 --> 14:33.000]  So that's useful.
[14:33.000 --> 14:36.000]  And then because it saved in a local SQL database,
[14:36.000 --> 14:39.000]  we can do things like calculate a histogram
[14:39.000 --> 14:42.000]  of which years I actually listened to music.
[14:42.000 --> 14:45.000]  We can select tracks based on,
[14:45.000 --> 14:48.000]  okay, it was first listened to in 2019,
[14:48.000 --> 14:51.000]  or it was first listened to in 2020.
[14:51.000 --> 14:53.000]  So that's what I did.
[14:53.000 --> 14:55.000]  And now I have a playlist, right?
[14:55.000 --> 14:57.000]  So the first thing the special mix does
[14:57.000 --> 15:01.000]  is it chooses one year out of this histogram,
[15:01.000 --> 15:04.000]  so a year where I did actually listen to music,
[15:04.000 --> 15:07.000]  and then it selects all the tracks
[15:07.000 --> 15:09.000]  that I listened to
[15:09.000 --> 15:11.000]  where the first listen is in that year.
[15:11.000 --> 15:13.000]  So songs I discovered in 2019
[15:13.000 --> 15:16.000]  or discovered in 2021, for example.
[15:16.000 --> 15:19.000]  And now we have a playlist, but it's very long.
[15:19.000 --> 15:23.000]  So the next step is...
[15:23.000 --> 15:26.000]  The next step would be to select tracks from it.
[15:26.000 --> 15:29.000]  However, we want to know a bit more about the songs first.
[15:29.000 --> 15:31.000]  So actually the next step here
[15:31.000 --> 15:35.000]  is to resolve the tracks to local content.
[15:35.000 --> 15:37.000]  This is where I keep my music collection.
[15:37.000 --> 15:39.000]  It's a hi-tech home server.
[15:39.000 --> 15:42.000]  And I manage it with a Python program called Beats.
[15:42.000 --> 15:45.000]  This is a command line tool that lets you
[15:46.000 --> 15:48.000]  take music that you've got from Bandcamp or wherever
[15:48.000 --> 15:50.000]  and import it into a database
[15:50.000 --> 15:52.000]  and it fixes the tags using MusicBrains.
[15:52.000 --> 15:54.000]  So it's always correct.
[15:54.000 --> 15:56.000]  It has nice apostrophe characters.
[15:56.000 --> 16:00.000]  You can use any content resolver in theory.
[16:00.000 --> 16:02.000]  You can generate playlists against Spotify
[16:02.000 --> 16:04.000]  and upload them to Spotify,
[16:04.000 --> 16:06.000]  but that's not my goal here.
[16:06.000 --> 16:10.000]  So using Beats, I can resolve the tracks in the playlist
[16:10.000 --> 16:14.000]  to actual mp3 files on my music server.
[16:14.000 --> 16:16.000]  That's actually a lie.
[16:16.000 --> 16:18.000]  I haven't implemented that yet and I use Tracker.
[16:18.000 --> 16:20.000]  But because Beats is written in Python,
[16:20.000 --> 16:22.000]  we'll pretend that I use Beats.
[16:22.000 --> 16:25.000]  Either way, now I have a playlist
[16:25.000 --> 16:28.000]  where the track location is set to a file
[16:28.000 --> 16:31.000]  and we also know the duration of every song.
[16:31.000 --> 16:34.000]  So we have a bit of more metadata.
[16:34.000 --> 16:38.000]  And now we can select
[16:38.000 --> 16:41.000]  which tracks go in the playlist.
[16:41.000 --> 16:43.000]  Okay, so here's the fun part.
[16:43.000 --> 16:45.000]  All the parts are fun,
[16:45.000 --> 16:47.000]  but maybe this is the most fun part.
[16:49.000 --> 16:54.000]  The approach I took was to try and do constraint solving.
[16:54.000 --> 16:57.000]  Now, this is a pretty old area of AI, right?
[16:57.000 --> 16:59.000]  People have been looking at constraint solving
[16:59.000 --> 17:01.000]  for years and years, so the fashion at the moment
[17:01.000 --> 17:03.000]  is to solve everything with machine learning
[17:03.000 --> 17:05.000]  and lots and lots of GPUs.
[17:05.000 --> 17:07.000]  And that works. It produces nice results,
[17:07.000 --> 17:09.000]  but it's hard to reproduce in an hour
[17:09.000 --> 17:12.000]  on the weekend on your old laptop,
[17:12.000 --> 17:15.000]  whereas the constraint solving approach is pretty simple.
[17:15.000 --> 17:19.000]  You can run it on, you know, a Raspberry Pi with no issues.
[17:21.000 --> 17:24.000]  This was inspired by a paper that I found,
[17:24.000 --> 17:26.000]  which, again, is from 2008.
[17:26.000 --> 17:28.000]  It's nothing too futuristic.
[17:28.000 --> 17:32.000]  And in this paper, they define a constraint model.
[17:32.000 --> 17:35.000]  So this looks kind of academic if you're not a mathematician.
[17:35.000 --> 17:40.000]  But these are ways of making constraints on a playlist,
[17:40.000 --> 17:42.000]  such as I want the playlist for whatever reason
[17:42.000 --> 17:45.000]  to be 20% or more Stevie Wonder songs.
[17:45.000 --> 17:50.000]  And they can express that as a function.
[17:50.000 --> 17:54.000]  And the key is that you can apply this function to a playlist.
[17:54.000 --> 18:00.000]  So let's say I have a playlist and it has 100 Stevie Wonder songs.
[18:01.000 --> 18:05.000]  This constraint function would return a score of one
[18:05.000 --> 18:08.000]  for that playlist because every song is Stevie Wonder,
[18:08.000 --> 18:10.000]  so the constraint is completely satisfied.
[18:10.000 --> 18:12.000]  Now let's say I have another playlist,
[18:12.000 --> 18:14.000]  which is entirely Death Metal.
[18:14.000 --> 18:17.000]  This constraint function would return zero
[18:17.000 --> 18:20.000]  because none of the tracks are by Stevie Wonder.
[18:20.000 --> 18:23.000]  And if we had a playlist that was 10% Stevie Wonder songs,
[18:23.000 --> 18:27.000]  then we would assume this function would return 0.5
[18:27.000 --> 18:31.000]  because the playlist kind of half satisfies the constraint.
[18:31.000 --> 18:34.000]  So the first step in constraint solving like this
[18:34.000 --> 18:39.000]  is to define a constraint function that can score any playlist.
[18:39.000 --> 18:42.000]  And then we use a local search algorithm
[18:42.000 --> 18:46.000]  to find a playlist that best matches the constraints.
[18:46.000 --> 18:49.000]  So local search is a kind of try it,
[18:49.000 --> 18:52.000]  try, try, try, try again until you get bored
[18:52.000 --> 18:55.000]  and then pick the best solution that you found.
[18:55.000 --> 18:58.000]  You set a fixed number of iterations like 10,000
[18:58.000 --> 19:00.000]  and you kind of go, OK, well, this works,
[19:00.000 --> 19:02.000]  this one's a bit better, this one's worse,
[19:02.000 --> 19:05.000]  and choose the best one that you found.
[19:05.000 --> 19:09.000]  So I'm going to do a quick worked example of this
[19:09.000 --> 19:11.000]  with two constraints.
[19:11.000 --> 19:13.000]  And the constraints I'm going to put are
[19:13.000 --> 19:15.000]  the songs must be,
[19:15.000 --> 19:18.000]  each song must be two to four minutes long
[19:18.000 --> 19:21.000]  and the playlist as a whole must be 10 minutes.
[19:21.000 --> 19:29.000]  And here's a demo of solving that constraint problem.
[19:29.000 --> 19:31.000]  As promised, it fits on a sheet of A4 paper.
[19:31.000 --> 19:33.000]  This is the whole program.
[19:33.000 --> 19:36.000]  So here I've defined two constraints.
[19:36.000 --> 19:39.000]  One of them is an item duration constraint
[19:39.000 --> 19:41.000]  saying that the duration of each item
[19:41.000 --> 19:44.000]  should be between two and four minutes.
[19:44.000 --> 19:46.000]  And here's a playlist duration constraint
[19:46.000 --> 19:48.000]  saying that the overall duration should be
[19:48.000 --> 19:50.000]  between 10 and 10 minutes.
[19:50.000 --> 19:52.000]  I know exactly what I want.
[19:52.000 --> 19:55.000]  And then here is the input,
[19:55.000 --> 19:58.000]  which is a playlist made up of four fake songs.
[19:58.000 --> 20:03.000]  I haven't used real songs here because it's too complicated.
[20:03.000 --> 20:05.000]  Notice they have vastly different lengths,
[20:05.000 --> 20:07.000]  so it's quite hard to solve this problem.
[20:07.000 --> 20:10.000]  There's not an obvious solution.
[20:10.000 --> 20:13.000]  And we're going to use the Kaliope Select module,
[20:13.000 --> 20:16.000]  which internally uses simple AI.
[20:16.000 --> 20:21.000]  So the only thing that's required in this playlist,
[20:21.000 --> 20:23.000]  the only required piece of metadata is an ID.
[20:23.000 --> 20:26.000]  In this case, I've put emojis because they're pretty,
[20:26.000 --> 20:28.000]  but normally you'd have an integer or something.
[20:28.000 --> 20:31.000]  And these constraints will look at the duration field.
[20:31.000 --> 20:33.000]  So the duration field is also required
[20:33.000 --> 20:36.000]  because otherwise we can't calculate the score, right?
[20:36.000 --> 20:39.000]  Because we need to know the duration of each item.
[20:39.000 --> 20:42.000]  So if I've got time, how are we doing for time?
[20:42.000 --> 20:45.000]  Okay, I think I have time to show you
[20:45.000 --> 20:49.000]  a live demo of solving this constraint problem.
[20:49.000 --> 20:52.000]  And the good news is,
[20:52.000 --> 20:54.000]  Simple AI, the Simple AI module,
[20:54.000 --> 20:58.000]  has a web viewer that can give us a kind of graphical view
[20:58.000 --> 21:00.000]  of what's going on here.
[21:00.000 --> 21:04.000]  So with luck, yeah, with luck this will load,
[21:04.000 --> 21:07.000]  and it's going to step through
[21:07.000 --> 21:09.000]  each step of the local search algorithm
[21:09.000 --> 21:11.000]  to find the best playlist.
[21:11.000 --> 21:16.000]  So in the beginning we have an empty playlist
[21:16.000 --> 21:18.000]  and the score is zero because it doesn't satisfy
[21:18.000 --> 21:21.000]  either of the constraints.
[21:21.000 --> 21:24.000]  We take a step.
[21:24.000 --> 21:27.000]  Each step it can choose to do different actions
[21:27.000 --> 21:29.000]  that will change the playlist.
[21:29.000 --> 21:33.000]  So here it's chosen actions of adding a song
[21:33.000 --> 21:35.000]  because that's all it can do.
[21:35.000 --> 21:37.000]  So we can add this song which was quite long
[21:37.000 --> 21:40.000]  and the score is now 0.4 because one of the songs is too long.
[21:40.000 --> 21:43.000]  Or if we have the really short song, the score is lower.
[21:43.000 --> 21:48.000]  Or if we add this seven-minute ambient tune,
[21:48.000 --> 21:52.000]  then, ah, it's very difficult to get the scale right.
[21:52.000 --> 21:56.000]  There we go.
[21:56.000 --> 21:58.000]  So this one has the highest score,
[21:58.000 --> 22:00.000]  no, this one has the highest score actually.
[22:00.000 --> 22:03.000]  So probably it's going to choose this one.
[22:03.000 --> 22:05.000]  Yeah, so the next step we have a playlist
[22:05.000 --> 22:07.000]  which is just this song.
[22:07.000 --> 22:08.000]  What song was that?
[22:08.000 --> 22:10.000]  Amazing tune.
[22:10.000 --> 22:12.000]  So we have amazing tune in the list
[22:12.000 --> 22:14.000]  and now the score is 0.6.
[22:14.000 --> 22:17.000]  Let's take the next step.
[22:17.000 --> 22:20.000]  Okay, so now one of the possible actions
[22:20.000 --> 22:22.000]  is removing the song again,
[22:22.000 --> 22:25.000]  but we won't do that because the score is back down to zero.
[22:25.000 --> 22:27.000]  Or we can add one of these other songs
[22:27.000 --> 22:30.000]  and this one seems to be the playlist
[22:30.000 --> 22:34.000]  that best matches the constraint.
[22:34.000 --> 22:37.000]  Okay, so now we've got a playlist of two items.
[22:38.000 --> 22:40.000]  Now we can take some more actions.
[22:40.000 --> 22:44.000]  We can add another song or remove either of those songs.
[22:44.000 --> 22:47.000]  And we've added another song.
[22:47.000 --> 22:50.000]  Probably at this point it's going to say,
[22:50.000 --> 22:53.000]  okay, so it can't find any actions that increase the score,
[22:53.000 --> 22:55.000]  so the algorithm has said, right, it's done.
[22:55.000 --> 22:57.000]  That's the best playlist you're going to get.
[22:57.000 --> 23:00.000]  And that is how you can create a playlist
[23:00.000 --> 23:03.000]  in a page full of Python.
[23:03.000 --> 23:06.000]  So pretty short on time.
[23:07.000 --> 23:09.000]  We can export this playlist.
[23:09.000 --> 23:12.000]  And I have a jelly-fin music player set up
[23:12.000 --> 23:15.000]  and that's how I listen to it.
[23:15.000 --> 23:18.000]  That's a recap of what we've just seen.
[23:18.000 --> 23:20.000]  So what's next? I don't know, really.
[23:20.000 --> 23:22.000]  We have a couple of minutes for questions.
[23:22.000 --> 23:25.000]  Maybe you can answer the question of what's next.
[23:26.000 --> 23:29.000]  APPLAUSE
[23:41.000 --> 23:43.000]  Thank you, Sam.
[23:43.000 --> 23:46.000]  We have two minutes for questions.
[23:46.000 --> 23:49.000]  I will repeat the questions then.
[23:49.000 --> 23:54.000]  Yeah, so if I want to actually use this project
[23:54.000 --> 23:56.000]  like, for example, Francis,
[23:56.000 --> 24:00.000]  how much time would it take me to set up the project
[24:00.000 --> 24:05.000]  and find constraints that actually match and replace music
[24:05.000 --> 24:08.000]  given that I'm fairly familiar with Python
[24:08.000 --> 24:10.000]  or setting up some projects?
[24:10.000 --> 24:13.000]  So how much time would it take to set up the whole project
[24:13.000 --> 24:15.000]  and find the constraints and so on?
[24:15.000 --> 24:17.000]  Yeah, interesting question, actually.
[24:17.000 --> 24:19.000]  So, I mean, setting up the project is fairly easy.
[24:19.000 --> 24:21.000]  You pip install.
[24:22.000 --> 24:24.000]  After that...
[24:24.000 --> 24:26.000]  I don't know, it wouldn't be quick.
[24:26.000 --> 24:28.000]  I'll tell you that.
[24:28.000 --> 24:31.000]  You would have to enjoy getting your hands dirty a bit at this stage.
[24:31.000 --> 24:35.000]  My general goal is to create a folder of example recommenders.
[24:35.000 --> 24:37.000]  So hopefully in future you'd be able to...
[24:37.000 --> 24:40.000]  And you can actually run the examples as modules as well.
[24:40.000 --> 24:43.000]  So hopefully in the future you'd be able to kind of run
[24:43.000 --> 24:46.000]  an existing example and get started fairly quickly
[24:46.000 --> 24:48.000]  and just tweak a few values.
[24:52.000 --> 24:55.000]  One more over there.
[24:55.000 --> 24:57.000]  Or one over here.
[24:57.000 --> 25:01.000]  So depending on the number of times you have used
[25:01.000 --> 25:03.000]  this recommendation system,
[25:03.000 --> 25:06.000]  how often has it repeated the same set of music,
[25:06.000 --> 25:10.000]  same set of songs or very similar tasting songs?
[25:10.000 --> 25:12.000]  Yeah, how long has it repeated?
[25:12.000 --> 25:15.000]  It's never come up with two playlists the same, actually.
[25:15.000 --> 25:17.000]  What it does sometimes do, though, is it'll take an album
[25:17.000 --> 25:20.000]  and kind of give me four or five songs of the same album
[25:20.000 --> 25:22.000]  in one playlist.
[25:22.000 --> 25:24.000]  So maybe I need to tweak the constraints a bit there.
[25:24.000 --> 25:27.000]  But there's infinite possibilities, really.
[25:27.000 --> 25:30.000]  Yeah, I haven't got bored of it so far.
[25:30.000 --> 25:33.000]  If your input is very short, then it will get repetitive.
[25:33.000 --> 25:35.000]  One more question?
[25:41.000 --> 25:43.000]  Thanks for the very interesting talk.
[25:43.000 --> 25:45.000]  Just a quick question.
[25:45.000 --> 25:47.000]  How easy would it be to implement?
[25:47.000 --> 25:52.000]  So say if you wanted to search for different performances
[25:52.000 --> 25:55.000]  or different interpretations of a particular piece of music,
[25:55.000 --> 25:58.000]  so if you had classical music, some symphony
[25:58.000 --> 26:00.000]  with a bunch of different recordings,
[26:00.000 --> 26:02.000]  how easy would it be to implement that in the current?
[26:02.000 --> 26:05.000]  How familiar are you with music brains?
[26:05.000 --> 26:06.000]  Not at all.
[26:06.000 --> 26:08.000]  So the tool you would use would be music brains.
[26:08.000 --> 26:10.000]  Yeah, talk to this guy.
[26:10.000 --> 26:12.000]  He'll bring you up to speed.
[26:12.000 --> 26:14.000]  Thank you again, Sam.
[26:14.000 --> 26:16.000]  Thank you.