[00:00.000 --> 00:11.080]  Hello, everyone. It's good to be back. It's been a while. This is my first time giving
[00:11.080 --> 00:18.960]  a talk here. I'm really pleased to be here. My name's Joe. I am a coder. I work in London
[00:18.960 --> 00:26.720]  for local government. I work a lot with geospatial data, and I am a Python programmer. Have we
[00:26.720 --> 00:38.360]  got any Python coders in today? Anyone using Jupyter? Cool. Right. So let's go. So in lockdown
[00:38.360 --> 00:46.720]  in 21, we had a census in England and Wales, and the data is coming now. Most of the data,
[00:46.720 --> 00:52.960]  all of the data, sorry, is spatial data. So we want to look at this on a map. Why? Most
[00:52.960 --> 00:58.960]  of the data is geospatial. In local government, everything that we do generally happens somewhere,
[00:58.960 --> 01:04.520]  whether it's collecting a bin, looking after young people, looking after old people, cleaning
[01:04.520 --> 01:11.480]  the streets. We always have to think about where this is happening. Apparently, 60% of
[01:11.480 --> 01:18.920]  all data is geospatial data. So I spent a lot of my time making maps in terms of data
[01:18.920 --> 01:26.240]  of this. Now, I'm going to be focusing on one part of the census data set today, and
[01:26.240 --> 01:33.280]  that's the east end of London in an area called Tower Hamlets. This may be familiar to some
[01:33.280 --> 01:43.320]  people if you've ever seen places like Columbia Road, Bethnal Green, Canary Wharf. These are
[01:43.320 --> 01:47.880]  all parts of the east end of London, and this is the main area I'm going to be talking
[01:47.880 --> 01:57.600]  about. So where is Tower Hamlets in London? So what you can see here is a very small area.
[01:57.600 --> 02:04.040]  It's 20 square kilometers, but this is quite a special area because in the whole of England
[02:04.040 --> 02:10.040]  and Wales, it has the highest population density. It has the most people packed into a small
[02:10.040 --> 02:16.880]  area. It also has the fastest growing population, so it's becoming more and more dense. So
[02:16.880 --> 02:23.360]  in terms of providing services for residents, we need to have a big think about where all
[02:23.360 --> 02:31.120]  the people are and how they fit in. Now, when we make maps, the first thing we usually do
[02:31.120 --> 02:40.240]  is we make a coropleth map. However, the data set for population density in our area, and
[02:40.240 --> 02:47.440]  I do apologize, I couldn't fit it all on screen. It doesn't appear very well as a coropleth.
[02:47.440 --> 02:53.360]  The reason is because the data set is not very evenly distributed. There is, as we will
[02:53.360 --> 03:02.120]  see, some areas with extremely high population density. So over here you've got Whitechapel.
[03:02.120 --> 03:07.840]  We have very high population density in Whitechapel. Over here we have a new development which used
[03:07.840 --> 03:13.440]  to be industrial land. Again, very, very high density developments, big, big towers full
[03:13.440 --> 03:19.680]  of people. And then we also have, just to the south of the financial sector, some areas
[03:19.680 --> 03:25.280]  of very high population density with a lot of people packed into a small place. But in
[03:25.280 --> 03:30.720]  terms of the data viz, this map doesn't really help very much. So the coropleth data viz
[03:30.720 --> 03:36.080]  didn't work for us. So we began to think, what else can we try? And we checked the data
[03:36.080 --> 03:41.520]  distribution, and sure enough, we've got some serious outliers. This is why the coropleth
[03:41.520 --> 03:48.240]  map didn't work very well for us. So what did we do next? We tried to log-transform
[03:48.240 --> 03:55.440]  the data. And yeah, you can see, you know, this area here. You can begin to see the density
[03:55.440 --> 04:01.320]  there. There's quite a few large developments with a lot of people squeezed in. Whitechapel,
[04:01.320 --> 04:06.200]  you don't see so much happening there. But you do see, just to the south of the financial
[04:06.200 --> 04:13.280]  sector, high density of population. The areas with low density, this is where all the banks
[04:13.280 --> 04:19.560]  are. So obviously, there's no people living in there. This is an old dock near to the
[04:19.560 --> 04:23.880]  Tower of London. There's no people living there. There's some very nice pubs, though.
[04:23.880 --> 04:29.680]  If you ever find yourself in that area, the Dickens Inn is excellent. I can recommend
[04:29.680 --> 04:36.360]  that to everybody. And then up here in the north, we have Victoria Park, which is where
[04:36.360 --> 04:43.200]  the East End borders with Hackney. And obviously, there's no people there, at least having their
[04:43.200 --> 04:51.080]  address registered there. Log-transform data looks better on a coropleth map. However,
[04:51.080 --> 04:57.760]  you can see the legend. You lose the data. So you can try to fix the legend. But we want
[04:57.760 --> 05:02.240]  to write as little code as we possibly can. We don't want to keep fixing legends and things
[05:02.240 --> 05:08.800]  like that. So we began to think about other ways to visualize our data set. So what did
[05:08.800 --> 05:17.440]  we do? I am a Python coder, but there's a really nice package in R called Cartogram.
[05:17.440 --> 05:27.240]  And this is a technique called a density equalization algorithm that basically turns your data set
[05:27.240 --> 05:35.240]  into a Voronoi first, and then it rescales the polygons from the Voronoi relative to
[05:35.240 --> 05:43.480]  an attribute of the data. This technique is quite popular. There's a wonderful geographer
[05:43.480 --> 05:50.480]  called Danny Dooling, who has an amazing website called World Mapper, which I strongly recommend
[05:50.480 --> 05:57.880]  you have a look at. And they do things like showing poverty, inequality, food pressure
[05:57.880 --> 06:04.400]  all around the world. And they size the geographies relative to the attributes of the geospatial
[06:04.400 --> 06:10.720]  data. So this is a great technique. There is one issue here, though, is that if you
[06:10.720 --> 06:16.040]  want to overlay different layers, then it becomes difficult. And also, the map does
[06:16.040 --> 06:22.120]  look a little bit unfamiliar as well. But it does show particularly where you have like
[06:22.120 --> 06:28.480]  clustering, where you have a number of census areas, and I'm going to say a little bit more
[06:28.480 --> 06:35.320]  about census areas, where you have a few together that have high data attribute value, then
[06:35.320 --> 06:41.040]  they all get bigger together. So what we can see here is just to the south of the financial
[06:41.040 --> 06:46.280]  sector, you can see there's a lot of worker bees all crammed into this place, and then
[06:46.280 --> 06:51.040]  it increases the volume on the map. So it's a nice data vis, but still we have a small
[06:51.040 --> 06:57.160]  challenge if we want to add more data over the top. And also, it's a bit unfamiliar for
[06:57.160 --> 07:07.160]  people that don't use cartograms. So this is a map made using Data Rapper. It's a very
[07:07.160 --> 07:14.320]  nice website, and they have something called a symbol plot. And what this does is it just
[07:14.320 --> 07:21.800]  basically shows little mountains, little peaks, that show the value of the data attribute
[07:21.800 --> 07:28.120]  that you're interested in at the place where that data is happening. And so again, we can
[07:28.120 --> 07:35.240]  see over here, you've got Whitechapel, lots of people packed in there. Just to the south
[07:35.240 --> 07:41.560]  of the financial sector, lots of people packed in there. The new developments here by the
[07:41.560 --> 07:47.400]  river in Blackwell, and here by the river in the old industrial zone. So this is quite
[07:47.400 --> 07:54.000]  interesting. It gives us some context, and it gives us the data. I really like this data
[07:54.000 --> 08:01.880]  vis, but it's Data Rapper, so it's not FOS, and it's not Python, and I like to use Python.
[08:01.880 --> 08:10.200]  So it was great, but it helped, but it didn't do everything that we needed it to do. The
[08:10.200 --> 08:15.800]  other thing that you will notice, and I'll try to explain this briefly, is that we have
[08:15.800 --> 08:22.360]  one really high value here. And there's a reason for this. It's an outlier, because
[08:22.360 --> 08:31.400]  actually it's this value here. It's an outlier, because, and the reason why it's an outlier
[08:31.400 --> 08:37.400]  is because the actual census area is really, really small. And the thing about the people
[08:37.400 --> 08:44.520]  who produce the census data is that they have to create census areas using roughly 100 to
[08:44.520 --> 08:49.200]  600 people. Generally speaking, it's about 300 people, but they have to make it all fit
[08:49.200 --> 08:54.600]  together like a big jigsaw puzzle. So sometimes, you know, it's hard for them to make it work
[08:54.600 --> 09:01.040]  really well. So in this case, this census area with really high density is actually
[09:01.040 --> 09:08.200]  just one building. And so it's not a particularly big building, but everyone squeezed in there.
[09:08.200 --> 09:14.560]  So yes, so the data is quite hard to work with, but it is interesting. So when I was
[09:14.560 --> 09:19.960]  working with Data Rapper, I really liked it, and it did remind me of when I was young and
[09:19.960 --> 09:24.640]  I was reading Lord of the Rings books, I used to really like the map at the front of all
[09:24.640 --> 09:35.240]  these mountains, showing the misty mountains in those books. And so I was thinking, I could
[09:35.240 --> 09:42.480]  probably make a mountain with Python. How hard can it be? It turns out it's really easy.
[09:42.480 --> 09:51.000]  This is the essence of the library. It's just one function. You take a point on a map, you
[09:51.000 --> 09:57.800]  turn that point into a line. The line has a start point, which is just a couple of points
[09:57.800 --> 10:04.000]  of longitude, a tiny little bit of longitude to the west of your point. Then you convert
[10:04.000 --> 10:10.560]  your point to a latitude, which is kind of like a proxy for the height of the mountain,
[10:10.560 --> 10:19.400]  using some kind of algorithm that you choose. In my case, I'm just like using a range. So
[10:19.400 --> 10:25.840]  I take the minimum and maximum value of the input range, which is a separate function
[10:25.840 --> 10:33.320]  here. And range one is essentially the minimum population density and the maximum. And then
[10:33.320 --> 10:40.960]  I convert that to latitude values. And then the third point on the line is just a little
[10:40.960 --> 10:48.040]  bit of longitude to the east of my point. And then you use that to create a small triangle,
[10:48.040 --> 10:51.960]  really easy, really easy, and a lot of fun as well.
[10:51.960 --> 10:59.840]  So this is what I made with Python. And it's very similar to the data wrapper map, but
[10:59.840 --> 11:05.040]  I was going for like a kind of hand drawn kind of a look to make it look like something
[11:05.040 --> 11:10.560]  from Lord of the Rings. And, you know, it's the same thing. You've got Whitechapel here.
[11:10.560 --> 11:18.360]  You've got the financial sector here, and so on and so on. So that was fun. But, you know,
[11:18.360 --> 11:23.800]  population density, we were just talking about the reasons why it's a messy data set. There's
[11:23.800 --> 11:31.280]  one place in Chelsea, which has a population density of two million people per square kilometer.
[11:31.280 --> 11:38.320]  So this is a very difficult data set to represent using any tools available. So, you know, it's
[11:38.320 --> 11:43.720]  interesting. The other thing about Kensington and Chelsea is this is where Grenfell Tower
[11:43.720 --> 11:51.600]  is, if anybody knows about that story. This is where it happened.
[11:51.600 --> 11:56.560]  So let's try some other data sets to see if they're really messy. This is people that
[11:56.560 --> 12:03.240]  live in one bedroom homes. So this is tiny little flats, you know, filled with people.
[12:03.240 --> 12:07.200]  And so you can see all the worker bees for the financial sector. A lot of those are living
[12:07.200 --> 12:13.560]  in one bedroom flats. And actually, the new builds. This is a very new development here.
[12:13.560 --> 12:17.440]  And this is a very new development here. So it looks like people who are building homes
[12:17.440 --> 12:23.600]  now are building a lot of one bedroom homes. Two bedroom homes. Generally, everything is
[12:23.600 --> 12:30.680]  kind of the same. Nothing really jumps out here. Three bedroom homes. What you can start
[12:30.680 --> 12:36.520]  to see with three bedroom homes is that, yeah, it's generally even. But actually, in this
[12:36.520 --> 12:44.160]  area here, which is a bow, which is near the bow bow's church, which is used to decide
[12:44.160 --> 12:48.800]  if someone's a traditional East End cockney or not. That's kind of this area, really.
[12:48.800 --> 12:54.880]  So the cockneys seem to have three bedroom homes, generally. And then four or more. And
[12:54.880 --> 13:00.400]  what you see here is in the areas where the financial workers live, there's still quite
[13:00.400 --> 13:05.320]  a lot of four bedroom homes. But in some of these new build areas, there's very, very
[13:05.320 --> 13:12.560]  few relative to the rest of the area. So let's look at another slightly more famous area.
[13:12.560 --> 13:20.080]  This is Westminster in central London. And so you can see this is where Hyde Park is.
[13:20.080 --> 13:26.480]  There's no one living there. Again, this is the population density dataset. And then you've
[13:26.480 --> 13:33.880]  got an open street map based map just to help with orientation. And then in a future version
[13:33.880 --> 13:40.640]  of the module, I think I might do some more stuff with open street map. And then if you
[13:40.640 --> 13:47.480]  look at some of the outer London areas, and this is where I live, you can see like areas
[13:47.480 --> 13:53.520]  of urban density, but you can also see some very suburban areas where the population density
[13:53.520 --> 13:58.240]  is lower. This is like where most people are living in houses, basically. And you can also
[13:58.240 --> 14:04.080]  see green space. So we're nearly finished. I just want to give a massive shout to NB
[14:04.080 --> 14:12.360]  Dev. It's really good if you use Jupiter. Just check it out. Number one, if you're trying
[14:12.360 --> 14:19.600]  to do version control on Jupiter notebooks, it helps you with any clashes, any merge conflicts
[14:19.600 --> 14:24.800]  because it removes the metadata in the JSON that sometimes causes conflicts. If you have
[14:24.800 --> 14:30.000]  a team of people working on the same notebook, this is a real lifesaver. And also it just
[14:30.000 --> 14:36.960]  bakes in good practice. So it means that your code gets shared on GitHub really easily.
[14:36.960 --> 14:42.600]  It helps you or encourages you at least to write good documentation for your team and
[14:42.600 --> 14:49.800]  the community. It also encourages you to write good tests. And it enables you to publish
[14:49.800 --> 14:59.680]  modules. So big shout to them. I'd also like to thank Jarek, who has produced a wonderful
[14:59.680 --> 15:06.240]  PWA for FOSSTEM called Sejourner OX. Do check it out. It's a really good way of looking
[15:06.240 --> 15:12.800]  at the schedule for FOSSTEM and you can watch the videos with Sejourner OX. And also Ed,
[15:12.800 --> 15:19.160]  who's going to be giving a really cool talk on OSM and Wikidata. And finally, I'd like
[15:19.160 --> 15:24.360]  to thank all the council coders everywhere. Thanks for having me.