[00:00.000 --> 00:11.080] Hello, everyone. It's good to be back. It's been a while. This is my first time giving [00:11.080 --> 00:18.960] a talk here. I'm really pleased to be here. My name's Joe. I am a coder. I work in London [00:18.960 --> 00:26.720] for local government. I work a lot with geospatial data, and I am a Python programmer. Have we [00:26.720 --> 00:38.360] got any Python coders in today? Anyone using Jupyter? Cool. Right. So let's go. So in lockdown [00:38.360 --> 00:46.720] in 21, we had a census in England and Wales, and the data is coming now. Most of the data, [00:46.720 --> 00:52.960] all of the data, sorry, is spatial data. So we want to look at this on a map. Why? Most [00:52.960 --> 00:58.960] of the data is geospatial. In local government, everything that we do generally happens somewhere, [00:58.960 --> 01:04.520] whether it's collecting a bin, looking after young people, looking after old people, cleaning [01:04.520 --> 01:11.480] the streets. We always have to think about where this is happening. Apparently, 60% of [01:11.480 --> 01:18.920] all data is geospatial data. So I spent a lot of my time making maps in terms of data [01:18.920 --> 01:26.240] of this. Now, I'm going to be focusing on one part of the census data set today, and [01:26.240 --> 01:33.280] that's the east end of London in an area called Tower Hamlets. This may be familiar to some [01:33.280 --> 01:43.320] people if you've ever seen places like Columbia Road, Bethnal Green, Canary Wharf. These are [01:43.320 --> 01:47.880] all parts of the east end of London, and this is the main area I'm going to be talking [01:47.880 --> 01:57.600] about. So where is Tower Hamlets in London? So what you can see here is a very small area. [01:57.600 --> 02:04.040] It's 20 square kilometers, but this is quite a special area because in the whole of England [02:04.040 --> 02:10.040] and Wales, it has the highest population density. It has the most people packed into a small [02:10.040 --> 02:16.880] area. It also has the fastest growing population, so it's becoming more and more dense. So [02:16.880 --> 02:23.360] in terms of providing services for residents, we need to have a big think about where all [02:23.360 --> 02:31.120] the people are and how they fit in. Now, when we make maps, the first thing we usually do [02:31.120 --> 02:40.240] is we make a coropleth map. However, the data set for population density in our area, and [02:40.240 --> 02:47.440] I do apologize, I couldn't fit it all on screen. It doesn't appear very well as a coropleth. [02:47.440 --> 02:53.360] The reason is because the data set is not very evenly distributed. There is, as we will [02:53.360 --> 03:02.120] see, some areas with extremely high population density. So over here you've got Whitechapel. [03:02.120 --> 03:07.840] We have very high population density in Whitechapel. Over here we have a new development which used [03:07.840 --> 03:13.440] to be industrial land. Again, very, very high density developments, big, big towers full [03:13.440 --> 03:19.680] of people. And then we also have, just to the south of the financial sector, some areas [03:19.680 --> 03:25.280] of very high population density with a lot of people packed into a small place. But in [03:25.280 --> 03:30.720] terms of the data viz, this map doesn't really help very much. So the coropleth data viz [03:30.720 --> 03:36.080] didn't work for us. So we began to think, what else can we try? And we checked the data [03:36.080 --> 03:41.520] distribution, and sure enough, we've got some serious outliers. This is why the coropleth [03:41.520 --> 03:48.240] map didn't work very well for us. So what did we do next? We tried to log-transform [03:48.240 --> 03:55.440] the data. And yeah, you can see, you know, this area here. You can begin to see the density [03:55.440 --> 04:01.320] there. There's quite a few large developments with a lot of people squeezed in. Whitechapel, [04:01.320 --> 04:06.200] you don't see so much happening there. But you do see, just to the south of the financial [04:06.200 --> 04:13.280] sector, high density of population. The areas with low density, this is where all the banks [04:13.280 --> 04:19.560] are. So obviously, there's no people living in there. This is an old dock near to the [04:19.560 --> 04:23.880] Tower of London. There's no people living there. There's some very nice pubs, though. [04:23.880 --> 04:29.680] If you ever find yourself in that area, the Dickens Inn is excellent. I can recommend [04:29.680 --> 04:36.360] that to everybody. And then up here in the north, we have Victoria Park, which is where [04:36.360 --> 04:43.200] the East End borders with Hackney. And obviously, there's no people there, at least having their [04:43.200 --> 04:51.080] address registered there. Log-transform data looks better on a coropleth map. However, [04:51.080 --> 04:57.760] you can see the legend. You lose the data. So you can try to fix the legend. But we want [04:57.760 --> 05:02.240] to write as little code as we possibly can. We don't want to keep fixing legends and things [05:02.240 --> 05:08.800] like that. So we began to think about other ways to visualize our data set. So what did [05:08.800 --> 05:17.440] we do? I am a Python coder, but there's a really nice package in R called Cartogram. [05:17.440 --> 05:27.240] And this is a technique called a density equalization algorithm that basically turns your data set [05:27.240 --> 05:35.240] into a Voronoi first, and then it rescales the polygons from the Voronoi relative to [05:35.240 --> 05:43.480] an attribute of the data. This technique is quite popular. There's a wonderful geographer [05:43.480 --> 05:50.480] called Danny Dooling, who has an amazing website called World Mapper, which I strongly recommend [05:50.480 --> 05:57.880] you have a look at. And they do things like showing poverty, inequality, food pressure [05:57.880 --> 06:04.400] all around the world. And they size the geographies relative to the attributes of the geospatial [06:04.400 --> 06:10.720] data. So this is a great technique. There is one issue here, though, is that if you [06:10.720 --> 06:16.040] want to overlay different layers, then it becomes difficult. And also, the map does [06:16.040 --> 06:22.120] look a little bit unfamiliar as well. But it does show particularly where you have like [06:22.120 --> 06:28.480] clustering, where you have a number of census areas, and I'm going to say a little bit more [06:28.480 --> 06:35.320] about census areas, where you have a few together that have high data attribute value, then [06:35.320 --> 06:41.040] they all get bigger together. So what we can see here is just to the south of the financial [06:41.040 --> 06:46.280] sector, you can see there's a lot of worker bees all crammed into this place, and then [06:46.280 --> 06:51.040] it increases the volume on the map. So it's a nice data vis, but still we have a small [06:51.040 --> 06:57.160] challenge if we want to add more data over the top. And also, it's a bit unfamiliar for [06:57.160 --> 07:07.160] people that don't use cartograms. So this is a map made using Data Rapper. It's a very [07:07.160 --> 07:14.320] nice website, and they have something called a symbol plot. And what this does is it just [07:14.320 --> 07:21.800] basically shows little mountains, little peaks, that show the value of the data attribute [07:21.800 --> 07:28.120] that you're interested in at the place where that data is happening. And so again, we can [07:28.120 --> 07:35.240] see over here, you've got Whitechapel, lots of people packed in there. Just to the south [07:35.240 --> 07:41.560] of the financial sector, lots of people packed in there. The new developments here by the [07:41.560 --> 07:47.400] river in Blackwell, and here by the river in the old industrial zone. So this is quite [07:47.400 --> 07:54.000] interesting. It gives us some context, and it gives us the data. I really like this data [07:54.000 --> 08:01.880] vis, but it's Data Rapper, so it's not FOS, and it's not Python, and I like to use Python. [08:01.880 --> 08:10.200] So it was great, but it helped, but it didn't do everything that we needed it to do. The [08:10.200 --> 08:15.800] other thing that you will notice, and I'll try to explain this briefly, is that we have [08:15.800 --> 08:22.360] one really high value here. And there's a reason for this. It's an outlier, because [08:22.360 --> 08:31.400] actually it's this value here. It's an outlier, because, and the reason why it's an outlier [08:31.400 --> 08:37.400] is because the actual census area is really, really small. And the thing about the people [08:37.400 --> 08:44.520] who produce the census data is that they have to create census areas using roughly 100 to [08:44.520 --> 08:49.200] 600 people. Generally speaking, it's about 300 people, but they have to make it all fit [08:49.200 --> 08:54.600] together like a big jigsaw puzzle. So sometimes, you know, it's hard for them to make it work [08:54.600 --> 09:01.040] really well. So in this case, this census area with really high density is actually [09:01.040 --> 09:08.200] just one building. And so it's not a particularly big building, but everyone squeezed in there. [09:08.200 --> 09:14.560] So yes, so the data is quite hard to work with, but it is interesting. So when I was [09:14.560 --> 09:19.960] working with Data Rapper, I really liked it, and it did remind me of when I was young and [09:19.960 --> 09:24.640] I was reading Lord of the Rings books, I used to really like the map at the front of all [09:24.640 --> 09:35.240] these mountains, showing the misty mountains in those books. And so I was thinking, I could [09:35.240 --> 09:42.480] probably make a mountain with Python. How hard can it be? It turns out it's really easy. [09:42.480 --> 09:51.000] This is the essence of the library. It's just one function. You take a point on a map, you [09:51.000 --> 09:57.800] turn that point into a line. The line has a start point, which is just a couple of points [09:57.800 --> 10:04.000] of longitude, a tiny little bit of longitude to the west of your point. Then you convert [10:04.000 --> 10:10.560] your point to a latitude, which is kind of like a proxy for the height of the mountain, [10:10.560 --> 10:19.400] using some kind of algorithm that you choose. In my case, I'm just like using a range. So [10:19.400 --> 10:25.840] I take the minimum and maximum value of the input range, which is a separate function [10:25.840 --> 10:33.320] here. And range one is essentially the minimum population density and the maximum. And then [10:33.320 --> 10:40.960] I convert that to latitude values. And then the third point on the line is just a little [10:40.960 --> 10:48.040] bit of longitude to the east of my point. And then you use that to create a small triangle, [10:48.040 --> 10:51.960] really easy, really easy, and a lot of fun as well. [10:51.960 --> 10:59.840] So this is what I made with Python. And it's very similar to the data wrapper map, but [10:59.840 --> 11:05.040] I was going for like a kind of hand drawn kind of a look to make it look like something [11:05.040 --> 11:10.560] from Lord of the Rings. And, you know, it's the same thing. You've got Whitechapel here. [11:10.560 --> 11:18.360] You've got the financial sector here, and so on and so on. So that was fun. But, you know, [11:18.360 --> 11:23.800] population density, we were just talking about the reasons why it's a messy data set. There's [11:23.800 --> 11:31.280] one place in Chelsea, which has a population density of two million people per square kilometer. [11:31.280 --> 11:38.320] So this is a very difficult data set to represent using any tools available. So, you know, it's [11:38.320 --> 11:43.720] interesting. The other thing about Kensington and Chelsea is this is where Grenfell Tower [11:43.720 --> 11:51.600] is, if anybody knows about that story. This is where it happened. [11:51.600 --> 11:56.560] So let's try some other data sets to see if they're really messy. This is people that [11:56.560 --> 12:03.240] live in one bedroom homes. So this is tiny little flats, you know, filled with people. [12:03.240 --> 12:07.200] And so you can see all the worker bees for the financial sector. A lot of those are living [12:07.200 --> 12:13.560] in one bedroom flats. And actually, the new builds. This is a very new development here. [12:13.560 --> 12:17.440] And this is a very new development here. So it looks like people who are building homes [12:17.440 --> 12:23.600] now are building a lot of one bedroom homes. Two bedroom homes. Generally, everything is [12:23.600 --> 12:30.680] kind of the same. Nothing really jumps out here. Three bedroom homes. What you can start [12:30.680 --> 12:36.520] to see with three bedroom homes is that, yeah, it's generally even. But actually, in this [12:36.520 --> 12:44.160] area here, which is a bow, which is near the bow bow's church, which is used to decide [12:44.160 --> 12:48.800] if someone's a traditional East End cockney or not. That's kind of this area, really. [12:48.800 --> 12:54.880] So the cockneys seem to have three bedroom homes, generally. And then four or more. And [12:54.880 --> 13:00.400] what you see here is in the areas where the financial workers live, there's still quite [13:00.400 --> 13:05.320] a lot of four bedroom homes. But in some of these new build areas, there's very, very [13:05.320 --> 13:12.560] few relative to the rest of the area. So let's look at another slightly more famous area. [13:12.560 --> 13:20.080] This is Westminster in central London. And so you can see this is where Hyde Park is. [13:20.080 --> 13:26.480] There's no one living there. Again, this is the population density dataset. And then you've [13:26.480 --> 13:33.880] got an open street map based map just to help with orientation. And then in a future version [13:33.880 --> 13:40.640] of the module, I think I might do some more stuff with open street map. And then if you [13:40.640 --> 13:47.480] look at some of the outer London areas, and this is where I live, you can see like areas [13:47.480 --> 13:53.520] of urban density, but you can also see some very suburban areas where the population density [13:53.520 --> 13:58.240] is lower. This is like where most people are living in houses, basically. And you can also [13:58.240 --> 14:04.080] see green space. So we're nearly finished. I just want to give a massive shout to NB [14:04.080 --> 14:12.360] Dev. It's really good if you use Jupiter. Just check it out. Number one, if you're trying [14:12.360 --> 14:19.600] to do version control on Jupiter notebooks, it helps you with any clashes, any merge conflicts [14:19.600 --> 14:24.800] because it removes the metadata in the JSON that sometimes causes conflicts. If you have [14:24.800 --> 14:30.000] a team of people working on the same notebook, this is a real lifesaver. And also it just [14:30.000 --> 14:36.960] bakes in good practice. So it means that your code gets shared on GitHub really easily. [14:36.960 --> 14:42.600] It helps you or encourages you at least to write good documentation for your team and [14:42.600 --> 14:49.800] the community. It also encourages you to write good tests. And it enables you to publish [14:49.800 --> 14:59.680] modules. So big shout to them. I'd also like to thank Jarek, who has produced a wonderful [14:59.680 --> 15:06.240] PWA for FOSSTEM called Sejourner OX. Do check it out. It's a really good way of looking [15:06.240 --> 15:12.800] at the schedule for FOSSTEM and you can watch the videos with Sejourner OX. And also Ed, [15:12.800 --> 15:19.160] who's going to be giving a really cool talk on OSM and Wikidata. And finally, I'd like [15:19.160 --> 15:24.360] to thank all the council coders everywhere. Thanks for having me.