[00:00.000 --> 00:10.440] So, hello. I'm Edward, and I'm going to be talking about some tools that I've been building [00:10.440 --> 00:15.720] for adding links between OpenStreetMap and Wikidata. I've been working on these for a [00:15.720 --> 00:22.960] few years. This is all a hobbyist project. I'm not being paid to work on this, but I [00:22.960 --> 00:28.000] thought I'd come here and share with you some of the work that I've been doing. So, [00:28.000 --> 00:33.400] I'm going to use as an example to talk about the software that I'm building, this building [00:33.400 --> 00:40.560] which is in Brussels, the Royal Palace of Brussels. It's in the city centre. So, you [00:40.560 --> 00:47.320] can see here, this is it in two different systems. You've got OpenStreetMap and you've [00:47.320 --> 00:56.720] got Wikidata, both showing the same building. So, I'll describe OpenStreetMap just for anyone [00:56.720 --> 01:03.800] who's not familiar with it. It's a collaborative map. I've been going since 2004, covers the [01:03.800 --> 01:10.240] whole world, and anyone can come in and edit the map. It's got revision history. You know, [01:10.240 --> 01:17.520] it works a lot like Wikipedia, but for maps. So, within OpenStreetMap, you've got three [01:17.520 --> 01:27.400] types of objects, nodes, ways and relations, increasing complexity. And each of those objects [01:27.400 --> 01:42.640] can have tags. Tags are pairs of keys and values. I've got some examples here for my [01:42.640 --> 01:48.000] example in Brussels. And the tags are not controlled by the software. You can put anything [01:48.000 --> 01:54.240] you want in, but it won't get rendered on the map unless it's one of the standard tags [01:54.240 --> 01:59.760] that gets used on OpenStreetMap. So, there's a community process for discussing, you know, [01:59.760 --> 02:05.280] how things should be tagged in OpenStreetMap, and then it gets documented on the OpenStreetMap [02:05.280 --> 02:13.000] wiki. So, everything in OpenStreetMap can be uniquely identified by the type and the [02:13.000 --> 02:19.240] ID. Like the ID on its own isn't enough. There's nodes and ways that have got the same [02:19.240 --> 02:25.400] ID. You have to have the type as well. So, in this example, the Royal Palace, you can [02:25.400 --> 02:31.120] see it's a relation. It's a complex polygon. You can see there's holes in the middle of [02:31.120 --> 02:35.640] the building, so you can't represent it as a way. And you can see there it's got an [02:35.640 --> 02:44.440] ID as well. So, what about the other system I'm talking about? Wiki data. So, wiki data [02:44.440 --> 02:55.040] is part of the Wikimedia Foundation, like the same people that run Wikipedia. And it's [02:55.040 --> 03:02.240] a wiki for structured data. It's newer than OpenStreetMap 2012. It launched, and it's [03:02.240 --> 03:08.080] big. Like, it's got 102 million items now. And for comparison, English Wikipedia has [03:08.080 --> 03:13.560] 6.6 million articles. Like, English Wikipedia is the biggest Wikipedia. And most of those [03:13.560 --> 03:19.920] articles have a wiki data item as well. But then there's a lot more data, a lot more items [03:19.920 --> 03:28.440] in wiki data than there are articles in English Wikipedia. So, if I take my example of the [03:28.440 --> 03:32.600] Royal Palace of Brussels, and you look it up on English Wikipedia, you can see there's [03:32.600 --> 03:38.880] a link in the sidebar that will take you to the wiki data item. And you click that link, [03:38.880 --> 03:46.000] you end up on the, this is the wiki data item for the Royal Palace. I'll talk you through [03:46.000 --> 03:52.120] some of the pieces on this page. So, you've got, down the side, the site links. These [03:52.120 --> 03:59.440] are links to Wikipedia articles in different languages. Like, part of the reason for the [03:59.440 --> 04:07.240] distance of wiki data is to store these, they call them inter-language links. They used [04:07.240 --> 04:13.320] to be stored in Wikipedia and had to be maintained across all the different languages. So, if [04:13.320 --> 04:20.320] there was a new article written in a new language, every existing article in one of the existing [04:20.320 --> 04:24.800] languages had to be updated with these links. So, much better to centralize them and store [04:24.800 --> 04:32.840] them in wiki data instead. All the bits and pieces you get on this page, you get a label, [04:32.840 --> 04:40.200] description and aliases. So, by default, when I look at wiki data, I just see them in English [04:40.200 --> 04:46.880] because that's the language I speak. But I can click the link to show me in more languages [04:46.880 --> 04:54.000] and you can see that there's names of the thing available in lots of languages and descriptions [04:54.000 --> 05:03.000] and so on. The other main part of this page you see is the list of statements. So, statements [05:03.000 --> 05:07.120] are a bit like tags in OpenStreetMap, but they're more controlled by the software. You [05:07.120 --> 05:14.760] can't just make up a property. You have to use ones that are already in the system. And [05:14.760 --> 05:20.200] again, there's a community process in wiki data for determining new properties. And the [05:20.200 --> 05:25.200] other big difference is that there's different data types. Like in OpenStreetMap, everything [05:25.200 --> 05:30.360] is a string, but wiki data has different data types of values. Here you can see there's [05:30.360 --> 05:38.360] an image and there's also a link to another item used as values in the statements. [05:38.360 --> 05:44.440] So the interesting in terms of maps is wiki data has got coordinate locations. There's [05:44.440 --> 05:50.400] almost 10 million items with coordinates. So those are the kinds of things that we're [05:50.400 --> 05:58.280] interested in and will probably be on OpenStreetMap as well. And there is a property for storing [05:58.280 --> 06:05.200] geo-shapes in wiki data, but it's quite new and it's not used so much. There's only 29,000 [06:05.200 --> 06:15.160] odd items with a geo-shape. So, you know, it's mostly about the coordinates. So the [06:15.160 --> 06:19.960] thing that I'm interested in is adding links between the systems. So if we have another [06:19.960 --> 06:28.680] look at OpenStreetMap, I've got highlighted here one of the tags for the palace and it's [06:28.680 --> 06:36.440] the wiki data tag and it's got a wiki data QID. This is the unique identifier for the [06:36.440 --> 06:45.240] wiki data item. So now the two systems are linked. Like if you visit this object on OpenStreetMap, [06:45.240 --> 06:53.040] then the user interface has a hyperlink that will take you to the same thing on wiki data. [06:53.040 --> 07:00.040] So why do I want to add links between wiki data and OpenStreetMap? Well, it makes the [07:00.040 --> 07:08.680] data in OpenStreetMap a lot more useful. Like wiki data tends to have labels in more languages. [07:08.680 --> 07:12.520] Like if you want the name of a thing in a different language, you can get it from wiki [07:12.520 --> 07:22.040] data. You can link to the wiki preview articles. You get images from commons and identifiers [07:22.040 --> 07:29.640] from other catalogs, data catalogs. So there's wiki media commons is the wiki media location [07:29.640 --> 07:36.960] for storing photos of things. So we get loads of photos of our building and we also get [07:36.960 --> 07:43.560] lots of identifiers in wiki data. So you can think of wiki data as a bit like the Rosetta [07:43.560 --> 07:52.040] Stone of linking different data catalogs. It makes sense to store all this information [07:52.040 --> 07:59.960] in one place. So why not use wiki data as that place for storing this kind of info? [07:59.960 --> 08:06.040] So this is a good thing. We want to add links. The other thing that you get is wiki data [08:06.040 --> 08:12.520] gets access to the shapes of things, the polygon outline of the building, which otherwise [08:12.520 --> 08:20.080] it wouldn't have without a link. So adding these links by hand is kind of laborious and [08:20.080 --> 08:27.280] time consuming. So better to write some software to do it instead. So the software I've written, [08:27.280 --> 08:38.520] I'm calling it awl places and the web address is osm.wiki.data.link. So this is what the [08:38.520 --> 08:44.600] software looks like when you visit it. It asks you for a place name where you want to [08:44.600 --> 08:50.200] search for some matches. So you can put in the name of your town, somewhere you're familiar [08:50.200 --> 08:59.360] with and can check that the matches are valid. So I've done a search and I've found the place [08:59.360 --> 09:06.200] where the Royal Palace is located. And this is the page you see. You've got a map with [09:06.200 --> 09:13.440] some blue pins. And these blue pins represent wiki data items that the software has found [09:13.440 --> 09:20.640] something that matches OpenStreetMap. So if I scroll down this page, you can see some [09:20.640 --> 09:30.880] example matches. So I show you various bits of data that come from wiki data and wikipedia [09:30.880 --> 09:36.680] to help you try and identify if these matches are valid. Like sometimes the software doesn't [09:36.680 --> 09:42.600] get it right and will give you an invalid match. So it's important to look through this list [09:42.600 --> 09:50.240] and check that all the matches are correct. And to help you with that, I show you the [09:50.240 --> 09:58.120] first paragraph from the wikipedia article and I show you any images that come from wiki [09:58.120 --> 10:06.560] data. You've got the wiki data description there. The paragraphs I show you, I'll talk [10:06.560 --> 10:14.960] later about how it decides which languages to use for showing those. But it supports [10:14.960 --> 10:21.560] various languages. And then it shows you some of the details from OpenStreetMap just so [10:21.560 --> 10:28.400] you can compare and make sure that they match. So if I click on one of those, then it will [10:28.400 --> 10:36.320] zoom in on the map and it shows you the polygon outline of the thing. You can see the red [10:36.320 --> 10:41.280] pin there is the selected thing. So that looks like a pretty good match. It's probably the [10:41.280 --> 10:47.600] same thing. So we can go ahead and save that. So we're interested in saving these matches [10:47.600 --> 10:55.800] to OpenStreetMap. So the software has a button that lets us log in via OpenStreetMap by OAuth. [10:55.800 --> 11:01.600] Just put in username and password and log in. And then you come back to the confirmation [11:01.600 --> 11:08.000] page where you just see the same list again but kind of abbreviated. These are things [11:08.000 --> 11:13.520] that I've checked and I've said yes, these are valid matches and I want to save them. [11:13.520 --> 11:18.520] You can put a change comment. So everything gets saved together as one change set like [11:18.520 --> 11:26.320] it goes in as a single edit. And the change comment on the change set is generated automatically [11:26.320 --> 11:35.000] based on the location but you can change it if you want to. So I'll carry on with describing [11:35.000 --> 11:40.720] the software and I'll show you some more features that I've built. So I've added a type filter [11:40.720 --> 11:47.240] like at the top here you can see it's a type filter and there's a list of different types [11:47.240 --> 11:57.280] of things that it's found that are possible matches. So it's got statues and buildings. [11:57.280 --> 12:03.760] I can tick a sculpture to say I just want sculptures. And then when I scroll down it [12:03.760 --> 12:10.320] will just show me things that it thinks are sculptures. So I can focus on one particular [12:10.320 --> 12:17.280] type of thing. Sometimes when you put in the name of a town you might get 200 matches and [12:17.280 --> 12:24.280] it's a bit overwhelming to do them all in one go. So it's useful to do them bit by bit [12:24.280 --> 12:31.280] just specific types that you're interested in. And then when I go to the save page it [12:31.280 --> 12:36.480] generates a change comment that's based on the type filter that you've selected. So here [12:36.480 --> 12:46.200] it just says add wiki data tags to sculptures in this area. So I'm just going to talk about [12:46.200 --> 12:54.840] how it determines what is a match. So if we have a look at one of these examples this [12:54.840 --> 13:03.040] is a sculpture. So if we have a look at the same thing on wiki data you can see that there's [13:03.040 --> 13:08.280] a statement in wiki data which is the instance of statement. So this is saying that this [13:08.280 --> 13:16.560] thing is a sculpture. So we can click through and have a look at the sculpture page. This [13:16.560 --> 13:24.680] is the wiki data item for the concept of a sculpture. And then if we scroll down this [13:24.680 --> 13:34.720] page we get a wiki data property which is for OpenStreetMap tag or key. So there's actually [13:34.720 --> 13:41.560] two values here and the second value is uninteresting like it's shown in red because it's a deprecated [13:41.560 --> 13:46.720] value like it's a kind of old value that used to be used in OpenStreetMap and it's being [13:46.720 --> 13:54.360] documented in wiki data. So the interesting one is the top one which is tag colon artwork [13:54.360 --> 14:04.280] underscore type of sculpture. So the information is stored in wiki data about what tags are [14:04.280 --> 14:10.800] used in OpenStreetMap to describe things. So using this information we can say that these [14:10.800 --> 14:18.640] two things are the same type of entity. So when it comes to matching things I'm looking [14:18.640 --> 14:24.880] for the coordinates to match like the two things and the two systems have to be close [14:24.880 --> 14:33.560] to each other. They're not necessarily a perfect match but within like 50 meters or something. [14:33.560 --> 14:38.360] And the entity type has to be the same like I just described. And then I'm also looking [14:38.360 --> 14:47.400] for a matching name or street address or identifier. So I pull names from all over the place in [14:47.400 --> 14:54.040] both systems like in wiki data there's a bunch of different fields or rather in OpenStreetMap [14:54.040 --> 14:59.960] there's a bunch of different fields where names can be stored and I look at all of those [14:59.960 --> 15:06.320] and then in wiki data there's different places to get the names. I look at the labels, wiki [15:06.320 --> 15:14.360] data, the aliases, the names of any wikipedia articles. I look at the file name of any images [15:14.360 --> 15:21.000] that are in wiki data just to get as much, you know, many possible names that I can use [15:21.000 --> 15:25.600] for matching. And then I normalize the names a lot so I lowercase them and I remove stop [15:25.600 --> 15:34.480] words and process them a lot to try and get as many name matches as I can. And then similarly [15:34.480 --> 15:40.840] with street addresses. So there's street addresses in OpenStreetMap and wiki data which I compare [15:40.840 --> 15:49.400] and the software also looks for street addresses in the first paragraph of wikipedia articles. [15:49.400 --> 15:56.520] And then in terms of matching identifiers there's lots of standardized OpenStreetMap [15:56.520 --> 16:04.280] tags for different identifiers and then there's also properties in wiki data for those same [16:04.280 --> 16:09.200] identifiers. So if, you know, I've got a railway station that's got the same station code in [16:09.200 --> 16:14.440] OpenStreetMap and wiki data I can be pretty sure that it's the same thing that I'm matching [16:14.440 --> 16:20.720] so I can be confident about that match. So one of the things I'm not using at the moment [16:20.720 --> 16:28.440] is the wikipedia tags which appear in OpenStreetMap. Like before wiki data came along there was [16:28.440 --> 16:36.640] lots of wikipedia tags added to OpenStreetMap and they're not completely consistent in their [16:36.640 --> 16:44.320] formatting for how they link to wikipedia and sometimes they're wrong. So, you know, I've [16:44.320 --> 16:50.640] left the work for now working on trying to match up using wikipedia tags for somebody [16:50.640 --> 16:57.160] else to have a look at. But I've been waiting for a few years now and no one has so I might [16:57.160 --> 17:05.600] have to have a go at this. So just in case anyone's interested in the technology behind [17:05.600 --> 17:15.240] this, the software is written in Python with Flask. I'm using Postgres as my database and [17:15.240 --> 17:21.200] then on the front end, you know, various bits of JavaScript. I'm not really a front end [17:21.200 --> 17:27.280] developer but, you know, I'm muddling my way through and it seems to be working quite [17:27.280 --> 17:35.960] well. I'm using a bunch of APIs to get this data. So in terms of searching for places [17:35.960 --> 17:42.320] to look for matches, I use the OpenStreetMap nominatum API and then to grab more data I [17:42.320 --> 17:48.840] use the overview pass API and then on the wiki data side, I do a lot of sparkle queries [17:48.840 --> 17:56.720] against the wiki data query service and I use the wiki data media wiki API to get the [17:56.720 --> 18:03.400] details of the wiki data items. So there's a bunch of things that don't work in my system [18:03.400 --> 18:10.880] at the moment. One of them is tunnels. Like I designed the software with the assumption [18:10.880 --> 18:16.120] that there would be a kind of one-to-one mapping between a thing in OpenStreetMap and [18:16.120 --> 18:22.640] a thing in wiki data and that doesn't work for tunnels because tunnels tend to get represented [18:22.640 --> 18:29.600] as two ways in OpenStreetMap where as in wiki data there'll be a single item. And so, you [18:29.600 --> 18:35.160] know, my assumption was wrong and I need to change my software to say that you can add [18:35.160 --> 18:49.240] the wiki data identifier to ways in OpenStreetMap but I haven't done that yet. Incidentally, [18:49.240 --> 18:53.600] we don't have the same problem with bridges. Like the way that bridges get represented [18:53.600 --> 18:58.280] in OpenStreetMap is they are often two ways but then there's a relation across the whole [18:58.280 --> 19:06.920] bridge that represents the bridge itself. And tunnels, there isn't a relation for representing [19:06.920 --> 19:12.160] the whole concept of the tunnel. So that's another possible approach. Maybe OpenStreetMap [19:12.160 --> 19:18.720] should change and start mapping the tunnels with a relation that contains the two ways, [19:18.720 --> 19:23.800] you know, for storing wiki data tags and any other information about the tunnel that [19:23.800 --> 19:33.720] is the same across both ways. So, another thing that I don't support are rivers because [19:33.720 --> 19:39.720] they are linear relations and my software that I'm using to import data from OpenStreetMap [19:39.720 --> 19:47.000] I'm using OSM to PGSQL and it can't handle linear relations. It just, you know, expects [19:47.000 --> 19:53.280] relations to be polygons. So at the moment rivers don't work in the system. And then [19:53.280 --> 19:59.040] similarly for tram stops. Tram stops are kind of complex objects in OpenStreetMap. You've [19:59.040 --> 20:04.000] got, you know, stop positions of where the tram stops on either side of the road which [20:04.000 --> 20:11.920] are no single points and they're collected together into a relation and that isn't supported [20:11.920 --> 20:22.120] properly by OSM to PGSQL. So I can't handle tram stops properly. I'm going to talk about [20:22.120 --> 20:30.560] a few more features that are in the software. So again, this is the center of Brussels and [20:30.560 --> 20:36.160] I've got the language selector. So the software has figured out all the languages that get [20:36.160 --> 20:47.360] used for the labels of things and the OpenStreetMap objects that are in this area. You know, unsurprisingly [20:47.360 --> 20:54.520] for Brussels the most popular languages are French and then Dutch and English is the third [20:54.520 --> 21:01.120] most popular. Interestingly we've got Latin at the bottom there. There's 22 items that [21:01.120 --> 21:09.440] have got labels in Latin in wiki data. But so by default this page is opened in French [21:09.440 --> 21:15.840] and you can see the type filter is appearing in French but I can't read French very well [21:15.840 --> 21:23.680] so if I want to change it to Dutch I can reorder these languages by drag and drop or I can [21:23.680 --> 21:30.040] click on move to top and you can see the type filter is now switched into being in Dutch [21:30.040 --> 21:36.440] or if I want it in English then I can move English to the top of the list and it will [21:36.440 --> 21:42.280] show me the type filter in English, English labels and descriptions. And if I scroll down [21:42.280 --> 21:48.480] the page you can see that this is the page appearing in French. You've got titles in [21:48.480 --> 21:55.640] French and the extracts from wikipedia in French or again I can change it into Dutch [21:55.640 --> 22:01.720] if I want or I can have it in English. And this works without reloading the page. You [22:01.720 --> 22:08.680] just change the order that you prefer the languages to appear in and it does it all on the client [22:08.680 --> 22:18.160] and switches it over. So some statistics for you. People are using this tool. Well first [22:18.160 --> 22:24.840] of all there's more and more wiki data tags appearing in OpenStreetMap so not all of them [22:24.840 --> 22:30.800] are coming from my software. You know there's other people figuring out how to add wiki data [22:30.800 --> 22:41.120] tags to OpenStreetMap. So here's some more stats. 26% of the wiki data tags in OpenStreetMap [22:41.120 --> 22:48.600] were added using this tool and we're up to 400 people and there's been 23,000 change [22:48.600 --> 22:57.480] sets and we're getting close to 700,000 wiki data tags added. So I'm going to talk about [22:57.480 --> 23:06.920] the licensing. Wiki data is CC0 or public domain. You can do anything you want with [23:06.920 --> 23:12.720] wiki data and OpenStreetMap uses the open database license which is a license that was [23:12.720 --> 23:21.240] pretty much written for OpenStreetMap. So you can't copy any data from OpenStreetMap [23:21.240 --> 23:29.120] into wiki data because you'd be re-licensing it CC0 which is not allowed. But even more [23:29.120 --> 23:36.120] than just the licenses being different the intellectual property jurisdictions are different. [23:36.120 --> 23:42.480] So OpenStreetMap asserts database rights. Like the argument is that it's a lot of effort [23:42.480 --> 23:47.720] to go around collecting all this information and putting it in OpenStreetMap and they want [23:47.720 --> 23:55.480] to protect that whereas wiki data is part of the wiki media foundation which uses US [23:55.480 --> 24:02.440] intellectual property rules and so under US law facts are not copyrighted, not protected [24:02.440 --> 24:11.440] rather in law. So the two things don't mesh that well but it's fine because I'm not copying [24:11.440 --> 24:17.240] any data between the systems. I'm just adding links between them. Like in some cases it [24:17.240 --> 24:24.880] might be nice if we could tidy up the data in one system based on the other but I'm [24:24.880 --> 24:30.400] not doing that and there's you've got to think carefully about the intellectual property [24:30.400 --> 24:38.640] rules before you try and do that. And so also just while we're talking about licenses my [24:38.640 --> 24:43.680] software is GPL and code is on GitHub it's all open source. Anyone can have a look at [24:43.680 --> 24:51.680] the software behind it. So an important aspect for being able to add these links between [24:51.680 --> 24:59.760] the systems is to have stable identifiers and for a long time OpenStreetMap has talked [24:59.760 --> 25:08.320] about the identifiers not being stable and sometimes say a railway station might get [25:08.320 --> 25:15.240] mapped as a single point and then later on somebody comes along and traces the outline [25:15.240 --> 25:23.240] of the building and so it changes from being a node into a way or a relation and the identifier [25:23.240 --> 25:34.920] will have changed. So they aren't stable identifiers for concepts in OpenStreetMap. So the thinking [25:34.920 --> 25:42.680] is that makes it difficult to link into OpenStreetMap because the identifiers might change and [25:42.680 --> 25:49.720] there's been discussions within the OpenStreetMap community of having a permanent ID and the [25:49.720 --> 25:55.840] discussions have been going on since 2017 and they haven't come to a conclusion. There's [25:55.840 --> 26:00.920] been an argument that maybe the right thing to use in terms of stable identifiers would [26:00.920 --> 26:07.480] be wiki data IDs, just say anything that's important enough to need a stable identifier [26:07.480 --> 26:14.080] is probably on wiki data and so you could use the wiki data ID as a permanent ID. But [26:14.080 --> 26:19.680] another way to look at it is in reality most of the world is mapped now on OpenStreetMap [26:19.680 --> 26:28.480] and the IDs aren't changing that much. Things tend to be mapped as polygons like outlines [26:28.480 --> 26:32.720] of buildings and people aren't coming along and making changes that are destructive in [26:32.720 --> 26:38.240] destroying the IDs. So maybe the IDs that are in OpenStreetMap already, the IDs that [26:38.240 --> 26:44.200] I talked about earlier, maybe they're stable enough and maybe it's okay to just link to [26:44.200 --> 26:51.640] those and not worry about them changing. Whereas we've got wiki data on the other hand and [26:51.640 --> 26:57.640] wiki data was designed always to have stable identifiers. That was a big part I think of [26:57.640 --> 27:07.440] the initial approach to wiki data. Wikipedia identifies things by article title and over [27:07.440 --> 27:14.000] time the article titles can change and then things get moved around and so they don't [27:14.000 --> 27:22.000] have long-term stable IDs and so the wiki data QIDs was an approach that gave you stable [27:22.000 --> 27:27.720] IDs. But it turns out that they're not completely stable. There's also redirects appearing in [27:27.720 --> 27:32.840] wiki data. Like with some of the work I've been doing, I find a lot of duplicates in [27:32.840 --> 27:38.280] wiki data. Things have been imported from different sources and say for example I found [27:38.280 --> 27:46.040] a lot of duplicate churches in wiki data. So when I go and I merge the churches, then [27:46.040 --> 27:55.400] the ID that represents one of those churches will change. So I've got on the slide here [27:55.400 --> 28:03.920] there's 10,000 OpenStreetMap objects that point to a redirect in wiki data and somebody [28:03.920 --> 28:11.800] needs to go through and resolve those redirects and fix OpenStreetMap. I will probably do [28:11.800 --> 28:20.480] that at some point if no one else does. So a recent change to wiki data is that there's [28:20.480 --> 28:28.920] a new property called OpenStreetMapElement and that is for storing OpenStreetMap IDs. [28:28.920 --> 28:37.360] So now it is possible to add the links in both directions. We can have links from wiki [28:37.360 --> 28:44.280] data to OpenStreetMap which we never used to be able to have. So I need to change my [28:44.280 --> 28:51.200] software to start adding these links in. When you save things at the moment it just uploads [28:51.200 --> 28:57.520] into OpenStreetMap it should be uploading them to wiki data as well. But to do that [28:57.520 --> 29:04.920] I need to make the user login to both systems which is possible but it will break the flow [29:04.920 --> 29:32.920] of it. So I am going to try and do a demo. Let's see. So this is the software I'm describing [29:32.920 --> 29:40.360] and I can say I want it in English. And you can see the type filter there and if I scroll [29:40.360 --> 29:46.000] down it shows matches that weren't very good at the start. So it's got some difficulty [29:46.000 --> 29:51.720] with this match and it can't handle it so we scroll past those. And here's the first [29:51.720 --> 30:04.400] match that the system can handle and if I click on it then it shows you the match. I [30:04.400 --> 30:16.360] can click toggle OSM tags. This is showing all of the tags from OpenStreetMap. The green [30:16.360 --> 30:25.440] ones are ones where it's found a match that's using those to figure out what the match is. [30:25.440 --> 30:31.320] I'll show you some more. Here's another one. You can see it appearing on the map. If I [30:31.320 --> 30:41.640] think this is not a correct match I can click here and it's deselected it. So I've got a [30:41.640 --> 30:47.160] whole pile of matches here. I've checked these ahead of time. They're all good. So I scroll [30:47.160 --> 30:55.200] to the bottom and I can say add tags to OpenStreetMap and this is the confirmation page that I was [30:55.200 --> 31:09.320] talking about. So I can hit save and the software goes through and it's saving my matches. So [31:09.320 --> 31:18.320] it has done it and I can say view my change set and you get to see my change set on OpenStreetMap. [31:18.320 --> 31:27.400] I can scroll down and you can see these are all the things I've edited. So nice and quick [31:27.400 --> 31:39.960] to go through and edit OpenStreetMap. I've just got another example. Another bit of Brussels. [31:39.960 --> 31:51.000] I can change to English. Say I want squares and then if I scroll down it will just show [31:51.000 --> 31:59.800] me some matches that haven't worked. So scroll past those. Here's some squares that the software [31:59.800 --> 32:09.080] has managed to match up. And these all look like good matches. I've checked these before [32:09.080 --> 32:18.560] so I can scroll to the bottom. There's another one and I can say add to save to OpenStreetMap [32:18.560 --> 32:28.760] and it's in the change comment it's put the word squares. So I can hit save and that is [32:28.760 --> 32:40.560] working to edit OpenStreetMap. I'll go back to the presentation. So that was my existing [32:40.560 --> 32:44.120] software. That's been running for a few years. People have been using that and I've been [32:44.120 --> 32:50.720] working on a new version of the software that I'm calling OwlMap. This is what OwlMap looks [32:50.720 --> 32:58.880] like. So when you open this you go straight to a map. It tries to guess where you are, [32:58.880 --> 33:06.360] locate you based on your IP address and then it shows you this interface much more map-based [33:06.360 --> 33:14.880] rather than like a list of things. You see the red pins are where there isn't a match [33:14.880 --> 33:21.840] already. Green pins are where there is a match and the yellow pins are OpenStreetMap things. [33:21.840 --> 33:28.000] So you can see some of them have a line between the green pin and the yellow pin. That's showing [33:28.000 --> 33:36.080] you which, you know, the green pin is a Wikipedia item that matches a thing on OpenStreetMap [33:36.080 --> 33:41.040] which is the yellow pin and there's a line between them. And you've got a filter at the [33:41.040 --> 33:48.560] side where you can filter on different item types. This is an example where I've selected [33:48.560 --> 33:54.040] one of the pins. I've clicked on a pin and it changes the color slightly and it shows [33:54.040 --> 34:00.000] you some details. You get to see the photo and bits and pieces from Wikidata. And then [34:00.000 --> 34:05.000] underneath it shows you a list of possible matches. It just says, you know, this is a [34:05.000 --> 34:12.120] building. Here's some other buildings nearby. And I can see the street addresses on here [34:12.120 --> 34:18.040] and, you know, the nearest building. The street address matches. But in actual fact, there's [34:18.040 --> 34:24.200] two street addresses on there. And if I scroll down this list, I can see that there's two [34:24.200 --> 34:29.920] buildings next to each other that both match this warehouse. So for some reason Wikidata [34:29.920 --> 34:35.760] is representing it as a single item whereas OpenStreetMap has got two separate objects. [34:35.760 --> 34:40.960] But this version of the software supports it. So I tick the boxes next to them and then [34:40.960 --> 34:48.840] I can hit save and it'll add the Wikidata tag to them. So this bit of software I'm still [34:48.840 --> 34:55.640] working on. It's live but it keeps breaking so I'm not really advertising for people to [34:55.640 --> 35:06.800] use it. I need to do some more work on it. And in fact, I think I need some help. You [35:06.800 --> 35:12.120] know, I'm just a hobbyist and I'm running out of time to work on this stuff. So I don't [35:12.120 --> 35:18.800] know if anyone knows how I can get some help with this, whether, you know, there's someone [35:18.800 --> 35:26.720] out there who wants to pay for this work or whether I can find volunteers to help me. [35:26.720 --> 35:36.720] I don't know. It's all a bit tricky like trying to work out managing people to work on this. [35:36.720 --> 35:53.360] So yeah, that's the software built. And I guess, has anyone got any questions? [35:53.360 --> 36:21.120] If you have a question, please raise your hand so I can see you all there. I'm coming. [36:21.120 --> 36:26.960] Thank you, Edward, for that. Hi, I'm Siebrandt. I'm a volunteer at Wikimedia. Wikimedia has [36:26.960 --> 36:34.560] a service called Wikimedia Cloud Services where you can get free compute resources. [36:34.560 --> 36:40.520] Oh, where you can get free compute resources. I would highly recommend that you look into [36:40.520 --> 36:46.920] that. So like the machine I'm running some of this stuff on is 60 gigabytes of RAM and [36:46.920 --> 36:54.000] two terabytes of disk. Would I be able to get that much from Cloud Services? [36:54.000 --> 37:01.560] I would highly recommend that you talk to someone there as you may be having a project [37:01.560 --> 37:08.000] that's quite valuable to the Wikimedia movement. I'm sure that someone will try to help you. [37:08.000 --> 37:30.120] Thank you for your contributions and for the talk. Have you considered interfacing or linking [37:30.120 --> 37:49.040] with OSMOS? It's a quality assurance project. It's a quality assurance project. It's a [37:49.040 --> 37:59.440] model where you see alerts on the mob, dangling ways, et cetera. I think it's somewhat extended [37:59.440 --> 38:08.440] and it has an existing user base. Maybe you could benefit from that. I haven't looked at [38:08.440 --> 38:21.960] this. I will write you later. Thank you. Hello. I have two remarks. First of all, I'm the [38:21.960 --> 38:29.640] maker of MapComplete which also has an entomology team to link Wikidata to Straits so we can [38:29.640 --> 38:37.320] work together on that. And then second, a small remark on the adding an ID of OpenStreetMap [38:37.320 --> 38:43.760] to Wikidata. That's a bit of a flow approach because IDs aren't very stable in OpenStreetMap. [38:43.760 --> 38:50.760] Say that a new park is opened, I place a point where the park is and then a few days later [38:50.760 --> 38:56.400] someone else passes by and says, oh, we have aerial imagery now, throws the outline as a [38:56.400 --> 39:04.240] polygon and then removes the alt point. That means that the link would be broken in Wikidata. [39:04.240 --> 39:08.800] I mean, I guess we just have to deal with that. We can have software that looks for [39:08.800 --> 39:14.720] these broken links. Maybe it would be nice if OpenStreetMap could add redirects like [39:14.720 --> 39:21.840] Wikidata has. Yeah, except that it's way more difficult than that because, for example, [39:21.840 --> 39:26.520] sometimes you have a big street and then you have properties which are different for parts [39:26.520 --> 39:30.720] of the street and then the street gets split into three parts. So then suddenly you'd have [39:30.720 --> 39:40.720] to redirect to three different parts. Do you think that it's a mistake to add OpenStreetMap [39:40.720 --> 39:51.000] IDs to Wikidata then? Yes, basically. It doesn't make sense at first glance but technically [39:51.000 --> 39:57.840] it will break down over time. So it's better to add a link to OpenStreetMap to Wikidata [39:57.840 --> 40:06.240] and then look it up reversely because the editing tools will keep track of the Wikidata [40:06.240 --> 40:10.920] link. So if the roads get split into multiple pieces, every single piece of the road will [40:10.920 --> 40:17.320] get a backlink to the Wikidata item. Yeah, you might have a good point. But let's have [40:17.320 --> 40:37.560] a discussion after the questions. Hi Ed, thanks for sharing the new software. It looks great. [40:37.560 --> 40:44.560] So I was fascinated by the example where you showed a modern one potential match and I [40:44.560 --> 40:51.200] just wondered does your software have a role to play in improving the quality of the data [40:51.200 --> 40:56.760] by cross-referencing between the two sides? I think it can improve the quality. Like I [40:56.760 --> 41:04.120] say, when I run this I find duplicates in Wikidata that are difficult to identify from [41:04.120 --> 41:11.520] just Wikidata itself. I feel like the coordinates that are in Wikidata don't get much use. [41:11.520 --> 41:17.640] Like for a long time you didn't even see the map appearing, the Wikidata pages, and then [41:17.640 --> 41:25.080] a lot of the coordinates were wrong. People transpose digits. Since the map is visible, [41:25.080 --> 41:34.520] people are more likely to check their data. The fact that the two systems exist, you can [41:34.520 --> 41:45.360] cross-reference them and find errors. Yes. I'm wondering how relevant it is now based [41:45.360 --> 41:51.640] upon the question just a moment ago. But I was wondering can you search Wikidata for [41:51.640 --> 42:00.400] a lot long window and find all objects within it when you're adding data to OpenStreetMap? [42:00.400 --> 42:07.280] So underneath I'm doing Sparkle queries to Wikidata, and Wikidata Sparkle queries [42:07.280 --> 42:15.640] do support coordinate bounding boxes. I can say you can write your own query in Sparkle [42:15.640 --> 42:22.840] that will give you all the churches within a given bounding box. I demoed two separate [42:22.840 --> 42:28.120] systems that should really be combined into one, and the old system doesn't support bounding [42:28.120 --> 42:34.640] boxes. It's all based on place polygons. You have to say, show me things that are in Brussels. [42:34.640 --> 42:41.560] You can't say, show me things within this rectangle. And the new system is more bounding [42:41.560 --> 42:45.640] box based in that you see the map and it just shows you all the matches that are in the [42:45.640 --> 42:51.560] rectangle that's visible on the screen. I'm not sure if that answers your question. [42:51.560 --> 43:02.400] It doesn't think. It's very valuable what you've done. Thanks. [43:02.400 --> 43:13.720] Any other questions? Raise your hand. Hi. Thank you for your talk. I had a question [43:13.720 --> 43:20.640] about the OpenStreetMap tags that are in Wikidata. I think you showed this in one of [43:20.640 --> 43:28.280] your slides. How often are these tags uploaded from OpenStreetMap, and does it pose any problem [43:28.280 --> 43:32.160] with the license compatibility issues that you talked about? [43:32.160 --> 43:40.000] I think you mean the property for OpenStreetMap tag or key. Things like I showed the palace [43:40.000 --> 43:45.920] type. Is that right? Is that the one you're thinking of? There's a few properties in Wikidata. [43:45.920 --> 43:51.480] Yes, the OSM tag, like the structure one. I don't think there's any problem in terms [43:51.480 --> 44:00.200] of the intellectual property. It's kept pretty up to date. People invent a new tag to use [44:00.200 --> 44:06.040] on OpenStreetMap, and then they go and find the matching Wikidata item and add the tag [44:06.040 --> 44:14.160] to it. And some unofficial tags that are used on OpenStreetMap, the information is in Wikidata. [44:14.160 --> 44:19.920] So it's pretty current, I think. [44:19.920 --> 44:27.600] So similar question from my side. Nice presentation. You explained the licenses. Nicely when you [44:27.600 --> 44:33.920] said that you cannot copy data from the OpenStreetMap to Wikidata, but what about the other way [44:33.920 --> 44:35.720] around? [44:35.720 --> 44:41.360] So that's an interesting question. And the OpenStreetMap community is a bit suspicious [44:41.360 --> 44:45.800] of the information that's in Wikidata. Like, there's a feeling, you know, where did the [44:45.800 --> 44:52.680] coordinates come from? Were they just copied from Google Maps? Like, do people look up [44:52.680 --> 44:58.160] a thing on Google Maps, find the coordinates, put the coordinates into Wikidata? And then [44:58.160 --> 45:04.880] does that make Wikidata a derived work of Google Maps? And so, you know, it's probably [45:04.880 --> 45:13.840] fine to copy any data from Wikidata into OpenStreetMap. You know, if you want to copy a name in [45:13.840 --> 45:19.600] a different language, you know, that's probably fine. But my software doesn't do that. I just [45:19.600 --> 45:23.720] add the links. And, you know, once the links are there, it's easier for somebody else [45:23.720 --> 45:31.320] to come along and find these things and copy the data over if they want. [45:31.320 --> 45:37.960] So my question is, does the software do the requests, the API requests on the back end [45:37.960 --> 45:45.360] on your hosted service, or is it the client, the user that will do the browser will do [45:45.360 --> 45:46.840] the API requests? [45:46.840 --> 45:52.840] I showed two versions. The old, you know, the more established version is using the [45:52.840 --> 45:58.760] Nominatum API to find things. And then it's using the Overpass API to grab lots of map [45:58.760 --> 46:05.080] data. And then they use the OpenStreetMap API to push the changes you make to upload [46:05.080 --> 46:12.440] the Wikidata tags back into OpenStreetMap. And the new system I built maintains a full [46:12.440 --> 46:22.280] mirror of the OpenStreetMap data just to make things faster. So I'm not using APIs for downloading [46:22.280 --> 46:27.040] data with that one. I just use the API for saving the changes. Does that answer your [46:27.040 --> 46:28.040] question? [46:28.040 --> 46:36.000] Yeah, partly. But does the request to fetch data from the Wikidata, does that go from [46:36.000 --> 46:39.800] your servers? Do your servers fetch data? [46:39.800 --> 46:45.880] It is all going from my server, yeah. It's not from the client browser. It's going. Like, [46:45.880 --> 46:51.600] I do a lot of pre-processing before I show you the list of paid matches, and then I [46:51.600 --> 46:57.000] store them all in the database. So when you load the list of matches for a place, it's [46:57.000 --> 47:02.880] not doing any queries either on the server or the client with the APIs. It's all stored [47:02.880 --> 47:08.280] in the database. I mean, that's a problem. The matches get stale. There's a refresh [47:08.280 --> 47:13.480] button that you can hit, and it will go off and rerun the matcher and get fresh data from [47:13.480 --> 47:15.000] OpenStreetMap and Wikidata. [47:15.000 --> 47:17.000] Yeah, okay, thanks. [47:17.000 --> 47:24.000] There was a question here? No? Okay, so I'll be back on the other side. [47:24.000 --> 47:43.000] Hi, I'm Valerio from Milano, and thank you so much for this tool. Again, thank you for [47:43.000 --> 47:48.480] the person who mentioned the possibility to host this tool on the Wikimedia Foundation [47:48.480 --> 47:53.760] infrastructure, because it would be really, really nice to propose this on the Wikimedia [47:53.760 --> 48:01.760] Fabricator, and I would be interested in discovering how the discussion will go. Second thing, [48:01.760 --> 48:08.280] you asked how to found your development. I think you can just contact your local Wikimedia [48:08.280 --> 48:15.280] chapter that maybe they provide microgrants or something like that. In my local community, [48:15.280 --> 48:21.960] some volunteers often in one week can obtain microgrants to develop small tools or to boost [48:21.960 --> 48:30.240] some activities. Maybe this can be interesting if they are useful for the university to produce [48:30.240 --> 48:37.320] OpenStreet software and Libre content. One feedback for the user interface, it's not [48:37.320 --> 48:45.600] clear to me how to contribute on just one element. If I have one minute, if I want to [48:45.600 --> 48:53.040] visit the tool and connect just one item, because I'm 100% sure about that item, so [48:53.040 --> 49:00.520] I just want to save on that contribution and be kidnapped, I don't know. So this maybe [49:00.520 --> 49:06.200] can be useful if it's not already possible. The two approaches for that, if you click [49:06.200 --> 49:11.080] on the title of an item, it takes you to a page where you can just edit a single item. [49:11.080 --> 49:17.200] Okay, wonderful. At the top of the page there's an uncheck all tick box, and then you can [49:17.200 --> 49:23.080] just tick the box next to one thing and scroll to the bottom and hit save. Both of those [49:23.080 --> 49:28.400] will work for adding a single Wikidata tag. Okay, thank you. And thanks for your comment [49:28.400 --> 49:34.200] about contacting my local Wikimedia chapter, that's a good idea. Last thing, can you repeat [49:34.200 --> 49:41.200] sorry, why do you need two terabytes of data to have this working? Thank you so much. The [49:41.200 --> 49:48.080] open stream app database is big. The Earth is big and I keep a whole copy of it to make [49:48.080 --> 49:56.320] things fast. And so it's probably 1.6 terabytes to store all of the open stream app data. [49:56.320 --> 50:03.320] I think that's time up. So thank you. Thank you.