[00:00.000 --> 00:11.440] So, hello everyone. I'm glad to be here. I am Etienne Combe. I am a researcher at [00:11.440 --> 00:17.480] Gustave Eiffel University and I will talk about a site project that I have from [00:17.480 --> 00:23.900] the right time ago and try to tell the story of this site project with [00:23.900 --> 00:31.360] concern web mapping and statistical data and how quickly how the OpenStreetMap [00:31.360 --> 00:41.000] stack have changed the way we show and diffuse statistical data. Okay, so this is [00:41.000 --> 00:46.280] a background that I will skip and I will go back in time eight years ago. I was [00:46.280 --> 00:51.560] on a project and I was working on a massive graded statistical data set [00:51.560 --> 00:59.320] that are called Doné Caroyer in French and this data are derived from tax sheets [00:59.320 --> 01:08.160] and they are given at a very detailed spatial scale so pixels of 200 meters by [01:08.160 --> 01:17.240] 200 meters on all the French territories and you have several variables [01:17.240 --> 01:26.600] like population structures, number of people under 10 above 17 and so on and [01:26.600 --> 01:33.680] you have also information about the revenue of the people and this data set [01:33.680 --> 01:39.440] reached the statistical limit of statistical secrets so in some of the [01:39.440 --> 01:46.360] pixels are less than 11 households so the data are aggregated in order to be [01:46.360 --> 01:52.960] above this threshold which is forced by the law. Okay, so the more precise data [01:52.960 --> 02:01.800] set that we may have on the French population. Okay, and the distribution [02:01.800 --> 02:07.960] of this data set at this time was like this so the first step was to go on the [02:07.960 --> 02:15.400] INSEE website, find this page. It was already quite hard because it was in a [02:15.400 --> 02:22.160] very deep part of the website and you have to download two files in a very [02:22.160 --> 02:28.720] specific file format like DBF and MIFMID which were coming from a map [02:28.720 --> 02:34.600] and for software so a G software from the 80s and you have to deal with that. [02:34.600 --> 02:43.200] Okay, so not very easy. You already have a QGIS way to read that so [02:43.200 --> 02:47.680] there is already an open source software to read the data but the [02:47.680 --> 02:55.800] exploration was not very easy and and friendly. Okay, so this is the help page [02:55.800 --> 03:02.200] to read the data, load them in QGIS and so on but if you are not a G [03:02.200 --> 03:10.440] specialist it was quite difficult to deal with that. Okay, and I was working [03:10.440 --> 03:16.040] with my student on this data set and at the end of the thesis we had the [03:16.040 --> 03:22.320] idea to try to improve a little bit that and we seen an [03:22.320 --> 03:28.560] opportunity. So the context I have already given, EV file, tricky file [03:28.560 --> 03:35.160] format and also projection problem in the data set at this time, usable with a [03:35.160 --> 03:43.960] lot of pain and with dedicated software like map info, ArcGIS or QGIS and the [03:43.960 --> 03:49.320] opportunity that we have seen at this time was the web mapping stacks that was [03:49.320 --> 03:56.960] developed around the OSM project. So open layer was already there, leaflet [03:56.960 --> 04:06.000] also and mapbox tools were also on the rise with at this time this was a timing [04:06.000 --> 04:16.600] platform that allowed to build custom map web maps. Okay, and we had [04:16.600 --> 04:21.040] seconds that this will allow to renew the data diffusion visualization [04:21.040 --> 04:29.680] approaches. Okay, we were not alone, Oliver O'Brien in UK did a quite similar [04:29.680 --> 04:41.440] proposal in 2015 with DataShine, this is a screen of DataShine at this time [04:41.440 --> 04:52.240] which was built using open layer and he had to build one stack of tags for each [04:52.240 --> 04:59.360] features he wanted to be visible on the map. So in all of this interface you may [04:59.360 --> 05:08.520] choose between several features, so more than 50, less than 10, the revenue or so [05:08.520 --> 05:19.120] on and he had to build all these these tags, so a lot of work. And at the [05:19.120 --> 05:23.400] same time there was a technology that was coming which is called [05:23.400 --> 05:29.080] vector tags, so the tags that were used to build the map are no more images but [05:29.080 --> 05:39.320] are vector file format solutions. So it was the very beginning of this [05:39.320 --> 05:48.520] this approach and I have recovered some of the first tech notes about some [05:48.520 --> 05:55.600] solution from the open source community about vector tags and for [05:55.600 --> 05:59.920] statistical data this was a massive advantage because you may put all [05:59.920 --> 06:05.560] the data in the tag and adapt the visualization on the front end and you [06:05.560 --> 06:10.720] did not have to produce one tag set for every feature, so it was very [06:10.720 --> 06:17.800] advantageous and very interesting and we tried to build something around that [06:17.800 --> 06:23.040] and the tools were not existing completely at this time, so the first [06:23.040 --> 06:28.080] toolchain was something like some earth creep to process the data and export [06:28.080 --> 06:36.280] vector tags in geogizon format, a leaflet map to draw the map and a [06:36.280 --> 06:42.560] detroit hook to render the vector tags on the canvas and to animate all this. [06:42.560 --> 06:51.160] And some tricks to try something with interactivity because if you draw [06:51.160 --> 06:55.760] something on canvas it's not so easy to know on which part of the map you are [06:55.760 --> 07:01.920] and we try to to put some some some love in the details about the color [07:01.920 --> 07:10.840] scales, some background on the labels and the data that were produced by the [07:10.840 --> 07:20.320] French administration were square, so there is one well-known problem with [07:20.320 --> 07:25.280] statistical data, special statistical data which is called a modifiable [07:25.280 --> 07:30.680] area unit problem which says that when you change the regression levels you [07:30.680 --> 07:38.080] will see different things and you may find different patterns and we see that [07:38.080 --> 07:46.160] this can be an opportunity in fact and we will try to aggregate the raw data [07:46.160 --> 07:51.080] at different scales and link these scales with the zoom levels, so this will solve [07:51.080 --> 07:57.640] two problems we need to keep the vector tags a little bit small because we are [07:57.640 --> 08:05.120] on the web so we didn't want to download 10 mega and so on so this will be [08:05.120 --> 08:12.440] solved by the aggregations and because of the mob we want to explore several [08:12.440 --> 08:18.600] aggregation scales so this was a possibility that were interesting so [08:18.600 --> 08:31.600] this first solution is still is still usable 8 years ago and it's it's already [08:31.600 --> 08:41.320] cool so you have a map you may zoom you may look for a place and so on and you [08:41.320 --> 08:46.960] may find a precise location you may switch the features that you want to [08:46.960 --> 08:53.560] look at and you didn't have to download the big file open a g software and it's [08:53.560 --> 09:04.400] quite more easy okay so this was the first the first project and we get some [09:04.400 --> 09:11.680] feedbacks from the INSEE so we are speaking with the producer of the data also [09:11.680 --> 09:16.160] from journalists some journalists who are interested on specific topics and [09:16.160 --> 09:20.960] use the tools for example to to study segregations on school or [09:20.960 --> 09:28.480] districting your banis also use some of the your banis use that for territorial [09:28.480 --> 09:35.200] diagnosis transportation researcher and curious people so it was it was cool and [09:35.200 --> 09:44.160] since this first this first version the technology have quite maturated and [09:44.160 --> 09:55.600] there is now a very standard way to to to to to deliver vector types which is a [09:55.600 --> 10:03.520] MVT format mapbox vector tiles with specifications and pogeys one on [10:03.520 --> 10:10.160] database can produce this type of vector types and there is also new front end [10:10.160 --> 10:17.200] solutions like mapbox gel which is now no more in open source but the open source [10:17.200 --> 10:25.720] project is used by my trip now and that's that's also we have we have a new [10:25.720 --> 10:34.160] toolchain and some successors and if you have questions and if you have if you [10:34.160 --> 10:50.480] want details about some of these successors it will be a pleasure [10:50.480 --> 11:07.080] it's how you split so the question was about vector ties on the meaning of [11:07.080 --> 11:13.960] vector ties so when mapping is based on the pyramid of ties and when you zoom a [11:13.960 --> 11:34.520] tie is a twill in French yes but it's because in web mapping so to draw the [11:34.520 --> 11:40.200] map you have to to build ties that are arranged in a pyramid and when you zoom [11:40.200 --> 11:45.000] you download only the part of the of the zoom level that is [11:45.000 --> 11:54.720] concerned by the current view so it's too it's too no it's just to optimize the [11:54.720 --> 12:00.000] usage of the bond which and to to send only the relevant part of the data to [12:00.000 --> 12:25.720] the front end during during the exploration of the of the map from [12:25.720 --> 12:37.280] INSEE so just for INSEE they are trying to to to build something similar but from [12:37.280 --> 12:44.920] some time now and it's still not not online okay so I have a working on a [12:44.920 --> 12:50.800] prototype for them okay so as the next version of the of the solution was [12:50.800 --> 13:05.640] built with with INSEE and and they they didn't have sufficiently people inside [13:05.640 --> 13:12.160] to to to deal with this type of project I think but it's still not there but if [13:12.160 --> 13:18.160] you look at that I shine so almost the same project with with similar tools [13:18.160 --> 13:26.280] and in the UK that I shine is now available in in for for the census of [13:26.280 --> 13:32.240] the UK it's it's a new version of that I shine that is used okay so I have some [13:32.240 --> 13:39.640] hope that at one point in time we will have an official portai in INSEE that [13:39.640 --> 13:57.760] will look like that but still not yes but not in the web mapping exploration [13:57.760 --> 14:05.120] tools for everybody but I had one versions for for your buddies for [14:05.120 --> 14:12.800] examples where they can upload their own shapefives to to to add more layers of [14:12.800 --> 14:29.000] your information yes there is European initiative I have forgotten the name but [14:29.000 --> 14:35.760] it's it's all derived from the in spear directive which gives some guidelines [14:35.760 --> 14:41.200] for our delivering a statistical that has a special statistical dataset and there [14:41.200 --> 14:52.480] is also a library from euro stat which is called right this I didn't have with so [14:52.480 --> 14:56.960] you may find the links but there is a JavaScript library now which is [14:56.960 --> 15:05.760] developed by euro stat to show graded data statistical data