How many of you are still awake?
Let's have a show of hands.
Okay, 90%.
Very good.
Very good.
How many of you work in mobility in your day job?
All right, that's about 25%.
How many of you would like to work in mobility as your day job?
These are the superheroes of the next generation.
How many of you have worked with passenger counting before?
All right, five.
Excellent.
All right.
So this story, this is like a few of the things that I want to highlight in the development
of the Finnish national automatic passenger counting system.
And there are like bits for everyone, and I'm going to be fairly speedy with these things.
And if you have questions, please ask them at the end.
And let's get started.
So I've been working in public transit for a bit over 10 years now, and software development
for a bit over 15.
And I started my own consulting company five years ago when I wanted to help more organizations
as well.
But I just wanted to give you a bit of background that I come from the public transport side
and not so much from the railway side.
So just the basics.
What is automatic passenger counting?
It just answers the simple question, how many people are there in the vehicle?
There are two different kinds of messages that these vendors send.
For example, they send how many people went in or how many people went out of this particular door.
And then there's the option of telling how many people there are in the vehicle right now.
And some vendors send both of these data, and then you have to decide, speak louder?
Yeah.
All right, thanks.
So some vendors send both of these data, and then you have to decide which one you trust,
the DIF or the total.
So why do we collect this?
For the passenger, the benefit is quite obvious.
You want to travel with the less crowded vehicles, mostly.
You get this information from the passenger information systems such as signage.
But also, more and more, I expect that there will be automatic decisions made for you
without you knowing about it.
So trip planners already will suggest trips that are less crowded.
And in the future, I think general technology should already be there,
but that general technology hasn't reached public transit yet.
So we can't yet recognize prams and bikes and such when they come in.
But when we can, then we can tell you if your pram fits in the bus that you're aiming for.
Now for the authorities, the public transport planners, for example,
one of the most important things is to be able to understand where the masses are moving.
So you want to allocate the vehicle capacity where it's needed.
So for example, if there's a route and the last three stops of one direction is often very empty,
then it might make sense to cut the route short and just increase frequency.
Also, some of the trip planners have these status information on how many grams of CO2
have you released when you're traveling.
So that depends on how many other passengers there are in the bus.
Also, pandemic precautions are important driver for the funding of these projects
to finally get these things funded and running.
And also when the passengers choose to even out the load,
when they choose to go to vehicles that are less crowded,
it means that transit becomes smoother because there's less congestion in particular spots.
Now the situation in Finland before COVID and before mobile tickets were very popular.
Aha, I'm hearing myself.
The situation was such that...
Somehow caught me off.
Right.
So there was not much incentive for developing these systems
because we got most of the information from the ticketing systems.
At least we got the information on when people got on.
But in 2020, six municipalities and the government put money together
and pulled it into Valti, Valti's service development company for public transit purposes
owned by the major municipalities of Finland.
And they pulled money together and in 2021, they chose Futurize as the contractor.
Futurize is an excellent service company in Finland
and I was privileged to take part in that team as a technical architect and a lead developer.
And in the next two years, more companies and organizations joined in this project and many phases.
I'm just giving you a bit of the background on it.
I think in hindsight, our main task was to reduce vendor lock-in and to reduce the costs of APC
because currently the high quality sensors take cost...
Typical stereo camera costs 1,000 euros per door
and 203 doors per bus means that it's quite expensive.
And also, I mean, sensibly, many of the vendors want to offer end-to-end service
of providing data and analysis.
But then it's hard to maybe get rid of that vendor
if you want to move on to the next system.
So we interviewed stakeholders, held workshops, sketched out some architecture ideas
and we came up with like a three-prong approach.
And the first one is that we create an API spec between the onboard accounting devices and the backend.
And we try to make it easy to understand for companies that don't work with public transport in general.
And as a starting point, we took from HSL to Helsinki Regional Transport Authority,
the capital area, PTAs, Public Transport Authorities often have the most resources, so they were a bit ahead.
They had this data format that was modified from an earlier data format that they have.
And we wanted to be compatible with HSL, so we don't fragment the finished market.
But also, it has a lot of craft for our needs.
So I've split the chasing message into two columns here.
The first one is more about the APC ideas and the right one is about the general public transport method data,
such as routes and operating dates and directions and such.
And all of the data on the right side is available in the backend anyway from some other source.
So by reducing some of the fields and also trying to get rid of some ambiguities,
we just added schema version and accounting system ID to do matching in the backend
and message ID to make a message checking unique when there are duplicates.
And then we dropped everything that wasn't about the APC.
So these chasing messages are sent over MQTT, which is very commonly used in public transport,
both on board and between backend and vehicles.
And I think this format allows for any company that understands how to count people or objects to participate in this market.
So it lowers the barrier to entry.
And we're hoping that there will be more companies that offer accounting devices.
Okay, the second approach, second attack was to prototype new counting technologies.
And we asked two companies to develop new things and one company to provide a reference device of something that already exists in the market.
And DILCOMP created object detection from security cameras and AmpliCa used a millimeter wavelength radar for the presumably object detection.
And there are a couple of pictures on the upper right, maybe a bit small.
This is a picture of the prototype millimeter wavelength device 3D printed stuff hidden behind the ceiling panel.
Now, unfortunately, unfortunately we learned that 20,000 euros per new technology was not enough.
For us to create breakthrough technology, we managed to create the right format of data, but values were not yet usable.
But maybe some of you can figure this out.
I hope you can.
Okay, the third approach that we created an open source backend for this whole system.
So there's also no great vendor lock into our team either.
And here's like the simplified architecture of it.
And let's forget the left side for now, but in the middle, data comes from the on-port counting systems.
They go to the MQTT broker.
And then we push it into Apache Pulsar.
Apache Pulsar is a distributed append-only log system competitor of Apache Kafka and has been in use at HSL for six years now.
And we also wanted to have synergy there so that Valti and HSL would have similar technology backends.
The messages from the MQTT broker are deduplicated and they're brought into the journey matchup.
Journey matchup also takes its input from GTFS real-time, the vehicle positions which tell where the vehicles go and when they leave the stops.
So it's a very simple logic in principle in journey matchup that you just accumulate values in and out until you leave the stop and then you trigger
APC message with all of the public transport method data that you need in the analytics.
So routes and stops and directions and so on.
So journey matchup pushes it through MQTT back into the provider of the GTFS real-time API.
So that services the authorities.
That's the raw data or raw, but I mean the accurate data as such.
But it doesn't serve the public because this is private data.
This is mobility data of people moving about.
Now you might think, okay, how many people moved in the door doesn't really match with any individual.
But that is not so.
And on the left side we describe how we need information from the vehicle registry as well to pick up data about the vehicle models that we have, the seat configuration and standing places
and how we create an organization profile out of it.
But for this part we need help from experts.
So we asked university researchers from Finnish Center for Artificial Intelligence and Helsinki University.
There's a professor on the home class group that is focusing on differential privacy.
And especially Jonas Jalko and also Raja, sorry.
And oh dear.
Okay, I'll get back to that.
Worked on this.
Jonas was especially working hard on this with me and they created a method for the anonymization.
Now the reasoning why we need this is that if you consider someone who lives maybe not in the city center a bit further away and they travel in a bit peculiar manner.
Let's say that they have shift work, they travel at noon.
The stop that they use, no one else ever gets on that particular route in that particular direction at that particular time except them.
No one else gets off of the bus at that time.
So if you learn that pattern, if that accurate information was public you could stalk them and figure out, okay, now their house is empty and so on.
So to combat that, often I've understood that how people approach it is just been the values.
So for example, if it's five to 20 people in the bus then it's many seats available.
In GTFS real time, standard occupancy status, the occupancy status field has these ordered categories from empty to full.
And the thing is that that's not really anonymization because when you switch from one category to the other you're still leaking information.
So the method that they created based on differential privacy, we believe it is the first differential privacy method for automatic passenger counting in public transport.
And I'm really glad that these researchers made this effort for all of us and it's all open source.
I think it deserves a round of applause.
It's also very simple.
So the above case would be the one where you have no anonymization except the pinning.
So once you switch from four people to five people you go from empty to many seats available.
Now how their method works is that they take that vehicle model, the seats in the standing places, and they turn them into these, they take as input this upper CSV file, CSV, I'll just give you a light.
And they fast these boundaries so that they match the differential privacy condition.
So I'm not an expert on differential privacy.
I'll hedge anyway how it works roughly so that we're actually using epsilon delta differential privacy.
But in the epsilon differential privacy you have the small value epsilon that you can decide and that affects how private versus how usable and accurate your output data is.
And the epsilon affects what's the probability that you can figure out an individual from that data set or whether that result was formed by a data set with a particular individual in it or not.
That probability difference is very small and affected by epsilon and the delta parameter relaxes that condition a bit.
So the black areas here have probability of zero, exactly zero.
So that's the delta in action.
Otherwise you would have these violet purple bars quite far along.
So for example, how you interpret this is also CSV files is visualized.
For example, if you have seven people, you have a small chance of publishing empty and a large chance of publishing many seats available and no chance of publishing any of the other categories.
So we want to have such a system that if the accurate value would be many seats available, we don't accidentally publish full.
The counting of these profiles is quite intensive, takes many hours.
There may be various optimization possibilities in the algorithm, but it needs to be only done once per vehicle model.
And then you have this small CSV file, a table of probabilities that you just sample from every time you need to publish the result at a stop.
So it's very, very fast in use.
And you can just plug it in if you already have another system like the one above.
All right, so this has been a trip of these highlights.
Check out our API spec, especially if you're interested in creating these kind of counting devices.
Please try your hand in it.
They are, the bosses are dirty and they are dusty and they're shaky, but otherwise you can use whatever methods you have available.
Also, if you haven't yet got your own APC system, check out our backend code or maybe our architecture and this idea of not, of having only minimal data from these APC vendors.
It's attractive for you.
And if you already have an APC system, please do use the anonymization method created by the researchers.
If you have further questions after this, you can contact me by that email.
There are a few of the links.
They're also in the talk page.
And I'm not yet sure what else will be behind transit privacy.org, but right now it's just a link to the tool that the researchers created.
That's enough of the monologue.
Let's start the dialogue.
So for public transport, I guess it's very important to easily detect when a road is being inefficient by maybe just moving air.
For example, if in some end of the road, the bus or a tram or whatever is mostly empty, how does this anonymization algorithm make it harder to detect when some public transportation is being under you?
So the question was whether or how this anonymization will affect the public transport planning use case of figuring out whether reallocation is to be done with the vehicle capacity.
In our architecture, the public transport planners get the accurate data into their analytics.
So the anonymization happens after and it's only for the open data part.
Can you speak about your experience with the microwave based sensors?
Oh, the millimeter wavelength radar.
I have no clue.
So we gave these companies a lot of leeway and they produced their pilots and we don't have insight into how exactly that works.
Any insights about the results of their pilot or not?
The insights were that thus far the results were not good enough to be shown.
You haven't actually mentioned the great deal about how counts are actually achieved.
Sorry, I can't hear you.
You haven't explained much about how counts are achieved.
The time has been involved enormously over decades.
In the past you simply used to weigh the carriage.
Now obviously it could be distorted by adults, children, people with a lucky age, Americans, whatever.
Then you don't even know where they are on the train to the carriage.
So you don't know that they're individual though.
Then they looked at things like as you enter an exit using light sensors.
Then they looked at things like are you connected to the internet and counting the number of people who did that.
I suggest that they actually work on facial recognition, not against a database but simply against the number of unique faces on a carriage at a time.
Then you can start as they move around and work out what the behavior is.
What's your thought on that?
Alright, thanks.
There was a brief history of the different kinds of technologies used for detecting passengers and objects.
Then a question of whether facial recognition would work.
I'm not sure if it sounds like it would effectively work.
It could.
It's very tricky how to communicate it to the public in a way that is understood correctly.
Like for example that we don't send anything else that plus one and minus one from these vehicles onwards.
Yes, it's back.
About that, facial recognition, you don't have to go really directly to that.
There is so many more stuff that you can do and with your counting mechanisms, even all the source models can really do much better without having to get any facial information.
If possible, just actually close that out.
You don't really have to get through that.
The layers, like I'm working on model share counting and we're doing that for cycling and we're doing that also for passenger counting and stuff.
You don't need to actually get to mine all these information characteristics of the people who do the tracking algorithm yourself.
It's not really necessary to get there.
That will also reduce the communication.
Thank you.
A comment was about how the object detection and object tracking algorithms on open source are already quite fine without facial recognition.
Yeah.
It's like calculation of CO2 in the carriage.
Sorry, I can't hear you.
Calculation of CO2 in the carriage because when people are breathing, they can reflect air transfer and calculate how people are in this situation.
Other studies have been done for COVID, for example, and you can reuse this COVID study.
My next question is regarding the users of security cameras for counting people.
Do you have any experiences in terms of producers of the systems?
In the moment, you use the camera for a different purpose, the warranty is gone because we have this problem.
This kind of thing is like, let me say, okay, we will never use it for other purposes than just checking security.
This is like a big ego constraint that you have when you're procurement.
Yeah.
A good comment on security camera warranties.
I don't, I remember hearing discussions about that, but I don't have any proper answers about what do the security service providers think about using their camera fees for something else.
Okay.
Thank you.
Thank you.