I had my notes here, so I'm supposed to read them because my English is really not that
well.
So in some parts I'm going to try to just read the slides and in some others I'm going
to try to improve.
So first of all I want to really thank you guys being here, the Graphology and Sigma developers
because thank you, thank you for your work and thank you for being here.
I really appreciate that.
I'm using that library.
I'm also in one moment.
So in one moment I also used Force Atlas II that Matthew Jacome is also here.
So it's kind of really, really amazing for me being in this chance to at least talk to
somebody that has the same interest and that's why in my own country maybe it's difficult
to do this kind of presentation because I have to make a long introduction.
As I'm going to do right now, but about the social phenomena I'm studying, it's freight
graffiti.
So this is one of the visualizations we can achieve and this is, as you can see, it's
a really hot mess happening there.
It's a lot of stuff and we're trying to get this.
It's kind of a synthesis or maybe an abstraction too.
So in this visualization at the end we'll be able to link the users to the symbolic
forms, the symbolic or the meaning they are using to make a community.
So I just invite you to have that in mind.
We're going from this to this, but before we actually have to get all that information,
gather all that information and make it happen in this visualization, in this computational
visualization.
Something is happening in these train yards.
So we have two different stuff.
We have to break that, this really long title.
I know maybe social scientists were always in verbose mode and we were just talking right
and talking right and I see how you guys are really into synthesis and really straightforward.
So what I'm going to do here is talking about two different stuff.
One, a computational process that is a really fancy name, no production through artificial
intelligence, inferences, but it's just a filter.
We're just cleaning all the mess.
And the other one is about a social phenomenon.
This is happening in physical world, real world if you want to say so, but we're trying
to take that data also and make it happen in our own framework.
So these guys tend to write their names on the freight trains and these trains will always
travel from all the North America region and then some other persons will track them, will
take some pictures and will post this on Instagram.
So this has been happening before social media.
So this is a community.
It's a practice community.
If somebody here know Jenkins, maybe you will know what I'm talking about, participatory
culture.
So that's kind of the same.
There's two, two, that's the same phenomenon happening in two different places.
In the physical world, in the digital world, and that's what we called an on-life phenomenon.
So the case of study for this presentation, as I told you, is the graffiti, the freight
train graffiti in the North America region.
So an hypertextual conversations, I don't know if it makes sense for somebody here.
One guy in the morning make a presentation about his presentation was Cosma.
That's the software he was presenting and he talked about that guy that invent this
kind of linking document idea and that's hypertext.
What about hashtags?
Hashtags may be this kind of hypertext too because they will function like a gathering
point.
People will join in those places through their own publishing, their own post after they
tag them with any hashtag.
This network will be talking about users, Instagram users, that post and mark this post
with any hashtag.
So this can make clusters or clicks and that's what we are trying to look.
These small groups that share something in commune, that share meaning, all these posts
are meaningful for themselves.
So that's I think something happening here too.
This is like this big group of persons that gets together and fuzz them and then we have
these little clicks happening in each room and then anybody will move from place to place
and make these kind of networks if we try to see it that way.
So the other part I would like to talk to you is about this filter.
This filter happens in two different levels.
One with a Python using some other libraries too and the second one through Graphology and
Sigma, I think misspelled, using JavaScript.
So this was introduction guys.
The point here is I'm going to share to you how each step uses different open source libraries
or software and that's one way to acknowledge to all the developers here that all that effort
you are doing is making people like me that is not really a developer.
Trying to make a dialogue, talk between social science, computational science with the tools
that I can try to use.
So there's a word there that is really important.
It will go all the way from the whole slide show, it's data.
We have been listening to that concept a lot and I really feel kind of sad because the
effort that I can see in those talks before was about standard data.
Big platforms make tools to make standard data and this example is the whole other thing.
It's really different because it's a really custom data set.
It's a really custom, it's a really niche social phenomena so there's no tools to study
this study object.
So we have to make them with anything we can.
So data is the key and it's the link between execution devices, between disciplines, between
programming languages, theoretical frameworks, development libraries and social phenomena.
That will help us to make interoperability between all of these different dimensions.
And I think, and I hope you do too, this will be only possible through open source and data.
Data is the key here.
So the journey starts.
I'm going to try to be really fast so I can have some of your comments.
I will tell you this is a master's degree thesis so each step it was way long.
If you think this is verbose, that's some other stuff.
So I'm using the first link between, I want to show you is between conceptual frameworks,
theories.
So we have Thompson, a guy from England that is trying to find these kind of categories
to detect meaning, to detect symbolic stuff.
And we also have the graffiti de firma from Figueroa, that's a Spaniard, another Spaniard
guy that retomb these I exist, I am the SCART, I don't know in French maybe, the SCART, I
don't know the right pronunciation, the SCART, the SCART is to the cart.
To see how graffiti writers broadcast themselves to the world.
So we're making this link, right?
Because data will be the key here.
To make this link between some theoretical point view perspective, to a way we can manage
to just back up at least, we need to make this, look for these terms, look for these
stuff and make it some sort of way to, well to data.
So we at least have these three categories, those things we are looking for.
We are looking for geographies, so we are looking for cities, so we are making a dictionary,
a city dictionary.
We are looking for communities, that's symbolic, shared terms, so this dictionary is about
the words that graffiti writers use to tag their own posts and the freight workers use
too, so we can mix them, merge them and make this freight train dictionary.
And last but not least, we have entities, so we are looking for graffiti writers names.
We are going to scrap, we're going to mine these hashtags, these hashtags, conversations,
these hypertextual conversations and the network, we have that simple structure.
Users post some publication and add some tags.
But we are not only using one user's post, we are using a lot of them.
So we have this seed node, the seed node is the first hashtag scrapped and this Instagram
data mining boat, really original name, will download an infinite number of posts and then
add new hashtags that are found on these publications.
That will give us this primitive kind of network, this is a small one, the seed node was graffiti
bombing, we used a mining depth of only zero, so it will only mine that in this case 30
posts that are using this specific hashtag, but as you will see, graffiti bombing has
30 posts, but this other post is also using different hashtags.
So that is how this network is built.
For making this mining, I'm using Instagram app, it's an unofficial Instagram app for
Python.
I don't know if it's a privacy stuff and I know it's tricky, so I won't talk about it.
But I use Docker, so I can make a Raspberry, we try to mimic human behaviors, so this mining
will last for maybe one week for each conversation and if these conversations are really large,
it will last longer.
So that's why we are using this low consumption computer and then after we scrap this, we
will put it on the SQL database.
So we are going from the publications to the SQL database.
So this will be a really fast way to put it.
We came from reality to Instagram and from Instagram to our own dataset.
But we are now looking for these terms on the dictionary we already made before.
So in this case, this is a writer from my city, Afex, he wrote that train in Mexico and
now somebody else sent him the photo in Utah.
So he will put his name, the place it was found, some other stuff and some slang for
the same community and also his crew, his group.
So if we try to put this text in spicy, we'll give us only one token and that won't help.
So we have to split the hashtag in small words.
So the answer was really cool.
It was already on Stack Overflow.
So thank you to that guy.
I put it in the paperwork.
It's there because we have to acknowledge some others work.
So I have to build this really big dictionary of all the words we know in Spanish and English
to split the hashtag.
And after that, we'll look in these dictionaries.
If some word is inside any dictionary, it will be marked as.
If it's not, we will think about it as a writer name or as a crew name too.
But we're making sure this is real and we will look for graffiti in any part of the
caption to make us sure that that strange string is actually a graffiti writer's name.
So this is simple, but it works.
We have those words, how those hashtags were marked by this software.
So we have throw up calls, bubble style.
And this is interesting, but it will be more interesting when we try to put everything
together.
We do it with a spicy docker on Sorrasberry.
I'm going to be really fast now.
These are two image detection models, ones for Google, ones for IMAID.
And that's cool because this is the same technical process, the same image, but it's seeing two
different stuff, right?
Because the models will see what we want them to see in the training.
So that's really straightforward, but it's really important because the Google model,
it won't be useful for me at all.
So this is the result of using this model.
Also it will be interesting when we put everything together.
We are using Jupyter with Google collab because it's free, so we can make an SQL query and
then it will download the images and apply the model.
So the beauty of relational databases is you can access to these different content from
different sizes.
You already know this.
But the point here, we are going from the database to JSON network and we will get something
like this, right?
We have in the middle, in yellow, the inference notes and purple, the images were detected.
And the point here is to see how users gather using the same symbolic stuff.
That can be the same symbolic stuff, some kind of graffiti, some specific slang word,
some kind of city.
We can see in this point how somebody in Tijuana maybe will use the same...
Some group in Tijuana will use maybe the same style.
I think that's important.
But this is a hot mess again.
It's thousands of notes of different types, so we have to clean this.
Looking for significance, or meaning or symbolic forms.
We know a man is an animal suspend in webs of significance.
He himself has fun, but we can clean that.
We're trying to clean that.
So the note reduction will be the shortest path to the meaning.
We're going from that to the really clean network.
At least I think so.
So the algorithm, the thing that's happening here is that for each user node, we're making
an array.
Then I have another array of the whole network.
And if they have a shortest path that is like an algorithm using graphology, and another
place is two, that if this match some...
If it match three steps, it means that the user has some symbolic node detected and then
it will link them.
If it's not like that, it will delay the node.
So it will change the network structure to this now.
It will be...
It's really different from that one from before.
So I think...
I don't know if we have some time.
If you want to see how this works, and if you have also some comments too, because I
would really love to see if you guys have anything that I can change, I can add.
I think it's really...
I don't feel like there's some questions you can do.
I think it would be better if you just told me what to think about it.
So in this case, the network starts with 4,000, almost 5,000 nodes.
And at the end, it's really small.
Let's see.
I don't even remember which one is the biggest.
Sorry.
We went from 5,000 to only 500.
So I put this example because...
There's a principle phenomenon called divergence that when we mine the whole hypertextual conversation,
it will go for a lot of places that we are not interested.
Like if somebody used red as a hashtag, if somebody used love, if somebody else used
no, it will just move the conversation that we are trying to mine to different places
that we are not interested.
So in this case, macro, it's a really known writer.
Let's see if we can find it.
Well, this one is also a really known writer.
So in this specific example, we can see how the object detection model tagged this photo
as a wild style, as a throw up, and the Google model will tag them like a wheel and a train.
But also, we can see how this hashtag was tagged as a graffiti writer.
So that gives us an idea that...
Well, not an idea, an evidence that this guy is a graffiti writer name and we can see
his intervention.
We can see who is the user that makes this post.
When we can access to the user, the user neighbor network, we can also be sure that
he's using these specific graffiti styles.
Okay.
So I'm going to finish with this.
In this case, when we apply this filter, it's way better, it's way cleaner, and we can start
to see...
I think I'm just talking nonsense now.
Do you want to add something?
That's my answer question.
Thank you.
How did you get inspired to run this as a master piece and continue doing research?
I mean, checking the throw ups and the graffiti on trains.
What was the practical aspect that motivated you to be fine?
Okay.
So the question was, what was the practical aspect of attracting the graffiti and what
motivates you personally to do it?
I had a pre-grad also graffiti as a central topic and I thought it would be easier to do
something that will continue this personal initiative.
But at last, I've been late for two years now.
I shouldn't deliver this last year because...
But it was really interesting to how learn this new stuff and put it together to make
some scholar work.
What made the difference?
What do you see?
Right train, for instance.
How do you know that right train is the background and not the name of the artist?
Okay.
That's a really...
That's a good question.
I...
You can repeat the question.
Okay.
How we managed to difference the freight as a graffiti...
Is not a graffiti writer's name.
So there's a big dictionary.
It's built with all the known words.
So it will distinguish between known words and words that are out of that vocabulary.
Yes, Alison?
I have one question then if no one else has one.
How do you...
Have you had any insights in your graph that really excited you?
Yes.
So the question is if the insight of the network really excited me.
Yeah, I think it does because it was like a kind of a serendipity, you know?
When I start to see like this small notes linked together from the terms, it will pop like how some terms are linked for some graffiti styles too.
So everything's connected and I think the way to get to this is tailoring data to our own needs.
Right.
Yeah.
Okay, folks.
Can we have a big round of applause?
Thank you.
Thank you.
Thank you.
Thank you.