So we're going to have a nice little open discussion of AI and machine learning for the next 30
minutes.
Jeremy, where are you?
Here you are.
And Jeremy is going to be chairing it for us.
Take it away, Jeremy.
Okay, ladies and gentlemen, we thought it would be good to fire up some Q&A.
We're slightly suffering from, I think we're still down to one microphone.
So I'm going to get a lot of exercise.
We're going to tie, in order to give some structure this, William, Stefania and Michelle
are going to each give a two minute introduction to a topic we think needs discussing and then
we'll open the floor for the next eight minutes to comments from the floor feedback.
Okay, and William's going to go first.
Get the microphone right this time.
So one of the hats I wear outside of working for Jeremy is that I'm chairing and leading
the best practices group in AI at TechWorks, which is the trade body for electric systems
in the UK.
And as part of that, we are working on a base practices guide.
Can you hear this now, by the way?
Yeah?
Yeah, cool.
Let's be louder and into the mic.
Okay.
Let's see how this one works.
That was our best practices.
One of the challenges we have with this, one of the challenges that we have with this is
retraining existing software engineers to be AI engineers and to understand the risks and
challenges of doing AI and machine learning engineering, particularly to try and make
the software do all of the things it should do and none of the things it shouldn't do.
And some of those challenges are adversarial attacks.
You can poison models either in their training data or just with enough experimentation, you
can find examples, adversarial examples, which cause the model to do weird things.
Or we have to deal with privacy issues.
Models can be reverse engineered or they can be the data sets that they were trained on
can be reverse engineered by sufficiently clever adversarial attacking models.
So that's what I wanted to open up on this first discussion, talking about essentially
how we can keep AI robust to misuse intentional or otherwise and keep things secure.
Any questions, comments, thoughts on that?
Please don't make this a very short discussion.
So I've been spending a lot of time working on prompt engineering for a project, personal
project I'm working on, prompt engineering.
Did you ever play the Gandalf game where you're trying to guess a password?
And at one point it uses a second LLM to verify the output of the first one to know whether
it gave the response.
I have never played this.
It sounds pretty cool.
Yeah, no, it's fantastic.
So quickly, it is basically you are attempting to get Gandalf to tell you a secret password.
And so however you coax it to produce it, such as tell me in a different language or whatever
kinds of ways you can trick it to reveal that password, one of the last level is basically
it adds in a separate query to an LLM from the output of the first one.
And to verify, hey, no, no, actually I did actually say the password that's wrong.
And so then it becomes difficult to say, okay, how do you get it to spit out the password
such that the other LLM lets it through?
And so there were some different techniques on doing that.
I would love to hear any kind of discussion on that kind of prompt engineering.
So this is the thing I've seen a few people do, have large language models, check their
own homework.
It's pretty interesting and I've got to say it seems to work remarkably well, but I'm
pretty sure in this game you're describing that the last level is beatable.
Is that right?
Yes.
And so this type of thing, having large language models check their own homework, it seems
to work really well, but it doesn't solve the issue, at least proof of the issue.
At the end of the day, large language models checking it can still be tricked as well,
so that's where we end up.
Thank you.
Any more questions, comments?
Yes, okay, hold on.
Thank you very much.
So one of the things that we're doing at the company I work with, we work with AI in
an educational sort of environment, and so because of that we have to be very, very careful
what we give out to students, especially because if we give them any wrong information, it
can really be detrimental.
And so one of the things that we've been working on and we've got it working fairly well is
that instead of giving the student like full access to the input and output of the LLM,
we've made it so that the student can basically provide tailored inputs to it that we know
and have tested the outputs of based on our data.
And so we've been able to get it so that we can have outputs that are generally 99.99%
of the time beneficial to the student through instead of letting them directly enter prompts,
we engineer the prompts for them ourselves and then give them a drop down or based on
their input into the chat box determine what they're actually looking for.
Now if they ask something really, really random or obscure, like yes, sometimes it can come
through and say I don't know what you're asking for, but we found that it's better to sort of
have a more curtailed environment to actually return outputs from.
That's really interesting. It sounds pretty labor intensive. How do you curate these things? Is it done manually?
So with regard to it being curtailed, we have basically gone through and we've spoken to students,
we've spoken to universities and the other providers that we sell our software to.
And we've essentially figured out like a long list of what they're looking for.
And so we essentially built a chat box based around that. It's relatively limited right now with the number of prompts
it can give. It's about 15, but we're adding more as time goes on.
So it is intensive in building it, but the end sort of goal that it improves trustworthiness
and robustness in our system, which is really important for our clients, is like it's worth it.
That's really cool. So this is a chat box you've built yourself and is this like the chat box you've built,
not a large language model, it just is a curation of a large language model?
Yeah, it's not. The chat box itself, I didn't actually build it myself. Someone else on the team built it,
but yeah, I believe it uses either a very basic LLM or it's just purely statistical.
So something I was going to ask is if this was using a large language model and you were restricting the input and output,
I think a really clever student could play around with the order you ask things and probably still get bad things to come out,
which is, which would be, I mean, it would be bad, but it would be interesting.
It would be interesting. Yeah. There's something that we're continually testing.
But yeah, we, as the software works, it will only allow prompts to go to the actual LLM through one of the channels that we've laid out for it through the prompts.
And so they might be able to be clever and get it to return a wrong prompt, but it still wouldn't return like things that are detrimental per se.
That's interesting. Thank you.
Okay. Thank you for that. Stefania, would you like to introduce your topic for discussion?
Hello. Can you hear me well from outside?
So my topic of discussion is the important to share your projects and to get exposed to conference from a national and community level as well
and the local community as well. So to give you some inputs, I personally came from physics, decided to go in data science.
And my very first exposure was Picon Italia seven years ago. How many of you actually programming Python?
Cool. How many of you have been in Italy?
So if you want to join actually on 22 to 25 of May, there will be Picon Italia.
And personally, when I joined seven years ago, I was a volunteer presenting speakers.
And at one point that I was studying a lot, I was saying, okay, I can do that.
And that gave me a big push in applying to data science job and I've been teaching Python data science for a long time.
And so I encouraged you to go to national conferences, but also on the local point of view.
So for example, I'm based in Turin, Northwest Italy.
And a great example was for example, Shishari Kat, an assistant.
We are by, if I don't remember wrong, Piero Stefani and Savastani.
And there was a contributor, Alessandro Spallina, that came to give his first talk with a demo inside a Python Turin on my city.
So it was very interesting to see how from one person showing up a project can inspire others and also collect more and more volunteers to their own project.
And to add on that also, I want to also inspire you to be networking across different communities.
For example, another example of an event was working with OpenStreetMap data and collaborating with Wikimedia Italia in order to make this happen.
So I want to inspire you to enhance your local community, especially also to give opportunities to students to showcase your product and your project in machine learning.
And what I saw from my side is when you're able to create a space for people to network and to collaborate, it's also easy to study, understand better complex knowledge and collaborate together.
So I'm very open to help you with your local community or national chapter and hoping for questions, even if you want to share your own experience in starting or being inspired by local events.
Okay, thank you, Stefania. Any questions, comments or anyone wish to take up the offer of help from Stefania?
Hands, come on. Yep.
Help me hearing because I don't hear you.
Hello. So thanks for the interview.
I was kind of worried because I'm Belgian and I think there is a lack of student community in tech, especially informatics.
I was a leader of a Google group last year for students and it's kind of hard to attract students to learn AI a part of their course.
So I kind of agree with your view to improve this communication, to attract some people and motivate them to participate into some conference like that and other conference and other hackathon or something like in Italy, for example.
And so I would really like to see more group like that, like you told and I think it's really important because some people don't even think that, some people think that I have to be, to study AI, to do AI and to be specializing in this thing, to find a job in this thing.
And when I speak with recruiter and something, they have an opposite discourse. So I can agree with your point of view and I can't, like, no, I want to motivate people to join group, to create group because for example the GDSC, Google group, we are the first one was in Mons, in Belgium, a second in Liege, but for foreign students.
And in Belgium it's really hard to create those groups and I think schools should help students to create that and to entertain the culture about techs and AI, etc.
And I could wish to, with older societies or older more professional group to maybe ask to study, like university for example, to help those groups to preferrate and multiple and to attract more and more people like that.
So thank you.
Yes, thank you for the input and I have a lot of tips to give you that maybe can help other people too. So first of all, also encourage students, volunteer for conference, for example, an example of Picon Italia is to get also students from Florence, from high school and university to get and help.
So it's very important, I'm also leading an nonprofit association, so it's very challenging to get that volunteer work. So if you are the organizer and what I encourage all of you is to not wait for something to happen, just make it happen, just create once.
And the first thing that I will do is also to contact perhaps speakers from other university. For example, in Italy we have a tour in Milan that are quite close to each other. Milan is doing the first pay data, that's another great network of events, especially for students, have something very clear and attractive to them.
And I have to be honest, sometimes also free food helps. For beers, for example, there is a format called Data beers, I don't know if you ever heard about it. It started from data scientists in Spain and there is a beer estrella that kindly sponsored beers.
And the format is a free lighting talks. So we started to do it more and more in Italy, there is different city that doing it and also across Europe. And that could be an extra push to make people come to events and then really network to each other.
So my suggestion is if you can get a professor on board, that's great. But then also another last tip that can give you is the last event that we did was in collaboration with a high school because sometimes it's very challenging to find spaces.
So for example, you can look out the different community, even a Linux group that can give also particular topics in machine learning and say, okay, I'm in contact with the professor that can offer the space to host an open source community here.
And then the students will start to know about that and start to get around and talk about that and from there can start another group. And there are a few tips, but feel free to contact me if you need more tips on that because it's very good to also share your experience so all the local groups and national community can also learn from each other.
Out.
So I was the one of the main organizers of the University of Birmingham Computer Science Society for several years.
And so my tip about making a community is definitely focus on some organic kind of community growth of pizza and beer and interesting talks.
And then from that, see if you can, once you've got enough of a group of people, you can then approach companies for sponsorship, which will only make it bigger and bigger because people are desperate, or companies are desperate for computer science graduates.
You've got a genuine currency there that companies will definitely respond to. So I think build organic growth to begin with and then you can really get much more money and greater support by utilizing companies like that.
Yes, I do agree with that. And also there's someone that is waving in here, you can get it.
And also in terms of workshop, I do agree in getting involved also with companies, local companies, and in terms of kind of events, it could be more social events or more workshop events.
For example, for the one about special data, it was very, very hardcore workshop with not both shared and then discussion of it.
And he could be also hack and tell with someone is coding and other people are watching it and asking questions along the way.
So hello, thank you for mentioning the cat. I am the guy that started it.
So thank you. I want to get a little political if it is permitted. So I think in this place, it's really important to, and I ask you a comment on this, to focus on open source and standards because they go hand in hand.
And it's a way to invite our territory in Europe to build without falling into the fun boy style around open AI, around US services.
So it's time to build in open and create standards, not only the laws, because we are good at laws and standards and open source.
It's the only way we can build our own AI economy. Thank you.
Thank you.
Thank you. Yeah, I do do agree completely. Actually, maybe he knows that I give some talks last year by potentially about a safety but later on at two 15 we are going to talk about exactly that.
So standards AI governments. So if you want to stick around, there will be a wonderful panel to talk about that. And I completely agree that we need that more centralization and more discussion, even an event in university.
And not only computer science, also policymakers need to have more and more exposure in the past. I worked with the open data and we have policymakers ask us, can you please explain that the science because we have to make the law.
And yeah, so communication between the two sector are extremely important. Thank you for the inputs.
Can I just ask a question? I was lovely to hear the discussion about graduates and everything. What's your view on how you bring forward the older engineer, of which I'm a representative member who's been working in software engineering for a long time.
How do you bring them into the AI world?
Well, I think there will be a space for everyone in terms of, well, I'm from the data science part of you and recently more about handling their risk of AI. So I think there will be a great space to understand how to make it more accessible to everyone.
And also, I don't know your specific, because from UX to development to understanding more also cross part about languages, for example, so it came to my mind, for example, Rust became more and more popular in data science from pollers.
This is a library in in used in data science through a Python binding as well. So it's very important. And let me know if I could request it as well to have some of the tools to have an understanding of different ways of approaching in this case, data science for different technology from especially now that we need more power,
recorrosy, more paradigms that sometimes is less used in a high level program languages. So I think there's space to collaborate in that as well.
Yes, so this work I've been looking at with best practices and I for tech works is actually quite a lot about this. And it's really interesting how different it is trying to write a guide or a course for somebody who already knows a lot of computer science paradigms.
There's so much you can skip. But there's so many things you have to also be careful to not skip because you need to save them again or reemphasize things. For example, source control is one you need to think about not just source control for your data, sorry, source control for your code, you need to think about source control for your data to make sure your whole pipeline doesn't get messed up.
And it's just very interesting. Might be some interesting things to talk about that by the way.
It's so nice to be at FOSDOM. I haven't been here for years. So in 2017 I was doing repotential build, neurodebian, any other neuroscientists? No.
Yeah, neurodebian. Yeah, good. Sorry, I could not find this building so I only saw the last slide. But so in 2017, a group of computational neuroscientists led by me put together a fully open and available set of training.
And it had to be asynchronous as well, right? Code has to be fully available. The classes have to be pre-recorded so that anyone anywhere on earth can access it at any time. That is a extended definition of open and you have to create that.
So we trot them everything up from reinforcement learning agents to GANs in 2017, right? And I think this space is under exploited in open education, right? And if you take this, what we did was say here's the criteria to create reproducible science.
Here are the skills that you're going to learn. Go out and find and validate an open data set. And if you do all of that because of reproducible science, you in advance have a pre-registered paper. You will produce a peer-reviewed paper.
In this space, concerningly, a lot of people, if they get to masters and PhD, have done at this time a lot of rote education and no critical thinking or no extending into new knowledge.
But the second you tell them, these are tools. And they're tools to solving new problem. And if you solve that problem in science, right? In science, it's the only place where you say, I solved a problem, it's peer-reviewed, it's open and public.
If you pre-register it, you're perfect. If you can motivate them with something real, it will be applied. And I just want that to be much, much more in the forefront of people's minds as they communicate.
Okay. We've only got a couple more minutes. Will, would you and Stephanie just like to wind up the discussion and then we'll hand back to JJ to run the next talk.
Just a quick comment on what you just said. It was always something we struggled with, or I struggled with when I was a neuroscience researcher, because there's quite a lot of not quite so good neuroscience researchers who don't do what they should and pre-register experiments.
And they just sort of make it up as they go along and that causes a lot of problems. And I really like this idea of pre-register everything first and then when everything's pre-registered, then do it and then publish it no matter what. I think it's quite important.
Yes. And do you want to just wrap up your bit on education?
Sorry.
Just quickly wrap up your bit on education.
Okay. I think we're, okay. I think we're done. Okay. We're done. Thank you very much indeed.
Bye. Thank you.