45 minutes session today. We want to go over some of the risks of open source models and then cover like what AI governments can do about this. So the first part will be a presentation so that we have the same common background information on what we're talking about, what kind of models, what does the technology look like. This isn't very in depth on the technical part but people who do have technical questions feel free to ask them and will leave a lot of time for audience participation questions and answers. So about 10 minute presentation, 15 minute initial panel discussion and then interaction with the public so that you can give us your thoughts and inputs on AI governments open source models. So let's start just with AI safety. How many in this room already have heard this term or have read about AI safety? Can you raise your hands? I'm seeing about half of you, thank you. So an interdisciplinary field concerned with preventing accidents, misuse or other harmful consequences that could result from artificial intelligence AI systems. This is not just technical talk or social thing, it has to both require the expertise to understand what machine learning systems are doing, how they work, what are their problems. So that's what I'll be going through in the first 10 minutes, 5 minutes of the talk and then figure out how society can adapt and how the economy can adapt to this new technology. Okay, part one, I'll just go briefly over deep learning, what's different, what's different with open source with relating to deep learning models as opposed to classic open source software and then part two, quick introduction, my personal thoughts on AI governance which do not represent the rest of the panel and then the panel discussion. Why is there something different with deep learning models? This is not the same as usual software where you can see the code and you can reason about what the model is doing. In deep learning models, there is huge pile of weights randomly initialized, updated into its succincts and there is a field called interpretability which tries to figure out how do these models achieve what they do and this is not always something that we actually can do for the largest models. So if I'm talking about GPT-4, chat GPT that you've interacted with, it's not sure how it's able to get the information that it has and we have some amount of uncertainty about what tasks it can do. When it was trained, we did not predict what strategies GPT-4 could use to reason and we continuously discovered techniques like chain of thought prompting and iteratively like tree of thoughts and this brings the particular difficulty where you can't scope exactly what actions, what kind of text a GPT-4 like model can do. There are a bunch of vulnerabilities and I'm going to focus on text and image generative AI. Accidents is when the developer did not intend the particular use and yet something happened. So specification gaming specifically that you optimize the AI to succeed at a certain objective like in video games and then it uses an exploit to gain maximum score instead of actually doing what you wanted to do and there are cases so for generative AI where we wanted to train an LLM to do text that was agreeable or that users found quite pleasant and once this was trained with the reinforcement learning system, it started talking about birthday cakes all the time because the particular machine learning system that was rating it kept rewarding this. So even as developers, you might train an ML system and then it will have unwanted behavior. You could also give it perfectly correct data sets and yet it could misgeneralize if it's learning labels that are easier to learn than more complicated data in your images. And then hallucinations, we use LLMs to give us some information and yet some of the time this information is incorrect. I'll not go too long on the adversarial part because this is more, this is less about how we train systems but how you make sure that while they're in deployment, you are still safe to use them. But there's specifically like prompt injections where particularly crafted input can cause your LLM systems to change behavior. Sometimes leak information about what prompts you are using before. And I mentioned Trajan's because this is a unsolved problem if in your data set that you downloaded online, if someone created a small percentage of this data to have a particular relationship, they can make that upon a particular trigger, the machine learning system will behave differently. So particular token inputs which suddenly cause the LLM to be much more willing to reveal prompt information or to follow different instructions than what you thought you find too needed to do. And this discussion like Fosdum is specifically about open source and open weight models are not the same thing as open source software. It's not because that you have the weight that you can actually know what the model is doing. Backdolls currently cannot be identified systematically. And you can't manually update the models. I mean, we don't have the, I guess humans, we can't update the model weights directly to rewire its behavior, though we can fine tune it which is like do new training runs. So to have actually open source deep learning systems, you need to have control of the full stack, both the data set and the training code, and you actually also need computing power to run this. If you don't have these things, you are not actually in control of the model that you have. Even if you do train this in our system yourself, you have the problems that I evoked before earlier. And so some really high level approaches. If you're going to use these models, don't use them in critical operations. And otherwise, well, be clear on what your model can and can't do by examining the model card or creating one when you do yourself. Misuse risk, you cannot, you can't choose how users will retrain your model after it's deployed. So Lama had particular safeguards so that it doesn't produce illegal content or illegal instructions. And yet fine tuning can remove those safeguards. So at the fundamental level, the only way you can get a model to never be dangerous is for it to not be capable to be dangerous. And if you're going to release your model in the wild, please evaluate your model's capabilities before release. This corresponds notably to can your model help people in illegal acts and others? Okay. Part two, AI governance and open source. Let's just get some context for where we're going. Like the current capabilities appeared in the last few years and we've had an exponential increase in the amount of training compute put behind these deep learning models, which is some of the reason why we don't, we're not able to understand them in that much detail. And not just training compute, the algorithmic efficiency of these models has been rising. So for a given year, for a given data set, we can now train models that can more efficiently recognize the image. It takes less amount of compute to train the models for, to still succeed at getting 63% performance and image in it. And so this leads to a question of like, are we going to predict, like this is leading to more and more powerful AI. And I just want to open this question, like leading ML scientists say that mitigating the risk of extinction from AI should be a global priority alongside other societal scale risks such as pandemics and nuclear war. So what are they talking about? I just don't have the time in two minutes to go over the whole field of AI existential risk and like forecasting governance, AGI governance. So I've listed these three papers for people who will be more interested to go into those details, but we can talk about some of these in the Q and A if you're interested. There's also pushback, like some ML leaders have signed this letter. Some others have said that it's ridiculous. Yanlokin's tweets, I think, is particularly relevant because it pushes back on both sides systematically. LLMs are not superhuman in all ways right now, but they're not useless parrots. The hallucinations are not the end of all. And scaling is not sufficient to do AGI, yet we will still get more success with deep learning. And he pushes back notably that AI doesn't exist and never will. He does think that we will create AGI or artificial general intelligence, but he has sort of different estimates as to when that will happen. And I guess I also listed Grady Butch's tweet. There are clear and present real harms that we must address now to worry about some future fantasy existential risk is a dangerous opportunity cost. So because there's so much polarization, there are so widely differing views on AI, AGI and existential risk. This is one of the reasons we're organizing this panel discussion and the participation of the audience. Before we get into the panel, I just mentioned two key concepts that I found quite important to frame this is that we have a choice between which technologies to advance. There are technologies which allow us to better understand the deep learning models we have to better control the risk that they have. And we can think about even if this is a complex environment where we don't know exactly what results of what policies are good or not, there are some keys to success. And one of the reasons I'm doing this here again is that a wider understanding of AI safety among AGI developers, among users and policymakers, seems to be one of this overwhelmingly like net good, net positive action. All right. So I'll hand over the mic to Stefan Adele-Pret. Yeah. Yeah. Well, thank you, Jonathan. And first of all, big applause for Jonathan for having a short introduction. And also being so able to introduce yourself. So I will do for you because he's a great researcher, actually a founder of the European Network for AI Safety, where different research and collaborators can go together. And also a teacher in the machine learning for a good bootcamp that has been funded by Aarabens Plus. And I can say for the first time, it's a very great panel, very great course. And with us, we also have a Felicity Reddell, and she's been involved also in connection with the Dutch Ministry of Internal Interest and extensive knowledge in AI policy, especially for the International Center for Future Generation. And with us, we also have Alexandra Xalidis that is involved in the Future All Life Institute that I personally have a big pen, have been a researcher in Harvard also for the ethical part of AI and extensive knowledge of AI safety as well in terms of policy. And in this panel, we have one to tackle, especially this knowledge. And the first question to have for all of you is what are in your opinion the risks to have the development of this advanced AI unrestricted through policy? So what are the risks to don't have any policy in place right now to really reinforce what could be a more stable and a safety environment? I'll hand it to you first. Thank you. Yeah, so I would like to start with saying that obviously open sourcing has a lot of consequences that can be very positive, such as accelerating innovation or decentralizing access. Just at the same time, there can be notable risks from open sourcing AI, especially as we talk about increasingly capable models. And so now I'm going to go briefly into those a little bit. I think the risk there stems mainly from two kinds of things. One is that these capable systems can have quite the misuse potential. It can make it a lot easier for bad actors, for malicious actors to access model weights to modify them and then basically to exploit the work of open source developers that didn't intend any harm. And they can, I mean, you've heard about this, they can poison the information ecosystem, but also lead to a lot of personal harm, like in the form of scams, for instance, or adult content without consent and online bullying. But also it can make it easier to create harmful substances or obviously like the whole story of cyber threats that are being made easier. So that was the first point, like there's the potential for misuse. And the second point, Jonathan also mentioned already a little bit, it just makes it a lot harder once the weights are open sourced to control the whole situation. You cannot really monitor how these models are used. You cannot roll them back. You cannot shut them down. Yeah, so I think Felicity Rout lined almost everything. And obviously we can divide the harms in terms of misuse, but also models going beyond our control. Those are more speculative in nature, but there are still harms that I think we consider at this stage. What I would like to focus on right now though, as someone with a policy law background, is that with open source, it's very difficult to attribute responsibility. So if something does go wrong in the sense of any of these risks with misuse materialized, and they're much more likely to materialize the minute that something is online and available with model weights, with data, with the code, as a policymaker or as let's say the lawyer representing whatever party has been harmed in that situation, it would be very difficult to identify who has caused that harm and who's responsible. And for me, that's the scariest part in the sense that in a traditional release, you would have a company, an entity, which is responsible, which you can therefore keep accountable for downstream harms. I think I really hope we can discuss this at some point in the panel, but I also think that there's a lot of misconceptions with regards to what risks developers take on when they use models released by big tech companies. Not many of us have read the terms of conditions. I have, and there's nothing in there, or there's very little in there that will protect downstream developers for future harms, because when a company like Meta open sources their models, the first thing that they will do is close themselves off from responsibility. Now that can be a very unfair balance because a lot of the time they might keep information which is crucial for making those models safe. So you end up with a dynamic where the company holds the levers necessary to make a model safe, but then attributes all the responsibility to the developer that ends up putting that downstream application on the market, or just releasing some other form. Okay, thank you. And on the positive side, if we want to look what will look like to have governments of AI that is actually working good, are any of your vision can be on your timeline, what you expect to become? And I guess, yes, what are good governments of open source AI will look like? Yeah, we can start. Yeah, so I think on a high level, the good news is that we don't probably need a lot of special governance rules, because it's the same kind of system, so we can probably have somewhat of the same requirements. It just makes it a lot easier to comply if you don't have the ways available for the reasons that were mentioned before. You can avoid it, that guard rates are being stripped, and you can monitor much more. You can realize when something goes wrong, and you can adjust the system. And I would also like to stress on a meta level that I think it makes sense to focus a lot of the requirements on the models that are the most advanced. It's kind of in line or similar to what's happening at the AI Act, for example, with the risk-based approach, you want to focus your efforts on where most of the risk stems from, and for general purpose AI systems, the tiered approach is the analogy of that. But yeah, to come to some concrete examples, I think one good approach could be to have offered access, restricted access for research and testing. So that could be vetted researchers from academia, but also from civil society. And I think also the open source community, there could be ways that independent people from the open source community could qualify and contribute there as well. And like that, you could get a lot of the benefit of open sourcing, namely that a lot of more eyes can look at the system and see where something might be wrong without a lot of the risk of just anybody having access, and as well as bad actors making it really easy for them to misuse the system. I'm not sure how much time there is. Just a quick question. Who's read the final text of the AI Act? Okay, cool. So we have, and what they have essentially compromised on is any model with a 10 to the 25 flaw ratio would end up falling in the highly capable category or a model with systemic risk, and would therefore be subject to extra precautionary measures. I think that's a relatively decent threshold to have arrived at. It's not perfect, but I think we can agree that not all models are as dangerous as some of the most highly capable ones with systemic risk. So we're not really concerned about open sourcing models which have next to no potential to cause harm. And I think that's a misconception about the AI safety community versus the open source community. What we're really concerned with are the most capable models, like the models that are leading in the field, that are often produced by the largest companies. So again, for me, it comes back to any type of governance that we arrive at has to have some sort of mechanism for tracking who is accessing these models. Because not having that tracking mechanism is again in the interest of these companies. The less information they have about the downstream developer, the less responsible they are themselves. Because the minute that they know something is going on or that there's some sort of suspicion that harm could be caused in the future, they become liable. So I think a know your customer provision would be a minimum for any type of governance scheme. Thank you. I think I'd like to highlight again this specialization based on capability. The governance of open source models will depend a lot on what kind of model you're doing. A lot of models really do benefit from having open weights, having being able to clean the data sets. And I think this is specifically true for a lot of narrow AI. If you're doing classification models, then using open source classification models that were trained on data sets that a lot of people have looked at, pruned and sort of understand, even have done interpretability on to be able to know what's going on. I think this does significantly reduce risk compared to if the leading classification models are closed source and sort of don't know what's going on. So I would encourage like empowering open source to do narrow AI quite well. On the other hand, when we talk about the, yeah, there's this idea about front chair models who can do more, a wider array of tasks like coding and where people are trying to make them more and more autonomous by putting them in frameworks like auto GPT, which was is one of these open source frameworks. And this is why I think the open source community can sort of at least have some self governance. Like I believe that by, as long as the developers of these frameworks care enough about their safety, they will do much more secure products and not generally release products that by default harm the users who sort of just plug in the model, they let it autonomously write their code and actually it's sort of like use their credit cards to buy 10,000 euros of stuff. Yeah. Yeah, thank you for your point, you. And before I open up to your question, I have a last questions. Over last year, I was able to have the occasion to talk about these topics in a Python community, Linux community, also understanding how Mozilla is getting on board. So the open source community seems very interesting in these topics. So in your opinion, what the open source community can do and how it could be best interacting in a way that could be safe for all of us. And personally, from my experience, that lead for very interesting conversation about the impact, how we can develop in a safe way. So I'm very interested in your inputs as well. What do you want to kickstart the conversation? I see mainly three things where the open source community can contribute to AI safety. One would be participate in existing approaches or like stress testing, red teaming, finding vulnerabilities, all those kind of things. The second one would be to try to work out more existing approaches and or develop new approaches that handle the safety issues. So for instance, finding ways towards robust water marking. And the third one, which might be the most important one, is to raise awareness and adjust the own behavior and norms. So I mean, awareness of the risks that we've talked about, and again, especially for increasingly advanced systems, but also to the distinction that Jonathan mentioned before, that we talk about open source, but it's quite a different story if you talk about traditional software versus the most capable advanced AI systems. So for instance, for Linux, the open source community can find a bug and send them in and can be fixed. But if your AI system tries to talk you into suicide, what do you do? Please retrain the model. You can't do so much. And other things could also be about the openness. For instance, we talk about openness, open source as a very binary thing, kind of like bullying. You just say either it's fully open or it's fully closed, but it's more of a multi-dimensional vector of different things that you can make available or you can choose making less available. Yeah, let's go. Yeah, I don't have much to add beyond that, except that working in civil society, we're always open to hearing from the open source community. We are not the companies. We are representing people who are concerned about AI safety in general for everyone, for the world. So, yeah, so we're always open to collaboration in that sense. And there is a future where you can have, you can reap those benefits of open source that developers benefit from at the same time as ensuring there's some reasonable guardrails to prevent what we all don't want, which is, you know, absolute catastrophe. Thank you. I'd like to highlight the work particularly done by Illyuther AI. Illyuther AI, who originally were some of those who trained again, like some of the first open-weights, large language models, replications of GPT-2, and who have pivoted in the last two years to also contribute to interpretably research and fundamental scientific research to understand deep learning models. And so the open source community has this advantage of having highly motivated people who are interested in the technical aspects and who will, like, go quite into detail even just for the passion of it. And we're seeing very good papers being published by teams like Illyuther AI's. And that's not the only org that works with open source large language models. So in terms of the scientific understanding, a lot of, like, the current level of large language models are already a fascinating artifact to study. And I'd encourage open source community to keep contributing to the advancements of interpretability and control, controlling the whole class of engineering systems, or how do we monitor what the machine, like, what is this model actually doing? Can we understand? Can we make it go to particular branches rather than not? Even, like, a lot of the prompting techniques which are known today have been discovered by people tinkering with their systems. So furthering our understanding keeps being a good thing. And I'd encourage the open source community to keep on, like, doing research and interacting in this way with these models. Also, another recent development regarding the AI Act is that now they're looking at setting up the AI office. So if you're someone who enjoys tinkering with these types of models and is interested in safety in general, there's a lot you can contribute to the field through an institutional group like the AI office, which is looking for highly capable people that have a technical background. Thank you. I hope you took notes. I took mentally notes, but I'll see you again in the recording of this. Do you have any questions for the public right now? Otherwise, I think, okay, I'll come there. Okay. I can't hear. Yes. Hi. Yeah. So we've spoken a lot about the harms that could come from bad actors. I wanted to ask about potentially harms that happen with big tech or large powerful organizations having access to behind closed doors, having access to this technology, and whether you believe that there's a need for access to their source code or from some kind of regulator or something like that? Coming. I think I want to... Yeah, I think that's right. There's definitely risks from that centralization of power. And I think it's very important how exactly we tackle that. So for instance, if you just require them to make everything open source, I don't think... I mean, then you don't have... You have that risk a bit reduced that they do things behind closed doors that are not visible. But I think if you do something like vetted researcher access, then you can approach it in a much safer way and kind of get the better deal in terms of balancing the risks and the benefits. This is a very crude analogy, but it's sort of the same thing with, let's say, if you had a bio weapon, right? Let's say we were developing this poisonous gas that has a capability to poison a lot of people all at once. It's pretty bad if there's a couple companies developing it and we have no idea what's happening behind closed doors, because again, they're companies, they're not state actors, there's no checks and balances in that sense. And until very recently, there was no regulation to have any access to what they're doing. On the other hand, we also don't want everyone to have their personal poisonous gas in the sense that suddenly the risk has been magnified by the fact that everyone has it at the same time. It's not like a nuclear nonproliferation system where multiple actors having the weapon would reduce the risk overall. So I think in general, as Felicity said, transparency is important, but also checks and balances are important. And there are mechanisms for democratizing the technology in a way that allows us to keep an eye on it and it doesn't just proliferate. Can I add one quick thing to that? So this narrative of democratizing AI, right? I think it's one that Big Tech tries to push and talks a lot about that. And I think it's kind of interesting, because democratizing means kind of like moving towards democracy, right? And that means kind of making decisions together of like, how do you want to govern a specific technology, for instance? But within AI, how it's usually used, it means availability, like access to the weights. But that is not really making a shared decision of how we want to use it. It's more like, like Alex said, like giving it to everybody. It's more like, yeah, not governing it at all. Anybody can do whatever they want to do with it. Maybe anarchization would be a better change. I'm not sure, but it's just like interesting how this is called, but it's not actually that, but it gives it a very positive spin to it that can be quite misleading, I think. So there's a question from the online audience, which I'm going to read out loud. So Frank, is there a reason for confidence that a smaller than 10 to the power 25 flop AI model cannot be trained or tuned for just as much harm as larger general ones? So, yeah. No, there is no confidence, but we fought tooth and nail to get it at that level and not a higher one. So this is basically the lowest threshold that was able to be achieved politically. So no, it wasn't a threshold that was determined through scientific, you know, studies. But and we have plenty of evidence that a model at 10 to the 24 would be harmful and potentially dangerous as well. But but yeah, the answer is just this is the best we could get politically. And thankfully within the AI act, there are other mechanisms for determining, for classifying a model as having systemic risk besides this threshold. So this is sort of a presumption. If you fall above it, then you're presumed to have systemic risk. But then the AI office, for example, will have the discretion to also monitor and determine whether a model has systemic risk through a much more detailed criteria that gives them that flexibility. I'd also like to highlight how this value will change in the future, because I gave I showed you the algorithmic efficiency graph where you need less flops to achieve a particular level of capability as our science of like how to do the training runs evolves. And so this is a maybe a stopgap measure where we think we're pretty sure we don't understand how these models work above 10 power 25. And we want to do more science. But also we want people to spend the time to actually analyze. Like if you're going to do something that's so hard to analyze, please put more effort, comparatively more effort to analyze it, because it does have in general more unknown capabilities. And so having there are different terms for having evaluations that depend on how competent the model is. And in the future, we can imagine more targeted laws that allow you to see, well, what kind of data are you training on and what kind of capabilities you have, and actually having a risk factor which depends more on the training data. But it isn't like these things have moved so fast that governance, governments, sorry, were not able or generally are not able to follow at that kind of pace of progress. And so I do understand it more of a stopgap than as the best way to govern these models in the future. And so as institutions are being constructed in different governance, there are institutions called the AI Safety Institute in the UK, in the US, and you mentioned one for the EU, the AI office. Now, I think joining, like people working at these organizations will be able in the future to more closely monitor what it is that makes a model dangerous or doesn't, capabilities like autonomous replication, and then we'll have maybe more sensible regulation in that approach. I'll quickly answer a second question from the text, and then we'll send it back to the room again. It's just a detail where someone asks, where does the European Network for AI Safety get its funding? And the answer is that we have received a grant from an organism called Lightspeed Grounds. Yeah, so does someone in the audience again have a question or comment about these? Is someone up high in the room? Sorry, can you raise your hand again? But we'll be happy to stay also afterwards outside to catch more questions. A lot of the safety problems are related to the data and where the data comes from, and obviously Big Tech will never expose their weight because that's their IP. I'm working on a data provenance solution, trying to build one, and I'm getting actually really insightful thoughts from this discussion. I believe that a couple of things in tech are out there like ZKML that can contribute to this whole thing, where Big Tech should not disclose their weights, but through ZKML, they can expose that the weights they used were safe, and the proof on chain would improve that that model is actually safe. Is that a thing you believe in or is that rubbish? Thank you. Sadly, the acoustics are not so good and so I didn't totally get it. Can you shout again just the core of your question? Yeah, let's talk afterwards. Like you mentioned the importance of the data and the capabilities in data. I think that's quite true, but like for specific technical discussion, let's talk after. So thank you again for the Jonathan Felicity and Alessandra, and we're here to talk and continue the conversation. Thank you. Awesome, y'all. We're going to be speaking or we're going to start our next talk here in about five minutes. It'll be right here in a second.