So thank you for your coming. It's quite early, 9.30. It's difficult to start, so I will try to push energy to this session. So just before to get started, I would like to know more about you. So with three very simple questions. First, who has ever locally run LLM on his laptop using Lama CCP, VLLM or LLM Studio? Please raise your hand. Okay, right. Second question, who has ever fine-tuned a model? Fine-tuned. Okay, let's find 10. Okay. And the last one, who has, like me, dreaming to have one-dred open-source model in here? Not only one, not only open-weight model, but really open-source models. Raise your hand. Okay, you are in the right place. So we will do the job. Okay. Yes, my name is Michel Marie Modés. So I have a co-founder company, a software company called Inagora. So we started in 2001. So as the first time, we will be very close to our 25 years. So it will be for next year. And our mission with Inagora is to invent and develop good tech for good. So what I can sum up as ethical open-source. And for AI, we do the same. We do ethical AI. And to come up to achieve this goal, so we started a community, a very large and a brand community called Open LLM France. So we started in June 2020. And we have two main goals. First, as well to build trusted sovereign and real open-source generative AI technologies. And the second goal is to build a strong ecosystem around LLMs and generative AI systems. So for the second objective, so I can say that we have success because the community right now is more over 450 active members with a strong support from the academic and public research in France. So it's very important because, for example, with the GenC, we can use freely supercomputers like GenZ. And it's very useful for us to give freely GPUs to train our models. So it's very important. And at the same time, so we have a lot of corporates, corporate private company, who have are using AI technology or many to build with us AI solutions. So and all this track for today. So I think there is a lot of important things to build ethical AI system. But my talk, I will talk with you about three topics as well, what we could consider as open-source AI. So this is my first part of my talk. The second part of my talk will be related to diversity and the underrepresentation of our culture, our language in these models today. And the third part of my talk this morning will be related to data quality and the evaluation of this model. Okay. Under. Okay. Right. So to be very clear, and to start on the biggest problem, so the most popular open model that you are using today are not open source. They are open wave model. So this afternoon, Stefano Maffuli from the OZ, open source initiative, we have a talk to report to their progress to this definition of what we could consider as open source AI. So I'm very proud because I'm part of this small group and private group inside the experts from external from from the OZ to try to define and to get this definition because it's important to clarify the situation because as you know, and I'm not alone, but Stefano and probably some of you's have used as a published post to raise the problem to the misuse of the open source term today by some players on the far work ecosystem. And so and I put in this slide, you know, the OZ definition of open source. So to be very clear, if you have limitation on the use, the license and the term of use of a lease license, or if you don't have the artifact, the element to train again the model or to make a derefited work on it, you can't say that you are doing open source. This is very clear. And today, the main part of the popular model we have, you don't have a view and access to the data set used to train the model. For us, for this community, what we open source AI means three things. First, as well, that we are able to have the open source of the model. All the tooling system used, for example, to train the models to evaluate the model of the pipeline to do the evaluation of the model. And so for different things, it's not very easy to find this information on an open model today. The second point is related to a license. So if you have for us our license, we don't have this license, we have to have, we thought in the limitation of who and what we are doing with this model. And the most important is the third point is related to that asset, open corpus, open corpora. But you know, it's very interesting because probably if you follow the news related to AI, you saw during these past days some new models with data sets published under open source license. So, and I think it's very important and I think that for 2020, not only the year of open source AI, but also for data set publication, open source license. So I changed my presentation last night, just after the talk of Joss, the co-founder of Next Cloud, because he present an ethical rating system. And I'm very glad to see that we share the same point of view. And it's very simple for also for the Next Cloud community. If all these conditions are met, the three conditions, so you are in the green area. If you have only one, two conditions, so you are in the yellow, only one orange. And if you are using, for example, open AI, in fact, ChabGPT from open AI, zero condition are met. So you are in the red area. So if we have today this morning developers from this beautiful Next Cloud community, thanks for your job. It's amazing and we love it. And so for us, by the way, we are in the first green area and we try to do the job. The second part, the second topic I would like to underline this morning, it's the problem that AI generative models are more and more representation of a picture of what we are in terms of culture, in terms of society, in terms of language. So I think that's figures talked by the by themselves. So in the left, you can see that since 2018, less than 8% of LLM has been created in Europe. And on the right, what you can see that it's the volume of language used to train LAMATU model. So 0.16 for French and 0.17 for German. So percent. So I don't know what do you think about that. So but in my point of view, we can say that we are not really well represented as our culture values in this model today. So we have a problem, I think. And we have a community we try to solve. So first, first try we did, it's to adopt a data first, drive an approach or quite a quality first, drive an approach. And because the small also is beautiful. And we try to get the proof that quality of the data set is more important than the quantity of data you have. And to demonstrate this this point, we release a first model in October called Claire. So Claire like the woman, the show name in France. So I'm not against I have nothing against a podcast, Albert, Alfred, Mr. But you know, we prefer in our community to promote women because by fact, it's our little contribution to have more women in our AI ecosystem and a global unity. So I will, I will not go deeply in Claire because Julie, the real one. Yes. Julie will go deep and tell you all about Claire what we did. Oh, we did this model. But just for very, very, very, we just gave the proof that it's we are able with a lot of amount of French tokens to give a very, very conversational model. Conversational means that Claire is able to understand dialogue between people with their realization. And the second part of Claire, the second features, it's that Claire is able to talk like, like you, to make a dialogue, human like dialogue with defluence, hesitation, because we train Claire with conversational data. So we continue to collect a lot of data. And today, so we are around 140 billion of token in French. So and we I'm very glad and happy to announce that we started to the training phase to train our new model called Lucy. So Lucy, the main goal of Lucy is to fix or to yes, to improve the under representation of the French language in generally in LLMs. But at the same time, we put in our data set some over European language, the German, Spanish, Italian, some code to some some some source code to make our model to have a capacity of reasoning. And we try to build some new features to make this model efficient, not only in French, but for over language. So probably you will be interesting to follow this work and probably our custom tokenizer and so on. But the most important things I would like to share with you this morning is that we are not the only one community involved in this goal to build this sovereign LLM in Europe. So I'm sure that this list is not exhaustive. If anyone knows new or other initiative, please call me just after the presentation. I will be very excited to discuss with you. But the most important is that we are strongly believe that we have all the capacity, all the technology, all the GPUs in Europe to build our models. And it's why I'm very delighted to announce you that today, during the first day, we changed OpenLLM France to become OpenLLM Europe. So you can use this QR code to inboard yourself in this in our Discord server. So we all the content we produce during the six months in French is still available, available. But we have created the channel for each European language. So please welcome. And if someone want to be part of the community management team, please contact us and we will be very pleased to inboard you in our initiative. So that's my tool for today.