Thanks for coming to this talk.
I know it's the last one of the day in this room,
so I'm sure we're all tired.
I'll get through as quickly as possible.
But yeah, essentially my name is Gabriel.
I am a tech lead on the innovation studios working on
a project called Formuleic,
but more on that later within Mozilla.
This presentation is all about,
yeah, as we just mentioned,
why intelligence is tricky and how to integrate
LLMs in standard code.
So I guess this probably makes sense to start with the definition.
So LLM, probably a term you've heard a lot throughout
this conference, is essentially a large language model,
and that is a program that can
naturally understand human text and do something with it
that a traditional program cannot.
So just to reiterate that,
it's essentially a program that doesn't need a specific syntax,
it just works with natural language,
which is actually pretty interesting when you think about it.
Actually, I had a question for the audience,
who here has played around with chat GPT or other chat bots or LLMs?
Literally everyone has their hand up.
I was the opposite, who didn't?
Yeah, but who found that actually useful?
And just put everyone too, awesome.
Okay, just about, not everybody literally.
Yeah, so just before we get started too deeply,
I just wanted to make mention that there's
a lot of terminology in the space.
Everything has a name, a definition, and lots of weird words.
So apologize in advance if I don't explain every one of those words,
just there's so much going on in such a short talk.
So yeah, so as we all just learned,
everybody thinks it's kind of useful.
I also think LLMs actually have utility and are quite useful.
These are just two random examples I pulled up,
but essentially LLMs can help categorize content,
answer questions, provide summaries,
help create content, and structure unstructured data,
and plenty of other things.
They're essentially the proverbial hammer in the toolbox.
They can kind of do anything you want them to do.
Doesn't mean you should be using them necessarily,
but you can do it.
So I think it'd be appropriate just to have a quick example
of a traditional app that is not intelligent,
and then what it looks like when you add intelligence,
and just how easy it can be once you get things set up.
So in this case, we have a node application,
just takes unstructured text from a user,
and stores it in a database unless you use a retreat.
Nothing special.
But then when we add an LLM, for example,
we can then take this unstructured information,
craft a little prompt for it, a prompt
is just an instruction for the LLM,
and then get the LLMs to return something
quite unique or useful to us.
In this case, the LLM output is a category
of what the node could actually be called,
and that helps with organizational structure
things in the database.
And this is amazing, but let's be real.
There are some issues over this technology.
It is not perfect, unfortunately.
It's also very young, to be fair.
So yeah, so what are some of the issues
that you can see when you interact with this technology?
It includes hallucinations, a.k.a.
it just makes things up.
It likes to lie sometimes.
It's not great.
But also inconsistent format.
So when you're in code, you can imagine
you want to talk to your services, your APIs,
in a structured format, and LLMs like to not necessarily
listen to that format, and they'll reply with,
instead of JSON, it could be marked down,
or broken attributes, or things that just don't make sense.
So that takes a lot of hand-holding and validation.
There's also the performance and cost aspect
of running these services are quite computationally
expensive, and we all know GTs are expensive and scarce.
Fundamentally, there is also text account limitations
that you can only interact with these LLMs
with a certain amount of text before sort of forgetting
what you actually asked it to do, which kind of sucks.
There's also a education and documentation kind of lack
thereof, especially for open models.
So that's something that takes a lot of learning
and trial and error to do.
Lastly, there's friction points that
include bugs and security issues.
So this is, again, a new technology.
Of course, there's going to be bugs,
and of course, there's going to be security implications
that need to be thought of.
And what's particularly crazy is that there is actually
over 50,000 text models on Hugging Face at the moment.
And there's so much choice in so many models out there.
It's actually quite hard to understand
how to gauge which ones are good.
And on top of that, there is also a ton of licensing
across these models, and that also complicates
how you can select and choose which models to actually use.
But there's actually more models, of course.
There are the proprietary and closed models
that are ever so popular as these little diagrams pictures
show.
These models are popular for a reason, though,
because it is exceptionally straightforward and easy
once you add your credit card to get these systems working
without having to think too hard about these models.
And that has some consequences.
The main consequence, though, is it kind of creates
a technical vendor lock-in.
These models all interact with these prompts,
as we just saw.
But those prompts essentially have
to be curated to the model to get the actual value.
So you can imagine you write a bunch of prompts
for one proprietary model, and now you
expect to run those exact same prompts in an open one,
and you don't get the same results.
So this is like a key friction point for open models,
because there's not so many examples.
There's not so much documentation around these what
prompts work and why.
And then when you do run a prompt that you already had,
it doesn't work.
And then you just stop using that model, for example.
So this is just a quick little demo.
Two of these models, one of them is open,
the other one not so much.
They replied with a relatively good answer.
Whereas the other model in the middle just decided
it didn't want to do it.
So the reality is, if you actually tweak the prompts
and you add a little clarity and you write the prompt
to the model, regardless of the model,
you can see that the responses were actually
really consistent and really good.
So it's quite interesting to see how,
with a little bit of effort, a little bit of elbow grease,
if you will, you can get something that's considered,
maybe not the prime model, to still output something
that's still useful.
And that's where my team and project come in.
So Formuleic, today's our public announcement,
if you will, so it's kind of exciting,
is going to help or try to help anyway.
Create a platform for open prompt scripts
that anyone can interact with.
They're open by default.
And we'll help enable the creation, sharing,
and testing of these different prompts
against different models.
And of course, we're still in super active development,
and we would love to get your opinion as we're
building out these repositories here.
So please don't hesitate after this talk to say hello.
Yeah, so that's my super quick talk.
Please find out more about us online.
And yeah, that's it.
I still have like four key steps to do.
So where am I headed to with the questions?
Don't hesitate.
Yes.
Do you have already some plans on how to integrate LLMs
and tides?
Yeah, I think we have a lot of questions.
I think we have a lot of questions.
Do you have already some plans on how to integrate LLMs
and tides, Mozilla tools like Firefox?
So it's a question like, are we thinking about adding LLMs
to Firefox?
Honestly, I'm not too sure what the long term plans are.
I know there are people obviously playing around
with the technology, but I don't think there's anything
officially on the books.
It doesn't mean you can't add it to your own version of Firefox.
So I'm just saying.
Anyone else?
Don't be shy.
We are good then.
You, we are closing.