We are welcoming David who will talk to us about training OpenSeal Accelerators LLNs
on your phones.
Hello everyone, thank you for coming for the talk.
I will start who I am.
I'm David Heidelberg.
I work on Mesa3D as a developer and currently I'm contracted by Colobora to work on CI.
In free time I work on mobile embedded Linux and Alpine and Debian and PostmarketOS contributor.
Let's go for the topic.
So content, just quick introduction, what these things are about, how it looked before
OpenSeal and these standards we have these days.
What we have now, how it works, I will show you an example with running PostmarketOS with
OpenSeal and what we can expect in the future.
So first, who ever heard about OpenSeal here?
Like most of you.
So OpenSeal allows you to run your C or C++ code on CPU or GPU or specific DSP or FPGA
or whatever you can find.
This is one thing.
Second thing, Mesa3D is part of GPU drivers.
You have all the Linux systems or Linux computers and it also works on Android and Windows.
I think Haiku or something exotic.
And last thing which I use for my talk is TinyGrad framework which allow you to run
stuff like GPT or stable diffusion or other interesting projects which you can run usually
on GPU.
So how it looked before the compute we know right now like QDA, ROKM, OpenCL.
So you could do the computation or CPU but CPU has high overhead because it's meant to
run classical computer programs and not highly parallelized software.
Before you can use OpenGL and with OpenGL you could squash computations into OpenGL workloads
but it was a big hack and work around and some scientists did it but not widely used
between people.
So currently we have options.
We have already CPUs with multiple cores and multiple threads but the overhead is still
here.
We have GPUs which are much faster and much more easily parallelized but they still have
some overhead and then we have smaller units like NPUs which you can find in new phones
or new devices, new dev boards and these are optimized to run machine learning or AI workloads.
But usually in the hardware you get like Linux phones you still don't have these accelerators
in the place.
For new phones and new devices you have them already.
So probably all of who of you used OpenCL or any acceleration to run something?
Okay, much less than the knowledge of OpenCL.
So you can do all the stuff with this image processing and these days language models
are most popular I would say.
So this is motivation and what we have like in open source world.
In general you can use multiple technologies but I will talk mainly about OpenCL because
OpenCL gives you one thing which nothing else can.
You can have CUDA or ROKM but these are usually vendor specific.
So if you are going to write let's say software for your phone and then like you want something
which will run everywhere and you cannot achieve that with CUDA at all because it's proprietary
and closed.
You can eventually do that with ROKM but it's only AMD for AMD cards.
And so only alternative which remains is OpenCL.
And in 2012 we had Clover implementation in Mesa and I was very excited about that and
I was like wow now everything will be faster maybe it's associated with me running Gen2
back then and like wanting everything be compiled and much faster than on the BN or anything
else.
But for me it didn't work.
I lost interest pretty quickly because to be honest with Clover nothing was working for
me.
So I gave up but in 2022 there came a rustical implementation to Mesa 3D and it supported
latest OpenCL standard.
It gained support for Intel and multiple drivers.
You can see these supported drivers are Intel, AMD, Mali which is for example present on
Pineform Pro and of course you can run on CPU or you can run over Vulkan with Zinc.
There is work in progress for multiple drivers including Asahi, Qualcomm Adreno which is
for example present in OnePlus 6 which is pretty popular Linux phone and which I also
own and Vivante which is used in for example LibreM 5 and Raspberry Pi everyone knows I
assume.
So here is example how it looks like if you run a LLM model on OnePlus 6.
It runs.
It's not that slow.
It's not that fast but on other hand it runs which is amazing.
This is GPT2 as you can see from the answer it's not very clever but on other hand it
runs and this is general purpose model.
So currently I tried run some GPT3 based models some minimal ones but it's not there yet.
But GPT2 works for me and so it means for every app you developing or you thinking about
you can consider using some smaller models and already run it for example on the phones.
And what is interesting is performance of course because if you run the model without
GPU so you run on CPU you get load on 8 cores 100% to 4 week cores for strong cores get
fully utilized but if you switch to open CL you get hardly 2 cores at 20% of usage and
it's much faster as you can see.
So and we talking about small phone.
Here is the slide if anyone wants to try and you have post market OS on your phone so you
can apply these simple steps to get in the place where it will be running.
I guess you will not try right now but after you see the slides you can click through it
and install the stuff.
And in general where to open CL and compute heading.
You can do a lot it's kind of widely supported already there is a lot of progress done on
Rusty CL on Linux.
So I believe in one year you can assume on every device you will be able to run some
CL.
So you can use it for your workloads for example what I heard just today before the talk that
lib camera which is used for processing input from Linux cameras on phones considering using
open CL workloads for processing.
So it will get popular soon I hope so.
And so what I'm thinking is the most interesting part you can start relying on it because for
today applications like blender or GIMP are able to use open CL but it's not something
you can count on.
But that's hopefully change very soon.
And another thing to talk about is what's going to happen next because open CL 3 is pretty
good specification but if you compare to CUDA from NVIDIA it's really lacking.
Best we have right now but it's not that amazing.
It's amazing enough to be fast to provide a float which is good quality but it's not
that great compete with CUDA.
So I recommend for example David early talk about future of open CL and about some standardization
which would fit all vendors so it wouldn't end up like one vendor trying to push his
technology and this allow other vendors to contribute or use it.
So something like open CL 4 we will see what happen.
So this is eventually future is a little bit unclear but so far the open CL looks pretty
good.
Clover implementation and Mesa which was from 2012 is pretty dead because no one use it,
no one maintains it.
We dropped it from CI approximately half year ago so it's not counted it's just waiting
for deletion.
And last thing I want to point out is like even the low power device as a phone can provide
some nice acceleration with open CL and it's really visible difference.
So few credits Carol Herbst for bringing Rustical alive because that was very nice project
and he was integrating Rust based software into Mesa which is C and C++ based.
So that's pretty challenging task.
Rob Clark for working of free Drano because are there are here any one plus six users?
Okay right so you know the GPU works pretty well on these devices so this is lot of his
work and Dimitri Baryshko.
I hope I read the name right.
Good for preparing to merge request for free Drano on Rustical and it's not merged yet
but it's pretty close needs some polishing and for the tiny grad and GPT2 it works well
enough and of course many others who contributed.
So thank you for your attention and fire the questions.
Thanks for the talk question regarding your comparison of the workload.
Did you check the load on the GPU when you were running this model?
So when you compare the eight cores versus GPU?
I haven't checked the workload on GPU but I assume it was pretty high but yeah it's
what I forgot to mention is it's not yet optimized and no one tried to profile it to give a good
performance and tiny grad project is also meant mostly for the powerful AMD GPUs or
let's say NVIDIA even Intel is not that popular.
So it means these results which you've seen are still highly unoptimized software so it
has probably lot of chances for improvements and better performance.
Thank you.
Hi thank you.
Since newer for example Qualcomm SoC also have an actual NPU in the SoC are you aware
if it's possible to run OpenCL on this and what could be missing and if anybody is working
on this or something?
Yes the new hardware has NPUs and for example one of former colleagues Tomoe Wissoso working
on etnavif acceleration for NPUs so recently in Mesa was integrated part of the Teflon
framework which allows interaction between etnavif which is vivante GPUs like in it's
newer generation than this in LibreM 5 and you can run TensorFlow networks on NPU directly
but so far only one vendor at vivante and one device is supported I think.
How's the RAM usage?
I think the model I used is like 500 megabytes so it's pretty nice like nothing serious the
phone has 8 gigabytes so anyway for the RAM usage it would be more interesting with GPT-3
models but I think on the phones the language models are useful only if they are specialized
and cut down to appropriate power which the phone offers because of course you cannot
run full Lama GPT model which requires 8 gigabytes of RAM or VRAM on phone which has 8 gigabytes
of RAM.
Do we have any more questions?
That's it then.
Another big round of applause for David.
Next talk is in 15 minutes.