The guy in the green.
Look up.
The guy from the whole group besides.
Yeah, exactly. The same guy.
The same guy.
So you have the team fan and part of it.
Nothing else in the N.A.
So make sure that you have six slides.
Oh, no.
That's hard.
That's a good one.
I can't help but be sure that I'm not going to be able to do that.
I'm going to try to do this on the phone.
Okay, that's just not your thing.
I'm just not sure if you need a copy of this.
No, I'm not going to upload it.
It's just how it is.
Hello?
Hello?
Oh, it's you.
I know.
I know.
I know.
Final four?
Final four.
Final four.
Final four.
Final four.
Final four.
Final four.
What's the other side?
What's the other side?
Okay.
As the way he says.
So, hello.
Welcome to Brussels again.
Still.
Myself and Hans here will be doing a talk on nippy cameras.
Here's some nice logos.
Let's go.
So, introductions.
We'll do this.
Go.
Yeah, so I'm Hans de Hude.
I work for Red Hat.
It says it's on without this.
Now it's on.
Okay.
So, hello everyone.
My name is Hans de Hude.
I work for Red Hat as a software engineer in the laptop hardware
enablement team.
I'm also the upstream kernel subsystem maintainers for drivers
platform x86, which has mostly laptop related drivers.
And, well, new Intel laptops are now using nippy cameras
instead of old USB UBC cameras, which is a problem or a challenge.
So, that's why I'm here and what this talk is about.
So, my name is Brian O'Donohue.
I see I don't have my second name here.
And that's spelled with a Y, not an I,
because I had a granny from Scotland and from Ireland,
which I'm going to give you my standard spiel,
is that not in the UK.
It is in the EU.
We do use the URL.
Thank you.
Thank you.
But we're not in Schengen, so you have to show your passport
when you come to my country.
Yeah, I'm a kernel engineer with the Lamarrow.
I work in the Qualcomm landing team.
And, I suppose about a year ago,
I inherited KMSS, which is the Qualcomm camera subsystem driver
for the Lamarrow team.
Here's my cold Lamarrow place.
They dump all of my in-progress kernel work,
my GitHub, and various IRC channels I'm on.
So, we're going to kind of divide this up a little bit.
There's a full topic.
And it turns out that your I is more sensitive to green.
I didn't know that.
And particularly here, at 550 nanometers,
I kind of a yellow bit of a yellow green.
And so, actually, that impacts how we capture light.
It's like an ADC for light.
And the guy called Bruce Beyer,
working at Eastman Kodak in 1974,
came up with an encoding algorithm for how we capture light.
Because basically the sensors are monochrome sensors.
What we do is we put a filter on top of the sensors
to capture a particular bit of the visual spectrum.
And that's called Bayer encoding.
And it gives us a lot of green again,
because your eye likes green.
And here's a paper, actually.
I thought this was pretty cool.
It just rolls it out here.
So it's green, red, green, red, green, red, green, red.
Blue, green, blue, green.
You see we have a different Bayer pattern here.
But they all conform to a similar layout.
And they look like that.
So, you can't really look at this.
You'll see it a little bit later on.
The picture that comes out in a Bayer pattern,
it looks like anosaic from something from Roman times.
Actually, it looks pretty cool.
But it's not like what you take a picture and you do a selfie.
You don't want to look like you're at Pompeii.
So this is a problem.
We need to interpolate that data.
We need to interpret it and reasonably recover
the original data that we took.
So there are various methods to do that label.
It's called deep-biering.
Funny enough, we have the bi-erring called an image
and we're going to deep-bier it to RGB.
So we have label, nearest, by linear,
and malar, hay, kuttler.
I can't even say this.
And as you go up, as you go down the list here,
the computational overhead goes up.
But the quality that you get out of it similarly goes up.
Unsurprisingly enough.
So here's a great paper on this, actually,
by a guy called Morgan McGuire from about 2009, I think.
And this shows us, you can see the mosaic pattern here.
You can see here on the right, what he's calling graem-truth.
So the original picture.
This is an approximation because actually what happened is
he took a picture with a Bayer sensor
that has a particular resolution.
But for the purposes of the talk, we'll call this graem-truth.
So you can see graem-truth, raw image, label, it's kind of crap.
Nearest, so you approximate based on the nearest pixel.
Hands knows more about this now.
I hope you give a better description of the recovery process
than I will by linear.
So in that case, we do line by line.
And then this one here is much more complex.
It does like kind of a bit in the middle
and then other bits around it.
But far more computationally costly.
So every time you take a picture, if you think about it,
your camera is performing at least this operation.
At least one of these debiring operations
and does so in microseconds.
So how does it do that?
It's not magic, believe it or not.
It uses a thing called a hard ASP.
So a hardware image spectrum, image signal processor, pardon me.
Which typically, especially on modern systems,
entails firmware running inside of the camera component of the SOC.
So if you think about it, you take a picture,
you start to bring data in.
It's locally in memory, quite physically close
to where the stream comes in.
And you immediately want to process it there.
Every time you kick up the chain closer and closer to main memory,
you're going to pay computational costs for whatever you do up there.
And so therefore, we have these hardware ASPs
that have a silicon block and a firmware block
that typically interacts with the silicon.
And it's based on the principle of data locality.
And here's some basic, so the 3As,
the three most basic things that you would do
with the image apart from debiring.
Auto-focus, auto-white balance, auto-exposure.
So bringing the exposure up or down, balancing.
But you can see an example here of how the white hasn't been balanced properly.
And on the right, it has been balanced properly.
And the left is under-exposed.
The middle is ground truth.
And then on the right is over-exposed.
I kind of like the right image myself, but I don't know.
According to what we're calling ground truth, that's over-exposed.
But hardware vendors consider all of this stuff secret sauce.
Secret sauce. What's the definition of secret sauce?
It's the goop that McDonald's puts on the patty before you stick it in your mouth.
So the very simple sensors, the ice-ward sea sensors,
is probably worth saying.
The mid-sea sensors have ice-ward sea bolses that allow you to configure them.
They're pretty cheap, I suppose, as sensors.
But the tunings for those sensors, setting up the PLL,
putting it into any configuration is considered secret sauce.
And typically, if you look in the kernel, what you'll find is we have these big tables with magic numbers.
And the magic number represents a...
And the mirror confuses it. It doesn't know what's happening.
So it thinks it's three different people.
I thought that was a funny feature.
So what's the problem that we're solving?
What is the problem that...
This is where the narrow and red hat come to the camera,
and the problem that we're trying to solve.
So like I say, I'm doing the Qualcomm stuff.
You're doing the X86 stuff.
And the commonality there is the sensors.
So the silicon vendors will not release enough information
to switch on their hardware ASP.
And so what we have for MIPI cameras is raw data coming in,
just Bayer encoded data.
And for quite a while on the Qualcomm side, that's just what we've been delivering.
We deliver you Bayer data upstream, we say, good luck.
Have a nice day.
Which is completely useless if I want to have a Zoom call.
So the question then becomes where and how can we fix it?
And the answer really clearly is in the live camera.
So Laurent and Kieran and Jack-O-Paul and these guys here
have a great project really.
If you've ever tried to use the Video for Linux stuff,
you'll find that the camera, even just hooking up your own camera,
can be quite difficult to do.
So live camera really isolates you from having to know anything about that.
You can just run it, you can reuse the library,
and actually I find it very easy to use.
And I love it. Thanks for the t-shirt.
So what we want to do is we want to do this.
This is an example of a high level.
This is quite similar, I suppose, to how the Raspberry Pi
and the other hardware ASPs work,
is that you have an ISP component here,
sorry, I keep sending the way to the camera,
an ISP component and then an IPA component.
So the three As live inside of the IPA,
and the other stuff, the stitching up,
the pipeline happens in the ISP.
And so when we approached the live camera and said,
hey, we'd like to get something better than Bayer data,
Edward, Edward, camera's stacked, why not?
We do this all day long.
We might as well show something for the jobs we're doing.
They said, please, please implement something like this.
So we started.
My colleague, Andre, kind of love,
I'm terrible with the Russian names,
didn't also the code.
I've been sitting in on the meetings
and kind of arriving up and saying,
here's the way I think it should work.
And then I guess about three or four months ago.
Yeah, something like that.
So I need to join Linairo in doing this
because we had a similar issue with the IPA 6,
that Intel is actually working upstream
to get the NIPI data receiving the CSI receiver going.
And then we have all Bayer data,
but Intel is currently doesn't really have a plan
on how to get their secret source algorithms upstream
because upstream doesn't want secret, they want open.
So that's an issue.
On the other hand, we have users who wanted their cameras to work.
So the conclusion was that we needed a software ISP too.
So we joined up with Linairo instead of doing our own stuff.
And, well, that's starting to work pretty okay now,
as you'll see in the demo.
It needs more work on image quality, but it gives a picture.
But that brings us to the next problem,
which is how do applications access cameras in this new world
where we have NIPI cameras, at least new for,
not for smartphones, but for smartphones for an Android,
but new for Linux on the desktop.
So these NIPI cameras have a pipeline,
be it with a software ISP or a hardware ISP,
and this pipeline needs to be set up and configured.
This is pretty complex stuff, which we don't want to do in a kernel.
In a kernel, we just want to say, hey, here's a bunch of hardware blocks.
Good luck. Just figure out how you chain them together
and tell us, as the kernel, a NIP camera takes care of this for applications,
but this means that currently Firefox and Chrome,
assuming most people use their camera for video conferencing,
and that happens in a browser, or Zoom,
they all directly open the dev video note,
and they expect to just be able to say there,
give me a list of resolutions which you support.
Oh, I want that resolution go.
And it's no longer that simple.
Which means that the applications will need to move to a different way
of accessing cameras.
At the same time, there are initiatives to move Linux on the desktop
to more of a fixed operating system with applications in an app store model,
mostly because building your own distribution from packages is...
You get a lot of variations which can lead to instability,
so having a fixed read-only base image and separate applications is desirable.
These applications also run sandboxed,
which I personally think is a really good idea for something like a browser
because that's often a security hole.
So we try to basically solve both problems at once
by saying we have PipeWire.
PipeWire is already used for screen capture into the browser,
so it supports video transport, so let's also use that for cameras.
So this will solve the sandboxing problem
because PipeWire is on the sandboxed binary boundary with a portal to access it.
And it also solves by using a PipeWire...
a NIP camera plugin for PipeWire.
It also solves the whole how do we access the camera issue.
A colleague for mine has been working on this.
Actually, first, I think PenguTronics started on this.
They did webRTC work, which is like the shared webRTC framework
between Chromium and Firefox.
Then a colleague of mine picked up the integration in Firefox.
This actually landed in Firefox 122, which was released like a week ago.
So the Firefox, which we'll be demoing, is actually just from the Fedora repos.
It's not a custom build.
So with how do we access the camera problem solved
and sort of having a proof of concept which we'll demo,
the question becomes then what do we want to do in the future?
Well, we want to do better as an image quality.
We want to do it faster as in use less CPU
and we want to do it cheaper as in use less energy
because doing everything on the CPU is not good for your battery.
So Brian actually has been experimenting with GPU acceleration.
I have.
And that still doesn't work?
It semi-works.
Oh, you have.
I can change the background green on purpose or as.
Or red, but I can't render.
So I need to flip buffers or something.
I need to go Google and just mash the keyboard until it works.
So we're looking into GPU acceleration starting with OpenGL.
Mostly because we already have OpenGL debiring support in a test app
in Lib Camera called QCAM.
So we already have a set of GL shaders
and it's useful to start with sort of non-working base for the shaders.
In the future, maybe we'll also do OpenCL
also because some ARM SOCs only support or Vulkan.
Some ARM CPUs only support Vulkan like the Imagination GPU.
So yeah, that's another option.
So GPU acceleration should do the faster and cheaper bit,
which would also use less energy.
Then we have a whole list of image quality enhancements
which we would like to work on.
These are also things which are actually done by the hardware ISP,
but we didn't put them on the hardware ISP slide
because the slide was already full.
I'm going to skip this because I would like to use the last five minutes for questions.
So.
APPLAUSE
Hey, you doing demo?
Yeah, that's good.
Questions on the demo?
That's a good point.
I'll give it a go around.
Please.
OK, nobody from that, please.
Everybody from the front.
So here you see permission dialogue,
and then hopefully if I join this meeting, we'll get...
Look.
This is all...
APPLAUSE
So yeah, this is our current image quality
and actually with this lightning it's not too bad.
But if you have a really low light condition,
then it sort of sucks which we need to work on.
Do we have time for questions?
I think we do.
Really quick one.
You mentioned OpenGL and OpenCL.
Meepie is pretty common on embedded systems
that usually don't have OpenGL,
they usually have EGL or GLES.
Yeah, so GLES is what we mean.
Oh, sweet.
OK, thank you.
So what AAA algorithms have you actually got?
So it's a more than just you've got white balance exposure.
Do you have ones that people can play around with,
implement their own versions of them?
We would definitely welcome people to look at
what we have at the moment.
It's in a separate repo.
I don't know if you edit that in the references.
I probably did not.
It has to be, you have to go look at the branch.
So I might have been sure if I gave the branch for the ISP.
I'm a terrible person.
Look, the patches are on the upstream mailing list
and the cover lever also has a link to the branch
where you can check out the branch.
And we definitely welcome people to experiment
with more algorithms, better algorithms.
Only please keep in mind, we're running this in the CPU.
This is actually full HD, 30 FPS,
which is currently like a 40 percent FPU load.
I spend a lot of time optimizing this.
So we need to do this in real time.
That's important to keep in mind.
I think it's probably worth saying that you won't really be
able to use this on an IMX8 unless we get the GPU going.
So there's a cutoff point of computational power.
It's around a bit A53, I suppose.
It's just too much work for those processes at that point.
Hi. As far as Vulkan support,
have you looked at a wrapper like Zinc
to make the same functionality work on Vulkan-only socks
like you have on the ones that do have OpenGL?
No, we have not looked into Vulkan at all.
We're currently at the stage where we're trying to get GLES shaders to work
and Vulkan will come later.
There's lots of stuff which will come later,
like more image quality improvements.
I think it might be more productive to look at OpenCL
because then you don't care what you're talking to.
You're talking to Vulkan, you're talking to a GPU,
you're talking to a CPU.
It kind of doesn't matter from the live camera perspective.
So if we were going to spend more time
rather than choosing between APIs and the GPU,
it might be better to choose between different compute targets.
You mentioned that there's lots of secret source algorithm
which companies don't like upstreaming.
What is the reason for this?
Is it just companies being cagey,
or is there an actual algorithm and all the magic number tables
that for some reason they don't want public?
Personally, I think it's a mix.
On one part it's just companies being secretive
because they're afraid of competition.
Well, I don't think that's really necessary.
This is something which we put together pretty quickly
and it's really basic algorithms.
It already gives a pretty decent picture.
On the other hand, I think that the more advanced stuff
really does have company or trade secrets in it.
It's a mix, at least in my personal opinion.
That was one more question you said.
We're at a time of trade.
Thank you very much for that talk.
That was fantastic.