Hello everyone, thanks for coming here.
I'm Danilo, I've been working on Tornip Driver
for three years in Igelya.
And I want to give a status update,
what we have achieved so far,
and what's coming for us.
Let's start with the new hardware we support.
We now support a lot of hardware.
And recently we started support at 700 series,
Adreno GPUs.
We already merged Adreno 730 and 740,
and the merge request for the most recent,
Adreno GPU 750 is being on review.
There are a lot of changes between Adreno generations
with four mostly performance reasons.
There are registers changed,
and many new performance features out there.
We also currently implemented only direct rendering
and not tile based rendering.
Adreno GPUs are a bit weird
because they support two modes,
tiling and direct rendering,
which is the same that desktop GPUs support.
But tile based rendering is still
working progress for now.
We also support a lot of, almost all,
600 series GPUs, but there are some variants
out there we don't support.
There are five sub generations of 600 series.
We support all of them.
So to add a new one, new variant of the GPU,
we just need to change some registers there.
As for our features and extensions,
we now support Falcon 1.3
and a lot of extensions with it.
Most interesting one for us was dynamic rendering.
It's rather simple for desktop GPUs
because they don't care about render passes boundaries,
mostly don't care about them.
But for tiled rendering for mobile GPUs, it's a big deal.
We have to stitch together the render passes,
sometimes even at the submission time.
It could be really nasty.
Like the code is bodily readable for it.
And we have all extensions implemented for DxVK,
D3D Proton and for Zinc supported.
So it's great.
While we do not claim Falcon 1.3 conformance,
we do regularly test
Vulkan CTS.
We test a lot of game traces, we test games,
but with games it feels like a vacuum all game right now
because there are not a lot of real users out there.
And we don't have a proper CI with game traces,
like Radvid does.
Another big changes we've done are in pipelines.
Our GPU has some unique way of dealing with pipelines
and with all the new pipeline related extensions,
we have to rewrite them every time in some way.
But thanks to Conor, Conorabot, our pipelines are healthy.
We've done a lot of IRC optimizations,
which is our backend compiler.
They add up a lot with time passing.
And we've done a lot of work in debug tooling
because we have to reverse engineer GPU.
We deal a lot of with unknown registers, unknown instructions,
so we have to be able to quickly understand
what's going on right there.
So I want to spend some time on these debug tools
we've implemented so far.
I gave a more in-depth talk last XDC.
You could find it at this link.
So what's our debug tool?
We have GPU breadcrumbs like in Google flight,
graphics flight recorder.
We have ability to reply common streams.
We have ability to edit common streams.
We can print for GPU memory.
We could print from shader assembly in these common streams.
And we could debug register reading
of undefined state from registers.
I'll describe each of these feature
a bit more in the following slides.
Why we even need our own GPU breadcrumbs?
There is already a solution for this at Vulkan API level.
It called graphics flight recorder from Google.
It already could tell you where Hank occurs
at which command, but there are two issues with that.
It's two cores because for example,
the start of the render pass could translate into like 10s
or 20 bleeds at the worst case
and each of them may hang.
So API level tooling could be like not great at this.
And what's really prompted me to create
the breadcrumbs to implement breadcrumbs in our driver
is debugging of unrecoverable hanks.
When your computer or board just completely hangs,
you cannot do anything, writes to disk
doesn't come through.
Like graphics flight recorder doesn't work with it.
And to make it work, you need some new Vulkan extension
and so on.
It was much easier to deal with in the driver itself
by doing all the things synchronously.
And it worked rather great.
But this tool is currently is not used too much
due to the tooling I will talk about now.
Okay, let's say you cannot even reproduce the bugs.
Some bugs are random hanks occurring
in different parts of the game and so on.
So the easy way to reproduce them
is just to record all comments submitted to the GPU
and then replace them back.
I mean, for most hanks and issues works great
for reproducing them.
There are a few caveats like it's necessary
to record all buffer objects submitted
and there could be a lot for some triple A game.
So it works mostly for one frame or two frames.
And not all issues are reproducible this way.
There are some that are too finicky for this.
But most of them are reproducible, so it's good enough.
But it's not enough to just be able to replay the trace
and see a hank in the mask.
You have to have a way to narrow it down.
So what we implemented is a simple way
to edit the common stream.
So we could decompile some submit to the GPU
into very trivial packets.
Like there are packet names only in comments right there
besides some of them.
It's really easy to do for probably any GPU
and even in this form, it's very powerful
because you could bisect the trace
and find the exact comment which hanks
even if you have like the comment.
Even if it's impossible to determine from any other way
how to deal with it.
So you could edit some part of the packet
and see if it helps.
If it solves the hank, you could like deal with it
as with ordinary code.
What if the issue is inside the shader itself?
We already could compile the shaders from assembly.
So with this replay tool, we could add ability
to just print some registers from the shader.
And the most trivial print is good enough.
So our print takes temporary registers for address
and so on and registers to print.
And print them.
Like it increments global counter
and tries to global storage and replay tool
just reads from it and prints the registers.
It's trivial and it was incredibly useful
in reverse engineering and hardware.
You get the trace from proprietary driver,
you decompile it, you edit the shader to print something
and you see the values and what's going on.
It's incredibly useful.
And the last tool in our tooling is the way to debug
undefined registers, stale registers.
A lot of issues are due to reading of
like run value from the registers.
Some state is not immediate.
Even games have issues of not emitting some state
and so on.
A simple solution, at least for us,
it was writing
run values to all the registers
and seeing what's breaks.
And it mostly works.
It's not that trivial because there are at least registers
which are written at the start of command buffers
and never touched again.
And there are registers written in each,
like in the render pass, like registers set
that are set by pipelines.
So we divided the registers into two categories.
The ones that are set at the start of command buffer
and the ones that should be stomped before each bleed
and render pass.
Again, there are some other caveats
but it helped us quite a lot in debugging various issues
when we implement new features.
Let's forget about some weird registers.
Okay.
What are the real users of our driver at the moment?
Like where you could see it.
At the moment they are emulators on Android.
Why?
Because proprietary drivers are terrible on Android.
Not due to their code but due to update policy
of proprietary drivers there.
They are not updated at all.
So users are stuck with their terrible,
many years outdated drivers.
And with many issues, these drivers have many issues.
They don't have necessary extensions.
Like it's bad, it's really bad.
And emulators need new features.
They need for drivers to work.
They push drivers to the limit.
So if, so they, like for example,
you now is able to load our driver, Chornip,
and use it instead of proprietary driver.
And it works rather well for them.
And I remember some other emulators
use the same technique to deal with issues
in proprietary driver.
Let's see an example.
Here is some Zelda game running on Android
on Adreno 650 with our driver.
It's running rather great,
even if it's a previous generation of Adreno.
Like FPS is nice, runs correctly, it's great.
So proprietary driver is a bit weird to say the least.
Like maybe it works with the most recent one
but it's hard to tell.
Drivers are not updated.
It's hard for users to update them and so on.
So there are lots of issues
and probably they don't test with these games.
Okay, fair enough.
We also don't really test these games.
But the developers of at least Yuzu
are willing to implement some debug tooling
like recording the games, the game traces
for us to easy to debug them.
Because it's not that easy to launch a game
without having the switch itself.
Like it's not legal.
Okay, earlier I said that Tornip implements
all the features for DXVK and VKD3D Proton.
So can we run desktop games?
Yes, we can run desktop games.
Here you see laptop X, X13S running cyberpunk.
It runs via a lot of layers.
Like you need FAC simulator to translate X64 assembly
into IRM 64 assembly.
You need Vine for Windows compatibility.
You need VKD3D Proton and so on.
There are lots of layers.
So we mostly test game traces, not games themselves.
We test games, but mostly traces
because they are easier to deal with.
But we will test games more soon.
So what is the future for us?
We need to support tile based rendering on 700 series
because it would maybe not give a lot of performance boost
for desktop games, but it would lower power consumption
and help probably on Android for the games.
Mark Collins, my teammate is working on it.
And I hope we will see it merged soon.
It would be great.
And then we need to squeeze even more performance.
There are lots of performance features
we need to implement there.
So even if we will not come to proprietary driver performance,
we expect to be somewhere near it.
At least we hope for this.
I hope.
And in the distant future,
we want to implement ray tracing
because at least like,
740 should be able to support Rayquery.
And 750 probably could support ray tracing pipelines.
I hope we implement this someday.
And maybe we would be able to implement my shaders.
That would be cool.
Okay, another exciting development, not from us.
It's not a Galeas project,
but an easy way to run desktop games on Android.
There is a work in progress project called Kasha.
It's worked upon by one of my teammates, again, Mark Collins
and some other people out there.
It's an amalgamation of Vine,
DXVK, VKD3D and FaxCore on Android.
And I hope Jornip would have a first party support there.
So it would be all bundled together
and work together as one.
Or you may say that people already
are running desktop games on Android.
Like here you see some person running Assassin's Creed
on their device.
Like it runs.
Yes, that's true.
There is project.
There are several projects probably for this.
It is done with Thermux.
It's, I mean, I'm not sure exactly what it is.
But it's even more unholy amalgamation
of projects.
It runs, it's really cool.
But there are some performance issues,
some issues with how all these moving scenes are
are stuck together.
But like people running games, desktop games on Android,
that's super cool.
Okay, that's all from me.
For today, so you have some questions, suggestions.
So you said you...
Mike, Mike, no, okay.
So you said you could use this on Android
to replace the proprietary devices.
Yes, you could use...
So does that, okay,
does it meet the root device or custom kernel?
There are two cases.
If you want to replace
proprietary driver for the whole system,
you need the root.
You cannot change system libraries without root.
But if you want to use a tournip for emulator,
if emulator supports this,
it could just load the shared library, packaged for it.
So, and Google Play allows emulators
to use custom drivers, they asked for it.
And Google Play allowed it for this case.
And the loaded driver talks to the proprietary kernel driver.
Yeah, there is proprietary kernel driver, KGSL,
it's a downstream driver.
So we have backends for several kernel interfaces.
That's right.
Anyone else then?
Sorry, will you recall the one with the upstream
for doing all the kernel?
Could you repeat the question, sorry?
How would your implementation interact
with the upstream kernel driver for the seven access?
Do you go as fast as you can?
We develop a Mesa for 700 series on MSM, on upstream.
Not exactly on upstream MSM,
because we have some custom changes to make it work.
Not all of them are upstreamed, at least for 750 GPU.
But it will be all upstream, we need it upstreamed.
It would be there.
But the kernel is not done by us,
so we don't have much control.
It's other people working on it.
Okay, I guess that's all.
Thank you.
Thank you.
Thank you.
Thank you.
Thank you.
Thank you.
Thank you.