I'll scare me like that.
Alright, without further ado, I want to introduce you to Paris.
Paris is here. He was supposed to give the talk together with George, but George could not make it.
Paris is going to tell us about the 3DS, but I know that he also has knowledge of the Nintendo 64,
so make sure to ask him any questions about that as well.
Did I forget anything, Paris?
No, that's all I think. Alright, then please take it away.
Alright, hello everyone. Welcome to our talk on the Panda 3DS.
We'll cover a variety of topics, but first I'll introduce the three names on this list.
First is George. He is the creator of Panda 3DS and this presentation.
Next, me, Paris, I'm a contributor in Panda 3DS, and we also have David, who helped a lot with making the presentation really pretty.
You don't want to see the original version.
For questions, you can ask either after the talk or on the Panda 3DS Discord server.
And lastly, you can send an email to these two to punish them for not showing up here.
So, what is Panda 3DS?
Panda 3DS is a Nintendo 3DS emulator for Windows, Mac OS, Linux and Android.
Just like all emulator projects, we have several goals and aspirations,
such as providing users with a pleasant experience, creating a portable code base,
exploring new possibilities, researching the 3DS, expanding the RedPandicold,
aiding homebrew developers in their 3DS software, and fun, usually.
Here's a peek at Panda 3DS. We have three frontends, the SDL one, the QT one, and we also have an Android version.
Let's take a look at what we're going to be discussing today.
So, three topics. First, we're going to go over the hardware architecture of the 3DS.
Next, we're going to talk about the 3DS software stack, that means the Horizon operating system and the user land.
Finally, we're going to go over emulating the 3DS.
We're going to talk about the levels of emulation, points of interest for new developers of the 3DS, etc.
So, this is the first glance at the 3DS hardware.
Most of the action happens in this big chip named the CTR, which is where Citrus name comes from.
It has three CPUs. So, one is this ARM 11 right here, and this is what runs all 3DS code.
It also has this ARM 9, which is the same model that was used in the DS and the DSi.
And next is this ARM 7 CPU, which is the same in the DS and DSi, and also in the Game Boy Advance.
We're going to see why it has three CPUs in a second.
But first, there's also the DSP, the digital signal processor for audio, and then there is the Pica 200 GPU, which uploads to two displays right there.
One notable thing about the top display is that, as you can see, there's two frames here,
and that is because it would generate two frames, one for each eye, to sort of create this fake 3D effect.
It has some other miscellaneous hardware.
It has 128 megabytes of RAM in the original version, and the new one has 256, and some other miscellaneous stuff.
We're going to go over all of that.
But first, the reason that it had three CPUs is because, actually, the 3DS can run DS and GBA games natively.
And a fun fact about this is that many people didn't know that the 3DS can run GBA games natively,
because only those who were part of the Nintendo's ambassador program could use this feature officially.
And that program is something that Nintendo launched for people that bought the 3DS at the original price of $250 before it dropped to $170.
Nowadays, there's an open source interface for running GBA games called OpenAGBFirm.
Alright, so inside the system on chip first, there is the ARM 11,
and in the original 3DS, it is composed of two cores running at 268 MHz.
That first core is not supposed to look like that, but let's ignore it.
In the new 3DS, there is four cores running at 804 MHz.
And we're going to see exactly what each core does, but first let's look inside one of these cores.
Okay, of course, my clicking doesn't work, so I'm going to manually go to a different slide.
This was supposed to be a cool transition, but it didn't work.
Alright, inside the ARM 11 core, there is the ARM V6 architecture, and it is composed of three instruction sets.
There is the ARM, the 32-bit main instruction set, that is what most game code is.
There is the thumb, this is a reduced 16-bit version of the ARM instruction set for improved code density.
Some operating system code uses thumb, and then there is Giselle, which is a Java byte code runner in hardware, essentially.
Games don't use this, some homebrew uses this.
There is also a vector floating point unit.
There is media instructions for video and audio decoding, say your cut scenes, stuff like that.
There is an MMU for running multi-tasking operating systems, and I guess more stuff.
And there is a branch predictor out of order completion.
Alright, now I will transition back to my normal slide.
Alright, let's pretend everything worked.
There is the ARM 9, which is the same as we said that was in the DS, and it has the ARM V5 architecture.
In 3DS mode, it's actually used, however, to manage storage and cryptography hardware, so it's not completely unusable in the 3DS.
It runs part of the operating system, we're going to look into that.
And then there is the ARM 7, which is for DS and GBA backwards compatibility, and it is completely disabled in 3DS mode.
How do the cores communicate with each other?
Well, in the ARM 11, they use shared memory and interrupts to communicate with each other, and a bus.
The ARM 11 with the ARM 9 communicates with this thing called the PXI-FIFO.
PXI stands for a processor exchange interface.
I mean, there's the bait. Some people say processor interconnect.
Some people say pixie, as in furry, I don't know.
Then the ARM 9 communicates with the ARM 7 using the IPC-FIFO.
That was what it was called in the DS.
So, it's the same there.
And now we're quickly talking about the Pica 200, which is Nintendo's first off-the-shelf GPU in a handheld, and what was used in 3DS.
It implements OpenGL ES 1.1, although most games didn't use OpenGL, and it has some extensions, and some of those include per fragment lighting,
which means it can calculate lighting per fragment instead of per vertex, hardware saddle mapping, polygon subdivision through the use of geometry shaders,
subsurface scattering, which is the scattering of light as it penetrates a translucent object, and more, many, many more.
And games communicate with the GPU using a piece of shared memory, which is used to store GPU commands and some other interrupt info.
If a game wants to render, it first needs to write to the Pica's external registers, which can figure the frame buffer and some other secondary things,
and then they need to initialize the rendering context, which is done by configuring the internal registers,
and this is done through the use of GPU commands, as before it was done through the use of writing to a memory address.
Pica command lists are nothing more than a list of values, describing patterns and data for writing to the GPU internal registers.
So that's all there is to it.
This is a white square.
This is Mikage, a demo by DMP for Pica 200 that was presented at SIGGref 2006, so they could show off their new GPU.
It shows the impressive shadows, like you can see here, and fragment lighting functionality, like, for example, in the Samurai's helmet.
And also, there is, you can't see it there, but the floor, the wood grain, was procedurally generated because the Pica 200 could procedurally generate textures.
The name Mikage, that's where the Mikage 3DS emulator got its name from.
And you may be wondering that, you know, Citro got its name from the CTR, Mikage got it from this, where did Panda get its name from?
That is a really good question, so if you find the answer, do let me know.
Alright, my computer is lagging.
Well, that's the joy of running it in a VM, so you could use a PowerPoint.
It would be quite nice if it would run live right now.
Oh, this is awkward.
Yeah.
Is that a thing? It's like frozen. Oh, it's back.
Alright, I'll restart the virtual machine just to be safe. Or maybe I won't. Let's see.
Aha!
No, no, no.
You're good on time still.
Okay, good. How do I send this?
Aha! Alright, sorry about that.
Oh, now my first window is buggy. Okay, great.
So, this is a Vertex shader in the Pica 200.
The Pica 200 didn't use a high level shading language like GLSL.
Instead, shaders were usually written in assembly.
So, let me actually try to fix my thing.
Okay, I guess I can.
There is uniforms at the top, just like in a GLSL shader, which is a read-only piece of data.
In this case, it's our projection matrix.
There's constants, output attributes and input attributes.
So, our position are in color, we want to output.
And moving down to the main function here,
it calculates for dot products to combine the projection metrics with our input coordinates
and moves the output color. It doesn't do any modifications there.
So, we're quickly talking about the pixel pipeline.
So, in modern GPUs, fragment shaders are used,
which is a small program essentially to run the fragment pipeline.
But in the Pica 200, there is no fragment shaders.
Instead, it has a six-stage color combiner to do so.
So, essentially what would happen is,
Vertex data would come from programmable shader units into the rasterizer.
The rasterizer would generate a position.
And then, four textures would be sampled using these texture units.
All of them could do 2D textures.
The first one could also do cube maps, 3D textures.
And the last one could also do procedural textures.
And then, this texture data and the Vertex data would be passed to the color combiners,
along with lighting data.
And color combiner is essentially what the name suggests, color combiner.
It takes in two inputs and a way to operate between these two inputs
and sort of produces a new color.
So, for example, you could have two textures that you want to combine,
or you could have a texture and lighting source that you want to combine.
And after that, after passing through all the six stages,
you get a color which is then post-processed,
and voila, you get the Kirby,
which, for example here, the beanstalk is mixed.
The beanstalk texture with the lighting from light source.
This is why it has, like, a scene here.
And then on the leaf Kirby standing on, there's a gradient from left to right.
That's not probably how the texture was of the leaf.
Instead, it was using a color combiner combining with horizontal position,
so it gets darker as it goes to the right.
Some sewing off of the pica, some games.
So, Captain Toad Treasure Tracker is known for being good and clever
with lighting and shadow effects.
The Legend of Zelda Ocarina of Time,
which uses the fog rendering hardware that pica has,
and also ADC-1, which is a format you probably never heard of,
which is a compressed texture format.
Mario and Luigi Paper Jam, it generates the seawater via procedurally generated textures.
And Super Mario 3D Land, the sole source of things,
like stencil testing, logic operations,
command lists that invoke other command lists and more.
And now we're talking about the digital signal processor.
It is the same one that was used in the DSi, but in the 3D S2s, far more.
And most games would ship with a common firmware,
which includes support for 24 audio channels, ASC audio decoder,
for games like Pokemon X and Y and Fire Emblem,
multiple audio encodings, effects such as reverb, delay,
both mono and stereo input, but only stereo output.
And the architecture for this digital signal processor
might seem a little bit weird,
but it's not really because, well, for example, it has 16-bit bytes instead of typical 8-bits.
And that is because it is a digital signal processor
and needs to process samples, which usually are 16-bits, but not always.
It has some weird instructions that you may not find in a typical CPU,
such as multiply ad.
It has support for type loops, which is necessary for such work,
and 500 kilobytes of memory.
And this is a rendering of the DSP doing math, I guess,
Georgia at this time.
And then there is some other miscellaneous hardware that we saw earlier.
So the RAM, we went through that.
There is also 6 and 10 megabytes of VRAM,
one-time programmable memory for storage uniques, console uniques, sorry,
controller, DMA engine, cryptography engine.
You're probably not interested in all of that.
There's two back cameras and a front camera and an EIR front LED,
which is used in the new 3DS for face tracking,
because actually the fake 3D effect would be quite straining.
They would need this.
All right, and we've reached the point to talk about the software stack,
and we're going to start by talking about the Horizon OS.
So the Horizon OS was the operating system of the 3DS,
and it was split between the RM11 CIS score,
which I actually skipped because my transition didn't work.
So let me find the slide.
I'm sorry about this.
Oh, okay, I guess everything is broken now, huh?
Okay, let's see here.
Close this.
Aha!
Okay, quick interjection.
Sorry about that.
What's the problem about the cores because of that broken effect before?
In the original 3DS, there is two cores.
One is the up core, one is the CIS score.
The up core runs usual in apps, including games and system apps.
The CIS score runs the operating system and services.
We're going to see what those are in just a second.
And the new 3DS has two more cores.
One is for the head tracking service,
which is for the straining effect that we talked about,
and the other is just available as another app core.
So they would have two cores.
All right, after this quick interjection,
I'm going to go back to the operating system, if I may.
All right.
So yeah, the R9 is for security and IO as we just saw earlier.
So you need to get a firm grasp of firms.
There is four in the 3DS.
The one we're going to be talking about is the native firm,
which runs 3DS games natively.
This is the 3DS mode.
In the R11 part of the operating system,
the R11 runs the usual length and the majority of the
operating system code while the R9 is dedicated to
cryptography and cartridge.
There is the AGB firm for GBA games natively.
And the R7 is the start of the solo runs all the game code.
There is the Twilight firm for the S, pretty much the same.
And then there is the Safe firm,
which is a bare bones version of native firm for recovery.
And you also can use it for updating.
But later on, from now on, actually,
we're going to look at only the native firm here for when we
talk about stuff.
And the kernel inside the R11 is called kernel 11.
I'm going to go through a brief introduction of what is a kernel.
Every operating system has a kernel,
which is considered the core of the operating system and
handles various critical functionality.
So in the 3DS R11, there is this kernel 11 and it is what we
call a microkernel.
A microkernel essentially tries to be as small as possible.
It runs its services in user space, things that it needs to do,
such as file systems and networking.
It is less code, which means less vectors of attack.
And it's generally considered to be more reliable.
So if something crashes, not the whole thing crashes,
like if your network crashes, the kernel doesn't.
Examples are this kernel 11 and also Minix, for example.
And then there is a monolithic kernel.
There's more types of kernels, but these are the major ones.
In the monolithic kernel, all or most services run inside
the kernel proper.
And there is supposed to be fewer context switches,
so everything is supposed to be faster inside the kernel.
One example of a monolithic kernel is Linux.
Kernel calls happen via the supervisor call,
SVC, which is this.
Here is an example of the assembly function called SVC.
And like most operating system, kernel level is not a pure
microkernel, but it is still a microkernel and handles memory
management, processing thread management,
and service and process intercommunication.
Let's see what a service is.
So services, as we said, usually end processes.
And inside the 3DS, they're managed by another service
called the service manager.
In order to interact with a service, say you're a game
and you need to interact with them, first you need to get
a handle to the service manager itself, which is a public service,
unlike all others.
And you do so using a supervisor call.
And once you have a handle to the service manager,
you can ask it for the handle to the service you actually want
to use, say the file system service.
And you do so using the sent sync request.
So you request the handle from the service manager
through the sent request.
Then you need to set up a parameter buffer with the function
you want to call to that service in the file system,
say the file system service you want to call the read file
function, which might not exist.
And you pass the function you're being called,
and the necessary parameters.
And then you need to send a request to the service,
again using sent sync request.
And you receive an output buffer.
The majority of communication with services is implemented
with requests like this.
Sometimes shared memory is used to reduce latency
for some crucial services, such as the GPU service.
And requests and responses are really thread-locked with storage.
And as the name suggests, sent sync request, that is,
they are synchronous, which means they're not a sync.
The response is written after the function returns.
But some services can't use something to happen later.
So for example, there is this Y2R service, which is
YUV2RGB color, is typically used for decoding videos
and FMVs and cut scenes.
It doesn't stall until it's finished, which might take a while.
Instead, it notifies you when it's done using a kernel event.
These are some important services.
So there's the file system, the DSP, the GSP for GPU,
communication, app for applets,
HID for input, et cetera.
There's many, many.
This is a function that asks the HID to HID service
to enable the gyroscope.
It gets a thread command buffer.
It makes a header to send in, and it sends a sync request.
It checks if there is an error, and otherwise gets the result.
So there is then Process9.
And Process9 is what runs inside the ARM9,
and it is unlike the kernel 11,
it is a monolithic single process kernel.
And all it handles, like we said, is cryptography
and device IO talking to the cartridge, to the SD card, et cetera.
It has over-complicated C++ that reverse engineers hate,
and it's really, really big.
And here's a funny quote by PSI, creator of Corgi3DS,
goes, if you ever feel useless, just know that Process9
calculates a ha ha on the CPU,
despite having access to a full hardware shanjin.
All right, so this is live demos,
but my system just broke, so I'm not going to do the live demos.
I'm actually going to show you a video of the live demos.
So here's a static frame.
Here's a video of Pand3DS running on a Discord button.
This is done through HTTP streaming.
There is a server, and it sends the frames,
and it becomes a GIF, and you can use the buttons to play
at two frames per second.
We've actually finished some games using this,
so it's not as bad as it looks.
This failed too, all right.
Here is Lua scripting in Pand3DS,
so you can use Lua scripts to create MGUI windows like this,
and you can create a lag.
Okay, there we go.
You can create cheats and stuff.
So you can...Pand3DS exposes these, like, functions,
such as write16, so you can write to address and stuff,
and it also exposes events, so you can do a function of a frame.
So here I'm going to change, for example, the slider,
if the video...
Okay, let's skip the video.
Oh, no, actually, here you can increment the RUPIC count,
and there is also a button below that says...
you can't see it, but it says murder link,
and pretend I pressed it and link died.
Oh, there you go. All right.
Let's get away from the videos.
All right, so there is a hidden emulation talk
inside this 3DS architecture talk.
So first, before we get into it,
I need to introduce you to HLE and OLE.
So HLE stands for high-level emulation,
and in high-level emulation,
you essentially reimplement parts of the emulator system software
in your own code to avoid emulating the hardware
that is needed to run said software.
So, for example, I told you that the DSP
has a common firmware that most games use.
Well, instead of emulating the over-complicated DSP,
you can emulate this common firmware.
The calls that would happen,
you do so by re-implementing it in C++.
The opposite of that is HLE,
where you would emulate the entire DSP,
which is, you know, much harder, or actually may not be.
And then you run the actual software,
the DSP firmware, on that hardware.
There is benefits to each side.
So, for example, LLE is slow
because you emulate the entire hardware,
but HLE is not easy.
You need to reverse engineer the software.
You want to run, et cetera.
We're going to look a bit more into that.
So for the 3DS specifically,
you can HLE the operating system.
So kernels, services, process nine,
there's more things you can add HLE.
And this is an example of something being HLE.
So there is the file system service,
and this is an HLE implementation of the file system service
that returns if an SD card is inserted.
And if you were to LLE this,
you would need to emulate the complex SD hardware interface.
So, you know, there are some benefits to HLE, of course.
So just a quick summary.
LLE is tedious because there's much, much harder to implement,
especially in the 3DS,
and it's also slower.
But, beneficially, you can run any 3DS software,
including bare metal firmware such as Linux for 3DS
or GodMode 9.
HLE is again tedious
because there's so many services to implement.
But it is performant, but still are prone,
and there's many things to reverse engineer.
There is also a hybrid approach you can do.
So you can HLE the kernels,
but you could LLE most operating system services.
And what does LLE services mean?
Well, services, as we said, are usual and apps.
So you can literally just take the binary and just run it.
That would be LLE-ing it in your emulated hardware.
You run it.
This is a nice bit of balance.
It minimizes work, improves performance, and accuracy.
Maintains accuracy.
So as a 3DS MUDAP,
you need to consider how you're going to approach the CPU.
And there is many ways to do so.
First, let's take a look at interpreters.
An interpreter, you essentially interpret
all the opcodes that you need to run,
which is to say you switch through them, you decode them.
This is slow, but it's also very portable.
As long as your code compiles to the target platform,
you're pretty much good to go, usually.
And this might be your choice
because a GIT might be very hard or impossible
on whatever you're targeting, say an iPhone, I don't know,
or WASM.
GIT recompilation is converting the ARM code to host CPU code.
So if you're running on an X64,
you would convert the ARM32 code to X64.
This is the most common solution.
It's also easier if you use dynamic, like Citroen Panda Do,
which can perform ARM32 to X86 or ARM64.
So you can run most devices that most people might care about,
such as computers and phones and stuff like that.
Then you could also consider virtualization.
As far as I know, this hasn't been tried yet in the 3DS,
but there is an ongoing request to try to do this.
So virtualization is the way apps like VMware and VirtualBox work.
And on some ARM32 and some ARM64 devices,
there is the possibility to execute 3DS code natively
through the use of virtualization.
It's not possible on all ARM64 devices
because they're removing this functionality, unfortunately.
But on some Raspberry Pi, for example,
and some rooted Android phones,
you can run 3DS code natively implementing this.
We don't yet know if it's faster than GIT, but yeah.
And then you could also consider
ahead-of-time recompilation potentially.
So that would mean recompiling the ARM code from the code section
to your host operating system.
This has a benefit compared to the GIT,
which is that if you ahead-of-time compile,
you don't need to compile fast,
which means that you can perform optimizations.
So for example, you could use something like LLVM
to produce optimal code.
So yeah, this is another potential way you could run your games.
Then you need to consider how you're going to approach the GPU.
Again, there is two ways to go about it.
So you could do software rendering, which is simpler,
but again, portable and slower.
That's the downside.
You could also do a hardware renderer,
which draws on the GPU using something like OpenGL,
Vulkan, Metal, et cetera, DirectX.
This is much faster and obviously ideal for playing games,
but less portable.
So for software renderers, how do you make them faster?
That's what you've got to consider.
Well, there's many ways to make them faster.
For example, there is multi-threading.
You can draw on concurrently on several threads.
And then you could also use a recompiler for a software renderer.
So you would take the pika state, the current render state.
You would make a small binary for it,
and you would run it for set state for every pixel.
So this would avoid like running a branch for everything.
For example, if the depth buffer is disabled
and you don't want to check for every pixel.
This is optimal, optimized.
And you can also reuse this compiled binary
if the same rendering state arises again.
This is something that, for example,
PCSX2 does for its software renderer.
And then for hardware rendering, well, it's challenging.
You have to make sure you choose the ideal API,
manage surfaces correctly, such as textures, color buffers, et cetera.
There's many, many other problems to solve.
For example, dealing with the parts that are on really OpenGL compliant
or games that use depth buffers, color buffers,
and write to them directly.
Or, yeah, there's many, for example,
tracking when a texture has changed,
and it's no longer valid.
There's many issues to solve in a hardware renderer,
which makes it a bit harder.
Then you're going to consider how you're going to approach
the pika state.
So again, you can do it in Turbo.
That's simple and also too slow.
You could do a GIT on the CPU.
So this is the vertex state that we saw before,
and it is converted to some scary looking X64 assembly.
A little more 64 assembly, scary looking.
And then another approach.
This is decent performance, but it could be better.
Is to recompile these satyrs from this state to your own GPU.
So something like GLSL or Spurvee, for example.
This would give even better performance,
but it might not be possible for some select pika satyrs.
So, yeah, let's move on.
Then you need to consider how you're going to emulate
the pika pixel pipeline.
There's a lot of things to emulate in the 3DS.
So one approach to specialized satyrs.
I think this is what Citra does.
So essentially you compile a specialized satyr
for each pika pixel pipeline configuration.
That's how you will emulate the pika pixel pipeline
and run it on your GPU.
This has low GPU usage because those specialized satyrs
are specialized and small,
but lots of time spent compiling satyrs
which causes stutters on some games.
This is the most common approach.
Panda3DS currently doesn't have this,
but it has uber satyrs, which is a term coined by Dolphin
which aims to solve that issue of specialized satyrs
of causing stutters.
You essentially take an entire emulator
for the pika pixel pipeline and run it inside a GPU fragment
satyr.
So you don't need to compile many times.
You just compile once and you run it forever.
This, however, has higher GPU usage
because it's very, very big,
but no compilation stutter.
And it works well on more PC GPUs,
but struggles on mobile GPUs, for example,
or lower end ones maybe, because it's very big.
And this is what Panda3DS does,
but, however, an even better solution would be
to do hybrid emulation.
So you will compile the specialized satyrs
as synchronously in the background,
which you couldn't do before because you need them to render.
And while they're compiling on a different thread,
you use the uber satyr right here,
which is, this is used for every call
until the relevant satyr is ready.
And this gives, I think, better performance
than either method and works well on all GPUs.
And this is what Panda3DS wishes to achieve
after we've, after we're done with specialized satyrs.
And finally, there is the audio DSP.
There is two approaches, as we saw earlier.
There is the LLE approach.
In the LLE, you need to consider how do you optimize it?
Do you recompile the firmware to your own architecture?
Do you do it ahead of time?
Teacro comes to mind, which is an emulator,
assembler, disassembler for the TIC DSP,
and it's used in Citra and Mount DS.
And you could also, instead, HLE DSP,
which should be faster, and things you need to do with that
is improving the current DSP reverse engineering efforts.
It hasn't been fully reverse engineered.
You need to make test runs and tooling.
And then you would need to optimize it.
You could do so using SIMB or SIMD,
and multi-threading and end-end.
That is all for emulation.
And now I'll show you again some ways
we are exploring new territory in the 3DS emulation
to sort of wrap up this talk.
So Panda3DS comes with fluid scripting,
including a text editor, so developers can make scripts
and mods and tests and stuff like that.
And we also have MGUI support.
You can have your little windows.
And I think it's pretty cool.
Also Panda3DS here is running CTR aging,
a factory test program some other emulators may struggle with.
And Panda3DS running on the Wii,
via the same HTTP streaming, not natively on the Wii.
Yeah, yeah, unfortunately.
Not natively on the Wii, just using HTTP streaming.
We thought to show it off because we have an HTTP server.
Just throw it on the Wii, whatever.
And the physics book there, the monitor rest,
I failed that class.
All right.
We have a revolutionary UI.
It has Panda icons.
You can play on Discord with all your friends.
That is it for me.
Thank you very much.
APPLAUSE
Hello, hello. Test? Does it work?
Yeah.
OK. Thanks for nice talk.
Anybody has any questions?
We have quite some time.
I'll start here.
Hello.
Hello.
Great talk. How much time did it take to build all that stuff?
Well, I think we've been working on it
since the FOSTA applications opened,
since we got accepted, actually.
So I would say, like, about three months now,
something like that.
The original version was not nice,
but we fixed it.
So yeah.
Maybe you can give us a presentation of the...
Oh, I'm sorry.
LAUGHTER
Silly me.
Three months.
Three months.
Yeah, we built all this and the presentation.
LAUGHTER
All right.
No, it has been in development since September 2022.
Yeah, so more than three months.
One and a half years now.
Sorry. Hello.
I have a question regarding GPU emulation.
Yes.
Have you considered using parts of Mesa, for example,
which has a lot of GPU stacks, open-jade implementations,
and can do compilation to native GPUs?
I wonder if it has been explored,
not necessarily in Pandat 3DS, but any emulator?
That is a good question.
No, I haven't considered it.
I don't know if George has.
He's the main developer on the Pandat 3DS.
You can ask him directly if he has,
but as far as I know, no.
Hey, thanks for the presentation.
Hello.
I was curious.
Do you know how you mentioned that in the 3DS,
different generations of the hardware have more hardware?
How do games and other software handle that?
The different generations of the 3DS have more hardware?
Yeah, and the different CPU speeds and whatever.
Do the games do more if they are on different versions of the 3DS?
Are you talking about the DS or GBA backwards compatibility,
or are you talking about 3DS games?
Because the 3DS in all generations has the same CPUs,
but the RM11 has more cores.
Yeah, but what do they do with the more cores?
Oh, I think I mentioned it, but I'll show you again here.
Yeah, so one is used for running games,
one is used for the operating system,
this is used for head tracking,
so your eyes don't get dizzy from the 3D effect on the new 3DS.
These two are on the new 3DS,
and then the new one, this one is available as another app core.
I think for backwards compatibility,
if you try to run an older game on the new 3DS,
it tries to downclock the speeds and not use the extra app core,
and if you run a new 3DS game,
it uses both of these to run the game.
Okay, thank you.
What happens if you run the new game on older generation 3DS?
Does it fall back here and turns off parts of the game
and not to crash on the new 3DS?
I'm not...
Yes, they asked,
what happens if you run a new 3DS game on all 3DS?
And I'm not sure, but my guess would be
that it would display some sort of message
that you can't run it on this console.
This is what happens, for example,
when trying to run a Game Boy Color game on a Game Boy
if there was no such support.
So I would assume that's what would happen, but I'm not sure.
I'm wondering about the ARM 7 core.
You mentioned it was disabled on the 3DS mode,
but can it be used in certain situations
like running virtual console, GBA games?
Can it be used to run natively, or is it still emulation?
Did you ask if it's used to run natively Game Boy Advance games?
Yes.
Yeah, that is its only purpose, and also to run DS games,
because the DS also had an ARM 7 core.
The ARM 7 only exists in the 3DS to run
Nintendo DS and DSi games and Game Boy Advance games natively.
For the GPU emulation, did you consider using
compute shaders on modern GPUs as well, like speed up
parts of the rendering pipeline that are hard to do
with normal fragment and vertex shaders?
Yeah, that is a question for George,
but I think he definitely would have considered
to use compute shaders, yes.
Does anybody else have a question?
Yeah, my question was, you said that currently
on the 3DS supports Windows Linux, Mac OS and Android,
and do you expect to support more platforms in the future
like Wasm or iOS?
I personally wanted to try at least to port it to Wasm.
It's not that easy.
There's no recompiler from ARM32 to Wasm.
You would need to use an interpreter.
Currently, Panda 3DS doesn't have one,
but we're hoping to add one eventually.
And then there is WebGL, which is not great,
but you can use it.
Theoretically, it should be possible for Wasm,
but it's in the future plans.
What was your other system question?
iOS.
I think iOS also has some problems with running a recompiler.
It needs special privileges.
And also there's the fact that you can't really post an emulator
on the App Store, which is unfortunate.
But we definitely want to at some point, yes.
That might change in the future.
I may be able to.
Yeah, hopefully.
I have a question.
I'm allowed to ask questions.
So this is not really a question,
but George is in the chat and he's answering some questions.
I'm not sure if you have access to the chat.
You can read the answers.
The FOSTA chat?
Yeah, it's on Matrix.
No, this computer is not connected to the Internet.
It's secure, ultra secure.
But the answers are too long to read for now,
so just check Matrix for now.
So my question is, it's actually related to Anise's talk
because he mentioned vertical slicing.
I'm very curious, how would you do vertical slicing
for complex systems such as this or Nintendo 64, for example?
Could you provide a definition for vertical slicing?
Like very...
Anise, could you...
A definition for vertical slicing.
I don't think it's a good question.
I have a question.
Like you emulate just the necessary path to emulate,
this is the start of a game, and then go on from that.
Yeah, by the world, or just maybe, I don't know,
a test run and then test game, and then more stuff as you go.
So sort of test-driven development?
More like emulating as little as possible,
but just having feedback, visual feedback from the...
I don't know if that's possible, emulating as little as possible.
Even the simplest ROMs use quite a bit of the operating system.
Even like a simple triangle uses quite a bit.
So yeah, I don't know if it's possible
if I understand the definition correctly.
Thank you.
So first question.
How do you feel knowing that every second spent on emulation
is making a Nintendo business executive cry?
And...
Secondly, do you think ahead of time compilation
could seriously improve performance,
especially in lower-end hardware,
or do you think the actualize more on the GPU on the rendering side?
Yeah, it definitely could.
I don't know if it has been tried yet,
but we do want to explore that possibility.
The big thing about ahead of time is...
Well, the thing about the 3DS is that it doesn't really
do a lot of any dynamic code execution,
so it should be theoretically possible.
There is CROs in 3DS,
which is the alternative of DLLs in 3DS,
but George has told me that they can be handled.
And yeah, the big thing about AOT...
Yeah, that's all he told me, didn't elaborate.
But the big thing about AOT is that you can optimize the code,
so I would think that it would run quite faster.
But we don't know.
Okay, I think this is the last one.
How do you deal with the system applets that the 3DS has?
Do any games even interact with those in any way?
When you say the system applets,
do you mean like the home menu, stuff like that?
Yeah, like when you go to the home menu,
there's little apps at the top, for instance.
How do games interact with that?
Can they launch other games, that kind of stuff?
Games in the 3DS only interact with services.
They don't interact with the system apps.
Okay, final one.
Yeah, you talked a bit about the new 3DS infrastructure
and the additional CPUs.
And I was wondering, when you run older games
that weren't using the additional CPU,
does the new 3DS automatically use the additional CPUs,
or are they just completely ignored and just unused?
No, as far as I know, it just uses the original LabCore,
the one it has, and it also down clocks it to the original speeds.
Oh, well, thank you.
Okay, great. Let's thank our speaker again.
Thank you.
Thank you.