[00:00.000 --> 00:18.120]  Okay, so hello, I'm Pavel Majek and I'm here to talk about cameras, but you can also talk
[00:18.120 --> 00:26.640]  to me about clicker train horses, mobile phones, kernel, smartwatch by based on ESP32, Mobian
[00:26.640 --> 00:36.200]  on my molester. So first thing first, video following is not for cameras, it's for frame
[00:36.200 --> 00:42.240]  grabbers and they are really very different, which is basically what this talk will be about.
[00:42.240 --> 00:50.840]  They can do remote controls, but they cannot do autofocus for you and so on. But the interface is
[00:50.840 --> 00:57.480]  fairly simple, you just open depth video zero, select format and capture. Unfortunately, what you
[00:57.480 --> 01:06.920]  get is blurry photo, which will be either all white or all all black. This is with autofocus and
[01:06.920 --> 01:15.960]  auto something. Anyway, they are phones with smart sensors, one such example is pine phone and
[01:15.960 --> 01:25.080]  those are pretty close to the frame grabbers. They do basically everything in hardware. This
[01:25.080 --> 01:30.520]  used to be a pretty common design in the past, which made a good sense at that point because
[01:30.520 --> 01:38.720]  USB had limited bandwidth and you could not push uncompressed data through it. It's easy to
[01:38.720 --> 01:47.400]  sanitize, but it doesn't make much sense today. If you have like five lens on your phone, you
[01:47.400 --> 01:55.480]  don't want to have five JPEG encoders there. So we are moving to dump sensors, which basically do
[01:55.480 --> 02:04.600]  bare minimum. There you set parameters like exposure, gain, select area and so on. And it just passes
[02:04.600 --> 02:13.560]  the bare data over the fast bus and it usually ends up in your memory. And then you have component
[02:13.560 --> 02:22.000]  called ESP, which is image signal processor, which will do the JPEG conversion and such stuff.
[02:22.000 --> 02:30.320]  Unfortunately, in case of the interesting phones, which is official LibreM5, pine phone and pine
[02:30.320 --> 02:37.880]  phone pro, we either don't have the processor or we don't have drivers for that, so we can't use it.
[02:37.880 --> 02:45.960]  So this is how the image, this is a photo if you try to take it without the automatics. Can you
[02:45.960 --> 02:55.840]  recognize what's there? It's a USB connector. It's recognizable, I'd say. So what do we need to do?
[02:55.840 --> 03:06.040]  Nokia N900 is another example of complex design, which used to be very important historically. And
[03:06.040 --> 03:14.000]  actually the photos in the presentation are from N900 with open source stack. In real time, you need
[03:14.000 --> 03:20.640]  to do auto exposure because otherwise you will have black or white frame and you need auto exposure
[03:20.640 --> 03:26.800]  for autofocus. On most cameras, you really want autofocus too because you can't just focus to
[03:26.800 --> 03:31.840]  infinity and expect good image. And that's pretty much everything you need to do for
[03:31.840 --> 03:38.080]  the video recording in the real time. Then you have preview. Preview is a bit less important than
[03:38.080 --> 03:45.000]  the video recording, but it's also important. You need to convert from Bayer to RGB. And you
[03:45.000 --> 03:50.920]  need to do gamma connection because the sensors are linear in one side and exponential on the
[03:50.920 --> 04:00.160]  other side. GPU can help here. And then there are extensive post-processing steps like auto
[04:00.160 --> 04:07.840]  white balance, lens shading compensation, getting rid of bad pixels and probably many others I
[04:07.840 --> 04:15.520]  forgot about. Advantage of this is that this can be done after taking a photo or after recording
[04:15.520 --> 04:23.240]  the video. And there are quite good tools for that, including raw therapy, Euro and so on. So
[04:23.240 --> 04:33.280]  people were working, unlike the other parts, this got some work done before. So what we are
[04:33.280 --> 04:39.520]  talking, for example, on the N900, you have LED flash, which is a completely independent device.
[04:39.520 --> 04:47.000]  You have voice coil support for autofocus, which is again a separate device somewhere on
[04:47.000 --> 04:53.920]  I2C. Then you have two sensors, front and back camera. You have GPIO switch to select which
[04:53.920 --> 05:03.160]  camera you want. And then you have ISP, which is quite a complex piece of hardware, which will
[05:03.160 --> 05:14.040]  not be important for this presentation because we will do without it. So tools to use. There's
[05:14.040 --> 05:24.120]  great set of tools to use, but they have some limitations. One which looks very nice is G-streamer.
[05:24.120 --> 05:30.440]  And G-streamer is really great if you have an unlimited CPU. Unfortunately, you don't have
[05:30.440 --> 05:39.080]  unlimited CPU. If I was willing to hack its C code, it would be very powerful, but there's
[05:39.080 --> 05:47.080]  some learning curve involved in that too. And at the end, G-streamer might be right to use,
[05:47.080 --> 05:54.240]  but I found other tools easier. There's FFM pack, which has quite nice and very simple
[05:54.240 --> 06:02.160]  command line interface. So I used it at the end. I didn't really need much. Just please
[06:02.160 --> 06:08.680]  take these images and compress me every video from there. There's megapixels. Megapixels
[06:08.680 --> 06:15.760]  is a very nice application focused on mobile phone, very well optimized, but its origin
[06:15.760 --> 06:26.880]  is a pine phone, and they don't use live camera there. Then there's live camera. Everybody
[06:26.880 --> 06:35.000]  says live camera is future of video on Linux. It probably is, but there are still many steps
[06:35.000 --> 06:45.200]  to get there. And there's megapixels. Millipixels is fork of megapixels, which is supported to
[06:45.200 --> 06:53.360]  live frame 5 and to live camera more importantly. So in many ways, so megapixels actually currently
[06:53.360 --> 06:59.560]  looks nicer because it is based on newer GTK. On the other hand, millipixels use live camera,
[06:59.560 --> 07:09.880]  and that's important stuff. Okay, so this will be a bit of history and reasons and so
[07:09.880 --> 07:16.960]  on. I started to play with camera on pine phone, and first idea was, hey, Gstreamer is
[07:16.960 --> 07:22.840]  there to capture video. Let's use Gstreamer, right? Okay. I started capturing raw Bayer
[07:22.840 --> 07:31.040]  data because that's what should be most portable. I did some shell scripting, media control
[07:31.040 --> 07:38.240]  to set up the pipelines. That's not fun. And then just use Gstreamer to save the Bayer
[07:38.240 --> 07:47.840]  images to the disk. And I could do 200 kilopixels, which is not great, but better than no video
[07:47.840 --> 08:00.320]  at all maybe. And I realized that CPU can compress at 70 kilopixels images in real time,
[08:00.320 --> 08:08.440]  which is, well, people were doing this, but it's sometime ago. So I tried to improve. There's
[08:08.440 --> 08:18.960]  IUU format the camera could do, which is the Bayer tent converted to like for better processing.
[08:18.960 --> 08:28.000]  And I could capture up to 0.9 megapixel video with that. And if you were wanted, you could
[08:28.000 --> 08:36.160]  take a look there. Maybe it's useful for someone. But, well, that was the reason. The reason was
[08:36.160 --> 08:45.080]  called colorimetry. And someone in Gstreamer decided to do a regression basically. And all the
[08:45.080 --> 08:52.600]  Gstreamer stuff stopped working. And I realized that, well, perhaps it wasn't good too great to
[08:52.600 --> 09:00.880]  start with anyway. So I started looking around. Quickly, I found the camera, which is the future,
[09:00.880 --> 09:09.560]  right? And, well, it's C++. It didn't work at all on pineforms. So I had to do some quite heavy
[09:09.560 --> 09:17.400]  patching. I get some help on the mailing list. And I realized it has JPEG support, which is,
[09:17.400 --> 09:26.320]  well, you avoid a lot of stuff, because if JPEGs are already core space converted and compressed
[09:26.320 --> 09:35.760]  and so on. And I realized that maybe JPEG is worse having second look. So I did. You can't say
[09:35.760 --> 09:43.320]  data into megapixel resolution to flash, because the flash is not fast enough. But it was like
[09:43.320 --> 09:50.760]  almost possible. So, hey, JPEGs are four times smaller. Perhaps this could be adjusted. And
[09:50.760 --> 10:01.800]  saving sound is easy. So maybe we can, well, maybe we already have everything we need. And this is
[10:01.800 --> 10:10.080]  why how Unixic camera was born. I realized the second reason. Someone decided that placing
[10:10.080 --> 10:19.440]  Uncache data to user space is fun. And the camera decided that placing Uncache memory up to the
[10:19.440 --> 10:26.200]  application is great. I thought someone stole my CPU, because the performance penalty is about
[10:26.200 --> 10:35.800]  10 times. But not. It's just the way it is. I believe this needs to be fixed. If you fight with
[10:35.800 --> 10:43.240]  the streamer and the performance seems too bad, this is probably why it's too bad. And I don't
[10:43.240 --> 10:50.840]  know, talk to your kind of person which can change it. By the way, in the old days, we used to have
[10:50.840 --> 10:58.760]  a read interface to get data from the camera. This is now deprecated. Of course, it is faster to
[10:58.760 --> 11:07.120]  read the data than to get Uncache memory, right? That's how badly Uncache memory sucks. Anyway,
[11:07.120 --> 11:15.840]  so Unixic camera started. Audio is really simple. You just create a small C application to sound,
[11:15.840 --> 11:22.520]  record sound, split it to chunks so you can have easy processing later and timestamp them,
[11:22.520 --> 11:31.800]  which is important for synchronization. Live camera with some small hacks can write 35 frames per
[11:31.800 --> 11:38.640]  second to megapixel this data to the file system. All you need to do is edit timestamp and sim
[11:38.640 --> 11:45.080]  links so your preview can tell you which is the latest image. Very easy. Control application,
[11:45.080 --> 11:52.920]  you probably don't want to start your video record from command line, but that's also very easy. You
[11:52.920 --> 12:01.240]  just take some GTK and Python. It creates timestamps, telling you, hey, start recording it now,
[12:01.240 --> 12:10.480]  and displays preview, which is the most intensive thing there. And this is basically what runs
[12:10.480 --> 12:16.040]  during the recording, so this is to be determined a bit optimized. Post processing is not that
[12:16.040 --> 12:23.120]  important, right? So you just use Python and FFM pack to compress the resulting video stream. Easy.
[12:23.120 --> 12:29.560]  This is something I was pretty happy about. If you want to deprecate it, you will need some setup
[12:29.560 --> 12:36.440]  like patching clip camera and so on, but code is out there, and there will be easier method in
[12:36.440 --> 12:44.240]  future. So I like this solution because I could use multiple languages to do my camera recording,
[12:44.240 --> 12:51.600]  write language for the job. In the end, this was few hundreds of lines of code total. And it could
[12:51.600 --> 12:57.240]  do some quite interesting stuff. Like you could take still pictures during recording. You simply
[12:57.240 --> 13:04.000]  copy the GTK one more time. Easy. In video resolution, but if you are recording it at two
[13:04.000 --> 13:12.320]  megapixels from phone camera, I'd say this is going to be pretty decent picture anyway. You could
[13:12.320 --> 13:17.920]  take photos with arbitrary delay. Like you could even take photos before the user asked for them
[13:17.920 --> 13:24.760]  because you are taking all of them anyway, so you just don't delete them. This was fun.
[13:24.760 --> 13:36.480]  Then I've got access to LibreM 5, which is different in important ways. It has damp
[13:36.480 --> 13:44.240]  sensors, so it won't give you JPEG. But it had better support. Let camera work there out of the
[13:44.240 --> 13:51.920]  box. There was megapixel application, as I explained about before, it with patched megapixels,
[13:51.920 --> 13:59.320]  but it had no auto exposure, auto white balance, or autofocus support. It couldn't report video.
[13:59.320 --> 14:08.160]  And there's more issues on LibreM 5. Canon could use some work. It only gives you 8-bit data,
[14:08.160 --> 14:14.280]  which is not really good enough for good photos. You can select one of these three resolutions,
[14:14.280 --> 14:23.160]  so megapixel, three megapixels, or 13 megapixels, and for some reason only 23.5 frames per second
[14:23.160 --> 14:32.440]  work. I don't know why. Hardware has face detection autofocus, which is a very cool
[14:32.440 --> 14:39.240]  sounding toy, and I have to thank Purism for their hardware and for the great work they did on the
[14:39.240 --> 14:52.680]  process of Verstek. They are heroes. That's the best photo I got with Nokia N900. Some megapixels,
[14:52.680 --> 14:58.600]  they are very simple application. There's small development teams, so it's easy to work with,
[14:58.600 --> 15:04.800]  it's plain C, it's easy to mark patches. It does all the processing on the CPU,
[15:04.800 --> 15:12.840]  which is great if you want to change the processing. So I started to do auto exposure because that's
[15:12.840 --> 15:21.680]  the most important part, and I did a very simple one. I prototyped on N900 years ago. So basically,
[15:21.680 --> 15:31.320]  if you have too much, too white pixels, like overexposed, you need to turn it down to exposure,
[15:31.320 --> 15:38.800]  right? And if you don't have enough white enough pixels, you need to turn the exposure up,
[15:38.800 --> 15:45.760]  and this is it, and this works well enough. It takes a few seconds to converge, can be improved,
[15:45.760 --> 15:54.320]  I don't know how to do that, but this is good enough to take photos. Other thing is auto white
[15:54.320 --> 15:59.400]  balance. This is not that important because you can do it in post processing. Anyway,
[15:59.400 --> 16:06.880]  they did have manual white balance, so I felt this is easy enough to do. It will need some
[16:06.880 --> 16:13.960]  more work. Again, if it's too blue, you make it more red. If it's too red, you make it more blue.
[16:13.960 --> 16:23.600]  That's it, works well enough. And in a few hundred lines of code, I had simple software only,
[16:23.600 --> 16:32.840]  auto exposure, and I got that merged. Next step is autofocus. Autofocus is something
[16:32.840 --> 16:40.880]  which deserves more respect because you really want it tuned, but well, if you want to do it
[16:40.880 --> 16:48.080]  simply, you just start from the infinity. You compute blurriness of each frame, and you only
[16:48.080 --> 16:53.800]  need to take a look at part of the image if you want to save your CPU, and you start your sweep,
[16:53.800 --> 17:01.080]  you start to blink the focus closer, and when the image gets more blurry, well, you stop. You
[17:01.080 --> 17:07.760]  might want to go a little bit back because of the physical issues of the lens, but this works
[17:07.760 --> 17:18.800]  well better than manual focus, and I got it merged rather quickly. Next step was video, so I decided
[17:18.800 --> 17:28.840]  that I like the ideas from Unixi Camera, and simply did 0.8 megapixels recording directly to the
[17:28.840 --> 17:35.320]  disk. I hacked millipixels to save timestamped frames, and left post-processing after the
[17:35.320 --> 17:43.960]  user presses the stop button. Easy to do, obvious to their disadvantages, right? You are now limited
[17:43.960 --> 17:49.760]  by the disk space, and maybe you could say it's not quite nice to the flash to just stream raw
[17:49.760 --> 17:58.240]  data to it, but hey, the flash is cheap and the phone will die anyway. Post-processing is quite
[17:58.240 --> 18:05.720]  long, it takes five times slower than recording, or I guess this could be optimized. This is again
[18:05.720 --> 18:15.240]  my old code, so I'm Python with FFmpeg. Ideally, there is hardware to do the encoding, we should
[18:15.240 --> 18:23.880]  use it, but I feel that doing that is awful lot of work. Anyway, this is now upstream, so if you
[18:23.880 --> 18:33.080]  update your LibreM5, you should be able to take video off, and I believe it's important to have
[18:33.080 --> 18:41.120]  something other than video recording. Next thing I want to talk about, which is very exciting,
[18:41.120 --> 18:51.440]  is face detection autofocus. You may want to Google it for nice explanations, but basically they
[18:51.440 --> 18:57.640]  have selected some blue pixels, they are special, and they are special in a way that they only
[18:57.640 --> 19:05.240]  take light from certain day directions. So you have a lens, and if it's focused, it's okay,
[19:05.240 --> 19:13.920]  the light comes and meets at the sensor, but if you are autofocus, funny set of happens,
[19:13.920 --> 19:25.400]  and light from the left of the image ends up at different place on the sensor than the light
[19:25.400 --> 19:33.440]  from the right part of the lens. But if you block the light from the direction on the chip,
[19:33.440 --> 19:42.680]  which is easy to do, you can use it for focus. So if you take a line from the sensor, and you
[19:42.680 --> 19:50.120]  have on the top you will have left special pixels, and on the bottom you have right special
[19:50.120 --> 19:56.200]  pixels, for example, then you will have this. The tree you will see on the line will be at
[19:56.200 --> 20:03.520]  different positions on different special pixels. Well, and you can use this to focus, right? You
[20:03.520 --> 20:10.920]  just compute correlation between the two lines, and it directly tells you how much autofocus you
[20:10.920 --> 20:18.680]  are, and in which direction you should focus. This was great to play with, it was like hacking.
[20:18.680 --> 20:28.400]  Unfortunately, it is not too usable on LibreM5. They are two issues for the special pixels are
[20:28.400 --> 20:34.320]  quite far apart, which they basically have to, because if you made all the pictures special,
[20:34.320 --> 20:41.600]  you would have, you would lose your resolution, and it only works in the high resolution mode,
[20:41.600 --> 20:48.160]  and you don't want to run your preview in high resolution mode. So if someone is interested
[20:48.160 --> 20:53.840]  in fade detection autofocus, I have the code, the code is on the, on the GitLab somewhere. It was
[20:53.840 --> 21:02.480]  fun experiment, it worked, but I decided, like, for real focus, you would probably have to do
[21:02.480 --> 21:07.920]  hybrid, like do course focus using the fade detection, and then do contrast detection on the,
[21:07.920 --> 21:16.240]  on the end. It seemed like a lot of work, and with the driver, which would only give you 23
[21:16.240 --> 21:29.160]  frames per second, and so on. Well, I decided not to take this much. So I have some wish lists,
[21:29.160 --> 21:34.280]  and I think I have, like, five minutes left. So five minutes talking, or five minutes questions?
[21:34.280 --> 21:41.160]  Including everything. Including everything. Okay. So I have a long wish list for all of the world.
[21:41.160 --> 21:46.600]  I would like to have better media control support in the tools, because it just doesn't work.
[21:46.600 --> 21:53.880]  A piece changed, and the tools didn't catch up. I would like library to get conversions between
[21:53.880 --> 22:00.440]  formats, and so on. I would like better than 8B support. I would like multiple applications
[22:01.080 --> 22:07.880]  accessing the camera at the same time. Better support would be nice, and someone should
[22:07.880 --> 22:15.800]  re-resolve the caching problem, because that's bad. For lip camera, I shouldn't be really hacking
[22:15.800 --> 22:21.240]  millipixels. I should be hacking lip camera, but lip camera doesn't really support software ISP,
[22:21.240 --> 22:28.680]  and I'm not a great C++ hacker, so I could do it, but they will reject the patches if I do. So I
[22:28.680 --> 22:35.480]  would much prefer them to do the preparation, and then I would fill the code. And that's
[22:35.480 --> 22:52.280]  pretty much it. So time for questions.
[22:52.280 --> 22:52.280]  Sorry?
[22:52.280 --> 23:00.280]  We do want your work to software ISP.
[23:00.280 --> 23:07.160]  The comment is that they want my work on software ISP, and I guess I will want to cooperate, but
[23:08.520 --> 23:15.480]  lip camera is not easy to hack for me, because of the C++ stuff. So be patient, and I maybe
[23:17.960 --> 23:20.040]  it would be better if someone else did it.
[23:20.040 --> 23:33.080]  Yes, so well, there will be not be much to see. So you know, millipixels could use some work too,
[23:33.080 --> 23:44.440]  but I can take pictures, trust me. I didn't use autofocus for this, because, yes, I can do it.
[23:44.440 --> 23:57.000]  So it's now upstream, so you can just update the operating system, and you will get one,
[23:57.720 --> 24:04.200]  and it should be possible to do just a short video recording too, so now you have all been
[24:04.200 --> 24:14.680]  recorded, and now the CPU is busy converting that.
[24:14.680 --> 24:31.160]  Okay, so I guess.