[00:00.000 --> 00:18.120] Okay, so hello, I'm Pavel Majek and I'm here to talk about cameras, but you can also talk [00:18.120 --> 00:26.640] to me about clicker train horses, mobile phones, kernel, smartwatch by based on ESP32, Mobian [00:26.640 --> 00:36.200] on my molester. So first thing first, video following is not for cameras, it's for frame [00:36.200 --> 00:42.240] grabbers and they are really very different, which is basically what this talk will be about. [00:42.240 --> 00:50.840] They can do remote controls, but they cannot do autofocus for you and so on. But the interface is [00:50.840 --> 00:57.480] fairly simple, you just open depth video zero, select format and capture. Unfortunately, what you [00:57.480 --> 01:06.920] get is blurry photo, which will be either all white or all all black. This is with autofocus and [01:06.920 --> 01:15.960] auto something. Anyway, they are phones with smart sensors, one such example is pine phone and [01:15.960 --> 01:25.080] those are pretty close to the frame grabbers. They do basically everything in hardware. This [01:25.080 --> 01:30.520] used to be a pretty common design in the past, which made a good sense at that point because [01:30.520 --> 01:38.720] USB had limited bandwidth and you could not push uncompressed data through it. It's easy to [01:38.720 --> 01:47.400] sanitize, but it doesn't make much sense today. If you have like five lens on your phone, you [01:47.400 --> 01:55.480] don't want to have five JPEG encoders there. So we are moving to dump sensors, which basically do [01:55.480 --> 02:04.600] bare minimum. There you set parameters like exposure, gain, select area and so on. And it just passes [02:04.600 --> 02:13.560] the bare data over the fast bus and it usually ends up in your memory. And then you have component [02:13.560 --> 02:22.000] called ESP, which is image signal processor, which will do the JPEG conversion and such stuff. [02:22.000 --> 02:30.320] Unfortunately, in case of the interesting phones, which is official LibreM5, pine phone and pine [02:30.320 --> 02:37.880] phone pro, we either don't have the processor or we don't have drivers for that, so we can't use it. [02:37.880 --> 02:45.960] So this is how the image, this is a photo if you try to take it without the automatics. Can you [02:45.960 --> 02:55.840] recognize what's there? It's a USB connector. It's recognizable, I'd say. So what do we need to do? [02:55.840 --> 03:06.040] Nokia N900 is another example of complex design, which used to be very important historically. And [03:06.040 --> 03:14.000] actually the photos in the presentation are from N900 with open source stack. In real time, you need [03:14.000 --> 03:20.640] to do auto exposure because otherwise you will have black or white frame and you need auto exposure [03:20.640 --> 03:26.800] for autofocus. On most cameras, you really want autofocus too because you can't just focus to [03:26.800 --> 03:31.840] infinity and expect good image. And that's pretty much everything you need to do for [03:31.840 --> 03:38.080] the video recording in the real time. Then you have preview. Preview is a bit less important than [03:38.080 --> 03:45.000] the video recording, but it's also important. You need to convert from Bayer to RGB. And you [03:45.000 --> 03:50.920] need to do gamma connection because the sensors are linear in one side and exponential on the [03:50.920 --> 04:00.160] other side. GPU can help here. And then there are extensive post-processing steps like auto [04:00.160 --> 04:07.840] white balance, lens shading compensation, getting rid of bad pixels and probably many others I [04:07.840 --> 04:15.520] forgot about. Advantage of this is that this can be done after taking a photo or after recording [04:15.520 --> 04:23.240] the video. And there are quite good tools for that, including raw therapy, Euro and so on. So [04:23.240 --> 04:33.280] people were working, unlike the other parts, this got some work done before. So what we are [04:33.280 --> 04:39.520] talking, for example, on the N900, you have LED flash, which is a completely independent device. [04:39.520 --> 04:47.000] You have voice coil support for autofocus, which is again a separate device somewhere on [04:47.000 --> 04:53.920] I2C. Then you have two sensors, front and back camera. You have GPIO switch to select which [04:53.920 --> 05:03.160] camera you want. And then you have ISP, which is quite a complex piece of hardware, which will [05:03.160 --> 05:14.040] not be important for this presentation because we will do without it. So tools to use. There's [05:14.040 --> 05:24.120] great set of tools to use, but they have some limitations. One which looks very nice is G-streamer. [05:24.120 --> 05:30.440] And G-streamer is really great if you have an unlimited CPU. Unfortunately, you don't have [05:30.440 --> 05:39.080] unlimited CPU. If I was willing to hack its C code, it would be very powerful, but there's [05:39.080 --> 05:47.080] some learning curve involved in that too. And at the end, G-streamer might be right to use, [05:47.080 --> 05:54.240] but I found other tools easier. There's FFM pack, which has quite nice and very simple [05:54.240 --> 06:02.160] command line interface. So I used it at the end. I didn't really need much. Just please [06:02.160 --> 06:08.680] take these images and compress me every video from there. There's megapixels. Megapixels [06:08.680 --> 06:15.760] is a very nice application focused on mobile phone, very well optimized, but its origin [06:15.760 --> 06:26.880] is a pine phone, and they don't use live camera there. Then there's live camera. Everybody [06:26.880 --> 06:35.000] says live camera is future of video on Linux. It probably is, but there are still many steps [06:35.000 --> 06:45.200] to get there. And there's megapixels. Millipixels is fork of megapixels, which is supported to [06:45.200 --> 06:53.360] live frame 5 and to live camera more importantly. So in many ways, so megapixels actually currently [06:53.360 --> 06:59.560] looks nicer because it is based on newer GTK. On the other hand, millipixels use live camera, [06:59.560 --> 07:09.880] and that's important stuff. Okay, so this will be a bit of history and reasons and so [07:09.880 --> 07:16.960] on. I started to play with camera on pine phone, and first idea was, hey, Gstreamer is [07:16.960 --> 07:22.840] there to capture video. Let's use Gstreamer, right? Okay. I started capturing raw Bayer [07:22.840 --> 07:31.040] data because that's what should be most portable. I did some shell scripting, media control [07:31.040 --> 07:38.240] to set up the pipelines. That's not fun. And then just use Gstreamer to save the Bayer [07:38.240 --> 07:47.840] images to the disk. And I could do 200 kilopixels, which is not great, but better than no video [07:47.840 --> 08:00.320] at all maybe. And I realized that CPU can compress at 70 kilopixels images in real time, [08:00.320 --> 08:08.440] which is, well, people were doing this, but it's sometime ago. So I tried to improve. There's [08:08.440 --> 08:18.960] IUU format the camera could do, which is the Bayer tent converted to like for better processing. [08:18.960 --> 08:28.000] And I could capture up to 0.9 megapixel video with that. And if you were wanted, you could [08:28.000 --> 08:36.160] take a look there. Maybe it's useful for someone. But, well, that was the reason. The reason was [08:36.160 --> 08:45.080] called colorimetry. And someone in Gstreamer decided to do a regression basically. And all the [08:45.080 --> 08:52.600] Gstreamer stuff stopped working. And I realized that, well, perhaps it wasn't good too great to [08:52.600 --> 09:00.880] start with anyway. So I started looking around. Quickly, I found the camera, which is the future, [09:00.880 --> 09:09.560] right? And, well, it's C++. It didn't work at all on pineforms. So I had to do some quite heavy [09:09.560 --> 09:17.400] patching. I get some help on the mailing list. And I realized it has JPEG support, which is, [09:17.400 --> 09:26.320] well, you avoid a lot of stuff, because if JPEGs are already core space converted and compressed [09:26.320 --> 09:35.760] and so on. And I realized that maybe JPEG is worse having second look. So I did. You can't say [09:35.760 --> 09:43.320] data into megapixel resolution to flash, because the flash is not fast enough. But it was like [09:43.320 --> 09:50.760] almost possible. So, hey, JPEGs are four times smaller. Perhaps this could be adjusted. And [09:50.760 --> 10:01.800] saving sound is easy. So maybe we can, well, maybe we already have everything we need. And this is [10:01.800 --> 10:10.080] why how Unixic camera was born. I realized the second reason. Someone decided that placing [10:10.080 --> 10:19.440] Uncache data to user space is fun. And the camera decided that placing Uncache memory up to the [10:19.440 --> 10:26.200] application is great. I thought someone stole my CPU, because the performance penalty is about [10:26.200 --> 10:35.800] 10 times. But not. It's just the way it is. I believe this needs to be fixed. If you fight with [10:35.800 --> 10:43.240] the streamer and the performance seems too bad, this is probably why it's too bad. And I don't [10:43.240 --> 10:50.840] know, talk to your kind of person which can change it. By the way, in the old days, we used to have [10:50.840 --> 10:58.760] a read interface to get data from the camera. This is now deprecated. Of course, it is faster to [10:58.760 --> 11:07.120] read the data than to get Uncache memory, right? That's how badly Uncache memory sucks. Anyway, [11:07.120 --> 11:15.840] so Unixic camera started. Audio is really simple. You just create a small C application to sound, [11:15.840 --> 11:22.520] record sound, split it to chunks so you can have easy processing later and timestamp them, [11:22.520 --> 11:31.800] which is important for synchronization. Live camera with some small hacks can write 35 frames per [11:31.800 --> 11:38.640] second to megapixel this data to the file system. All you need to do is edit timestamp and sim [11:38.640 --> 11:45.080] links so your preview can tell you which is the latest image. Very easy. Control application, [11:45.080 --> 11:52.920] you probably don't want to start your video record from command line, but that's also very easy. You [11:52.920 --> 12:01.240] just take some GTK and Python. It creates timestamps, telling you, hey, start recording it now, [12:01.240 --> 12:10.480] and displays preview, which is the most intensive thing there. And this is basically what runs [12:10.480 --> 12:16.040] during the recording, so this is to be determined a bit optimized. Post processing is not that [12:16.040 --> 12:23.120] important, right? So you just use Python and FFM pack to compress the resulting video stream. Easy. [12:23.120 --> 12:29.560] This is something I was pretty happy about. If you want to deprecate it, you will need some setup [12:29.560 --> 12:36.440] like patching clip camera and so on, but code is out there, and there will be easier method in [12:36.440 --> 12:44.240] future. So I like this solution because I could use multiple languages to do my camera recording, [12:44.240 --> 12:51.600] write language for the job. In the end, this was few hundreds of lines of code total. And it could [12:51.600 --> 12:57.240] do some quite interesting stuff. Like you could take still pictures during recording. You simply [12:57.240 --> 13:04.000] copy the GTK one more time. Easy. In video resolution, but if you are recording it at two [13:04.000 --> 13:12.320] megapixels from phone camera, I'd say this is going to be pretty decent picture anyway. You could [13:12.320 --> 13:17.920] take photos with arbitrary delay. Like you could even take photos before the user asked for them [13:17.920 --> 13:24.760] because you are taking all of them anyway, so you just don't delete them. This was fun. [13:24.760 --> 13:36.480] Then I've got access to LibreM 5, which is different in important ways. It has damp [13:36.480 --> 13:44.240] sensors, so it won't give you JPEG. But it had better support. Let camera work there out of the [13:44.240 --> 13:51.920] box. There was megapixel application, as I explained about before, it with patched megapixels, [13:51.920 --> 13:59.320] but it had no auto exposure, auto white balance, or autofocus support. It couldn't report video. [13:59.320 --> 14:08.160] And there's more issues on LibreM 5. Canon could use some work. It only gives you 8-bit data, [14:08.160 --> 14:14.280] which is not really good enough for good photos. You can select one of these three resolutions, [14:14.280 --> 14:23.160] so megapixel, three megapixels, or 13 megapixels, and for some reason only 23.5 frames per second [14:23.160 --> 14:32.440] work. I don't know why. Hardware has face detection autofocus, which is a very cool [14:32.440 --> 14:39.240] sounding toy, and I have to thank Purism for their hardware and for the great work they did on the [14:39.240 --> 14:52.680] process of Verstek. They are heroes. That's the best photo I got with Nokia N900. Some megapixels, [14:52.680 --> 14:58.600] they are very simple application. There's small development teams, so it's easy to work with, [14:58.600 --> 15:04.800] it's plain C, it's easy to mark patches. It does all the processing on the CPU, [15:04.800 --> 15:12.840] which is great if you want to change the processing. So I started to do auto exposure because that's [15:12.840 --> 15:21.680] the most important part, and I did a very simple one. I prototyped on N900 years ago. So basically, [15:21.680 --> 15:31.320] if you have too much, too white pixels, like overexposed, you need to turn it down to exposure, [15:31.320 --> 15:38.800] right? And if you don't have enough white enough pixels, you need to turn the exposure up, [15:38.800 --> 15:45.760] and this is it, and this works well enough. It takes a few seconds to converge, can be improved, [15:45.760 --> 15:54.320] I don't know how to do that, but this is good enough to take photos. Other thing is auto white [15:54.320 --> 15:59.400] balance. This is not that important because you can do it in post processing. Anyway, [15:59.400 --> 16:06.880] they did have manual white balance, so I felt this is easy enough to do. It will need some [16:06.880 --> 16:13.960] more work. Again, if it's too blue, you make it more red. If it's too red, you make it more blue. [16:13.960 --> 16:23.600] That's it, works well enough. And in a few hundred lines of code, I had simple software only, [16:23.600 --> 16:32.840] auto exposure, and I got that merged. Next step is autofocus. Autofocus is something [16:32.840 --> 16:40.880] which deserves more respect because you really want it tuned, but well, if you want to do it [16:40.880 --> 16:48.080] simply, you just start from the infinity. You compute blurriness of each frame, and you only [16:48.080 --> 16:53.800] need to take a look at part of the image if you want to save your CPU, and you start your sweep, [16:53.800 --> 17:01.080] you start to blink the focus closer, and when the image gets more blurry, well, you stop. You [17:01.080 --> 17:07.760] might want to go a little bit back because of the physical issues of the lens, but this works [17:07.760 --> 17:18.800] well better than manual focus, and I got it merged rather quickly. Next step was video, so I decided [17:18.800 --> 17:28.840] that I like the ideas from Unixi Camera, and simply did 0.8 megapixels recording directly to the [17:28.840 --> 17:35.320] disk. I hacked millipixels to save timestamped frames, and left post-processing after the [17:35.320 --> 17:43.960] user presses the stop button. Easy to do, obvious to their disadvantages, right? You are now limited [17:43.960 --> 17:49.760] by the disk space, and maybe you could say it's not quite nice to the flash to just stream raw [17:49.760 --> 17:58.240] data to it, but hey, the flash is cheap and the phone will die anyway. Post-processing is quite [17:58.240 --> 18:05.720] long, it takes five times slower than recording, or I guess this could be optimized. This is again [18:05.720 --> 18:15.240] my old code, so I'm Python with FFmpeg. Ideally, there is hardware to do the encoding, we should [18:15.240 --> 18:23.880] use it, but I feel that doing that is awful lot of work. Anyway, this is now upstream, so if you [18:23.880 --> 18:33.080] update your LibreM5, you should be able to take video off, and I believe it's important to have [18:33.080 --> 18:41.120] something other than video recording. Next thing I want to talk about, which is very exciting, [18:41.120 --> 18:51.440] is face detection autofocus. You may want to Google it for nice explanations, but basically they [18:51.440 --> 18:57.640] have selected some blue pixels, they are special, and they are special in a way that they only [18:57.640 --> 19:05.240] take light from certain day directions. So you have a lens, and if it's focused, it's okay, [19:05.240 --> 19:13.920] the light comes and meets at the sensor, but if you are autofocus, funny set of happens, [19:13.920 --> 19:25.400] and light from the left of the image ends up at different place on the sensor than the light [19:25.400 --> 19:33.440] from the right part of the lens. But if you block the light from the direction on the chip, [19:33.440 --> 19:42.680] which is easy to do, you can use it for focus. So if you take a line from the sensor, and you [19:42.680 --> 19:50.120] have on the top you will have left special pixels, and on the bottom you have right special [19:50.120 --> 19:56.200] pixels, for example, then you will have this. The tree you will see on the line will be at [19:56.200 --> 20:03.520] different positions on different special pixels. Well, and you can use this to focus, right? You [20:03.520 --> 20:10.920] just compute correlation between the two lines, and it directly tells you how much autofocus you [20:10.920 --> 20:18.680] are, and in which direction you should focus. This was great to play with, it was like hacking. [20:18.680 --> 20:28.400] Unfortunately, it is not too usable on LibreM5. They are two issues for the special pixels are [20:28.400 --> 20:34.320] quite far apart, which they basically have to, because if you made all the pictures special, [20:34.320 --> 20:41.600] you would have, you would lose your resolution, and it only works in the high resolution mode, [20:41.600 --> 20:48.160] and you don't want to run your preview in high resolution mode. So if someone is interested [20:48.160 --> 20:53.840] in fade detection autofocus, I have the code, the code is on the, on the GitLab somewhere. It was [20:53.840 --> 21:02.480] fun experiment, it worked, but I decided, like, for real focus, you would probably have to do [21:02.480 --> 21:07.920] hybrid, like do course focus using the fade detection, and then do contrast detection on the, [21:07.920 --> 21:16.240] on the end. It seemed like a lot of work, and with the driver, which would only give you 23 [21:16.240 --> 21:29.160] frames per second, and so on. Well, I decided not to take this much. So I have some wish lists, [21:29.160 --> 21:34.280] and I think I have, like, five minutes left. So five minutes talking, or five minutes questions? [21:34.280 --> 21:41.160] Including everything. Including everything. Okay. So I have a long wish list for all of the world. [21:41.160 --> 21:46.600] I would like to have better media control support in the tools, because it just doesn't work. [21:46.600 --> 21:53.880] A piece changed, and the tools didn't catch up. I would like library to get conversions between [21:53.880 --> 22:00.440] formats, and so on. I would like better than 8B support. I would like multiple applications [22:01.080 --> 22:07.880] accessing the camera at the same time. Better support would be nice, and someone should [22:07.880 --> 22:15.800] re-resolve the caching problem, because that's bad. For lip camera, I shouldn't be really hacking [22:15.800 --> 22:21.240] millipixels. I should be hacking lip camera, but lip camera doesn't really support software ISP, [22:21.240 --> 22:28.680] and I'm not a great C++ hacker, so I could do it, but they will reject the patches if I do. So I [22:28.680 --> 22:35.480] would much prefer them to do the preparation, and then I would fill the code. And that's [22:35.480 --> 22:52.280] pretty much it. So time for questions. [22:52.280 --> 22:52.280] Sorry? [22:52.280 --> 23:00.280] We do want your work to software ISP. [23:00.280 --> 23:07.160] The comment is that they want my work on software ISP, and I guess I will want to cooperate, but [23:08.520 --> 23:15.480] lip camera is not easy to hack for me, because of the C++ stuff. So be patient, and I maybe [23:17.960 --> 23:20.040] it would be better if someone else did it. [23:20.040 --> 23:33.080] Yes, so well, there will be not be much to see. So you know, millipixels could use some work too, [23:33.080 --> 23:44.440] but I can take pictures, trust me. I didn't use autofocus for this, because, yes, I can do it. [23:44.440 --> 23:57.000] So it's now upstream, so you can just update the operating system, and you will get one, [23:57.720 --> 24:04.200] and it should be possible to do just a short video recording too, so now you have all been [24:04.200 --> 24:14.680] recorded, and now the CPU is busy converting that. [24:14.680 --> 24:31.160] Okay, so I guess.