Good afternoon all. I'm here to talk about the VVC decoder in FFMPEG. I'm going to introduce VVC. I should imagine if you're in this room you're already somewhat familiar or at least interested but I'll refresh some of the coding tools and some of the objectives that it has. Talk about where FFVVC, the FFMPEG VVC decoder fits into that. Again, what new coding tools VVC introduces. Talk a bit about the threading model which is one of the most more interesting things for those of you who already have some experience with FFMPEG. Then go over performance, how that compares to previous codecs and the other VVC decoders out there. Conclude the talk, talking a little bit about the Google Summer of Code program this summer and the next steps for FFMPEG. First of all, a disclaimer. I did not write very much of this code at all. The credit should go to Noemi in China who unfortunately couldn't be here today. Who am I? I am Frank Palmer. You can find me at frankclammer.com. There's various other contact details on there. I was one of the Google Summer of Code students this summer working on this project. As you saw in the agenda, we'll talk a little bit more about what that involved later. Going into the introduction then. VVC or not H265, H266, that should read, is a new standard from the Java. It's succeeding H264 and HEVC, so quite big boots to follow. It's got two main objectives. It aims to have 50% lower bit rates than HEVC for the same quality of video. As the name suggests, versatility is the other main objective. That involves a lot of new coding tools for things like screen content coding, adaptive resolution change for things like video teleconferencing, independent sub-pictures. Versatile applications underlie a lot of the decisions made in the design of VVC. The open source landscape of VVC. For encoders, you have VTM, which is the reference software. You're not really going to want to use that for practical encoding. You have ENC, VVNC, which is developed by the Fraunhofer Institute. That is a practical decoder, encoder very fast. Finally, you have UVG266, which is an open source project developed by the community. Then on the decoder side, you again have VTM. You have the dual of VVNC, you have VVDEC, which I believe there's a lightning talk on that in a little while, which is very fast, very good decoder. You have also developed by the Fraunhofer Institute. You have OpenVVC, which is a community project VVC decoder, which is relatively performant for a single core. Unfortunately, that has now been abandoned. I don't think there's been a commit in about two years. Finally, we have what this talk is introducing, FFVVC. The state of FFVVC, the C code was merged at the start of the year. I believe it was a month ago exactly now. As John Baptiste talked about in his talk a little while ago, we believe it will be in FFMPEG 7.0, but possibly under some sort of experimental flag. The Inter-Prediction Assembly has just been merged about a week ago. We have some other assembly that has been written and is in the review process. It's important to note though that FFVVC is not yet maintain complete. There are some coding tools that are missing. The big one that we've heard from the community is intra-block copy support is not yet implemented. There is a patch set for that that's in the works. I'd be doubtful it will be in the 7.0 release of FFVVC though. Most of the other features that are missing are things that are a bit more exotic than intra-block copy. Features such as wrap around for 360 degree videos not yet implemented, independent sub-pictures, reference picture resizing, some of the more exotic stuff, but that will all come in time. This shows the assembly status, what has been written so far, what we're prioritizing, and what we've been able to reuse from HEVC. Things that we've prioritized so far are largely low hanging fruits. The inter-prediction we were able to reuse quite a lot of that from HEVC for good gain. SAO is entirely identical between HEVC and VVC so we've been able to rip that directly. Inter-prediction and ALF are both big contributors to the decode time in C only, their high priority. One of the GSOC projects last year was working on the ALF stuff so we'll talk about that a bit more so that's on its way. Inter we've managed to get some bits out of David for the more generic stuff just like averaging functions. That's been effective in getting a quick speed up there but we need your help with this. There's not many of us working on this at the moment and there's a lot of assembly to write. That's going to be key to performance as we'll see in the performance later on. Decoder size. I believe the biggest decoder now in FFMPEG in terms of lines of C. I'm not sure how it compares to David but even being the biggest decoder in FFMPEG it's still much smaller than open VVC and VVDC as you can see here. How do we manage to achieve that? By being in FFMPEG basically we're able to reuse parts from previous codecs. We're able to use the CBS Quebec reader you can see there and reuse like whole swathes of code also parts of the binary so it's kind of hard to measure that but you get a more bang for your buck in terms of the size of a compiled delivery codec. In the future I believe we may be able to also use some aspects of hardware decoder APIs to do the DPB reference management. We managed to be much much smaller and that's one of the main reasons really motivating putting this inside of FFMPEG. The other one being FFMPEG's vibrant community we can say which hopefully will help maintain this into the future. Moving on to what's new in VVC so there's a lot of new coding tools like a dizzying amount. You can see here you could talk for an hour and many people have about even a subset of these. As you say we haven't implemented them all yet but there's loads to play with which yeah feedback to them the ability to make much smaller bit streams and also to make more versatile video content. What FFVVC introduces that's new for FFMPEG is this stage-based thread model so lots of previous codecs have the frame and slice thread models which do well for sort of low number of cores but have some sort of here ceiling at certain point and so FFVVC uses a much more fine-grained thread model which is able to allocate threads based on the stage of decoding individual CTUs and yeah as that says it means we're able to much better utilize higher core counts and so our C code with no assembly we're able to decode 4k over 30 fps on you know relatively high-end desktop processor but I think that's really impressive. This thread model is possible to implement in HEVC. FFVVC does not use it I think it's also possible to do stage-based decoding in AV1 but it wasn't a factor in the design of AVC. The way that it works is you divide each CTU into several stages of decoding they're all listed there and the key thing is that each stage depends only on the current or previous stage of the neighboring CTUs and so you can start doing the D block of one stage before you've done the pass even in the like top left corner very far away sorry before you've done the intro I think you have to do the pass for all first and the effect you get from this is this sort of wave front that progresses across the image of each of the different stages and yeah it allows you to use much more cores. To allocate those cores we've had to introduce this new AV executor utility which has been made available in LibAVUtil so you can use this for other projects inside FFMpeg. It's a really simple algorithm at the moment but centralizing the control of allocation of threads you know not repeating yourself means we have now one location where we can make improvements here. It's a really simple algorithm it's based on I think some of the earlier implementations inside Python and Java's executor structures or whatever they call them but yeah having that one thing in one location that can be used throughout FFMpeg to improve multi-threading. Yeah so onto the performance section so at the moment it's pretty slow compared to previous codecs I mean this is to be expected by to a certain extent VVC is just a more complex codec than previous generation stuff it has to be in order to achieve high rates compression. This SIMD here false and true for FFVVC so this is with stuff that's not yet in FFMpeg master this is with the current state on the development staging repo. You can see we are getting about over 200 over a doubling of speed increase for FFVVC already but there's a long way to go as you can see from David's really impressive assembly speed up they have there but our multi-threading picture is quite different so that shows you the effect of doing that stage based multi-threading we're just much more easily able to use higher numbers cause yeah note here that this is using hyperthreading which is why you've got quite the knee there at six threads and but below six threads it's really not far off from that ideal you get a core you get the same multiplicative increase in the speed up comparing it to VVDC then. VVDC uses the same stage based threading model so you're getting a very similar performance between FFVVC and VVDC. Open VVC uses the conventional frame and tile based multi-threading techniques so that's quite useful on the left hand side there that figure to compare what is the effect of this new threading model but you can see and then on the right hand side the single threaded performance C only between FFVVC and VVDC is pretty much on par. VVDC behaves has quite significantly different performance on different operating systems but the average between the two is pretty much the same and on 4k it's a similar picture but everything just gets slightly more pronounced. Open VVC is slower that the speed up that we're getting from using more threads matters even more for larger videos so you can see that effect here but we're still lacking on the assembly front so VVDC has a lot of assembly already for quite a few different architectures and you can see that they're really pulling ahead once you enable the assembly there. The theoretically FFNPEG VVDC decoder should have somewhat of a higher ceiling due to the fact that FFVVC's assembly will be handwritten whereas VVDC's is using intrinsics and on some architectures using SIMD anywhere as like a portable SIMD library which introduces them overhead so with enough time hopefully FFVVC can be even faster but we've got a long way to go to catch up to them at the moment. So just sort of wrapping up to the last couple of things here so talking about the Google Summer of Code program in 2023 so there was two Google Summer of Code students contributing to the VVDC decoder this summer. Myself and Sean Liu so I worked on a lot of the stuff that was added in version two of VVDC so that includes the support for 12 and 14 bit which needs the range extension which changes various things to the entropy encoder when you get to higher bit depths and I've also been working on AVX2 optimizations for the inverse transforms they all had to be written from scratch in the end there's not very much that you can share between HEVC and VVC due to the way that the HEVC transforms are written in FFNPEG and Sean Liu is working on also on assembly transforms for the filters which some of them are in the process of being upstreamed at the moment I believe. So yeah next steps as I'm sure this performance and what we've been working on has sort of shown we've got a very solid baseline with the C performance and the multi-threading but we need lots more assembly in there to be able to compete with existing decoders so upstreaming and what we've already got implementing more functions with assembly also more architectures so ARM is going to be a Google Summer of Code project for this summer potentially also risk five there's a lot of work on doing risk five assembly for FFNPEG at the moment so we'll need that in time polishing off the maintain conformance so implementing those features that I mentioned for missing earlier particularly intra block copy is a high priority the thread optimization 32 plus cores so we may be able to improve the AVX2 utility for higher core counts if there's sufficient demand for that and the GPU based decoder so a lot of the stuff in VBC is really well designed particularly to do with the separation of stages that we saw earlier means that it's really well suited to decoding on the GPU so that's something on the far horizon. Concluding so FFNPEG now has a VBC decoder I've introduced that new threading model showing some of the benefits of that talks about the C in multi-threading performance and how that compares with VVDC and given an update on the status including the optimized assembly we're currently working on we'd help with this like especially with the assembly there's just very few of us who only work in our free time so progress on that front has been relatively slow so yeah patches welcome alright yeah thank you very much for listening. If anyone's got any questions I'll be happy to try and answer them as best I can as I said in that just like disclaimer I did not write very much of this code I just did you know the bits I've talked about and then I've worked on doing bug fixes especially since we've one thing I forgot to mention part of why we're going to have to be experimental is OSS fuzz we've only recently started being fuzzed since we went into FFNPEG master so we're getting a lot of reports for that at the moment that we're trying to work through before we go into like a normal release but I'll try and answer any questions as best I can yes. So the question was have we considered trying to use C in forensics? Yeah as a step between having fully C code and having handwritten assembly for everything it's not the FFNPEG way FFNPEG everything is handwritten assembly I think there's a little bit in like lib SW scale I believe but that's when the FFNPEG is in the process of removing that tiny bit of C in forensics that we still have so yeah I mean we're probably not going to do that just out of you can go faster with handwritten assembly so if we're trying to get that same performance and even be VVDC I think it's the only way to go really. Okay there's no more questions yeah thank you very much.