[00:00.000 --> 00:11.200]  So, we continue with our next talk, again, about Codex, next time about VBC, with two
[00:11.200 --> 00:17.200]  projects about encoding and decoding VBC, please welcome Adam.
[00:17.200 --> 00:25.520]  Hi, everyone, so today I want to introduce you to VBank and VBdeck, those are open source
[00:25.520 --> 00:27.720]  implementations for VBC.
[00:27.720 --> 00:34.520]  Now, to pick everyone up about VBC, so VBC is this new codec that was finalized just
[00:34.520 --> 00:39.720]  over two years ago, and if you want to know one thing about VBC, you basically, VBC allows
[00:39.720 --> 00:45.520]  you to have the same quality as HEBC at half the bitrate, and on top of that, it was developed
[00:45.520 --> 00:51.160]  by the Dravet, which is the joint video experts group, and it's called a versatile, because
[00:51.160 --> 00:53.480]  it's applicable in versatile scenarios, right?
[00:53.480 --> 00:59.440]  So we have support for screen content, HDR, as we heard in the previous talk, immersive
[00:59.440 --> 01:05.600]  8K, and we can do some fancy stuff like doing adaptive streaming with OpenGob.
[01:05.600 --> 01:14.320]  All right, so now let me talk you through a little bit of the background of our projects,
[01:14.320 --> 01:16.320]  VBdeck and VBank.
[01:16.320 --> 01:21.920]  So of course, those are both, you know, those are team efforts, right?
[01:21.920 --> 01:27.600]  They're developed by a whole team of researchers at the front of our HHI, mostly in the video
[01:27.600 --> 01:29.040]  coding systems group.
[01:29.040 --> 01:34.000]  Now front of our HHI, if you don't know it, like modern video coding probably wouldn't
[01:34.000 --> 01:41.400]  be what it is if it wasn't for HHI, and HHI is part of a biggest European research organization,
[01:41.400 --> 01:45.400]  the Front of our Society, which is a big German non-profit.
[01:45.400 --> 01:50.200]  And then about me, so I'm Adam Iskovsky, I've been at HHI since 2016, I've been leading
[01:50.200 --> 01:55.640]  the project since 2019, and since about a year I'm also the co-head of the video coding
[01:55.640 --> 01:58.720]  systems group.
[01:58.720 --> 02:00.760]  So why did we even start the software project?
[02:00.760 --> 02:07.160]  So basically, you know, there was HVC for which the test model was HM, and HVC uses
[02:07.160 --> 02:08.160]  square blocks.
[02:08.160 --> 02:14.440]  So they had this method of indexing blocks within, like within the frames using this
[02:14.440 --> 02:18.840]  set index method, which is really amazing for square blocks.
[02:18.840 --> 02:25.200]  And then, you know, they were exploring VVC based on the exploration model, which was
[02:25.200 --> 02:29.280]  still based on HM, except VVC supports rectangular blocks, right?
[02:29.280 --> 02:32.520]  So it's more than only square blocks, and there were really a lot of code for working
[02:32.520 --> 02:35.320]  around this set index thing.
[02:35.320 --> 02:37.880]  And at HHI, we wanted to do even more than that.
[02:37.880 --> 02:41.440]  So we started work on our own partitioner, and we just decided that this is not going
[02:41.440 --> 02:42.440]  to work.
[02:42.440 --> 02:46.800]  And what we had to do, we basically had to write our own software to deal with it, which
[02:46.800 --> 02:54.160]  we very creatively named the next software, which later became the VTM 1.0.
[02:54.160 --> 02:58.800]  And basically, the biggest difference is we had one big map that was mapping the position
[02:58.800 --> 03:04.280]  within the frame to like an object that was describing the current coding block.
[03:04.280 --> 03:09.480]  So the next software, it became the VTM, which is the reference software for the VVC standard.
[03:09.480 --> 03:13.800]  And you can see there in the graph how the VTM was developing over time with regards
[03:13.800 --> 03:19.320]  to the gains over HM with the encoding time, decoding time.
[03:19.320 --> 03:24.080]  And here, you can also see how we started our implementation projects from VTM.
[03:24.080 --> 03:28.720]  So from the VTM 3.0, super early, we already started work on the VVDec.
[03:28.720 --> 03:35.240]  And then from VTM 6.0, we started the work on VVNG.
[03:35.240 --> 03:39.960]  Then in the early 2020, Benjamin Pross became my boss.
[03:39.960 --> 03:45.240]  We basically started the VCS group, and he brought up the idea maybe we can do the project's
[03:45.240 --> 03:51.520]  open source, which we did initially end of 2020 under a little bit shaky license.
[03:51.520 --> 03:55.640]  But with some back and forth with the headquarters, we were able to change it to like a modified
[03:55.640 --> 03:57.080]  BSD3 license.
[03:57.080 --> 04:02.240]  And after some more back and forth, we actually have an unmodified, like a standard open source
[04:02.240 --> 04:04.240]  license, the clear BSD3.
[04:04.240 --> 04:12.720]  All right, so let's talk some more about the projects, you know, some hard facts.
[04:12.720 --> 04:19.280]  So as I already mentioned, they are based off VTM, they are both written fully in C++,
[04:19.280 --> 04:21.720]  but we do have like a pure C interface.
[04:21.720 --> 04:29.160]  So you can integrate it into frameworks or, you know, just use it from a pure C code.
[04:29.160 --> 04:33.720]  Those are C++ projects, so they are object-based, but it's kept very simple.
[04:33.720 --> 04:38.480]  So we try to, you know, not hide anything, no getters, no setters, and like have all
[04:38.480 --> 04:42.160]  the control over what is happening in the memory.
[04:42.160 --> 04:48.320]  Contrary to some other, you know, some other projects, we do not do assembler at all.
[04:48.320 --> 04:53.440]  We do only vectorization using intrinsics, which of course has the advantage that we
[04:53.440 --> 05:00.640]  get stuff like ARM support for free through this amazing SIMD everywhere library.
[05:00.640 --> 05:07.520]  Also support for Vasem cross compilation, so WebAssembly, which we also did.
[05:07.520 --> 05:11.360]  And what we try to do, we try to make those projects as simple as possible.
[05:11.360 --> 05:16.000]  So basically, we only expose options that are use case relevant, but like the coding
[05:16.000 --> 05:21.360]  options, everything that's like connected to efficiency, we try to define it for the
[05:21.360 --> 05:24.760]  user to just have the simplest experience possible.
[05:24.760 --> 05:30.960]  Yeah, they're both available on GitHub under the PSD3 close clear license, as I already
[05:30.960 --> 05:31.960]  mentioned.
[05:31.960 --> 05:34.160]  All right, so how do we do the development?
[05:34.160 --> 05:37.520]  The development is done internally without HHIs.
[05:37.520 --> 05:46.400]  We have our own main Git repo that we basically, from which we push squashed updates to the
[05:46.400 --> 05:47.400]  GitHub repo.
[05:47.400 --> 05:48.400]  Why is it internal?
[05:48.400 --> 05:53.360]  I know many people find it a little bit, you know, there might be issues with that, but
[05:53.360 --> 05:58.280]  you know, here on the right, you can see, or maybe you cannot see, a typical magic
[05:58.280 --> 06:03.440]  quest that I would, you know, do on the internal stuff, on the internal repo, and I think this
[06:03.440 --> 06:08.480]  is just not something that is ready to be released to the public, you know.
[06:08.480 --> 06:14.040]  So I would rather, you know, hide those kind of hiccups, and yeah, it just takes too much
[06:14.040 --> 06:19.000]  time to make stuff nice for, you know, for being public.
[06:19.000 --> 06:23.680]  And I also think not everyone at HCI might be comfortable, you know, being like a public
[06:23.680 --> 06:24.680]  developer.
[06:24.680 --> 06:29.480]  But yeah, all of the stuff that we have internally, it eventually goes to the public repo, either
[06:29.480 --> 06:35.080]  for new releases, to fix bugs or issues, you know, we develop a big new feature, we would
[06:35.080 --> 06:40.440]  push it, and you know, if someone was to make a large contribution, of course, to rebase
[06:40.440 --> 06:41.440]  the code.
[06:41.440 --> 06:46.720]  All right, so VBDec, the decoder project.
[06:46.720 --> 06:51.840]  So the highlights, it's fully compliant with the main 10 profile of EBC, of course, give
[06:51.840 --> 06:56.720]  or take a few bugs, so actually it was supposed to be fully compliant since the version 1.0.
[06:56.720 --> 07:00.680]  But you know, stuff happens, we find bugs, we fix the bugs.
[07:00.680 --> 07:04.040]  But basically there is no feature that is really missing.
[07:04.040 --> 07:10.280]  It's multi-platform, so it works on all the major, all the major OSs, and it also works
[07:10.280 --> 07:13.880]  on different architectures, thanks to Zimdi everywhere, as mentioned.
[07:13.880 --> 07:19.520]  So we have, like, x86 support, ERM, I also saw people, you know, doing builds for RISC
[07:19.520 --> 07:21.320]  V.
[07:21.320 --> 07:23.200]  So this is very nice.
[07:23.200 --> 07:28.280]  Also one thing I'm kind of proud of also is that from the first version we have like
[07:28.280 --> 07:30.200]  a unified thread pool.
[07:30.200 --> 07:35.440]  So also something that was already talked about, basically everything is, like, all the tasks
[07:35.440 --> 07:40.120]  are collected within this one thread pool that just balances itself, right?
[07:40.120 --> 07:44.880]  There is, like, no frame threads, slice threads, which also means we are multi-threading is
[07:44.880 --> 07:46.920]  independent of bitstream features, right?
[07:46.920 --> 07:53.200]  We don't need tiles, slices to be able to parallelize, so I think this is a really nice
[07:53.200 --> 07:54.200]  feature.
[07:54.200 --> 07:56.720]  All right, let's look a little bit into the development history.
[07:56.720 --> 08:02.080]  So this on the left is, like, the performance graph for, you know, different resolutions,
[08:02.080 --> 08:08.280]  different coding conditions, so like random access or all intram, and, you know, major
[08:08.280 --> 08:13.360]  milestones mentioned 1.0, full compliance with the standard in the version 1.3.
[08:13.360 --> 08:18.040]  We did a three times memory reduction, and, you know, I was also asking myself, how could
[08:18.040 --> 08:23.480]  we ever release any other, how could we have ever released any other version?
[08:23.480 --> 08:27.560]  But yeah, at least we managed to get it better.
[08:27.560 --> 08:34.000]  1.4 we got a major performance boost based of external contributions in GitHub, so this
[08:34.000 --> 08:35.440]  was really nice to see.
[08:35.440 --> 08:40.000]  And in between, we really had a lot of, you know, small improvements, and as you can see
[08:40.000 --> 08:42.320]  in the graphs, it does add up.
[08:42.320 --> 08:49.520]  All right, about the VVank, so VVank, you know, it's an encoder project.
[08:49.520 --> 08:52.720]  Of course, it's way more complex, way more interesting.
[08:52.720 --> 08:54.240]  It has way more degrees of freedom, right?
[08:54.240 --> 08:59.680]  Like the decoder just does this one thing and has zero degree of freedom, right?
[08:59.680 --> 09:05.040]  Basically the standard tells us exactly what to do, and the encoder has all of those choices
[09:05.040 --> 09:06.560]  that it has to do.
[09:06.560 --> 09:12.040]  Anyway, what I like to say is, you know, it's basically the best open source encoder out
[09:12.040 --> 09:13.040]  there.
[09:13.040 --> 09:16.200]  As you can see here on the right, it's runtime versus efficiency.
[09:16.200 --> 09:21.920]  So, you know, for a given runtime, we can have the best efficiency, and for any of our
[09:21.920 --> 09:26.760]  working points, we can get this efficiency the fastest.
[09:26.760 --> 09:30.200]  Of course, you know, it's not the best encoder if you want to encode UHD live.
[09:30.200 --> 09:31.720]  We're not quite there yet.
[09:31.720 --> 09:36.880]  But, you know, at those slower working points, it really doesn't get better than that, at
[09:36.880 --> 09:39.000]  least not that I am aware of.
[09:39.000 --> 09:44.480]  All right, so we have five presets on the encoder, from faster to slower.
[09:44.480 --> 09:49.480]  And you can see, this is a single-threaded graph, you can see more or less how those
[09:49.480 --> 09:53.120]  presets compare to XO65 in orange.
[09:53.120 --> 09:57.720]  With very efficient multi-treading, at least like between, you know, up to eight threads
[09:57.720 --> 10:00.280]  or with up to 16 threads with some additional options.
[10:00.280 --> 10:04.960]  I'm going to talk about this a little bit in two slides.
[10:04.960 --> 10:10.360]  And we have a very good optimization for human visual system based on the XPS and our metric,
[10:10.360 --> 10:12.280]  which I'm also going to mention a little bit later.
[10:12.280 --> 10:17.640]  We have really excellent rate control, so, you know, with bit rate deviations, rarely
[10:17.640 --> 10:22.240]  more than 2%, and like almost never more than 5%.
[10:22.240 --> 10:28.600]  As I mentioned, simple interface, and it's, you know, this thing can be used for academic,
[10:28.600 --> 10:32.880]  amateur, commercial uses, the license really allows it all.
[10:32.880 --> 10:35.520]  All right, so we have those five presets.
[10:35.520 --> 10:36.960]  How do we derive those presets?
[10:36.960 --> 10:41.560]  Like from an academic point of view, this is actually done in a very interesting way.
[10:41.560 --> 10:46.800]  So we take all not use case-related options of the encoder, and we do, like, large-scale
[10:46.800 --> 10:47.800]  optimization.
[10:47.800 --> 10:53.200]  So we start with a very simple, very fast working point, and then we try to derive which
[10:53.200 --> 10:56.520]  option should be changed as the next one.
[10:56.520 --> 11:00.760]  And this is always the option which basically gives incrementally the next best working
[11:00.760 --> 11:01.760]  point.
[11:01.760 --> 11:02.760]  Right?
[11:02.760 --> 11:03.760]  And this is a huge optimization.
[11:03.760 --> 11:08.720]  It takes really a lot of compute time, so we don't do this so often, like every two
[11:08.720 --> 11:13.840]  or three versions, or if we know one tool got implemented way better, we can, like,
[11:13.840 --> 11:20.000]  try to only optimize for this tool within the option space from the last optimization.
[11:20.000 --> 11:27.080]  We target HD and UHD, natural content, but we do sanity checks for resolution and, like,
[11:27.080 --> 11:30.640]  screen content or, you know, HDR.
[11:30.640 --> 11:36.760]  And the one issue that we still have is, you know, at the beginning here, you can see that
[11:36.760 --> 11:38.840]  our curve gets a little bit steeper, right?
[11:38.840 --> 11:43.960]  So we cannot go too fast, because at some point, for, like, every two times speed up,
[11:43.960 --> 11:47.240]  we're just losing too much efficiency, this is because, you know, we started from the
[11:47.240 --> 11:53.200]  reference software which was designed for the ability, and the efficiency is still work
[11:53.200 --> 11:54.200]  in progress.
[11:54.200 --> 11:58.360]  All right, about the multi-treading.
[11:58.360 --> 12:03.400]  So our multi-treading is also, I would say, done differently than in many other encoders,
[12:03.400 --> 12:12.480]  so we do the multi-treading over CTUs, so, like, the modern macro blocks and CTU lines,
[12:12.480 --> 12:18.640]  like we simulate a wavefront parallel processing without using the syntax, and we parallelize
[12:18.640 --> 12:20.160]  independent frames, right?
[12:20.160 --> 12:23.680]  Like two frames are independent of each other and the references are already done, we can
[12:23.680 --> 12:29.720]  do those in parallel, which, of course, means that, you know, how much we can parallelize
[12:29.720 --> 12:35.320]  depends on the number of CTUs that are available, which are always more in high resolution or
[12:35.320 --> 12:37.800]  with smaller CTU sizes.
[12:37.800 --> 12:42.640]  That's why the faster and fast presets, which have smaller CTUs, they parallelize a little
[12:42.640 --> 12:49.320]  bit better, which you can see on the top right there in the full HD parallelization efficiency
[12:49.320 --> 12:50.320]  plot, right?
[12:50.320 --> 12:55.960]  You can see, like, after eight, it kind of, well, it doesn't saturate, but, like, after
[12:55.960 --> 13:04.240]  eight threads, there might be better ways to utilize the resources than to just enable
[13:04.240 --> 13:06.240]  more threads.
[13:06.240 --> 13:10.160]  Exactly, and how can we improve the scaling?
[13:10.160 --> 13:15.360]  So we can improve the scaling by enabling normative features of EBC, so either doing
[13:15.360 --> 13:22.160]  tiles that is independent regions within the picture, or enabling the normative wavefront,
[13:22.160 --> 13:30.120]  which allows us to kill one dependency within the encoding, and this is on the bottom right,
[13:30.120 --> 13:34.920]  so you can see, you know, if we enable those additional features, the multi-treading scaling
[13:34.920 --> 13:42.640]  actually gets much better, but it costs between three to five percent of bitrate overhead.
[13:42.640 --> 13:47.520]  But still, even with those other features, the encoder is not ready yet for more than,
[13:47.520 --> 13:54.400]  let's say, 32 threads, so, like, you know, if you have a really, really big server, the
[13:54.400 --> 13:57.760]  encoder will not be able to utilize all the cores.
[13:57.760 --> 14:03.880]  Yeah, and about our optimization for the human visual system, it's based on the XPSNR, which
[14:03.880 --> 14:07.760]  is, you know, this new metric that a colleague at HHI developed.
[14:07.760 --> 14:11.280]  It has a really high correlation with most based on some public data sets.
[14:11.280 --> 14:12.960]  There are publications.
[14:12.960 --> 14:13.960]  You can look up.
[14:13.960 --> 14:19.760]  They're mentioned on the bottom left, and it has been contributed to FFMPEG as filter,
[14:19.760 --> 14:26.920]  and I think it is somewhere in the backlog of FFMPEG waiting to be looked at.
[14:26.920 --> 14:31.320]  You can see on the right here a lot of graphs, so basically, in the last JVET meeting, no,
[14:31.320 --> 14:38.320]  the JVET meeting one before last, there was a verification test with actual human subjects,
[14:38.320 --> 14:42.440]  where we, you know, where VTM was tested with, like, the new compression technology, and
[14:42.440 --> 14:46.520]  we submitted VVNG to be tested alongside of it in the slow preset, right?
[14:46.520 --> 14:49.600]  So the slower preset has around the efficiency of VTM.
[14:49.600 --> 14:56.600]  Slow preset is objectively five percent behind, and as you can see in the graphs, VVNG in
[14:56.600 --> 15:03.840]  orange matches or outperforms VTM, which means that our visual optimization is well able
[15:03.840 --> 15:09.720]  to at least close this five percent gap, if not even, you know, add more in terms of visual,
[15:09.720 --> 15:13.720]  like, subject to visual quality.
[15:13.720 --> 15:15.720]  So yeah, that's really nice.
[15:15.720 --> 15:16.720]  All right.
[15:16.720 --> 15:23.720]  VVNG in practice, you know, everyone asks in the end, so what kind of FPS can you achieve?
[15:23.720 --> 15:30.440]  So we did some encodes onto mobile workstation kind of computers, encodes using all defaults,
[15:30.440 --> 15:36.760]  which means eight threads, and, you know, for HD versus live, it's like, for faster,
[15:36.760 --> 15:42.920]  it's around times four, like, live times four, so around 15 FPS, medium around lifetime,
[15:42.920 --> 15:50.200]  times 30, right, so around two FPS, I'm talking about HD 60 FPS as live.
[15:50.200 --> 15:55.600]  For UHD, the faster can do, like, 15 times live, fast 30, and medium, well, let's just
[15:55.600 --> 16:01.080]  say medium would only be of interest for, like, large-scale VOD encodings, right?
[16:01.080 --> 16:06.040]  But you know, FPS also depends on many other factors, like bit rate, content, you know,
[16:06.040 --> 16:11.080]  your actual CPU and stuff like that, so this is more like ballpark numbers.
[16:11.080 --> 16:15.120]  Excuse me, is this HDR content?
[16:15.120 --> 16:17.840]  It is not HDR content, but it's 10-bit.
[16:17.840 --> 16:25.960]  So it should be roughly the same for HDR, like we only do, we only test with 10-bit usually.
[16:25.960 --> 16:28.840]  It works for 8-bit, but it's kind of 10-bit native.
[16:28.840 --> 16:35.800]  All right, also some version history for the VVNC thing, for the VVNC project.
[16:35.800 --> 16:40.560]  So our first major milestone was the 0.3, where we added frame threading, and you can
[16:40.560 --> 16:45.520]  see on this multi-threaded efficiency versus speed graph that it was, you know, a huge
[16:45.520 --> 16:47.040]  leap for us.
[16:47.040 --> 16:52.320]  In the 1.0, we added the pure C interface, allowing, you know, the integrations into
[16:52.320 --> 16:58.960]  pure C frameworks, 1.4, scene cut detection, 1.5, we added, like, arbitrary intra-period,
[16:58.960 --> 17:01.400]  so it doesn't have to be aligned to anything.
[17:01.400 --> 17:05.720]  We added a fast decode preset, and in the newest version, the thing that I really like
[17:05.720 --> 17:11.560]  about it is that we added the ARM support, and, you know, every version, we had improved
[17:11.560 --> 17:16.280]  ray control, things also to, you know, great community feedback, and, you know, from one
[17:16.280 --> 17:21.360]  version to another, your encode might be 10% faster, but if you had a ray control problem
[17:21.360 --> 17:26.680]  and it got fixed, it's going to be, you know, like, way, way better.
[17:26.680 --> 17:32.600]  So this is, like, this hard-to-quantify improvements are actually one of the most important ones.
[17:32.600 --> 17:37.120]  And you know, you can see how the curve behaves, extrapolating, I'm sure we're going to get
[17:37.120 --> 17:39.280]  even faster and better in the future.
[17:39.280 --> 17:44.520]  All right, about the ecosystem and the community, of course, you know, this is raw video encoding
[17:44.520 --> 17:50.480]  and decoding, this is really only of academic interest, right, like, it doesn't really bring
[17:50.480 --> 17:51.480]  you anything.
[17:51.480 --> 17:55.520]  That's why we have been looking into FFMPEG support for a long time.
[17:55.520 --> 17:59.840]  There was an open access paper over one and a half years ago that described how to do
[17:59.840 --> 18:01.640]  it for the decoder.
[18:01.640 --> 18:04.000]  There are some patches in the pipeline with FFMPEG.
[18:04.000 --> 18:08.760]  We also put in our wiki how to apply them manually, if you, you know, if you want to
[18:08.760 --> 18:13.600]  build it, you know, I've talked about this a lot, but the thing is, you know, if it's
[18:13.600 --> 18:16.560]  not in FFMPEG, it doesn't exist.
[18:16.560 --> 18:22.520]  That's why, you know, we put up a, like, how-to for you on how to do it.
[18:22.520 --> 18:27.800]  All right, about playback, once you get this FFMPEG with VBC included, you can just link
[18:27.800 --> 18:31.240]  whatever player uses FFMPEG as its back-end and it's going to work.
[18:31.240 --> 18:36.760]  As far as I know, for VLC, you might force it to use the FFMPEG as the demuxer.
[18:36.760 --> 18:37.760]  Not sure about it.
[18:37.760 --> 18:38.760]  I didn't test it myself.
[18:38.760 --> 18:40.920]  It comes, like, from community feedback.
[18:40.920 --> 18:45.280]  I know there are some exo-player integrations going around, which allow you to, you know,
[18:45.280 --> 18:48.320]  to use it in Android apps more easily.
[18:48.320 --> 18:53.080]  We have a toy web browser example, but it's nothing like the VLC.js.
[18:53.080 --> 18:54.880]  It's really a toy example.
[18:54.880 --> 19:00.040]  And for maxing and demaxing, you can just use GPEG, sorry, for maxing, dashing, you
[19:00.040 --> 19:03.600]  can just use GPEG since the version 2.0.
[19:03.600 --> 19:09.400]  And I think it also needs to be linked to an FFMPEG with VBC support.
[19:09.400 --> 19:10.400]  All right.
[19:10.400 --> 19:15.760]  For the community, we do our open to external contributions and we wish to get some.
[19:15.760 --> 19:20.000]  That's also why I'm talking to you, to get some interest.
[19:20.000 --> 19:24.640]  So we try to, you know, make this more easy by, you know, stating that the authors of
[19:24.640 --> 19:27.200]  VVNC are also retained at copyright.
[19:27.200 --> 19:31.400]  We don't have, like, a contributors agreement, so this is, like, the only way we can make
[19:31.400 --> 19:32.840]  it happen.
[19:32.840 --> 19:38.320]  We are interested in all kinds of contributions, you know, efficiency, speed-up, and, you know,
[19:38.320 --> 19:43.240]  we're going to, throughout the review this, test, generate, result, whatever is needed,
[19:43.240 --> 19:46.760]  give proper feedback, march, so, yeah.
[19:46.760 --> 19:48.880]  Please do if you're interested.
[19:48.880 --> 19:53.760]  To conclude, you know, if you just entered the room, I talked to you about our open source
[19:53.760 --> 19:56.720]  implementations of the VVC standard.
[19:56.720 --> 20:02.200]  And I'm looking forward to, you know, contributions, also results, tests, if you want to try it
[20:02.200 --> 20:04.200]  out, and general feedback.
[20:04.200 --> 20:05.200]  Thanks a lot.
[20:05.200 --> 20:12.200]  And a question in the room, yeah.
[20:12.200 --> 20:20.200]  What confused me, because I know a little bit about the backstory, so H265, right, was
[20:20.200 --> 20:27.320]  the super proprietary and the royalties, and the Alliance for Media Launch, the AV-1
[20:27.320 --> 20:28.320]  company.
[20:28.320 --> 20:29.320]  Yes.
[20:29.320 --> 20:32.000]  And now your colleague is open source and free, right?
[20:32.000 --> 20:33.000]  So to recap.
[20:33.000 --> 20:36.960]  Why is it H266, is the free?
[20:36.960 --> 20:44.640]  To recap the question, the question was H265 was a proprietary codec with licensing, AV-1
[20:44.640 --> 20:53.120]  is a non-properary codec, and then I'm talking about VVC, which is the successor of HEVC,
[20:53.120 --> 20:54.920]  so H265.
[20:54.920 --> 21:01.160]  And there is a small confusion, so I'm not talking about the codec as being open source,
[21:01.160 --> 21:05.480]  it's about the implementations of it, right, so you have to differentiate between implementations
[21:05.480 --> 21:07.480]  and the technology itself.
[21:07.480 --> 21:11.680]  There will be licensing for the technology itself, but, you know, there is still open
[21:11.680 --> 21:19.040]  source implementations also of HEVC, so that's kind of the way I would like to see it, you
[21:19.040 --> 21:20.040]  know.
[21:20.040 --> 21:29.040]  So it won't be like with MP3, but also developed, it will be completely free to use, forever?
[21:29.040 --> 21:30.040]  No, it won't.
[21:30.040 --> 21:36.760]  I mean, the software is free to use, but, you know, if you build your streaming service
[21:36.760 --> 21:41.600]  based on the technology, independent of which software you use, you have to pay royalties
[21:41.600 --> 21:42.600]  for the technology.
[21:42.600 --> 21:47.160]  Because you have patents, but patent issues?
[21:47.160 --> 21:52.920]  Yes, I mean, there is, you know, this technology cost stuff to develop, and people want to
[21:52.920 --> 21:55.320]  get their investment back, you know.
[21:55.320 --> 22:01.960]  Okay, disclaimer, I'm doing, yes, in the optimizations for video codec in general.
[22:01.960 --> 22:09.560]  My focus is on, and my comment to you is using a library like Cindy Everywhere, it works.
[22:09.560 --> 22:13.040]  It's going to give you the initial results that you want, but you will never get the
[22:13.040 --> 22:16.560]  optimal performance out of your hardware.
[22:16.560 --> 22:22.400]  We had some really, really good examples of code, particular algorithms like more instructions
[22:22.400 --> 22:27.880]  in forensics, a move mask type of in forensics, that are very common in popular in Intel.
[22:27.880 --> 22:33.840]  If you try to port them with Cindy Everywhere or some kind of abstraction layer to ARM,
[22:33.840 --> 22:40.000]  you're going to have to emulate this behavior, so ideally, you have to provide some way to
[22:40.000 --> 22:45.640]  provide optimized functions for ARM or any other future architecture.
[22:45.640 --> 22:51.920]  Otherwise, you're just going to have your Intel layer transferred, translated to ARM.
[22:51.920 --> 22:55.240]  It will work, but you will never optimize your software.
[22:55.240 --> 23:02.520]  So to recap, to recap the comment, there was a comment that Cindy Everywhere works, gives
[23:02.520 --> 23:08.680]  a nice initial, you know, initial deliverable, but will never match hand optimized assembler.
[23:08.680 --> 23:09.680]  We are...
[23:09.680 --> 23:10.680]  How do you know assembler?
[23:10.680 --> 23:13.760]  See, see the corresponding ARM intrinsics.
[23:13.760 --> 23:21.120]  I mean, when I say assembler, I mean, intrinsics, so, you know, we are aware of it, and we
[23:21.120 --> 23:26.440]  are looking a little bit of it, you know, there are different kinds of intrinsic kernels
[23:26.440 --> 23:27.440]  that are implemented there.
[23:27.440 --> 23:31.640]  You know, sometimes you have like two pieces of memory that needs to be added up and stored
[23:31.640 --> 23:32.640]  somewhere else.
[23:32.640 --> 23:38.840]  There, Cindy Everywhere works really nice, but when it's like lookup tables, shuffles,
[23:38.840 --> 23:39.840]  it works worse.
[23:39.840 --> 23:40.840]  We are aware.
[23:40.840 --> 23:44.720]  We are looking into, you know, identifying the kernels where there's the biggest potential
[23:44.720 --> 23:47.360]  for improvement and doing those manually.
[23:47.360 --> 23:54.360]  We are looking at HDR, implementation of a ARM of VBX, okay, so I'm doing the ARM and
[23:54.360 --> 23:59.800]  VMT optimizations for that, and the way to do the transpose, for example, and that is
[23:59.800 --> 24:06.840]  completely different to the way it is, especially with, because for AVX2, you have 256 bit-wide
[24:06.840 --> 24:07.840]  registers.
[24:07.840 --> 24:11.880]  You don't have that with VM, but you have other instructions to help you try to tackle
[24:11.880 --> 24:12.880]  the outcome.
[24:12.880 --> 24:13.880]  Yeah.
[24:13.880 --> 24:22.440]  So my point is that if you are open to contributions with specific optimizations that you might
[24:22.440 --> 24:29.360]  find people to help with that, but if you restrict yourself to only using the library
[24:29.360 --> 24:35.960]  and only use Intel intrinsics and then translate that one using the library, you might lose
[24:35.960 --> 24:36.960]  some performance.
[24:36.960 --> 24:37.960]  A lot.
[24:37.960 --> 24:41.400]  We are not restricting ourselves, you know, to only using this.
[24:41.400 --> 24:44.680]  It's just because, you know, we only have so many resources, this was the fastest way
[24:44.680 --> 24:45.680]  to do it.
[24:45.680 --> 24:51.280]  We can play it out on, you know, like actually this thing can play out VBC videos with our
[24:51.280 --> 24:53.680]  software, but yeah, totally.
[24:53.680 --> 24:57.840]  I think there is, there would need to be some changes to the build process, maybe to that
[24:57.840 --> 25:04.160]  like structure of the project to enable this, but yeah, this is something really very interesting
[25:04.160 --> 25:07.600]  and I think there will be a lot of research going on in that direction.
[25:07.600 --> 25:08.600]  Thanks.
[25:08.600 --> 25:11.600]  One last question, yeah?
[25:11.600 --> 25:18.400]  How does it compare, what are the advantages of AV1 in terms of compression or computation?
[25:18.400 --> 25:26.600]  Well, yeah, so the question is, what are the advantages of VBC or VVNG?
[25:26.600 --> 25:31.600]  VBC over AV1.
[25:31.600 --> 25:32.600]  All right.
[25:32.600 --> 25:37.040]  So what are the advantages of VBC over AV1?
[25:37.040 --> 25:39.400]  So VBC is a successor to HVBC, right?
[25:39.400 --> 25:44.680]  It was done by people who were really knowledgeable on how to make a standard work.
[25:44.680 --> 25:52.240]  So you know, it's the one thing, it just provides the additional bitrate savings, right?
[25:52.240 --> 25:58.880]  So here you still see like 20% additional bitrate savings over like the best case of
[25:58.880 --> 26:03.800]  AV1 and there isn't so many of these initial hiccups, right?
[26:03.800 --> 26:08.920]  So the HDR support just works, you know, immersive stuff just works a little, like a lot of
[26:08.920 --> 26:11.600]  those things, they just work.
[26:11.600 --> 26:17.520]  And then we can do stuff on top of that, like doing open-gop adaptive streaming, which allows
[26:17.520 --> 26:21.960]  you to reduce the bitrate by like another 10% on top of that, right?
[26:21.960 --> 26:26.640]  Like with all the other standards, I think the adaptive streaming can only be done with
[26:26.640 --> 26:32.560]  close-cops or with a prediction break.
[26:32.560 --> 26:40.440]  I would say more mature, but you know, I know there are different views of this, more compression
[26:40.440 --> 26:45.880]  efficiency and really versatile mature usability.
[26:45.880 --> 26:47.880]  Thank you, Adam.
[26:47.880 --> 26:48.880]  Thanks.
[26:48.880 --> 27:05.840]  Thank you.