[00:00.000 --> 00:12.200] Hi, my name is Stefan Heinze and I work on QMU and Linux and today I want to talk about [00:12.200 --> 00:16.820] Vhost User Block, a fast user space block IO interface. [00:16.820 --> 00:18.800] So what is Vhost User Block? [00:18.800 --> 00:24.400] Vhost User Block allows an application to connect to a software defined storage system [00:24.400 --> 00:27.700] that is running on the same node. [00:27.700 --> 00:31.900] So in software defined storage or in storage in general, there are three popular storage [00:31.900 --> 00:32.900] models. [00:32.900 --> 00:36.100] There's block storage, file storage and object storage. [00:36.100 --> 00:38.860] And Vhost User Block is about block storage. [00:38.860 --> 00:43.660] So for the rest of this presentation, we're going to be talking about block storage. [00:43.660 --> 00:49.140] And block storage interfaces, they have a common set of functionality. [00:49.140 --> 00:53.220] First of all, there's the core IO reads, writes and flushes. [00:53.220 --> 00:57.620] These are the common commands that are used in order to store and retrieve data from the [00:57.620 --> 00:59.260] block device. [00:59.260 --> 01:01.140] Then there's data management commands. [01:01.140 --> 01:04.700] These are used for mapping and allocation of blocks. [01:04.700 --> 01:08.660] Discard and write zeros are examples of these kinds of commands. [01:08.660 --> 01:13.660] There are also auxiliary commands like getting the capacity of the device. [01:13.660 --> 01:18.780] And then finally, there can be extensions to the model like zone storage that go beyond [01:18.780 --> 01:22.060] the traditional block device model. [01:22.060 --> 01:29.060] Vhost User Block supports all of these things and it's at a similar level of abstraction [01:29.060 --> 01:33.140] to NVMe or to SCSI. [01:33.140 --> 01:38.860] So let's start by looking at how Vhost User Block is a little bit different from things [01:38.860 --> 01:44.300] like NVMe or SCSI, things that are network protocols or hardware storage interfaces. [01:44.300 --> 01:49.380] Vhost User Block is a software user space interface. [01:49.380 --> 01:54.940] So let's begin by imagining we have a software defined storage system that is running a user [01:54.940 --> 01:59.740] space and it wants to expose storage to applications. [01:59.740 --> 02:04.140] So if we're using the kernel storage stack, what will happen is we'll need some way to [02:04.140 --> 02:13.180] connect our software defined storage to the kernel and present a block device. [02:13.180 --> 02:21.380] These of doing that might be NVMe over TCP or as an iSCSI LAN or maybe as an NBD server [02:21.380 --> 02:22.780] and so on. [02:22.780 --> 02:29.220] And so that's how a software defined storage system might expose its storage to the kernel. [02:29.220 --> 02:35.060] And when our application opens a block device, it gets a file descriptor and then it can [02:35.060 --> 02:39.340] read or write using system calls from that file descriptor. [02:39.340 --> 02:45.420] And what happens is execution goes into the kernel's file system and block layers and [02:45.420 --> 02:49.860] they will then talk to the software defined storage system. [02:49.860 --> 02:55.380] Now that can be somewhat convoluted because if we've attached say using NVMe over TCP, [02:55.380 --> 02:57.780] the network stack might be involved and so on. [02:57.780 --> 03:02.340] And at the end of the day, all we're trying to do is communicate between our application [03:02.340 --> 03:07.500] and the software defined storage processes that are both on the same node, they're both [03:07.500 --> 03:11.300] running on the same operating system. [03:11.300 --> 03:17.100] User space storage interfaces, they leave out this kernel storage stack and instead [03:17.100 --> 03:23.860] they allow the application to talk directly to the software defined storage process. [03:23.860 --> 03:29.140] Now there are a number of pros and cons to using a user space interface. [03:29.140 --> 03:30.780] And I'll go through them here. [03:30.780 --> 03:35.180] So I've already kind of alluded to the fact that if you have a user space interface and [03:35.180 --> 03:42.100] you don't go through the kernel storage stack, then you can bypass some of that long path [03:42.100 --> 03:46.740] that we discussed, for example, going down into the kernel, coming back out using something [03:46.740 --> 03:52.340] like NBD or iSCSI in order to connect to another process on the same node. [03:52.340 --> 03:55.180] There must be a faster way of doing that, right? [03:55.180 --> 04:00.420] So with VO's user block, it turns out we can actually get rid of system calls entirely [04:00.420 --> 04:05.620] from the data path, so reads and writes and so on from the device don't require any system [04:05.620 --> 04:06.620] calls at all. [04:06.620 --> 04:10.540] And we'll have a look at how that's possible later on in this talk. [04:10.540 --> 04:17.580] But speed is one of the reasons why a peer user space interface for BlockIO is an interesting [04:17.580 --> 04:19.140] thing. [04:19.140 --> 04:22.900] Another reason is for security. [04:22.900 --> 04:28.420] Typically in order to connect a block device to the kernel, you need to have privileges [04:28.420 --> 04:34.420] because it can be a security risk to connect untrusted storage to your kernel. [04:34.420 --> 04:38.500] And the reason for that is that there's a bunch of code in the storage stack that's [04:38.500 --> 04:42.420] going to run and it's going to process and be exposed to this untrusted data. [04:42.420 --> 04:47.020] If you think about a file system and all its metadata, that can be complex. [04:47.020 --> 04:52.140] And so there's a security risk associated with that and therefore privileges are required [04:52.140 --> 04:53.700] to create block devices. [04:53.700 --> 04:59.660] An ordinary unprovedged process cannot attach and mount a block device. [04:59.660 --> 05:05.140] So in a scenario where you do have an untrusted block device and you would like to remove [05:05.140 --> 05:12.300] the attack surface there, then using a user space interface allows you to avoid that. [05:12.300 --> 05:17.300] Also if you don't have permissions, if you simply don't have permissions, then you won't [05:17.300 --> 05:19.260] be able to create a kernel block device. [05:19.260 --> 05:23.780] So then a user space interface is beneficial as well. [05:23.780 --> 05:25.180] Now those were the pros. [05:25.180 --> 05:29.900] Of course there are drawbacks to having a user space interface. [05:29.900 --> 05:32.460] First of all, it's complex. [05:32.460 --> 05:37.540] Compared to simply opening a file and reading and writing from the file descriptor, you're [05:37.540 --> 05:42.060] going to have to do a lot more because all the logic for actually doing IO and communicating [05:42.060 --> 05:47.460] is now the responsibility of the application and not the kernel. [05:47.460 --> 05:48.780] So there's that. [05:48.780 --> 05:52.740] In addition, if you think about existing programs that you might want to use to access your [05:52.740 --> 05:58.540] storage, they won't have support for any new interface that is user space only. [05:58.540 --> 06:03.900] They are probably using the POSIX system calls and read and write and so on and that's what [06:03.900 --> 06:04.900] they expect. [06:04.900 --> 06:09.780] So you'll have to port those applications in order to access your software defined storage [06:09.780 --> 06:14.340] system if you rely on a user space interface. [06:14.340 --> 06:20.140] Another disadvantage is that if you have a user space interface, then the kernel storage [06:20.140 --> 06:22.100] stack isn't involved. [06:22.100 --> 06:27.220] So if you decide you need a feature from the kernel storage stack, whatever that may be, [06:27.220 --> 06:34.020] or if you have a legacy application that you cannot port and that needs to talk to a kernel [06:34.020 --> 06:39.260] block device, then again you have a problem because your software defined storage system [06:39.260 --> 06:45.100] is isolated, its block devices aren't connected to the kernel. [06:45.100 --> 06:49.100] What we're going to do today is we're going to look at both these pros and cons and we're [06:49.100 --> 06:54.780] going to also see how with VHOS user block we can actually overcome these cons. [06:54.780 --> 07:01.140] So let's start a little bit looking at some of the performance aspects, how this can be [07:01.140 --> 07:02.140] fast. [07:02.140 --> 07:06.300] I said no system calls are required, so how does that even work if the software defined [07:06.300 --> 07:09.700] storage system and the application need to communicate? [07:09.700 --> 07:13.260] How can they communicate without system calls? [07:13.260 --> 07:21.820] Alright so one of the important concepts in IO is how to wait for the completion of IO. [07:21.820 --> 07:27.820] When you submit an IO request, maybe you have no more work for your process to do. [07:27.820 --> 07:33.180] Maybe the CPU is essentially idle until that IO request completes and at that point you'll [07:33.180 --> 07:35.340] be able to do more work. [07:35.340 --> 07:41.740] The normal thing to do in that case is to then de-schedule your application and let [07:41.740 --> 07:45.500] other threads, other tasks on the system run. [07:45.500 --> 07:49.740] And maybe if there are no other tasks then the kernel will just put the CPU into power [07:49.740 --> 07:50.740] saving mode. [07:50.740 --> 07:55.260] It will put it into some kind of low power state and it will awake once the completion [07:55.260 --> 07:57.260] interrupt comes in. [07:57.260 --> 08:02.140] And you can see that at the top of this slide, at the top diagram, you can see that there's [08:02.140 --> 08:06.740] a green part where we submit the IO and at that point we run out of things to do because [08:06.740 --> 08:08.500] we're going to wait for completion. [08:08.500 --> 08:11.980] So then there's this gray part where other tasks are running, power saving is taking [08:11.980 --> 08:18.500] place and during that time the first portion is spent with the IO actually in flight. [08:18.500 --> 08:22.860] That's where we're legitimately waiting for the IO request to complete so that we can [08:22.860 --> 08:24.580] proceed. [08:24.580 --> 08:29.860] But then what happens is that the IO request completes and we need to somehow get back to [08:29.860 --> 08:31.820] our de-scheduled process. [08:31.820 --> 08:37.300] Now depending on what other tasks are running, their priorities, the scheduler and so on, [08:37.300 --> 08:40.220] our task might not get woken up immediately. [08:40.220 --> 08:44.980] Or maybe if the CPU is in a low power state it will just take some time to wake up, handle [08:44.980 --> 08:51.540] that interrupt, restore the user space process and resume execution. [08:51.540 --> 08:56.980] So this leads to a wake up latency, an overhead that is added. [08:56.980 --> 09:04.700] And so this is why notifications or also sometimes called interrupts can be something [09:04.700 --> 09:09.060] that actually slows down your IO processing. [09:09.060 --> 09:11.500] An alternative is to use polling. [09:11.500 --> 09:15.980] So polling is an approach where once you have no more work to do instead of de-scheduling [09:15.980 --> 09:19.740] you repeatedly check whether the IO is complete yet. [09:19.740 --> 09:24.620] And by doing that you're not giving up the CPU so you keep running and you keep consuming [09:24.620 --> 09:29.420] the CPU, the advantage is that you don't have this wake up latency, instead your process [09:29.420 --> 09:33.820] will respond immediately once the IO is complete. [09:33.820 --> 09:38.220] The drawback of course is that you're hogging the CPU and you're wasting power while there's [09:38.220 --> 09:40.160] nothing to do. [09:40.160 --> 09:43.180] So these are two techniques and I think we're going to keep them in mind because we'll [09:43.180 --> 09:47.020] see how they come into play later. [09:47.020 --> 09:50.940] The next performance aspect I wanted to mention that's kind of important to understanding [09:50.940 --> 09:56.620] how the host user block is different from maybe using a network protocol or an existing [09:56.620 --> 10:02.100] storage interface is message passing versus zero copy. [10:02.100 --> 10:07.140] As programmers we learn that when we have a large object in our program we shouldn't [10:07.140 --> 10:12.340] pass it around by value because it will be copied and that will be inefficient. [10:12.340 --> 10:17.460] And instead what we do is we use references or we use pointers allowing the function that [10:17.460 --> 10:23.220] receives the object to just go and access it in place rather than taking copies. [10:23.220 --> 10:28.020] And in inter process communication and in networking there's similar concepts. [10:28.020 --> 10:30.580] By default things are message passing. [10:30.580 --> 10:36.460] We build a message, it gets copied through various buffers along the network path, eventually [10:36.460 --> 10:41.780] the receiver receives it into its buffer and then it parses it. [10:41.780 --> 10:46.420] And so that model is the traditional networking model, it's also the IPC model, it has strong [10:46.460 --> 10:51.740] isolation so for security it's great because it means that the sender and the receiver don't [10:51.740 --> 10:56.900] have access to each other's memory therefore they cannot interfere or crash each other and [10:56.900 --> 10:58.700] do various things. [10:58.700 --> 11:03.740] But the downside is that we have these intermediate copies and that consumes CPU cycles and it's [11:03.740 --> 11:05.700] inefficient. [11:05.700 --> 11:10.820] So the zero copy approach is an approach where the sender and receiver they've somehow agreed [11:10.820 --> 11:15.740] on the memory buffer where the data to be transferred lives. [11:15.740 --> 11:20.580] And that way the sender for example can simply place the data directly into the receiver's [11:20.580 --> 11:24.180] buffer and all it then has to do is let the receiver know, hey there's some data there [11:24.180 --> 11:28.860] for you, it doesn't actually have to copy the data. [11:28.860 --> 11:34.340] So these are, this is another important concept that we're going to see with vhost user block. [11:34.340 --> 11:38.100] So now that we've got those things out of the way, let's look at vhost user block. [11:38.100 --> 11:39.100] What is it? [11:39.100 --> 11:45.580] It's a local block IO interface so it only works on a single node, on a single machine. [11:45.860 --> 11:48.380] It is not a network protocol. [11:48.380 --> 11:54.260] Two, it's a user space interface, it's not a kernel solution in itself. [11:54.260 --> 11:59.940] It's a pure user space solution that means it's unprivileged, it doesn't require any [11:59.940 --> 12:05.140] privileges for two processes to communicate in this way. [12:05.140 --> 12:10.620] It's also a zero copy solution and the way it does that is it uses shared memory. [12:10.620 --> 12:15.740] And finally, vhost user block supports both notifications and polling. [12:15.740 --> 12:21.340] So depending on your performance requirements, you can choose whether you want to deschedule [12:21.340 --> 12:27.100] your process and receive a wake up when it's time to process an IO completion or you can [12:27.100 --> 12:32.940] just pull and consume CPU and have the lowest possible latency. [12:32.940 --> 12:38.500] And vhost user block is available on Linux, BSD and on Mac OS and the implementations [12:38.500 --> 12:43.180] of this started around 2017. [12:43.180 --> 12:49.660] Now it's used, it came from SPDK and working together with QEMU, so those communities, [12:49.660 --> 12:52.620] they implemented vhost user block. [12:52.620 --> 12:58.140] But there are also implementations in other hypervisors like cross VM and cloud hypervisor. [12:58.140 --> 13:03.140] So primarily this kind of came from virtualization, from this problem of how do we do software [13:03.140 --> 13:06.900] to find storage and let a virtual machine connect to it. [13:06.900 --> 13:11.500] But that's not all that vhost user is good for, it's actually a general storage interface. [13:11.500 --> 13:17.460] It's generic, just like NVMe or SCSI is. [13:17.460 --> 13:22.460] So you could use vhost user block if you had some kind of data intensive application that [13:22.460 --> 13:27.500] needs to do a lot of storage IO and needs high performance or needs to be unprivileged. [13:27.500 --> 13:31.140] And that's why I'm talking about vhost user block today. [13:31.140 --> 13:33.880] So let's have a look at the protocol. [13:33.880 --> 13:40.840] So the way that this is realized is that there is a Unix domain socket for our user space [13:40.840 --> 13:46.840] storage interface and we speak the vhost user protocol over the socket. [13:46.840 --> 13:51.480] What the socket does and the vhost user protocol allows us to do is it lets us set up access [13:51.480 --> 13:57.920] to a virtual block device, so a block device that lives in the software defined storage [13:57.920 --> 13:59.160] process. [13:59.160 --> 14:03.120] So when we have two processes running on a system, a software defined storage process [14:03.120 --> 14:09.120] and an application, the application is using vhost user in order to communicate with the [14:09.120 --> 14:14.720] Verdeo block device and that's how it does its IO. [14:14.720 --> 14:15.880] So what is Verdeo block? [14:15.880 --> 14:18.880] Verdeo block is a standard. [14:18.880 --> 14:20.720] You can check out the Verdeo specification. [14:20.720 --> 14:25.320] Verdeo has a number of other devices, but it includes Verdeo block. [14:25.320 --> 14:29.160] Some of the other devices are Verdeo net or Verdeo SCSI and so on. [14:29.200 --> 14:34.440] But Verdeo block is one we'll focus on here and it consists of one or more request queues [14:34.440 --> 14:36.840] where you can place IO requests. [14:36.840 --> 14:39.000] And each one of these has a little structure. [14:39.000 --> 14:42.960] You can do all the requests I mentioned in the beginning of the talk, reads, writes, [14:42.960 --> 14:47.280] flushes, discard, write zero and so on. [14:47.280 --> 14:51.080] And you have multiple queues, so if you want to do multi queue, say you're multi threaded, [14:51.080 --> 14:53.640] you can do that as well. [14:53.640 --> 14:58.240] And it has a config space that describes the capabilities of the device. [14:58.240 --> 15:01.920] Like disk size, the number of queues and so on. [15:01.920 --> 15:05.480] So that's what you can think of Verdeo block as, that's the model we have here and that's [15:05.480 --> 15:09.880] the block device that our application can interact with. [15:09.880 --> 15:13.520] If you think of any other storage interfaces or network protocols that you're familiar [15:13.520 --> 15:17.680] with, this should be more or less familiar. [15:17.680 --> 15:20.600] Most of the existing protocols also work in this way. [15:20.600 --> 15:24.960] You can inquire about a device to find out its size and so on and then you can set up [15:24.960 --> 15:29.720] queues and you can submit IO. [15:29.720 --> 15:33.200] So underneath Verdeo block, we have the VHOS user protocol. [15:33.200 --> 15:37.680] And the VHOS user protocol is this Unix domain socket protocol that allows the two processes [15:37.680 --> 15:40.280] to communicate. [15:40.280 --> 15:41.760] But it's not the data path. [15:41.760 --> 15:47.600] So VHOS user is not how the application actually does IO, instead it's a control path that [15:47.600 --> 15:52.520] is used to set up access to these queues, these request queues that I've mentioned. [15:52.520 --> 15:57.880] And the IO buffer memory and the queue memory actually belongs to the application. [15:57.880 --> 16:01.720] And the application sends it over the Unix domain socket. [16:01.720 --> 16:07.480] It sends that shared memory over so that the software defined storage process has access [16:07.480 --> 16:10.600] to the IO buffer memory and the queue memory. [16:10.600 --> 16:15.320] The application and the software defined storage process, they share access to that memory. [16:15.320 --> 16:18.040] That way we can do zero copy. [16:18.040 --> 16:21.920] So this is going back to the message passing versus zero copy thing. [16:21.920 --> 16:27.600] We don't need to transfer entire IO buffers between the two processes. [16:27.600 --> 16:33.120] Instead, the software defined storage process can just read the bytes out of the IO buffer [16:33.120 --> 16:41.600] that live in the application process and it can write the result into a buffer as well. [16:41.600 --> 16:46.360] So if you want to look at the specification and the details of how VHOS user works, I've [16:46.360 --> 16:49.680] put a link on this slide. [16:49.680 --> 16:53.920] But really, if you're writing an application, I think the way to do it is to use LibBlockIo. [16:53.920 --> 17:00.440] LibBlockIo is a library that has both C and Rust APIs that allows you to connect to VHOS [17:00.440 --> 17:03.520] user block as well as other storage interfaces. [17:03.520 --> 17:08.160] So VHOS user block is not the only thing, but for the purposes of this talk, we'll just [17:08.160 --> 17:10.760] focus on that. [17:10.760 --> 17:13.760] LibBlockIo is not a framework, it's a library. [17:13.760 --> 17:18.760] It allows you to integrate it into your application regardless of what your architecture is. [17:18.760 --> 17:24.960] That means it supports blocking IO, it supports event-driven IO, and it also supports polling. [17:24.960 --> 17:29.840] So no matter how you've decided you want to do your application, you can use LibBlockIo. [17:29.840 --> 17:36.440] You won't have to change the architecture of your application just to integrate LibBlockIo. [17:36.440 --> 17:39.600] I have given a full talk about LibBlockIo. [17:39.600 --> 17:43.160] So if you want to understand the details and also some of the background and everything [17:43.160 --> 17:52.000] it can do, then please check out that talk, I put a YouTube link on this slide for you. [17:52.000 --> 17:54.720] I'll give you a short code example here. [17:54.720 --> 18:00.600] So this shows how to connect to a VHOS user block socket using LibBlockIo. [18:00.600 --> 18:04.840] And this is pretty straightforward, we essentially just need to give it the path of the Unix [18:04.840 --> 18:10.640] domain socket and then we connect and start the block IO instance. [18:10.640 --> 18:14.200] And then in order to do IO, we can submit a read request. [18:14.200 --> 18:17.840] That's just a function call, that's straightforward as well. [18:17.840 --> 18:22.160] A notice here that we do get the queue, we call the get queue function in order to grab [18:22.160 --> 18:23.160] a queue. [18:23.160 --> 18:26.360] That's because LibBlockIo is a multi-queue library. [18:26.360 --> 18:30.800] If you have a multi-threaded application, you could create one dedicated queue for each [18:30.800 --> 18:34.600] thread and then avoid any kind of locking and synchronization. [18:34.600 --> 18:38.080] All the threads can do IO at the same time. [18:38.080 --> 18:42.120] So for completion, what this example shows is it shows blocking completion. [18:42.120 --> 18:48.920] So here the program is actually going to wait in the do IO function until the IO is complete. [18:48.920 --> 18:53.320] But as I mentioned, the library also supports event-driven IO and it also supports polling. [18:53.320 --> 18:59.520] So whatever you like, you'll be able to do that. [18:59.520 --> 19:03.320] If you develop your application, you'll need something to test against. [19:03.320 --> 19:09.120] And I think the easiest way to test against the VOC user block device is to use the QEMU [19:09.120 --> 19:10.360] storage daemon. [19:10.360 --> 19:16.760] It's packaged for all the main Linux distros as part of the QEMU packages. [19:16.760 --> 19:21.440] And you can just run the storage daemon, you can give it a raw image file and tell it the [19:21.440 --> 19:27.800] name of a VOC user block UNIX domain socket that you want to have and then you can connect [19:27.800 --> 19:29.560] your application to it. [19:29.560 --> 19:32.800] All right, so that's how you can do that. [19:32.800 --> 19:39.040] If you want to implement a server, if you're already in the SPDK ecosystem and you're using [19:39.040 --> 19:45.800] Intel's software performance development kit in order to write your software defined [19:45.800 --> 19:52.960] storage system, then it's very easy because VOC user block support is already built in. [19:52.960 --> 19:55.800] So I've put a link to the documentation. [19:55.800 --> 19:59.960] There are also RPCs if you want to invoke it from the command line. [19:59.960 --> 20:07.680] And just for testing, you can create a VOC user block server using this. [20:07.680 --> 20:15.160] Now if you're not using SPDK, instead you're writing your own C daemon, your own process, [20:15.160 --> 20:20.960] then one way of using VOC user block is to use the libvhostuser library. [20:20.960 --> 20:26.960] So this is a C library that implements the VOC user protocol, the server side of it. [20:26.960 --> 20:30.760] So this will allow you to accept VOC user connections. [20:30.760 --> 20:32.840] It doesn't actually implement verdioblock. [20:32.840 --> 20:33.840] That's your job. [20:33.840 --> 20:36.640] That's the job of the software defined storage system. [20:36.640 --> 20:40.820] But verdioblock consists of basically just processing the IO requests like reads and [20:40.820 --> 20:49.400] writes and so on, and also setting the configuration space so that the disk size is reported there. [20:49.400 --> 20:54.840] And you can find an example of a C program that implements VOC user block using the VOC [20:54.840 --> 20:55.840] user. [20:55.840 --> 20:58.920] I've put a link on the slide here for you. [20:58.920 --> 21:06.080] So that's how you can do it in C. In Rust, similarly, there is a library available for [21:06.080 --> 21:07.080] you. [21:07.080 --> 21:12.560] So there's the VOC user backend Rust crate, and it plays a similar role to the libvhostuser [21:12.560 --> 21:19.360] library for C. So this allows you to easily implement whatever VOC user device you want. [21:19.360 --> 21:25.400] And in this case, it's your job to implement the verdioblock, just as I mentioned. [21:25.400 --> 21:32.640] Okay, now I still wanted to touch on one con that we hadn't covered yet, because we've [21:32.640 --> 21:38.720] explained how although a user space interface is complex and is more work than just using [21:38.720 --> 21:45.360] file descriptors and read and write, I think that the libvhostuser block and so on, these [21:45.360 --> 21:49.440] libraries that are ready for you to integrate into your applications or software find storage [21:49.440 --> 21:56.600] systems, they take away that complexity, and they make the integration easier as well. [21:56.600 --> 21:59.160] You don't need to duplicate code or write a lot of stuff. [21:59.160 --> 22:01.720] But we're still left with one of the disadvantages. [22:01.720 --> 22:07.600] How do we connect this back to the kernel if it turns out we want to use some functionality [22:07.600 --> 22:13.680] from the kernel storage stack, or if we have a legacy application that we can't port to [22:13.680 --> 22:17.200] use the user space interface. [22:17.200 --> 22:21.200] So for VOC user block, there is a solution here. [22:21.200 --> 22:26.720] There's a Linux VD use feature, which is relatively new. [22:26.720 --> 22:33.760] And what it does is it allows a VOC like device to be attached to the kernel. [22:33.760 --> 22:37.840] So even though your software defined storage system is in user space, this gives you a [22:37.840 --> 22:41.240] way of attaching your block device to the kernel. [22:41.240 --> 22:48.840] And then in the kernel, the VerdiO block driver will be used to communicate with your device. [22:48.840 --> 22:54.800] And what happens is that a devvda or devvdb block device node will appear, and your application [22:54.800 --> 23:00.280] can open that like any other block device, and it can read and write and do everything [23:00.280 --> 23:02.280] through there. [23:02.280 --> 23:08.440] One of the nice features of this is that because it's quite similar to VOC user block, the [23:08.440 --> 23:11.640] code can be largely shared. [23:11.640 --> 23:16.320] I think the only difference would be that instead of having the V host user code, you [23:16.320 --> 23:21.800] would have the VD use code, which opens this character device that the VD use driver in [23:21.800 --> 23:24.960] the kernel offers instead of a Unix domain socket. [23:24.960 --> 23:27.560] And the setup and the control path is a little bit different. [23:27.560 --> 23:32.220] But the actual data path in the VerdiO block is still the same, so you can reuse that code. [23:32.220 --> 23:36.480] So that's an effective way of doing it. [23:36.760 --> 23:41.960] There's another new Linux feature that I wanted to mention that is interesting here, and also [23:41.960 --> 23:45.960] a little bit more general, even outside of VOC's user block, and that's U block. [23:45.960 --> 23:51.360] U block is a new Linux interface for user space block IO, so that your software defined [23:51.360 --> 23:58.760] storage system can present host kernel block devices. [23:58.760 --> 24:04.880] So you can have your block device and process it in user space. [24:04.880 --> 24:07.120] And it uses IO U ring. [24:07.120 --> 24:10.960] It's an exciting feature, and it's pretty interesting, so I've left the link here. [24:10.960 --> 24:16.720] The only thing with this is that compared to VD use, it does not reuse or share any [24:16.720 --> 24:19.040] of the V host user block stuff. [24:19.040 --> 24:23.280] So if you already have V host user block support in your software defined storage system, or [24:23.280 --> 24:27.960] you just want to streamline things, then U block is kind of a whole different interface [24:27.960 --> 24:29.040] that you have to integrate. [24:29.040 --> 24:33.600] So that's the only disadvantage, but I think it's pretty exciting too. [24:34.520 --> 24:41.000] Okay, so to summarize, if you need a user space block IO interface for the performance, [24:41.000 --> 24:47.440] or because you need to be able to do unprivileged IO, or for security, then implement VOC's [24:47.440 --> 24:48.440] user block. [24:48.440 --> 24:51.520] There are open specs, code, and community. [24:51.520 --> 24:54.520] Please let me know if you have any questions, and thank you. [24:54.520 --> 24:55.200] Have great FOS-DAM! [25:03.600 --> 25:04.600] Thanks for watching. [25:04.600 --> 25:05.600] I'll see you in the next video. [25:05.600 --> 25:05.600] Bye!