[00:00.000 --> 00:10.840] So actually, I'm going to talk about three subjects, connect my name, proxy control [00:10.840 --> 00:16.320] option and also a little bit of rust throwing at the end as a bonus. [00:16.320 --> 00:24.640] So my name is Philip Holmberg and since a bit more than a year, I now work for NLNetLabs. [00:24.640 --> 00:31.640] So the question that has probably been posed by many people is can you have just a function [00:31.640 --> 00:37.160] that in comes a house name and a surface and you get a socket back. [00:37.160 --> 00:42.000] And sort of the starting point for this project, because we've got some funding, so we officially [00:42.000 --> 00:48.840] defined that as Mikael Aberson in the ITF one suggested that something like that should [00:48.840 --> 00:50.560] be done. [00:50.560 --> 00:55.960] And of course, we want to have options so we can have a slightly more modern version [00:55.960 --> 01:00.800] where you have a context as the first thing and it returns an error code in place of [01:00.800 --> 01:04.600] overloading that with the socket, but general idea. [01:04.600 --> 01:08.600] Of course, this is completely bad because this is blocking. [01:08.600 --> 01:11.000] This is what we now want. [01:11.000 --> 01:17.720] Unfortunately, because we only at NLNetLabs basically do DNS when it comes to name resolution, [01:17.720 --> 01:20.680] this talk ignores every other possible thing. [01:20.680 --> 01:27.280] We don't even do MDNS, but we definitely don't do anything fancy, but it should not [01:27.280 --> 01:28.280] be precluded. [01:28.280 --> 01:34.000] I mean, if people want to add it, why not? [01:34.000 --> 01:43.320] So to make it non-blocking, the obvious way to extend it is to take an event framework [01:43.320 --> 01:50.120] like LibEvent and then in LibEvent speak it is, well, you create an event base, you do [01:50.120 --> 01:56.160] a bit of initialization where you pass the event base to the asynchronous library function, [01:56.160 --> 02:00.680] you start it, it returns to say, well, okay, I'm busy. [02:00.680 --> 02:06.080] Then at some point it does a couple of callbacks, like this callback function that you pass, [02:06.080 --> 02:12.480] but the main loop is called event-based dispatch and as long as your entire application is [02:12.480 --> 02:17.600] written around it, then the application just calls this one and then you can call this [02:17.600 --> 02:22.000] connect by name as many times as you like. [02:22.000 --> 02:28.920] So if you want to make this practical more complex and do a release engineering, for [02:28.920 --> 02:34.760] example, getDNS has support for, I think, three event frameworks and you can define [02:34.760 --> 02:36.760] your own event framework and stuff like that. [02:36.760 --> 02:43.720] I'll ignore this, the only thing you're going to get here is LibEvent. [02:43.720 --> 02:50.040] But there's a couple of practical things that we would like to add, so now we get another [02:50.040 --> 02:57.920] full slide and so far I said you get a socket back, implicitly a socket back means TCP because [02:57.920 --> 03:04.080] while UDP is way too complex, but then in practice, who does TCP anymore? [03:04.080 --> 03:08.400] I mean, the thing is if you have a TCP socket, then you immediately call your SSL library [03:08.400 --> 03:12.440] and you want a TLS connection, I mean, at least I hope that people are not writing new [03:12.440 --> 03:18.080] codes that ships unencrypted data over the internet. [03:18.080 --> 03:24.400] Now within LibEvent, you're lucky because they have a concept called buffer event, that's [03:24.400 --> 03:32.240] why the callback there gets a buffer event, and LibEvent can transparently do SSL, so [03:32.240 --> 03:36.640] you just return right to the buffer event and then LibEvent, well, if it knows that [03:36.640 --> 03:44.280] it is a TLS, then it sends it to open SSL and if it's just a normal TCP connection, then [03:44.280 --> 03:46.760] it sends it to the socket. [03:46.760 --> 03:53.680] So that solves that problem and that allows the library to also do a couple of other interesting [03:53.680 --> 04:00.800] things as we will see on the other slide, but because we are an organization that is [04:00.800 --> 04:07.480] focused on DNS, we focused on all of the complexity of stuff that you can do with DNS. [04:07.480 --> 04:13.200] So for example, one thing that the library does, I forgot to mention, is that if you [04:13.200 --> 04:18.160] get multiple addresses back, then the traditional way is you write a for loop, you do connect [04:18.160 --> 04:22.360] to the first address and then to the second address and there's, I don't know, many minutes [04:22.360 --> 04:27.440] timeout on the TCP connection, so if the first address doesn't work, then it takes forever. [04:27.440 --> 04:32.760] So your library needs to do happy eyeballs such that you start to connect, wait not that [04:32.760 --> 04:39.080] long and then start the next connect, which also means that any timer system is not in [04:39.080 --> 04:43.320] the order of seconds, it should be definitely in order of milliseconds because it should [04:43.320 --> 04:51.320] be within human response levels and not like, okay, the network is down, we wait seconds. [04:51.320 --> 04:59.080] So that is stuff that this library can hide and that the prototype also does, but to get [04:59.080 --> 05:05.280] to the DNS part, if you have a modern web browser, then the web browser has an option [05:05.280 --> 05:12.800] to configure DNS and that's highly controversial because it goes over HTTP, but it's something [05:12.800 --> 05:18.160] where applications have now said, okay, we are done with, et cetera, resolve.golf, we [05:18.160 --> 05:23.520] from an application point of view want to be able to do, decide which is our upstream [05:23.520 --> 05:31.000] resolver, so we added configuration options that you can say, well, I want to have an [05:31.000 --> 05:36.120] upstream resolver that has authenticated encryption. [05:36.120 --> 05:43.400] I don't really like quick and I have no clue, so I say the only allowed transport protocols [05:43.400 --> 05:49.080] is plain old DNS over 53, which will always fail because it cannot do any encryption, [05:49.080 --> 05:57.400] but we do allow DNS over TCP, we do allow DNS over HTTP too, but none of the fancy quick [05:57.400 --> 06:04.640] things, we have a name for authentication and of course we can go completely overboard [06:04.640 --> 06:09.080] and also do SVC parameters. [06:09.080 --> 06:17.080] So that extends the call a bit because now the context has a way that you can say, well, [06:17.080 --> 06:21.760] this is my DNS policy and then it goes out and do it. [06:21.760 --> 06:29.720] I mean, basic interface is still more or less the same. [06:29.720 --> 06:39.280] So we worked on connect by name, we built a prototype and a grant from an LNET foundation, [06:39.280 --> 06:45.840] we support asynchronous resolution, well, of course, asynchronous also mean that your [06:45.840 --> 06:52.480] A or what A query should go in parallel, happy eyeballs, then of course the DNS community [06:52.480 --> 06:59.360] invented Dane, so if you do GLS then you also have to do the Dane query immediately and [06:59.360 --> 07:08.200] I forgot to list here, we also do SVCB and if you have the patience to configure experimental [07:08.200 --> 07:16.560] open SSL libraries, you can also do the encrypted client hello from SVCB into open SSL and stuff [07:16.560 --> 07:21.840] like that and the nice thing is you can all hide it in a single library. [07:21.840 --> 07:28.760] So what I would like from the community is sort of one is sort of what doesn't work, [07:28.760 --> 07:34.680] what extra stuff that we need, but we also have a problem with how do we go further with [07:34.680 --> 07:35.680] this. [07:35.680 --> 07:41.560] I mean, we built a prototype, but we cannot really ourselves make it into a product for [07:41.560 --> 07:49.080] various reasons, so take a look at it if you are interested and let us know if you want [07:49.080 --> 07:51.200] to do something. [07:51.200 --> 07:58.600] Current problem for me is it's on top of KTNS, KTNS is extremely nice library, but it tries [07:58.600 --> 08:04.040] to do everything, so it's also a very heavy weight library, so there it is like, it's [08:04.040 --> 08:09.520] a library that you want to link with potentially all applications should that be that heavy [08:09.520 --> 08:11.640] weight. [08:11.640 --> 08:18.400] So that's how we got to the next subject. [08:18.400 --> 08:25.960] This is sort of now what the ITF has created as what ASTAPS resolvers should do and I left [08:25.960 --> 08:32.480] out a case and other things because ADD is busy and I don't know, there's probably quite [08:32.480 --> 08:34.400] a few other working groups. [08:34.400 --> 08:40.440] So the stop resolver, which was a very simple thing with a recent that sends a query over [08:40.920 --> 08:47.840] port 53, has to do more and more and more stuff. [08:47.840 --> 08:56.840] So many applications, ASTAPS resolvers, how many libraries will implement all of those [08:56.840 --> 09:02.080] transports, especially if it's also implemented in different languages. [09:02.080 --> 09:09.440] It used to be that a stop resolver had basically no state, but if you do DOT, DOH, UQ, then [09:09.440 --> 09:16.000] you have connection setup, you generate load in a recursive resolver because if you're [09:16.000 --> 09:23.080] constantly setting up, say, DOT, DOH connections, then it has a way higher load than if it's [09:23.080 --> 09:31.080] just a simple UDP query and it's definitely bad for short-lived applications like Ping [09:31.080 --> 09:37.840] that have a way higher overhead setting up a connection to the local recursive resolver [09:37.840 --> 09:43.520] than the actual work that the application is doing. [09:43.520 --> 09:50.960] So the simple way to solve that, we thought, you introduce a local proxy. [09:50.960 --> 09:58.480] That's not really something new because lots of people are unbound as a local DNS proxy. [09:58.480 --> 10:05.480] Well, we also, as part of the GetDNS project created, Stubby, that focuses more on doing [10:05.480 --> 10:15.320] DNS all the time, there is things like DNS dist, DNS mask, system D, resolve D, so it [10:15.320 --> 10:23.080] looks like, okay, we don't have to worry about that, we can just talk to a local proxy. [10:23.080 --> 10:29.680] But then, if we go back to the example config I had for connect by name for the Firefox [10:29.680 --> 10:35.440] that wants to talk, DOH, how do you tell your local proxy that you actually want [10:35.440 --> 10:38.960] to have an authenticated connection? [10:38.960 --> 10:44.520] What if your proxy is just sending it, I don't know, to one of the public resolvers [10:44.520 --> 10:50.440] over port 53, maybe that's not what your application wants. [10:50.440 --> 10:59.600] And then, this whole local proxy falls down and you get, say, a browser again implementing [10:59.600 --> 11:06.080] its own step resolver because it doesn't have any control. [11:06.080 --> 11:17.040] So we thought about it for a while and created a draft in the ITF with a new ETNS zero option. [11:17.040 --> 11:23.760] And basically, when you send the request to your step resolver, then you can encode all [11:23.760 --> 11:30.680] of the stuff that you want to have as a policy in such an option. [11:30.680 --> 11:36.920] So you can be very basic and set a flag like, well, only give me an authenticated connection. [11:36.920 --> 11:41.520] If you can't do it and just report like it doesn't work or you could say, well, this [11:41.520 --> 11:48.120] is the recursive resolver that I want you to use, please use that. [11:48.120 --> 11:57.800] And then applications can trust the local proxy because they can control it. [11:57.800 --> 12:07.160] And it provides a nice way to basically reduce the step resolver footprint a bit by moving [12:07.160 --> 12:13.000] all of the difficult transports to the proxy. [12:13.000 --> 12:21.280] We have a proof of concept for that, though I have to warn you that we revised the layout [12:21.280 --> 12:25.880] of the option in the draft that is listed here and what the proof concept does is an [12:25.880 --> 12:26.880] older draft. [12:26.880 --> 12:34.960] But if you want to play with it with the general concept, then that is there. [12:34.960 --> 12:44.200] So we decided that, well, we can continue writing code in C and, of course, for our [12:44.200 --> 12:50.160] existing products like unbound NSD, we will just maintain them in C because they are written [12:50.160 --> 12:58.680] in C. But we would like to try to move to Rust for new code. [12:58.680 --> 13:08.120] And I just copied a little bit of stuff from a prototype. [13:08.120 --> 13:20.440] First thing uses Rust in creative ways and that is something where it's now a prototype [13:20.440 --> 13:25.920] and we definitely need feedback from users of the library like, okay, it's very great [13:25.920 --> 13:30.320] that you can have a message builder that takes a static or press or type and it has [13:30.320 --> 13:34.720] a stream target but probably you don't want to write code like that. [13:34.720 --> 13:42.080] So it's built at the moment to be flexible and use the language but it should be somewhere [13:42.080 --> 13:45.720] modified to be more usable. [13:45.720 --> 13:53.080] Then here in the middle, you basically get the main thing because the whole thing is [13:53.080 --> 13:58.000] generic if you want to send a query, then you have to go to the question section and [13:58.000 --> 14:05.280] then you say, well, I want to push a question there and then there is again a bit of a usability [14:05.280 --> 14:12.840] problem where you say, okay, I need this back to a builder and I need a clone of it. [14:12.840 --> 14:18.400] So this is the part that I experimented with. [14:18.440 --> 14:26.520] If you want to have a TCP upstream, then you say create the TCP connection and the nice [14:26.520 --> 14:35.160] thing with Rust is that it can do all of the asynchronous stuff with a nice syntax. [14:35.160 --> 14:42.200] So basically you say, do this connect here and wait until the connect is done but because [14:42.200 --> 14:49.840] this function is implicitly asynchronous, as a programmer you can just write this as [14:49.840 --> 14:54.880] if it's sequential code but the caller can just call this as an asynchronous function [14:54.880 --> 15:00.040] and you don't have to do anything extra. [15:00.040 --> 15:08.200] Here I have to do a bit more work to really figure out how it fits in the Rust ecosystem [15:08.880 --> 15:17.120] because the thing with if you have a TCP connection upstream to a DNS resolver and I wanted to [15:17.120 --> 15:24.400] have this as just the basics for maybe DOH or whatever is that you want to set up the [15:24.400 --> 15:29.720] connection once but then you want to potentially send many queries over it. [15:29.720 --> 15:37.920] So I need to have a separate thing that actually talks TCP as a worker threat but then because [15:37.960 --> 15:45.640] it's all asynchronous this is basically getting an asynchronous worker and then I also say [15:45.640 --> 15:52.560] well give me an asynchronous query and then in Rust you can say okay you have two asynchronous [15:52.560 --> 15:59.080] things that you want to do at the same time well just do them both at the same time and then [15:59.080 --> 16:05.160] normally we expect to be here that we got a reply and then we print a reply and we are done. [16:06.080 --> 16:15.680] So this is sort of the direction we want to go to which is also why we have a bit of a problem [16:15.680 --> 16:20.880] developing the connect by name prototype that we now have because it is like okay we don't really [16:20.880 --> 16:25.040] want to have a new prototype in C what do we want to do with it. [16:25.040 --> 16:33.040] So that's what I wanted to tell today there is I think plenty of space for questions. [16:33.040 --> 16:50.040] I love the idea of having a function which can deal with not just name a resolution but DNS name a [16:50.040 --> 16:56.000] resolution and also the cryptography but as a distribution maintainer I have to say that [16:56.560 --> 17:03.200] having something a library function which makes applications behave differently from all other [17:03.200 --> 17:09.800] applications is really a non-starter so I think that you need to consider in some way to support [17:09.800 --> 17:19.280] NSS and the NSS plugins through the libc or however it's better. You mentioned that probably [17:20.240 --> 17:29.160] a demon is needed to get good performance so maybe the DNS part is the less important one [17:29.160 --> 17:39.880] that you can delegate to some other component. I'll try to summarize you say there's something [17:39.960 --> 17:49.600] with distributing this and there is something with if you run a local proxy then you don't have to [17:49.600 --> 17:58.840] focus as much on DNS if I got that correct. There are already some projects in this space [17:58.840 --> 18:09.520] that you mentioned and they are expected to work with the normal libc NSS plugins and I think [18:09.520 --> 18:18.520] that your library to be universally used that I think that's the task to be your goal you need [18:18.520 --> 18:25.120] to support the normal name resolution which is expected by any current applications so it has [18:25.120 --> 18:35.880] to support the libc plugins. You say the library will only be adopted if it supports the libc [18:36.360 --> 18:44.840] plugins. Yes I agree I mean that's why we made the prototype because we were looking into what [18:44.840 --> 18:51.480] should the interface to the library be how should the library behave stuff like that sort of the [18:51.480 --> 18:59.360] high-level stuff and fully expecting that any production quality implementation of the library [18:59.360 --> 19:07.160] has to take a lot of this stuff into account and certainly dealing with nestwitch.conf is I guess [19:07.160 --> 19:18.040] mandatory for any production quality library. For the proxy control option because there are lots [19:18.040 --> 19:25.240] of demons in that space of course it's best if those adopt the option once it is actually standardized [19:25.240 --> 19:31.960] by the ITF. I mean it's not that we want to write another proxy it's just like we have a very [19:31.960 --> 19:38.960] specific problem that we want to solve if we want to make stuff resolve a small and still give them [19:38.960 --> 19:48.560] access to all of the encrypted transports but yeah if for example system dresolve they would also do [19:48.560 --> 19:54.120] the proxy control option then it would be perfectly fine I mean there's no new reason to write a new [19:54.720 --> 20:12.440] one for the proxy control option. Is it only the step resolver that will tell the proxy server that [20:12.440 --> 20:18.000] it wants those policies applied or does the proxy also communicate back to the step resolver that [20:18.000 --> 20:23.560] is actually implying those policies because in the initial situation where nothing supports it, [20:23.560 --> 20:30.600] which you always have. So the question is what happens if you send a proxy control option [20:30.600 --> 20:38.800] to an older step resolver that may not be aware. So I didn't want to go over the entire draft, [20:39.720 --> 20:50.960] so we thought about that. But basically there are some priming queries. I forgot the exact name. [20:50.960 --> 21:00.080] Is it resolver.ARPA that is proposed? Something like that. So try to look up resolver.ARPA, [21:00.080 --> 21:07.080] see if you get the right response. If you don't, then the only thing you leaked is that you were [21:07.080 --> 21:13.080] trying to look up resolver.ARPA. We assume that that is safe and then if you do get it, [21:13.080 --> 21:19.280] then you know that the proxy understands it. Yeah. Any more questions? Okay, yeah. [21:19.280 --> 21:24.320] There's actually a comment on both this presentation and the previous one. You're [21:24.320 --> 21:29.040] tackling three moving targets at the same time. You're trying to figure out how to integrate [21:29.040 --> 21:33.200] with the event loop. You're trying to figure out what your API to the application looks like [21:33.200 --> 21:40.680] and you need to figure out what your integration with NSS or system. The complexity is multiplicative, [21:40.680 --> 21:47.160] so you're curbing this. This is a horrible idea. You can at least remove the event loop [21:47.160 --> 21:53.600] integration as a moving target. There is an existing project called libverto which tried [21:53.600 --> 21:59.000] to just solve that one problem by providing four libraries and API to integrate with an [21:59.000 --> 22:07.160] arbitrary event loop provided by the application. I think you need to remove the number of moving [22:07.160 --> 22:14.120] targets like reduce it and maybe the event loop is the one to kick out first and try to put in [22:14.120 --> 22:24.040] a separate consideration how to solve that and then continue from there. So the question was [22:24.040 --> 22:33.280] basically it tries to deal with too much stuff at the same time. Event loops, figuring out an [22:33.280 --> 22:40.640] API and then also figuring out how to deal with an S-switch. There's an existing library called [22:41.360 --> 22:50.720] virto. That makes it easier to be flexible with respect to event loops. That's definitely a [22:50.720 --> 22:56.880] good point. I'll try to look at it, but I specifically decided to only focus on libEvent [22:56.880 --> 23:06.560] to just get virto. To get something, a prototype up and running and not try to support arbitrary [23:06.560 --> 23:18.280] things like that. More questions, some more time. Okay, it seems that we have run out of questions.