[00:00.000 --> 00:25.560] Thank you, thank you for joining the talk, welcome to my lightning talk. [00:25.560 --> 00:32.200] I want to talk today about performance optimization for voice of IP services. [00:35.200 --> 00:37.600] If you want to go out, please do it quietly, thank you very much. [00:39.840 --> 00:44.440] Quick to the agenda, just one example how to not achieve great performance. [00:44.440 --> 00:47.520] This is a real-life customer example, you probably will spot it immediately, [00:47.520 --> 00:53.160] what is the problem, just some guidelines on to approach performance problems. [00:54.120 --> 01:01.480] A few areas where you might want to look, some general examples for tools that you could use, [01:01.480 --> 01:05.600] that are interesting to use, of course for ten minutes it's not possible to go to a [01:05.600 --> 01:10.960] in-deep analysis of both performance topics, but nevertheless I hope it will be useful for you. [01:10.960 --> 01:18.680] My name is Henning, I started some time ago a company, we provide services for real-time [01:18.680 --> 01:23.280] communication services, work mostly with Camalio, do also a lot of other stuff, [01:23.280 --> 01:28.640] but I said mostly Camalio, if you're interested in the new stuff that's going on for the upcoming [01:28.640 --> 01:35.720] release in Camalio, please have a look to our website camalio.org, I didn't include it in this [01:35.720 --> 01:44.120] talk because it's not too much time. To the example, how to not achieve great performance, [01:44.120 --> 01:48.000] this is a real-life customer example, we were called to debug it during Covid, [01:48.000 --> 01:52.440] of course a lot of communications platform broke down during that time because of the [01:52.440 --> 01:57.840] increased demand, so the customer needed to make a routing decision in a SIP proxy in [01:57.840 --> 02:02.600] Camalio and what he did was basically use the exec module, exec module is generally a bad idea, [02:02.600 --> 02:09.320] you can use this to execute code or scripts on the system, use this to start a Perl script, [02:09.320 --> 02:14.840] the Perl script was then using a database layer in the Perl to access remote database, [02:14.840 --> 02:20.360] this database result would be reported back to Camalio, Camalio would pass it somehow into [02:20.360 --> 02:25.440] some JSON operations, process the message and this of course it works if you don't have a [02:25.440 --> 02:31.640] large load, but as soon as you get a higher concurrent call ratio on the system, of course [02:31.640 --> 02:36.840] this breaks down for obvious reasons because for every call you start a Perl script and this [02:37.240 --> 02:43.160] this will not going to work, this will not going to scale and if you have latency on the database [02:43.160 --> 02:47.960] all these Perl script invocations will take a long time, of course it will completely break down. [02:51.560 --> 02:56.680] Generally how to address performance problems, if you are an experienced operator experiences [02:56.680 --> 03:01.720] admin, this are probably no news for you, nevertheless of course most performance issues [03:01.720 --> 03:06.920] are not that obvious as in this example you should formulate a goal, okay I want to achieve [03:06.920 --> 03:11.880] that many concurrent calls, I need to support that many register messages on the platform, [03:11.880 --> 03:17.720] that many devices, I want to have I don't know 50,000, 100,000 concurrent connected [03:17.720 --> 03:24.200] user agent over TLS, whatever protocol you are using, WebRTC and in the best case of course you [03:24.200 --> 03:31.480] have some statistics, later we see some presentations about statistic projects [03:31.480 --> 03:37.480] from production load, maybe you have incidents where the system broke down or in the best case [03:37.480 --> 03:43.000] of course you have some performance test result. Generally speaking if you have performance issues [03:43.640 --> 03:49.640] we can cluster them in several performance related areas mostly related to machines, virtual machines, [03:49.880 --> 03:56.280] first side on the first hand you have CPU, Camelio in particular is really [03:57.000 --> 04:01.240] performant, normally you don't have performance issues there, Asterix is done in another story [04:01.240 --> 04:08.280] free switch as well, normally one frequent issue you might encounter is that if you have [04:08.280 --> 04:13.000] like a two large other commitment on your virtual system, virtual infrastructure, [04:13.000 --> 04:18.680] just keep in mind the physical core is not a virtual core of course, sometimes you have [04:18.680 --> 04:22.520] issues with other services running on the system, configuration management, [04:22.520 --> 04:27.240] maybe some void monitoring whatever you're using also in the system which causes a lot of CPU [04:28.200 --> 04:35.800] congestion, maybe you should adapt the Camelio worker configuration, the defaults are usually fine [04:35.800 --> 04:44.280] but nevertheless sometimes you need to adapt it. Related to the memory, Camelio if you install it [04:44.280 --> 04:49.960] from the default installation you definitely should increase the memory pool, the defaults are [04:49.960 --> 04:56.120] not really meant for production use, if you have a database of course normal tuning guidelines [04:56.120 --> 05:01.400] apply here, you should give the database plenty of memory, memory is cheap nowadays, if you have an [05:01.400 --> 05:08.120] HTTP API service maybe written in some Java service whatever Java language you should give them as [05:08.120 --> 05:15.080] well of course a lot of memory to perform correctly. In really special cases it's also [05:15.880 --> 05:21.320] might be a good idea to look to the Camelio memory manager default, it uses a bit the memory manager [05:21.320 --> 05:28.840] which is more suited for which has some debugging support built in, there's another memory manager [05:28.840 --> 05:33.800] without this debugging support but like a 99% of all infrastructure and scenarios you'd never use it, [05:33.800 --> 05:38.040] no never never need to change it but in some cases it might be beneficial to look into that. [05:40.360 --> 05:46.040] Most problems are usually related to IO, IO performance, yeah of course voice over PESIP is [05:46.040 --> 05:51.160] the protocol, it's relayed on DNS as most of the protocols out there, if the DNS is slow then also [05:51.160 --> 05:57.560] your server will be slow, Camelio uses an internal DNS cache, if you use Astrix there is no cache [05:57.560 --> 06:03.080] unfortunately so you should use DNS mask or something similar or keep some local DNS server in your [06:03.160 --> 06:10.200] data center in your infrastructure. For zip for real-time communication you need to write [06:10.200 --> 06:15.000] usually user registration this is something you can of course optimize, you can cache it, [06:15.000 --> 06:18.600] for Astrix there's something called Qualify which you use real-time infrastructure, [06:19.720 --> 06:23.880] this makes sense to tune maybe to deactivate it because it will basically scale with the number [06:23.880 --> 06:31.240] of your user and the write load will be also scale as well. Logging of course you need to look to it [06:31.240 --> 06:37.080] if you really need to log everything or maybe you can tune it to adapt to your scenario it makes [06:37.080 --> 06:41.160] sense to restrict it also with not only with Camelio of course with Astrix or other servers as [06:41.160 --> 06:46.920] well, if you have a lot of read operations they can usually cache quite well on Camelio, [06:46.920 --> 06:53.880] there's a htable module for Camelio you can use caching the data, you can also use something like [06:53.960 --> 07:00.600] read only replication, read is memcache whatever to scale that. The same for remote [07:00.600 --> 07:07.800] HTTP API requests this is also something you can cache of course. CDR writing we just saw [07:07.800 --> 07:14.760] call talk about CG rates, great project that offer these CDR capabilities, Camelio can also [07:14.760 --> 07:21.880] write CDRs internally but of course for highly loaded platforms it might sense to move it to another [07:21.880 --> 07:28.120] process to another system to have some asynchronous process doing the CDRs and not to affect the [07:28.120 --> 07:33.080] server operation and of course as we just saw in the beginning you should not fork processes [07:34.120 --> 07:40.840] if you rely on performance. What you could use for performance test one thing which is still [07:42.120 --> 07:48.840] used a lot is the old classical zp, there's pjwa you can script it a lot with Python or [07:48.840 --> 07:53.560] other bindings, they are dedicated to performance test frameworks usually they are homegrown [07:53.560 --> 07:58.760] or closed source unfortunately but they are the stuff you can pay or you can of course build by [07:58.760 --> 08:04.120] yourself. If you have a database hdp which is actually the bottleneck you can of course [08:04.120 --> 08:11.240] use custom tools to test the database to test the hdp API. Then for a start of course you see [08:11.240 --> 08:19.160] common tools to get inside about the cpu, the IO, the network situation, that can give you a [08:19.160 --> 08:24.040] lot of information if there's some pressure on the sockets for udp especially in particular. [08:26.360 --> 08:31.880] If you have tools like Humair we see a talk later about that as well, wipe monitor another tool [08:31.880 --> 08:37.400] of course the classical Isengar, Grafana, whatever statistics you have in house. Camelio [08:37.400 --> 08:43.880] offers also some benchmarks module and you can also adapt the logging a lot to your requirements. [08:46.440 --> 08:53.320] Okay that's all from my side thank you very much, just a quick pointer, we're doing Camelio world [08:53.320 --> 08:57.320] this year in presence again I'm really happy about that, it will happen at the beginning of [08:57.320 --> 09:02.520] June in Berlin called for papers was open so if you're interested in presenting something [09:02.520 --> 09:06.760] interesting there go ahead we are looking forward to your contributions there as well. [09:06.760 --> 09:14.120] Thank you very much.