[00:00.000 --> 00:13.400]  How's everybody feeling? Good. So if you haven't figured it out yet, I'm Chris Nova. Some people
[00:13.400 --> 00:21.180]  call me Chris. Some people call me Nova. Just don't call me Shirley. So we're going to get
[00:21.180 --> 00:25.000]  started with a few quick questions. There's a lot of people here. So I just want to get
[00:25.000 --> 00:30.120]  a feel for who's in the audience. So who here knows what mastodon is? Show of your hands.
[00:30.120 --> 00:35.200]  Okay. For folks at home, literally everybody just put their hand up. Who here knows what
[00:35.200 --> 00:43.040]  hackaderm is? Oh, God. Sorry. So pretty much the same number of people. Who here knows
[00:43.040 --> 00:49.160]  how to denial a service? Dadosa service? Okay. And how about just general abuse? How
[00:49.160 --> 00:54.240]  to like just use a service? Okay. So for those of you on the camera, literally the entire
[00:54.240 --> 01:01.480]  stadium of people just put all of their hands up. I can't believe I'm about to do this.
[01:01.480 --> 01:09.680]  You have 45 minutes starting now. You have my full permission to DOS my shit. Take down
[01:09.680 --> 01:16.680]  the service. You can do whatever you want. There are three known things I know of today
[01:16.680 --> 01:20.840]  that should make this pretty easy. I think if you knew exactly what you were doing, you
[01:20.880 --> 01:25.200]  could probably do it in about five minutes. So anyway, that's how we're going to start
[01:25.200 --> 01:32.920]  the talk off today. So the goal of this is to wake my partner up. So she's asleep right
[01:32.920 --> 01:39.800]  now. She's at home in Seattle. And if you're successful in disrupting the service, she
[01:39.800 --> 01:44.520]  will get some discord notifications and we have a team of volunteers. Their phones will
[01:44.520 --> 01:51.840]  go off. My phone will start going off. And hopefully, hopefully my puppy greets her with
[01:51.840 --> 01:59.000]  a smile and wakes her up as there's inevitably a crisis. Okay. So the reason I wanted to
[01:59.000 --> 02:05.040]  start this off is because we have had to do a tremendous amount of work to bring Hack
[02:05.040 --> 02:12.360]  Derm to where it is today. So just to kind of start the slides off with some basic numbers
[02:12.400 --> 02:17.080]  here. I don't know if y'all can see this, but this is just a public glimpse into the
[02:17.080 --> 02:24.280]  service that's online today, just serving mastodon. And there's 44,000 users. It looks
[02:24.280 --> 02:28.440]  like we had 200 people sign up today. I don't know how many of those people were here at
[02:28.440 --> 02:34.640]  Fosnum. I don't really know very much about them at all. We've had 20,000 toots. Sorry,
[02:34.640 --> 02:41.640]  we've had 789,000 toots. And we have 20,000 monthly active users. So there's been 20,000
[02:42.040 --> 02:47.200]  people who signed into the service in the past 30 days alone. And we are currently federating
[02:47.200 --> 02:54.700]  with another 20,000 instances, which in my opinion is yet another attack vector for the
[02:54.700 --> 03:00.480]  internet at large that we should probably spend more time discussing. So if you are
[03:00.480 --> 03:05.920]  successful in flooding the service, hopefully by the end of my talk, we should see some
[03:05.920 --> 03:12.920]  spikes in these two middle graphs here. The HTTP response time is probably the most sensitive
[03:13.120 --> 03:20.120]  part of our entire system today. Cool. So let's get back into it. So about me, I work
[03:24.960 --> 03:29.920]  at GitHub. I'm a principal engineer at GitHub. I'm also an author. I've written some mediocre
[03:29.920 --> 03:36.160]  quality books. And as of four days ago, I'm also the president and a board member of a
[03:36.160 --> 03:42.660]  foundation I'll tell you about here shortly. And if you want to follow me on decentralized
[03:42.660 --> 03:48.360]  social media, there's my links there. Okay. So we're going to start off and we're going
[03:48.360 --> 03:52.600]  to do some basic context studying. And then we're going to go into like a little bit of
[03:52.600 --> 03:59.040]  an incident report of a situation we found ourselves in last November. And then we'll
[03:59.080 --> 04:04.200]  talk a little bit about what this means to me, what this means to the United States economy,
[04:04.200 --> 04:08.800]  the legal situation in the United States, and how we're kind of navigating all of this
[04:08.800 --> 04:15.800]  that we really kind of just stumbled upon earlier last year. So the short story here
[04:16.760 --> 04:22.040]  is my little mastodon server that was used for me in about 100 of my friends, if maybe
[04:22.040 --> 04:28.960]  not even 100, maybe 50 of my friends, had very quickly turned into what I consider medium
[04:29.000 --> 04:34.800]  size scale. And when we reached medium size scale, a lot of the problems aren't necessarily
[04:34.800 --> 04:40.440]  related to the technology. Although, as you're about to find out, operating a Ruby monolith
[04:40.440 --> 04:47.200]  at scale does come with a substantial amount of concerns, which we'll dig more into that
[04:47.200 --> 04:54.200]  in a moment. Okay. So just for folks at home who are watching the video after the fact,
[04:54.440 --> 05:00.440]  I want to give a little bit of context on mastodon in general. And I want to be clear.
[05:00.440 --> 05:05.960]  I am not a mastodon. Well, I guess it depends what you define as a contributor, but I don't
[05:05.960 --> 05:12.960]  work on mastodon that much. I've written a few issues. I've helped talk to some folks
[05:12.960 --> 05:18.960]  who do contribute to the project. But for the most part, this is probably the most detached
[05:19.000 --> 05:25.500]  I am from any of the open source projects I work on. I literally am a consumer of mastodon.
[05:25.500 --> 05:30.320]  The most involved I have gotten with this particular project has been going to GitHub
[05:30.320 --> 05:35.480]  and going to the release tab and downloading the latest versions for me to go and install
[05:35.480 --> 05:42.000]  on my server. So it's kind of nice, not going to lie, to just be on the consumer side of
[05:42.000 --> 05:47.720]  the open source for a change. But mastodon is ultimately social networking that's not
[05:47.760 --> 05:54.680]  for sale. It's built on the activity pub W3C protocol. And it's an alternative to familiar
[05:54.680 --> 06:00.680]  social media sites like Twitter. And it gives you much more ownership and control of your
[06:00.680 --> 06:07.680]  data from both an operator and a user perspective. Okay. So this is probably the number one
[06:07.720 --> 06:12.280]  question I get asked, which is how did we come up with the name Hackaderm? And if you
[06:12.320 --> 06:17.760]  talk to my friends in Italy, it's hashaderm or hashadermio or I've heard a lot of different
[06:17.760 --> 06:22.760]  variations of it. It's my partner came up with the name. It's ultimately a plan words
[06:22.760 --> 06:29.760]  with hacky and packaderm. So hacky is a clumsy, temporal or in elegant solution to a technical
[06:30.200 --> 06:37.200]  problem. And packaderm is a large thick skinned mammal such as an elephant rhinoceros or
[06:37.920 --> 06:43.360]  hippopotamus, obviously mastodon. You see where we're going with this. And so we like
[06:43.360 --> 06:50.360]  to say that hackaderm is a clumsy, temporary or in elegant thick skinned social media server.
[06:51.960 --> 06:56.320]  And depending on how successful some of these people with their laptops are, we're going
[06:56.320 --> 07:03.320]  to see how thick the skin really is. So again, right now we have roughly 45,000 hackadermians
[07:03.880 --> 07:10.880]  is what we refer to the people in the community, which is a lot of people. I wasn't prepared
[07:13.400 --> 07:19.460]  for the sheer number of people and the sheer number of like bizarre things that we would
[07:19.460 --> 07:26.360]  be getting into as we approach the size scale. And we have 20,000 people who are active.
[07:26.360 --> 07:32.320]  And so this is, there's a lot of implications of that specific ratio. But for the most part
[07:32.320 --> 07:39.320]  we see a lot of traffic go through our network every day. And I think at least once a day
[07:39.760 --> 07:45.680]  there's some sort of crisis. So we have all of the major problems of Jurassic Park, of
[07:45.680 --> 07:51.240]  a major theme park, of a normal technical shop, which has been fascinating to kind of
[07:51.240 --> 07:58.240]  watch this whole thing grow. The hackaderm community is pretty interesting. And to be
[07:58.280 --> 08:04.280]  completely honest, I'm still not really sure how it ended up the way it did. But it's mostly
[08:04.280 --> 08:10.280]  composed of technical and open source professionals, such as people here. It's similar to Fostodon,
[08:10.280 --> 08:15.280]  who here has heard of the Fostodon MasterDawn server. That one's also great. Also, I have
[08:15.280 --> 08:20.280]  some colleagues who work on the InfoSec one. That's also another good one. But I see a
[08:20.280 --> 08:25.280]  lot of like SREs, such style people, a lot of senior engineers who work on the InfoSec
[08:25.320 --> 08:32.320]  style people, a lot of senior engineers, directors. We even have some executives. And then we
[08:32.320 --> 08:38.320]  also have like honestly just some very beautiful anonymous hackers who keep everybody in check.
[08:38.320 --> 08:44.320]  And so it's a good blend of people. And we see a lot of interesting things come through
[08:44.320 --> 08:51.320]  our various servers. So our about page reads, here we are trying to build a curated network
[08:51.360 --> 08:57.360]  of respectful professionals in the tech industry around the globe. And the around the globe part
[08:57.360 --> 09:02.360]  is the interesting part, especially when we start looking at the legal implications of this,
[09:02.360 --> 09:08.360]  which again, we weren't necessarily prepared for. And we welcome anyone who follows the rules
[09:08.360 --> 09:15.360]  and needs a safe home or a fresh start. I think this was personally a big one for me. And I
[09:15.360 --> 09:20.360]  think this is also very relevant to a lot of the folks that I know who have joined in the last
[09:20.400 --> 09:26.400]  few months. I do think that there's like some pretty, in my opinion anyway, some pretty toxic
[09:26.400 --> 09:32.400]  mental health situations that folks find themselves in using Twitter. And I think that this is
[09:32.400 --> 09:37.400]  kind of an opportunity to just like rip the Band-Aid off and start fresh and kind of establish
[09:37.400 --> 09:43.400]  some new habits for people and some new self image for people. And so I do see a lot of people
[09:43.400 --> 09:48.400]  kind of reimagining themselves and reinventing themselves when they come to Hackaderm. But
[09:48.440 --> 09:53.440]  yeah, ultimately, it's hackers, professionals, enthusiasts, and we're passionate about life,
[09:53.440 --> 09:59.440]  respect, and digital freedom, and we believe in peace and balance. And I wrote this very casually
[09:59.440 --> 10:04.440]  like on a Twitch stream, and those words are actually pretty important now that we're continuing
[10:04.440 --> 10:09.440]  to dive a little deeper into what they actually mean. I think the thing that kind of comes to mind
[10:09.440 --> 10:15.440]  right now, the word professionals and enthusiasts right next to each other, when you come to
[10:18.440 --> 10:24.440]  a certain scale, having a lot of enthusiasts sit alongside professionals comes with some consequences
[10:24.440 --> 10:29.440]  and balancing these two things is actually pretty challenging from an operation standpoint.
[10:29.440 --> 10:34.440]  But ultimately, we want to be a safe space for the tech industry, for people who want to talk
[10:34.440 --> 10:40.440]  about the economy, open source intelligence, news. We talk a lot about Rust, we talk about Linux.
[10:40.440 --> 10:46.440]  Who here was at my talk on Aura yesterday? Awesome, thank you. So a few folks here. So that's a
[10:46.480 --> 10:52.480]  new project that I'm trying to get more people to talk about. We talk about Kubernetes, Go, et cetera,
[10:52.480 --> 11:01.480]  et cetera. So anyway, we're going to spend a little bit of time talking about this blog post that
[11:01.480 --> 11:09.480]  I wrote called Leaving the Basement. And to set the context a little bit, Hackaderm literally
[11:09.520 --> 11:16.520]  started running in my basement. And this is the story of how we kind of ended up moving out of the
[11:16.520 --> 11:22.520]  basement and dealing with some pretty substantial scale problems. I think it was in the middle of
[11:22.520 --> 11:29.520]  November, we started to, the service started to degrade. And there was a lot of consequences of
[11:29.520 --> 11:37.520]  just shutting the service down. And so people were getting very aggressive on the internet. As it
[11:37.560 --> 11:41.560]  turns out, the internet is full of grown men with opinions. I don't know if any of y'all have noticed
[11:41.560 --> 11:48.560]  this or not. But yeah, sometimes these grown men with opinions have very toxic opinions, and they
[11:48.560 --> 11:53.560]  like to say a lot of things about people's services. And so we tried to do our best to keep a
[11:53.560 --> 11:59.560]  positive attitude and just continue to move forward. So this is the story of what actually
[11:59.560 --> 12:04.560]  happened behind the scenes and how we ended up there. And I think there's some really good
[12:04.600 --> 12:13.600]  takeaways in this from a technical perspective. Okay, so we'll begin our story on November 27th
[12:13.600 --> 12:20.600]  of last year, 2022. And also keep in mind that, you know, this is one month before the holidays. So
[12:20.600 --> 12:26.600]  this is about the most burnt out I ever get every year. So usually around the end of November,
[12:26.600 --> 12:31.600]  I'm honestly, I'm about the most I can say to anyone is like fuck off, like I just really need
[12:31.640 --> 12:36.640]  some space. And I need a break and I want to go relax and I want to sleep in. And this is when
[12:36.640 --> 12:42.640]  our new service decided just to completely go down. And this was a really good growing opportunity
[12:42.640 --> 12:49.640]  for people. So we have some really interesting numbers here. And I tried to do my best to build a
[12:49.640 --> 12:55.640]  graph. And this is like, it looks like a very stereotypical stock graph that's like pointing up
[12:55.680 --> 13:00.680]  into the right. So I feel like I should just, you know, do like a good, like, hi, guys, we're here
[13:00.680 --> 13:04.680]  to talk about business. And look, our business is going up into the right. And like business numbers
[13:04.680 --> 13:09.680]  are important because growth and strategy and impact and business. But honestly, this is just the
[13:09.680 --> 13:14.680]  amount of people who were leaving Twitter. And really, I think they were just kind of looking for a
[13:14.680 --> 13:23.680]  new home. And we just happened to be one that that met their needs for the time being. So up until
[13:23.720 --> 13:31.720]  November 1st, we had less than 700 people. The prior six months, the service was online. That's how
[13:31.720 --> 13:38.720]  we gained those 700 people. So it was roughly 100 people a month for the first six months. And then
[13:38.720 --> 13:48.720]  this happened. And this was very unexpected for both myself and everybody in my immediate circle. So
[13:48.760 --> 13:54.760]  one of the things I talk about as a professional SRE. So let me back up. When I'm not keeping the
[13:54.760 --> 14:00.760]  masted on Ruby monoliths online, my other job is keeping the GitHub Ruby monoliths online. So
[14:00.760 --> 14:06.760]  some of you use GitHub. Some of you use Hackaderm. I work on both of them. And I have two different
[14:06.760 --> 14:12.760]  UB keys here in my backpack, one for each service. And so anyway, one of the things I often say is
[14:12.800 --> 14:17.800]  when I enter a conversation with someone, this is the most important thing. And I honestly want to
[14:17.800 --> 14:22.800]  get this as like my next tattoo because I say this at least once a week, which is what is the current
[14:22.800 --> 14:27.800]  state of the systems? And if you can't answer this question very confidently at any given moment,
[14:27.800 --> 14:32.800]  especially in a crisis, we should be having other conversations at that point because this is the
[14:32.800 --> 14:38.800]  starting point of every conversation in my opinion. So we'll start our service off our service
[14:38.840 --> 14:45.840]  discussion off here. So we had a rack of hardware in my basement. And these are the specs that we had
[14:49.840 --> 14:55.840]  running in the basement. So it was a hobby rack that I've collected over the past 10 years or so,
[14:55.840 --> 15:02.840]  you know, pieces of hardware that have been donated to me or that I found for a cheap price that
[15:02.880 --> 15:08.880]  were used. In fact, the star of the show, Alice, over here on the far left, I've literally carried her
[15:08.880 --> 15:15.880]  across Market Street in San Francisco and dropped her in a pile of like pee on the side of the road.
[15:15.880 --> 15:20.880]  The hardware has been through a lot to say the least, but it's what we had and this is what I was
[15:20.880 --> 15:26.880]  using for kind of a home lab at the time. I think the important thing here to notice though is that
[15:26.920 --> 15:32.920]  these are not trivial computers. These are proper rack mounted servers with proper specs and most,
[15:32.920 --> 15:38.920]  for the most part, these worked fine. It got the name the water tower because we had Alice, who was
[15:38.920 --> 15:44.920]  our main compute node, and then we had three identical Dell PowerEdge R620s named Yacko, Wacko,
[15:44.920 --> 15:50.920]  and Dot, respectively, and all three of them seemed to just be up to shenanigans at any given point
[15:50.960 --> 15:57.960]  in time, from memory failure to broken boot loaders to just bizarre networking behavior and having
[15:57.960 --> 16:02.960]  to go flip out Nix to try to get a better network connection. There was just a lot of obscure things
[16:02.960 --> 16:11.960]  that was happening at the hardware level. So, Meet Alice. She's a very infamous server, especially
[16:11.960 --> 16:17.960]  if you've read any of our posts or if you've ever watched my Twitch stream, but there she is
[16:18.000 --> 16:24.000]  there, and that's in my basement, and that's the Dell R630, and you can see she's got eight SSDs
[16:24.000 --> 16:32.000]  in the front of the carriage there, and she was sitting behind a firewall of my own design,
[16:32.000 --> 16:39.000]  and that was our main endpoint for pretty much everything I ran in my home lab, and that just
[16:39.000 --> 16:45.000]  so happened to be the main endpoint for our mastodon service up until the month of November.
[16:45.040 --> 16:55.040]  So, yes, it was a home lab, and I think the whole point of this is that we used it for a lot of
[16:55.040 --> 17:02.040]  things, and so the mastodon service was running on the home lab, and I do a lot of really bizarre
[17:02.040 --> 17:08.040]  things on Twitch, so if you follow me on Twitch, you probably have seen me work on kernel modules
[17:08.080 --> 17:14.080]  and experimental EBPF probes, and I've experimented with adding some features to the ZFS file
[17:14.080 --> 17:21.080]  system and compiling my own version of ZFS from scratch, and I've been doing a lot, and I also
[17:21.080 --> 17:28.080]  installed mastodon on that same server, and that's the key part of this. So, here's a list of
[17:28.080 --> 17:34.080]  things from my home lab that have not blown up. There has not been a billionaire who decided to
[17:34.120 --> 17:40.120]  buy a company that decided to insult the broader technical community and encourage them to move
[17:40.120 --> 17:45.120]  off to a decentralized service, and so you've probably never heard of any of these, and that all
[17:45.120 --> 17:51.120]  of these also run in that same home lab, so I think it's important to realize that this was a
[17:51.120 --> 17:58.120]  very unexpected event, and that these servers were in a pretty high state of entropy, and we
[17:58.160 --> 18:05.160]  didn't really have a good idea of the state of the systems. This was a home lab. So, as it turns out,
[18:05.160 --> 18:12.160]  50,000 people trust me and really dislike a certain billionaire, and this was the one thing I
[18:12.160 --> 18:19.160]  kept hearing. We kept having large to medium size and smaller size name people with substantial
[18:19.160 --> 18:24.160]  Twitter following, like shoot us an e-mail and be like, yo, Nova, I'm done with Twitter, we're
[18:24.200 --> 18:28.200]  going to come to your mastodon server, and I'm like, okay, sounds cool, and then they have, you
[18:28.200 --> 18:34.200]  know, 350,000 Twitter followers, and it's not about the followers, but from an operator's
[18:34.200 --> 18:39.200]  perspective, I'm like, holy shit, that's a lot of traffic. That's a lot of people that we're
[18:39.200 --> 18:43.200]  going to have to open up web sockets against, and there's a lot to deal with there if you're
[18:43.200 --> 18:47.200]  going to be sending all of these messages out to all of these people who are going to be following
[18:47.200 --> 18:51.200]  you, and they all continue to say the one thing, which is like, well, we trust you to not screw
[18:51.240 --> 18:55.240]  this up, and like, you know, you can probably do better than he can, so we're just going to move
[18:55.240 --> 19:03.240]  over to your server anyway. Okay, so what had ended up happening is, it's a long story, and how we
[19:03.240 --> 19:08.240]  ended up finding it, I think ultimately took about three weeks, but we don't say root cause
[19:08.240 --> 19:13.240]  anymore, we say core cause, and the core cause of the incident is ultimately we had a bad disk
[19:13.240 --> 19:19.240]  on Alice. I don't know why the disk was bad, this really bothers me, so I'm just going to chop it
[19:19.280 --> 19:24.280]  up to like, it was just like a bad one in the batch, but we were able to actually isolate it
[19:24.280 --> 19:31.280]  after the fact, and determine that like a basic reader write to this SSD was in fact the problem.
[19:31.280 --> 19:38.280]  I also think an interesting takeaway here is that these were not consumer SSDs, these were
[19:38.280 --> 19:45.280]  actual proper enterprise SSDs, one of which either just decided to get slow IO, or I don't
[19:45.320 --> 19:51.320]  know what happened to it, but even in an isolation zone writing directly to an EXT4 system, we were
[19:51.320 --> 19:56.320]  still able to prove that this disk was substantially slower than another one of the same make and
[19:56.320 --> 20:03.320]  model. So it wasn't always bad, and it started to go bad, and this ultimately led to a cascading
[20:03.320 --> 20:10.320]  failure across our CDN and our geographic edge nodes, and so the interesting thing, and this
[20:10.360 --> 20:16.360]  is just one of those things, this is the aforementioned bad disk, and it also for some reason
[20:16.360 --> 20:23.360]  has a broken chassis in the front, so part of me kind of has to wonder, did the movers drop
[20:23.360 --> 20:28.360]  the server, or did something happen? I'm not really totally sure, but these are the woes of
[20:28.360 --> 20:36.360]  operating your own hardware in your basement. So here's a model of the cascading failure. Who
[20:36.400 --> 20:42.400]  here has dealt with cascading failures in production before? Okay, so 15 or 20, 30 hands
[20:42.400 --> 20:49.400]  or so. These are fascinating, how you get into these situations, and usually when you're
[20:50.640 --> 20:55.640]  dealing with one of these cascading failures, you're not really starting at the database,
[20:55.640 --> 21:00.640]  or at least you glance at the database and you think maybe something's wrong, and you
[21:00.760 --> 21:07.760]  usually blame DNS, but in our case, we were working back from our CDN. So imagine you
[21:08.240 --> 21:12.680]  are operating a mastodon server in your basement, and 50,000 people on the internet decide to
[21:12.680 --> 21:16.120]  join, and all of a sudden you can't even join a zoom call the next morning, because your
[21:16.120 --> 21:21.560]  internet pipeline is so throttled from your ISP, who's like, bro, why are you bringing
[21:21.560 --> 21:26.600]  this much traffic to your house? I don't understand what's going on, this is very bizarre. So what
[21:26.680 --> 21:33.180]  we did is the very first thing we did to offset the problem was we set up these CDN nodes
[21:33.180 --> 21:38.720]  around the world, and these basically served as reverse engine X proxies that had media
[21:38.720 --> 21:43.720]  cash on them, and we would then route the traffic through a dedicated connection from
[21:43.720 --> 21:50.720]  one of these CDN nodes back to YACO in my rack, and then YACO would then proxy the data
[21:51.360 --> 21:57.360]  over to Alice, and Alice was our main, our primary database running in the rack. So when
[21:57.360 --> 22:02.360]  things started to fail, it was like very intermittent failures in Frankfurt, and then we would get
[22:02.360 --> 22:07.280]  like some very intermittent failures in Fremont, and it all looked like engine X was the problem,
[22:07.280 --> 22:12.160]  we were getting timeouts and slow requests, and this whole incident is what later inspired
[22:12.160 --> 22:15.880]  us to build that dashboard that you see today, and the reason I was like we should be looking
[22:15.920 --> 22:22.920]  at those HTTP request times when I very politely asked you all to please DDoS my server, and
[22:23.280 --> 22:28.840]  so that transferred all the way back to Alice, and we learned entirely too much about Mastodon
[22:28.840 --> 22:34.520]  at scale, retracing everything back through the rack, and we had to go and trace Redis
[22:34.520 --> 22:41.520]  logs, and Sidekick queues, and Mastodon Ruby servers with the Puma server, and ultimately
[22:41.760 --> 22:47.960]  we found out that it was simply just Postgres unable to read and write from the database
[22:47.960 --> 22:54.960]  as fast as we would like. So these are what the graphs looked like the day of the outage,
[22:55.800 --> 23:00.160]  so we grabbed some screenshots, and I'm really glad we did because these make for some interesting
[23:00.160 --> 23:06.200]  takeaways here. On the left side you can see our HTTP response time, and so these are our
[23:06.280 --> 23:13.280]  get 200s, so in some cases the response time was actually, they were returning a 200, but
[23:15.040 --> 23:20.640]  we were having like 40 second responses. Was anybody here on Hackaderm when it was like
[23:20.640 --> 23:26.200]  in this weird like hangy stage where you kind of could upload media, but you kind of couldn't,
[23:26.200 --> 23:30.960]  and you're like what the heck is NovaPan's doing, she doesn't know how to operate a service?
[23:30.960 --> 23:35.880]  So this is what we were working on, we were working backwards from these graphs, and it
[23:35.880 --> 23:40.520]  was interesting to see the behavior of Mastodon under these conditions because you very quickly
[23:40.520 --> 23:44.840]  realized that different parts of the user interface were coupled with different parts
[23:44.840 --> 23:49.680]  of the back end, and so, and they all assumed that the entire user interface would work.
[23:49.680 --> 23:53.640]  So if the database started to go slow, maybe you could upload the image, but we couldn't
[23:53.640 --> 23:58.920]  actually write the image key to MySQL, and the UI would just kind of just exist in this
[23:58.920 --> 24:05.080]  in-between stage for like five minutes at a time. It was very interesting behavior.
[24:05.080 --> 24:09.920]  But ultimately, we isolated out the IO on disk, and we were able to determine it was
[24:09.920 --> 24:16.320]  old SDG and old SDH down here in the bottom right. You can see these numbers are closer
[24:16.320 --> 24:23.320]  to 100% for IO on our disks, and this was what was causing those cascading failures.
[24:25.360 --> 24:31.400]  So ultimately, this was a very exciting time. People were joining Mastodon around the clock,
[24:31.400 --> 24:37.240]  and our little group of people that hung out on Discord very quickly turned into a more
[24:37.240 --> 24:43.040]  serious group of people who hung out on Discord, and it was really fascinating to watch friends
[24:43.040 --> 24:47.880]  of the Twitch stream and my partner Quintessence, and there's even people here in the room.
[24:47.880 --> 24:55.360]  Malte and DMA, are you right here in the front? We are now best friends, and we wouldn't necessarily
[24:55.360 --> 25:00.480]  be friends if it wouldn't have been for this whole incident in the first place.
[25:00.520 --> 25:05.320]  So we were definitely working around the clock. I think Malte and DMA would kind of hand the
[25:05.320 --> 25:09.520]  service off to us when we woke up in the morning, and we would work until they woke up the
[25:09.520 --> 25:13.960]  following morning, and it was just this constant game of providing quick summaries of our work
[25:13.960 --> 25:17.480]  and then just like crashing and going to sleep for a few hours and trying to hold down a
[25:17.480 --> 25:23.400]  day job while we dealt with the service. And this is for the most part what it felt like
[25:23.400 --> 25:27.640]  behind the scenes. We had a dedicated channel where we were trying hard to work through
[25:27.680 --> 25:33.840]  things, and I think this is Malte just sent to the image. This is the moment where we finally
[25:33.840 --> 25:39.280]  realized what was going on, and we were starting to isolate the problems on the disks, and I
[25:39.280 --> 25:43.040]  think Malte was just like, okay, we finally found the problem. It's exactly what we thought it was,
[25:43.040 --> 25:48.720]  and everything is fine. This is going to be fine. And meanwhile, we have, you know,
[25:48.720 --> 25:53.600]  main names and technology joining the service, and things are kind of burning down all around us.
[25:54.600 --> 26:01.200]  From the human perspective, I wanted to share two interesting failure modes that we got into as
[26:01.200 --> 26:05.960]  people that I think are just an interesting takeaway for anybody who operates a production
[26:05.960 --> 26:13.960]  service. So the first failure mode was in a state of panic, I tried to just throw more computers
[26:13.960 --> 26:19.280]  at the problem, and so my response was like, we're going to go put more computers in the rack,
[26:19.520 --> 26:24.600]  and I turned on dot for the first time, and gave dot a public IP address, and I think the other
[26:24.600 --> 26:30.680]  big takeaway here was we got very good at doing the wrong things, and I think this is a very,
[26:30.680 --> 26:38.120]  very familiar trap for a lot of the organizations that I work with every day, is there will be
[26:38.120 --> 26:44.200]  some crisis, and they will respond to the crisis by doing something. In our case, it was creating a
[26:44.200 --> 26:49.000]  spreadsheet, and the spreadsheet helped us do some quick math, and that math helped us inform
[26:49.240 --> 26:54.320]  how we needed to provision our different system D services, and then when we changed the system D
[26:54.320 --> 26:59.800]  service, the rule was you needed to go update the spreadsheet, and this was a reaction to a crisis
[26:59.800 --> 27:06.720]  that allowed us to move forward, and then it was very difficult to get out of this situation, so I
[27:06.720 --> 27:11.520]  do think that there's a very interesting takeaway of you get in the habit of doing the wrong thing
[27:11.520 --> 27:17.920]  or doing a bad behavior during a crisis, and that can actually persist in the last longer than the
[27:17.960 --> 27:23.080]  actual incident itself, so we had all the major problems of a normal SRE team, and this was a
[27:23.080 --> 27:31.760]  volunteer open source project to begin with. Okay, so I have a friend in Boulder, his name's Gabe,
[27:31.760 --> 27:37.400]  him and I have known each other for a long time, he's grown very quickly in his career, he's now
[27:37.400 --> 27:44.840]  the Chief Product Officer of Digital Ocean, and Gabe texted me one day and says, hey Nova, so I
[27:44.880 --> 27:52.200]  bought this farm, and I'm trying to upload rooster pictures on your website, and I can't upload my
[27:52.200 --> 27:57.280]  rooster pictures on Hackaderm today. What's going on, and is there anything Digital Ocean can do
[27:57.280 --> 28:03.760]  to help? And so we were in a situation where we were trying to come up with a plan, we had just
[28:03.760 --> 28:10.120]  identified that the disks were the bottleneck and the single cause of our infrastructure problems,
[28:10.720 --> 28:17.560]  and I think this was the first time I kind of realized like, oh, we have 50,000 really smart,
[28:17.560 --> 28:23.960]  well-connected people who can more than obviously help us with our problems, and really the problem
[28:23.960 --> 28:29.720]  is how do we reach out to them, give them access to production, form a plan, and execute on that
[28:29.720 --> 28:34.640]  plan, and it became very obvious that our main problem wasn't necessarily fixing the disks in
[28:34.640 --> 28:39.960]  the basement, it was managing people, and it was organizing people to work on the service and making
[28:39.960 --> 28:45.680]  sure that we were in a good position to accept help from a corporation such as Digital Ocean in
[28:45.680 --> 28:52.920]  the first place. So Malte here, he's going to get embarrassed, but can we just give him a round of
[28:52.920 --> 29:05.120]  applause for this plan? He's smiling, but honestly, like if there was a Malte saved the day kind of
[29:05.120 --> 29:10.840]  moment, like straight up Malte saved the day. He came up with this very interesting engine X pattern
[29:10.840 --> 29:18.480]  that allowed us to effectively move our data off of the bad disks in the basement to the Digital
[29:18.480 --> 29:23.480]  Ocean service without taking the service offline, which you're like, okay, that's pretty cool,
[29:23.480 --> 29:28.880]  you can keep the service up, and you can start to fix the problem at the same time. Additionally,
[29:28.880 --> 29:34.680]  what this did was this actually gave us a means of getting the data out, and everybody who used
[29:34.720 --> 29:41.640]  the service contributed to the data migration. And so what we did is we set up this, who's here
[29:41.640 --> 29:48.080]  familiar with the try files directive in engine X config, a few people, you should, if you get time,
[29:48.080 --> 29:55.120]  go read about try files. This is a fascinating thing that engine X does, and what we were able to do
[29:55.120 --> 30:02.520]  was point media.hackaderm.io on Alice. We were able to point all of the CDN nodes towards Alice,
[30:03.160 --> 30:09.520]  and Alice would first try to resource the file from S3 running in Digital Ocean. If it could find
[30:09.520 --> 30:15.840]  it, it would then return that directly as basically a reverse proxy from S3 to the client, and
[30:15.840 --> 30:22.040]  otherwise it would resource it from the disks locally in the rack. So every time somebody read,
[30:22.040 --> 30:27.600]  whether it was an image or a post or something coming from the rack, it would then persist into
[30:27.640 --> 30:33.960]  S3 on the back end, and we would never have to serve that image ever again from Alice. So this was
[30:33.960 --> 30:39.800]  a clever solution, and it gave us a means to slowly start transferring the data, and every minute
[30:39.800 --> 30:43.720]  we transferred the data was another minute that it was likely going to be served from a cloud
[30:43.720 --> 30:58.840]  provider and not from my really crappy hardware running in my basement. So the disks were so slow,
[30:58.840 --> 31:03.320]  I mean, in my mind, these disks could be personified. They were like, they were beaten,
[31:03.320 --> 31:10.280]  they were tired, they have been through hell and back again, and it took eight days for us to arclone
[31:10.600 --> 31:16.280]  all of the data, which was about two terabytes of data, of Rooster videos and cat pictures and
[31:16.280 --> 31:23.040]  catter-day hashtags and all kinds of mastodon things over to Digital Ocean S3, and this was all
[31:23.040 --> 31:31.920]  courtesy of Gabe, who was like, bro, I just want to upload my Rooster pictures. So as we moved the
[31:31.920 --> 31:38.280]  files out of the basement, it became obvious that running this service in my basement was no longer
[31:38.280 --> 31:44.040]  going to work for us and that enough people had joined that we had reached critical mass. So our
[31:44.040 --> 31:49.560]  next decision was, okay, where do we actually want to move the compute to? And I think we all kind
[31:49.560 --> 31:54.480]  of have been like a little bit traumatized from like the vendor lock-in and the tech industry as
[31:54.480 --> 32:00.080]  it exists today. And so I think looking at Hackaderm, there was a lot of people who were very critical,
[32:00.080 --> 32:06.360]  myself included, of a dependency on various corporations. So we definitely didn't want to
[32:06.360 --> 32:10.960]  just go throw money at Amazon, right? Amazon has enough money. We're good taking our little
[32:10.960 --> 32:14.400]  community and putting it there. And we didn't want to go do the same thing at another cloud
[32:14.400 --> 32:19.600]  provider. So ultimately, we made the decision to go to Hetzner in Germany. Whoo, Hetzner.
[32:19.600 --> 32:30.920]  Another good caveat here is that from a legal perspective, Germany has some of the most restrictive
[32:30.920 --> 32:35.640]  privacy laws. And so this is going to be about the most isolated zone we're going to get in today.
[32:36.040 --> 32:41.240]  And a quick glance and a quick consultation with a lawyer told us that Germany was going to be the
[32:41.240 --> 32:47.120]  safest place to start the service from. So again, our biggest concerns had almost nothing to do
[32:47.120 --> 32:52.360]  with the crappy disks in my basement and almost everything to do with like international privacy
[32:52.360 --> 32:58.120]  law and user data. And we've very quickly found ourselves having discussions about the complications
[32:58.120 --> 33:06.840]  and implications of operating a global service. So here is our most recent diagram of how we
[33:06.840 --> 33:11.840]  kind of set things up. You can see that we had to balance things in my basement with things in
[33:11.840 --> 33:18.280]  Germany. And you can see that we have a set of CDN or point of presence nodes around the world. So
[33:18.280 --> 33:23.040]  it was very exciting for me when I flew across the ocean from Seattle to come here to Brussels,
[33:23.720 --> 33:28.520]  because for the first time our service, since the outage, was actually fast and responsive again
[33:28.520 --> 33:34.480]  because I am now being proxied through another server now that I am here on a different continent.
[33:36.240 --> 33:41.040]  So now what? Okay, so we've reached the point of stability. Our servers are stable. People are
[33:41.040 --> 33:46.440]  able to send their Rooster videos again. And we're still very much not out of the weeds. We still
[33:46.480 --> 33:53.760]  have a lot of concerns we need to deal with. So in general, the top Ruby monolith problems that
[33:53.760 --> 34:00.480]  we have solved to date is sidekick scaling, which if you've ever, who's here has operated sidekick
[34:00.480 --> 34:06.240]  before? It's a Ruby thing, show of hands. It's like a Ruby daemon that you have to specify the
[34:06.240 --> 34:10.840]  amount of threads and concurrent workers at runtime. And mastodon is built on this. So like
[34:10.840 --> 34:15.120]  every time we federate with a server, there's a whole queue that runs in the background that does
[34:15.160 --> 34:20.160]  the federation for us. We've also had to tackle network scaling, and we have a global CDN with
[34:20.160 --> 34:25.520]  reverse nginx proxies that has a cache on the edge so that the more people who look at an image,
[34:25.520 --> 34:30.240]  the more it's served from the cache. And all of those have legal implications. And it's just
[34:30.240 --> 34:34.880]  been a lot of work that we've had to get into to just operate a basic service so that we can all
[34:34.880 --> 34:39.600]  sit here in this room and I can make the joke, please go DDoS my web server on the back end.
[34:40.320 --> 34:47.200]  So here's a graph of our egress data. So the top of the graph here is roughly one terabyte of
[34:47.200 --> 34:54.720]  data per day. So you can see that looks like over on January 26th, we peaked over a terabyte of
[34:54.720 --> 35:01.200]  egress data. So that's honestly from an enterprise and scale perspective, this is no trivial amount
[35:01.200 --> 35:06.080]  of data, right? We're moving a lot of data across the wire and the fact that Hetzner can support
[35:06.080 --> 35:14.240]  us is very nice and seems to be working well for our needs today. Another interesting thing
[35:14.240 --> 35:19.200]  about just federation in general that we've had to kind of learn as a community is there's actually
[35:19.200 --> 35:24.800]  a lot of moderation consequences. And there's a pretty big user data and user privacy risk
[35:24.800 --> 35:30.560]  with operating mastodon. And so I put this sort of diagram together to just illustrate some of
[35:30.560 --> 35:35.280]  the consequences that we've had to deal with. In this case, we have three instances, one friendly,
[35:35.280 --> 35:40.400]  one neutral, and one evil. And even if the friendly instance decided to block the evil
[35:40.400 --> 35:46.880]  instance for whatever reason they deemed to be a cause for that blocking, it's still able for
[35:46.880 --> 35:52.400]  content to get out and to end up federating with another instance. I think what's important about
[35:52.400 --> 35:58.720]  this is this means that we can end up with content that is potentially illegal in the United States
[35:59.520 --> 36:05.600]  or illegal to have without like an 18 and up warning that puts myself, my family, and everybody
[36:05.600 --> 36:11.040]  who works on Hackaderm at risk. And so we've been trying hard to figure out how do we actually
[36:11.040 --> 36:16.160]  manage content and actually get to a point where we can manage this in an effective way. And let
[36:16.160 --> 36:22.880]  me just say I cannot thank the content warning feature on mastodon enough because that actually
[36:22.880 --> 36:26.160]  gives us a lot of insight into the types of things that could potentially be harmful.
[36:26.880 --> 36:33.680]  So ultimately, we had a lot of top non-Ruby monolith problems. So obviously, there was illegal
[36:33.680 --> 36:40.240]  concern. We have a team of moderators working around the clock who just deal with trolls and
[36:40.240 --> 36:45.520]  people who are causing problems and bad actors, and they're having to make judgment calls. And we
[36:45.520 --> 36:50.480]  have to establish rules, and these rules need to be enforced, and we have to respond to people,
[36:50.480 --> 36:55.600]  and people have really good reasons. There's videos out there that are very disruptive,
[36:55.600 --> 36:59.600]  and we have to go respond to them. And it takes a lot of work just to balance that on the back end.
[37:00.240 --> 37:04.480]  And the whole thing is ran by volunteers. And ultimately, where we are right now is we're
[37:04.480 --> 37:11.760]  spending roughly 1000 euro a month in hosting costs alone between the digital ocean bill,
[37:11.760 --> 37:16.800]  the Hetzner bill. We have an email API. So every time you go and you sign up for the service,
[37:16.800 --> 37:22.560]  you have to get an email so we can validate who you are. And all of this is coming from
[37:22.640 --> 37:29.600]  donations as they exist today. Okay. So if you want to learn more about Hackaderm,
[37:29.600 --> 37:34.080]  the community, and how we run things, we have a dedicated community resource. If you want to go
[37:34.080 --> 37:38.400]  grab and check it out, that's where we do things like announce our rules and our policies, and we
[37:38.400 --> 37:46.000]  document how we make moderation decisions in general. So the consequence of all of this is we've
[37:46.000 --> 37:52.400]  decided to found a new foundation called the Nivenly Foundation, which that's very exciting.
[37:57.120 --> 38:02.160]  So the name is just it's just the name of my blog that we turned into a 501c3.
[38:02.800 --> 38:07.360]  And I kind of like most things in my life, I kind of want this foundation to be relatively boring,
[38:07.360 --> 38:13.440]  but this will be the legal entity that will be used to protect Hackaderm and to hopefully
[38:13.520 --> 38:19.040]  fund the process moving forward. So right now the Nivenly Foundation has two projects, one of
[38:19.040 --> 38:23.680]  which I talked about yesterday called Aura, which is a distributed runtime written in Rust,
[38:23.680 --> 38:30.240]  and we also have Hackaderm. This is exciting because we this feels like the 90s. We have an
[38:30.240 --> 38:35.440]  open source service. This isn't an open source project that you can go download. We like legit
[38:35.440 --> 38:40.640]  have an open source service with graphs and people with pagers that we have to go and operate.
[38:41.520 --> 38:46.560]  And so that's an exciting thing that the Nivenly Foundation gets to do. So I want to introduce
[38:46.560 --> 38:51.760]  my wonderful partner who's not here, who is the executive director of the Nivenly Foundation,
[38:51.760 --> 38:58.480]  and also the person that we hopefully didn't just wake up by d-dossing the server. Anyway,
[38:58.480 --> 39:03.520]  she does the majority of the work and she couldn't be here today, but can we just give her a round
[39:03.600 --> 39:15.600]  of applause? Because she is actually the one who gets everything done. So she manages the
[39:15.600 --> 39:20.240]  infrastructure team right now. She's managing the moderator team right now. She even created these
[39:20.240 --> 39:24.640]  teams in the first place because people were freaking out and didn't know what to do. And so
[39:24.640 --> 39:29.520]  she wakes up every morning and deals with everything that Hackaderm throws at her, and I honestly
[39:30.160 --> 39:35.120]  thank her enough for the hard work that she's done. So one of the problems we've had to solve is a
[39:35.120 --> 39:39.760]  governance model for this whole thing. So we now have an open source service. There's legal risks
[39:39.760 --> 39:46.240]  and how are we going to make decisions as a nonprofit. And so we started to look at some of
[39:46.240 --> 39:53.760]  the consequences of modern day social media and some of the consequences of how corporations are
[39:53.760 --> 39:59.040]  navigating different open source spaces. And some of the things I noticed was for the most part,
[39:59.280 --> 40:05.360]  on Twitter especially, communities are very isolated from decisions. Users are detached from
[40:05.360 --> 40:11.600]  the technology and how things are done. And people are usually unable to impact change. So I had
[40:11.600 --> 40:16.800]  gotten into some trouble with Twitter. They banned my account. I wasn't able to talk to anyone. I had
[40:16.800 --> 40:21.680]  no avenue in which I could go and actually communicate with this corporation. And that became
[40:21.680 --> 40:25.360]  very problematic for me because I kind of used Twitter for a lot of things professionally.
[40:25.760 --> 40:33.120]  So what I started to realize was actually corporations usually have more influence and
[40:33.120 --> 40:39.040]  a better standing in the fabric of the economy than just a regular person does. And so as soon
[40:39.040 --> 40:45.520]  as I was able to interface with a corporation, I realized that I was no longer isolated from
[40:45.520 --> 40:51.040]  decisions. And I found that corporations often are not detached from the technology and corporations
[40:51.040 --> 40:55.520]  are in fact able to impact change. And I became obsessed with this idea. And I wrote a whole
[40:55.520 --> 41:00.320]  book about it. And I could, everywhere I looked, I saw this idea that ultimately corporations seem
[41:00.320 --> 41:07.600]  to have more rights than people. And that was very difficult for me to reconcile. I also think
[41:07.600 --> 41:15.200]  that this general observation explains why we see a lot of this on the Fediverse today. I think
[41:15.200 --> 41:19.680]  that there is this culture of cyberbullying and assuming that the people operating servers are
[41:19.680 --> 41:25.040]  inherently evil. And I see a lot of criticism instead of a lot of contribution. And somebody
[41:25.040 --> 41:30.480]  who comes from open source and I've worked on Linux and FreeBSD and Kubernetes, the Go programming
[41:30.480 --> 41:36.400]  language, the Rust programming language, it's very difficult for me not to intuitively walk up
[41:36.400 --> 41:42.160]  to a project and want to contribute. And so I guess this is just my way of saying that Mastodon
[41:42.160 --> 41:47.520]  gives us an opportunity and the Fediverse gives us an opportunity to no longer isolate people
[41:47.600 --> 41:51.840]  from the folks who are operating their services they use every day. And that's very exciting for
[41:51.840 --> 41:57.920]  me. So in our governing model, we want to figure out a way to balance communities and corporations.
[41:57.920 --> 42:02.800]  And this is the hybrid model that I'm hoping will actually be able to create a sustainable
[42:02.800 --> 42:08.480]  governing model for what we're doing. So right now, while we think corporate sponsorships are
[42:08.480 --> 42:14.080]  important, we're actually going to have two forms of non-corporate sponsorship, which are project
[42:14.080 --> 42:18.960]  members that you can achieve that status to simply by rolling up your sleeves and either
[42:18.960 --> 42:24.240]  contributing a project or becoming a contributor to one of our existing projects, or a general
[42:24.240 --> 42:29.600]  member, which is a small opt-in monthly fee that we have a few hundred people paying for
[42:29.600 --> 42:34.880]  right now. And the beauty of this is all general members are going to have a vote in how we do
[42:34.880 --> 42:42.000]  things. So if Hackaderm, the Mastodon server, wanted to, let's say, let a tech company have an
[42:42.000 --> 42:47.520]  account and that became controversial, anybody who makes a monthly donation to the service now
[42:47.520 --> 42:51.040]  is going to be able to have a vote in how we do things. And we're actually going to introduce
[42:51.040 --> 42:57.120]  a concept of open-source democracy. And we're going to be leveraging open W3C protocols
[42:57.120 --> 43:02.000]  to make this happen. And we still have some math to figure out exactly how much this is going to
[43:02.000 --> 43:07.280]  cost. However, this model is all built around the idea of a cooperation, which you see a lot of
[43:07.280 --> 43:12.240]  successful global companies do this and balance the different laws and trade-offs of different
[43:12.240 --> 43:16.960]  economies around the world. So my hope is that this will be slightly more sustainable and break
[43:16.960 --> 43:22.240]  down the sort of barrier between corporations and people because people now have a vote in
[43:22.240 --> 43:28.560]  influence and authority in how we do things. So we're still in very early stages of this. If you
[43:28.560 --> 43:34.320]  want to talk more, I'll be here at Fosdham. If you want to talk about Mastodon. And very specifically,
[43:34.400 --> 43:40.240]  if anybody here has any opinions on open-source democracy or how to build an open-source democratic
[43:40.240 --> 43:45.600]  model such that users can vote, I would love to talk to you. I want to learn as much as I can,
[43:45.600 --> 43:49.760]  and I want to help get Nivenly to a point where we actually have a sustainable model,
[43:49.760 --> 43:54.080]  and maybe we can learn some things from the various policy and legal efforts going on here
[43:54.720 --> 44:00.960]  in Belgium and in the EU. So now what? Now, really, it's just keeping Hackaderm online,
[44:00.960 --> 44:05.920]  which we're about to see if it is. Hopefully it is because I really feel like y'all would have
[44:05.920 --> 44:10.640]  been able to do a lot of damage if I would have been giving this presentation last November.
[44:10.640 --> 44:15.120]  And we just want to work towards a democratic model so that people who use the social media
[44:15.120 --> 44:20.640]  service have a vote and have influence in how that social media service is running so that it becomes
[44:20.640 --> 44:28.400]  everybody's social media service and not my social media service or somebody else's. So thank you
[44:28.400 --> 44:35.120]  to everyone who's been working on the service so far, and thank you to DMA and Malte who are here
[44:35.120 --> 44:39.520]  in the front, and specifically to the infrastructure team who helped us get out of the basement
[44:39.520 --> 44:46.720]  and keep the service online so that we can all have cat pictures and all the wonderful things that
[44:46.720 --> 44:49.120]  come with Mastodon. So thanks, everyone.
[45:00.560 --> 45:04.400]  Cool. And I grabbed a photo. So the test here is going to be to see, I'm going to try to upload
[45:04.400 --> 45:12.480]  the photo during questions, and we'll see how it goes. So here's a public resource. If you want to
[45:12.560 --> 45:16.640]  go check out the graph and see if there was a spike, you can go to grafana.hakaderm.io.
[45:18.320 --> 45:24.480]  And if you want to go to find out links to my slides and a recording of the video in the future,
[45:24.480 --> 45:28.240]  please go to github.com. And thanks again.
[45:42.640 --> 45:52.400]  And I guess we can do questions if anybody has questions. There's one over here.
[45:53.760 --> 45:58.240]  Right here. He's got his hand up.
[45:58.560 --> 46:12.000]  Okay, okay. I'm sorry.
[46:21.600 --> 46:26.320]  Could you start to interrupt, everybody? If you could please leave quietly, we are going to do Q&A
[46:26.320 --> 46:29.920]  right now. So we're just going to have a bit of Q&A. Please leave quietly. Thank you.
[46:31.040 --> 46:37.840]  I'm sorry, Chris. Can you show us the grafana panel? I can't hear you. I'm sorry. Okay, okay, okay.
[46:39.440 --> 46:40.400]  My questions. Thank you.
[46:41.200 --> 46:56.160]  Okay, right now. So the question was, can we see again the grafana table?
[46:56.160 --> 47:04.640]  Sure. Awesome.
[47:07.840 --> 47:21.920]  Grab a photo of this. Whoever did this round of applause.
[47:22.000 --> 47:29.120]  That's awesome. Hi. I was wondering, you were saying you could contribute skills or money to help.
[47:29.120 --> 47:36.320]  What are some ways that we as developers, engineers, SREs can help in the near future
[47:36.320 --> 47:39.920]  with keeping the video? It's a really good question. So the question was,
[47:39.920 --> 47:44.400]  how could we potentially volunteer or help out other than just throwing money at the problem?
[47:45.200 --> 47:50.160]  So the person to talk to is Quintessence. And we have a whole mod team right now that's working
[47:50.240 --> 47:54.800]  on onboarding docs. And I think we have 12 people right now. And these are folks from
[47:54.800 --> 48:00.080]  various tech companies around the world. And we have a Discord. So there's a link in the public
[48:00.080 --> 48:04.480]  resources I put. And there's a section on volunteering. And you can just interface with the team and
[48:04.480 --> 48:20.800]  get plugged in that way. Yeah, of course. Okay. More questions?
[48:25.200 --> 48:32.240]  Yes. You mentioned a thousand euros a month for the hosting. But I was wondering if you had an
[48:32.240 --> 48:39.440]  idea of what your total cost of ownership is now. And if the increase is linear with the increase
[48:39.440 --> 48:47.280]  of users in traffic. Sorry. The total cost of what? You mentioned a thousand euros per month
[48:47.280 --> 48:52.320]  for the hosting. But I guess your cost is much, much higher than that. So I was wondering if you
[48:52.320 --> 48:58.560]  know what your total cost is now monthly. And if it's been a linear increase with the number of
[48:58.640 --> 49:05.280]  users or? So the question is, is the cost of operating the Mastodon server, does it grow
[49:05.280 --> 49:11.040]  linearly with users? And the answer is no. It does increase with users. But I definitely think
[49:11.040 --> 49:15.760]  there's a threshold where you move from a small size to a medium size. And I think the traffic
[49:15.760 --> 49:20.960]  was really the deciding factor from us. So earlier it was just a few servers that we could operate
[49:20.960 --> 49:26.080]  on a small pipe. And now that we have a much larger footprint, we have to pay for a more
[49:26.080 --> 49:32.880]  enterprise and potentially a CDN and DDoS protection here in the future. And so that's
[49:32.880 --> 49:36.400]  grown up quite a bit. And that's probably our biggest cost right now is just the network.
[49:39.760 --> 49:40.800]  Cool. Any other questions?
[49:40.960 --> 49:56.000]  Hey, great talk. Did you evaluate, did you or any of your friends evaluate any other
[49:57.840 --> 50:02.640]  Mastodon compatible solution like Pluroma, Coma or any of that?
[50:03.600 --> 50:09.680]  Say, sorry, say again. Did you or any or when, when, when setting up,
[50:11.360 --> 50:15.520]  when setting up Hackaderm, did you or any of your friends
[50:16.880 --> 50:25.040]  evaluate any of the Mastodon compatible servers like Pluroma, Coma or any of that?
[50:25.680 --> 50:29.840]  So this, this is a really good question. So the question was, when we were setting up Hackaderm,
[50:29.840 --> 50:34.080]  did we look at any of the other Mastodon services like Pluroma or anything else?
[50:34.960 --> 50:40.160]  So the answer is no. And again, like, it's not like there was one day where I woke up and said,
[50:40.160 --> 50:44.160]  I'm going to go build a Mastodon server and I'm going to try to get all of the tech industry to
[50:44.160 --> 50:49.600]  come join us, right? Like I set it up for like me and my friends to just try out and Mastodon was
[50:49.600 --> 50:54.320]  the easiest one to get running on Arch Linux. And that was about the most thought that went
[50:54.320 --> 51:00.240]  into setting up Mastodon originally. And I think that like it had just continued to grow organically.
[51:00.240 --> 51:06.000]  And so like in hindsight, I mean, I like, I think there's opportunity to rewrite parts of Mastodon.
[51:06.000 --> 51:09.760]  I think there's a lot of opportunity to like have alternative dashboards as well.
[51:10.800 --> 51:15.760]  And so I'm not opposed to like operating different services for Hackaderm. I like to think of
[51:15.760 --> 51:20.960]  Hackaderm as a social media service where we just are on Mastodon mostly right now for today.
[51:21.840 --> 51:26.240]  So I don't have any personal experience operating the others, but I suspect that
[51:26.960 --> 51:31.760]  you know, as we move forward, the community might decide to switch over or run a different version
[51:31.760 --> 51:34.640]  or who knows, right? That's that's going to be up to the community now.
[51:37.760 --> 51:43.600]  All right. I have a further question. How fast was your internet speed at home to serve the
[51:43.680 --> 51:50.560]  Mastodon server? Sorry, say again? How fast was your internet speed at home to serve the
[51:50.560 --> 51:57.600]  Mastodon server? So like you showed the stats of your server setup with like 40 gigabits of
[51:58.160 --> 52:04.720]  possible network bandwidth, but how fast was actually the provided bandwidth from your ISP?
[52:04.720 --> 52:08.320]  Yeah. So this is a good question, which is how much bandwidth were we going through at my house?
[52:09.040 --> 52:12.800]  So in there's an official write up of the situation where we have some screenshots of the
[52:12.800 --> 52:20.240]  firewall at the house. And ultimately, we had pushed I think one terabyte was our busiest day
[52:20.240 --> 52:26.080]  in the middle of November over the ISP. So I had two connections, one of which was symmetrical 1G
[52:26.080 --> 52:30.480]  up and down that we were able to use and we like it maxed out our pipeline. We were like we were
[52:30.480 --> 52:41.120]  being startled by the ISP at one time. Yeah. Yeah. Thank you.