[00:00.000 --> 00:09.760] So, hello everyone, thanks for your patience. [00:09.760 --> 00:13.920] We're going to start with a slight delay, but I think it will be all right in the end. [00:13.920 --> 00:21.680] Can I please welcome to the stage Andreas Scherbam, who's going to talk about tour [00:21.680 --> 00:26.240] the data types. [00:26.240 --> 00:30.880] Okay, thank you, good morning. [00:30.880 --> 00:34.400] I hope you had a good time yesterday in Brussels. [00:34.400 --> 00:36.400] How was the restaurant? [00:36.400 --> 00:42.440] Well, we got some food. [00:42.440 --> 00:44.240] This is not switching. [00:44.240 --> 00:50.560] So, I have the pleasure to talk to you today about data types in Postgres. [00:50.560 --> 00:56.160] I actually have two talks about this, this is a I would say basic talk about regular [00:56.160 --> 00:57.560] data types we have. [00:57.560 --> 01:02.240] I also have an advanced data types talk, but let's focus on this one. [01:02.240 --> 01:03.760] So my name is Andreas. [01:03.760 --> 01:07.520] I work with Postgres since about 1997, 1998. [01:07.520 --> 01:10.400] I'm one of the founding members of Postgres Europe. [01:10.400 --> 01:16.400] We run conferences like Post-MPG Day on Friday here, Postgres Europe, which is in Prague [01:16.400 --> 01:23.240] this year in December, PostgresConf Germany, Belgium, PostgresConf Malta in April, so if [01:23.240 --> 01:26.960] you're interested in a couple more. [01:26.960 --> 01:31.320] Currently, I work for a company called Adjust in Berlin. [01:31.320 --> 01:36.480] We have about 1000 Postgres databases running, everything from a couple of gigabytes up to [01:36.480 --> 01:40.480] like 30 terabytes in Postgres. [01:40.480 --> 01:46.800] Okay, if you already know everything about data types in Postgres, if you're regularly [01:46.800 --> 01:50.880] reading hackers mailing lists, this is maybe not a right talk for you. [01:50.880 --> 01:58.760] I don't want to occupy your time, but otherwise have a seat and we start. [01:58.760 --> 02:05.200] So quick poll, how many data types do we have in Postgres? [02:05.200 --> 02:06.200] Any idea? [02:06.200 --> 02:07.200] 40. [02:07.200 --> 02:08.200] 40? [02:08.200 --> 02:10.200] Is this 14 or 40? [02:10.200 --> 02:11.200] 40. [02:11.200 --> 02:12.200] 40. [02:12.200 --> 02:13.200] Anyone else? [02:13.200 --> 02:18.200] I see you're still sleeping. [02:18.200 --> 02:26.880] And we talk about each and everyone today. [02:26.880 --> 02:29.480] No. [02:29.480 --> 02:33.800] So first of all, we have to exclude a number of data types because every time you create [02:33.800 --> 02:39.440] a table in Postgres, you also create an according data type, which matches the structure of [02:39.440 --> 02:40.440] the table. [02:40.440 --> 02:42.760] So we don't want this. [02:42.760 --> 02:47.240] If you look at regular data types, you have about 80, depending on which Postgres version [02:47.240 --> 02:50.960] you are, it's still more than you expected. [02:50.960 --> 02:57.640] There are things like Boolean, you know this one, or text, or timestamps, and other things [02:57.640 --> 03:05.000] like Trigger and Void and Cardinal Lambo, which you never heard about and you never need. [03:05.000 --> 03:11.120] And we focus on mostly the ones I marked highlighted here. [03:11.120 --> 03:17.240] At any time, you can also go and check the Postgres documentation on postgres.org, and [03:17.240 --> 03:21.000] it has a very long list of all the data types here in Postgres. [03:21.000 --> 03:25.720] The basic ones, the advanced ones, all the ones I listed in these slides before, it's [03:25.720 --> 03:28.720] all there. [03:28.720 --> 03:32.120] OK. [03:32.120 --> 03:36.120] How many different data types are you using in your application? [03:36.120 --> 03:38.120] Three. [03:38.120 --> 03:39.120] Which ones? [03:39.120 --> 03:46.120] What about numbers? [03:46.120 --> 03:51.120] So everything is watch out. [03:51.120 --> 03:52.120] OK. [03:52.120 --> 03:53.120] Anyone else? [03:53.120 --> 03:55.120] I'm a doctor. [03:55.120 --> 03:57.120] And you're sure it's J. [03:57.120 --> 03:58.120] J, isn't it? [03:58.120 --> 03:59.120] Yeah. [03:59.120 --> 04:03.120] I mean, yeah, this fits you. [04:03.120 --> 04:04.120] Sorry. [04:04.120 --> 04:10.120] You need to turn the volume off a bit. [04:10.120 --> 04:11.120] OK. [04:11.120 --> 04:15.120] No, that's too much. [04:15.120 --> 04:16.120] Better? [04:16.120 --> 04:17.120] Good. [04:17.120 --> 04:25.120] We are going over these basic types, so numeric, so when you come back, you can rewrite your [04:25.120 --> 04:28.120] application and finally you start using numbers. [04:28.120 --> 04:29.120] Text types. [04:29.120 --> 04:38.120] Maybe XML, if anyone of you is still using it, JSON, Booleans, and a couple more. [04:38.120 --> 04:40.120] So text types. [04:40.120 --> 04:43.120] We basically have one text type in Postgres. [04:43.120 --> 04:45.120] Under the hood, it's all the same. [04:45.120 --> 04:49.120] So we have watcher and char and text. [04:49.120 --> 04:53.120] Text types in Postgres by default are case-sensitive. [04:53.120 --> 04:57.120] So if you want to compare two texts, it's always case-sensitive. [04:57.120 --> 05:03.120] If you want to compare it case-insensitive, you have to use something like lower or higher [05:03.120 --> 05:07.120] to make a string uppercase or lowercase. [05:07.120 --> 05:12.120] A string in Postgres can hold up to a gigabyte of text, roughly. [05:12.120 --> 05:14.120] It cannot hold binary data. [05:14.120 --> 05:19.120] So if you want to store images in text, that's not going to work. [05:19.120 --> 05:26.120] If you specify your lengths, like this large N here, this is where Postgres stores up to [05:26.120 --> 05:36.120] these number of bytes or characters in a text data type, excluding white spaces at the end. [05:36.120 --> 05:40.120] A char by default is only one byte. [05:40.120 --> 05:43.120] That's most likely not what you want. [05:43.120 --> 05:47.120] So for char, you always want to specify the lengths. [05:47.120 --> 05:50.120] And then, of course, if you say text, it's just a text. [05:50.120 --> 05:52.120] You cannot specify your lengths here. [05:52.120 --> 05:55.120] Hi, everyone. [05:55.120 --> 05:59.120] How does char and varchar differentiate? [05:59.120 --> 06:02.120] Mostly how they handle white spaces. [06:02.120 --> 06:09.120] So we have a varchar, one and five here and ten here. [06:09.120 --> 06:17.120] And I cast five white spaces to both varchar one, five and ten, and char one, five and ten. [06:17.120 --> 06:22.120] And as you can see, we get different lengths here. [06:22.120 --> 06:29.120] So our five white spaces here, if we cast them to varchar ten, we only get five at the end [06:29.120 --> 06:33.120] because we only have five white spaces in it. [06:33.120 --> 06:40.120] What char is doing, char will fill up the entire string to the lengths we specified. [06:40.120 --> 06:48.120] If you say char ten, we get a string back with ten characters in it. [06:48.120 --> 06:52.120] That's mostly the difference for char and varchar. [06:52.120 --> 06:57.120] How it's handled, if you specify your lengths in char, it will actually give you that many bytes back, [06:57.120 --> 06:59.120] including white spaces. [06:59.120 --> 07:04.120] If you use a varchar, it will only give you the string back, excluding white spaces at the end. [07:04.120 --> 07:12.120] So it will cut off white spaces. [07:12.120 --> 07:18.120] We have a string in Postgres which is called a page. By default, it's eight kilobyte. [07:18.120 --> 07:22.120] That's where we store all the data in it, so we have one page header. [07:22.120 --> 07:34.120] Then we store rows in it, and a row can be anything from just one column up to 1,000, 1,500 columns. [07:34.120 --> 07:40.120] As I said, by default, it's eight kilobyte. You can increase it, but almost no one is doing this. [07:40.120 --> 07:46.120] How does Postgres store a text? A text can be up to one gigabyte. [07:46.120 --> 07:50.120] So we cannot store one gigabyte in eight kilobyte. [07:50.120 --> 07:53.120] I mean, you can if you compress, but usually it doesn't work. [07:53.120 --> 08:08.120] Any data type in Postgres, which has variable lengths, is only a pointer into what we call a toast table. [08:08.120 --> 08:14.120] In our regular table, we have a four byte pointer, which is a pointer into the toast table. [08:14.120 --> 08:19.120] And in the toast table, we have as many rows as we need to store this gigabyte of text. [08:19.120 --> 08:25.120] So it's four bytes here, and as many rows as we need here. [08:25.120 --> 08:32.120] Which brings me to my one question. Why are so many people using char255? [08:32.120 --> 08:42.120] Just two weeks ago, I've seen a customer who does this all over the place, street names, customer names, everything is char255. [08:42.120 --> 08:53.120] Does it make any sense? Well, there are certain databases where this might make sense, but it doesn't make any sense in Postgres. [08:53.120 --> 09:11.120] In Postgres, it doesn't make a difference if you use 10 bytes, 200 bytes, one kilobyte. You always end up, almost always end up with the text pointer into your toast table. [09:11.120 --> 09:24.120] There are other databases where it makes sense, like if you look at one of the competition in the market, they can only have like 255 bytes in an index. [09:24.120 --> 09:31.120] Or which is 255 characters, if you use UTF-8 up to 700 something bytes. [09:31.120 --> 09:37.120] It's what they can use in the index, so that might make sense to use 255 as a char. [09:37.120 --> 09:46.120] But every time you see this construct in Postgres where someone says, char255, go and question, why is the reasoning behind this? [09:46.120 --> 09:57.120] Why did someone say 255, not 200, not one kilobyte? Technically, it doesn't make sense. [09:57.120 --> 10:02.120] Numeric types. Pay attention, please. [10:02.120 --> 10:10.120] So, we have integers, we have floating point numbers, we have numeric, and we have sequences. [10:10.120 --> 10:23.120] Integers, we have small int storing two bytes, we have integers, regular integers by default, it's four bytes, anything from minus 2 billion to plus 2 billion. [10:23.120 --> 10:32.120] And then we have big int using 8 bytes, this is 9, no idea how much it is. [10:32.120 --> 10:51.120] You might get some small problems if you store something, okay, small int integer, small int integer, because the compiler will actually go and say, okay, we need to have some space because we need to align the next integer on a 32-bit boundary or 64-bit boundary depending on your operating system. [10:51.120 --> 10:55.120] So, what you really want to do is have all of your big data types. [10:55.120 --> 11:09.120] First, in your table, there was a thing which is like text and big int, and then start with your regular 4-byte types and then 2-byte types at the end, so you can compress the table a little bit more. [11:09.120 --> 11:15.120] It doesn't really bring you much if you only have a small table, couple million rows. [11:15.120 --> 11:27.120] But if you're talking about billion rows, it's actually quite a big saving you get if you can compress and save like 2 bytes, 4 bytes per row. [11:27.120 --> 11:34.120] Then we have floating point numbers as wheel and as double precision with 4 and 8 bytes. [11:34.120 --> 11:46.120] Keep in mind, a floating point number, even though Postgres might show you the accurate number you have, is always rounded internally. It's always a base and an exponent. [11:46.120 --> 11:50.120] I'll give you an example here. [11:50.120 --> 12:02.120] We have 6 digits here, 100,001, and you see if I return this, Postgres still shows me the 100,001, even though internally it's already rounded. [12:02.120 --> 12:08.120] If I expand this to 7 digits, you see I get a rounded number here. [12:08.120 --> 12:18.120] The 4 bytes can no longer store the precision required to store this one at the end, so you only get 1 million here. [12:18.120 --> 12:30.120] The same is true if I have my 100,001.5 as a 4-byte floating point number, Postgres starts rounding it internally. [12:30.120 --> 12:41.120] So if you want to store anything like money or data where you really need precision, please do not use floating point numbers. [12:41.120 --> 12:48.120] But, same example for floating point with double precision, so 8 bytes. [12:48.120 --> 12:55.120] If I have 15 digits here, it still looks okay, but once you start expanding the number a little bit, [12:55.120 --> 13:03.120] or add decimal digits here, you see it starts rounding the number. [13:03.120 --> 13:16.120] So if you really want to store floating point numbers, but you cannot use integers, what else can we use? [13:16.120 --> 13:20.120] This one comes later. [13:20.120 --> 13:32.120] You can use numeric. There's a data tab called numeric, which stores up to 1000 digits of precision in Postgres. [13:32.120 --> 13:38.120] So, basically any number you need to store, you can store in numeric. [13:38.120 --> 13:47.120] Keep in mind there is no real hardware support, like if you add two integers, in the end the CPU will load one integer into one register, [13:47.120 --> 13:53.120] and the other integer into another register, and just use one operation internally, everything is fast. [13:53.120 --> 13:57.120] That's not how it works with numeric, so it will be a little bit slower. [13:57.120 --> 14:05.120] Not much, but if you have to use it all the time, you might see it. [14:05.120 --> 14:10.120] There's also a tab called money in Postgres. [14:10.120 --> 14:15.120] Don't use it. Never. [14:15.120 --> 14:21.120] Internally it's a big int, so we have the same position here, 8 bytes. [14:21.120 --> 14:25.120] However, you only have one currency. [14:25.120 --> 14:35.120] Whatever you assign as LC monetary in your environment, this is a currency Postgres that is using to show you this number. [14:35.120 --> 14:42.120] So you cannot say I want to store two currencies in my database, you cannot store like an exchange rate. [14:42.120 --> 14:45.120] All of this is not working. [14:45.120 --> 14:52.120] Money was deprecated, I think, two times, three times, something like this, and there's always one user who comes back, [14:52.120 --> 15:04.120] oh, I really need it, so it's still around, but please don't use it. [15:04.120 --> 15:13.120] Is anyone from India here? Ever met him? [15:13.120 --> 15:20.120] You know the name? [15:20.120 --> 15:25.120] Would be surprised if you know the name. [15:25.120 --> 15:29.120] Anyone knows this game? [15:29.120 --> 15:36.120] How many, the story is that he did something for his king and the king asked, [15:36.120 --> 15:38.120] okay, what can I give you in return? [15:38.120 --> 15:44.120] And he said, okay, give me some waste grains, one on the first chess field and then doubly number. [15:44.120 --> 15:52.120] How many waste grains are we talking about? [15:52.120 --> 15:56.120] Yeah, it's hidden. [15:56.120 --> 15:59.120] So we can actually use numeric for this. [15:59.120 --> 16:04.120] So it's 2 to the power of 64 minus 1. [16:04.120 --> 16:07.120] So we have 64 fields, we start with 1, so it's minus 1. [16:07.120 --> 16:15.120] If we use floating point for this double precision here, you see, we don't get an exact number, it's way too big for floating point. [16:15.120 --> 16:19.120] If we use numeric, it's just 20 digits. [16:19.120 --> 16:30.120] If we have 980 left, we can store much more in numeric than just this number. [16:30.120 --> 16:40.120] I did the math at some point, and it's like 1,000 times the speed we are currently using or producing on Earth per year. [16:40.120 --> 16:50.120] So 1,000 years of waste production on Earth will suffer to solve this problem. [16:50.120 --> 16:54.120] Okay, we also have sequences in post-course. [16:54.120 --> 17:01.120] Internally, these are just integers, which are used as a default type in a table. [17:01.120 --> 17:14.120] Small serial, which gives you 2 bytes, 32k plus minus, and we have regular serial data type, which is 4 bytes, from 0 to or from 1 to 2 billion. [17:14.120 --> 17:19.120] And then we have big end for everything, which needs larger numbers. [17:19.120 --> 17:30.120] If you create a data type or a table with a data type serial, what post-course will do internally, it will create this data type in a table. [17:30.120 --> 17:34.120] So if you look into the table, it's actually an integer, and 4, and 8. [17:34.120 --> 17:41.120] It will create a sequence for you, and it will make this sequence a default value for this column. [17:41.120 --> 17:49.120] So every time you insert something into this table, and you don't specify this column, post-course will use the next value from a sequence. [17:49.120 --> 17:54.120] If you specify something for this column, it will not use this sequence. [17:54.120 --> 18:02.120] So if this is your primary key, and you're mixing values you're inserting, and values from a sequence, at some point you will have collisions. [18:02.120 --> 18:05.120] Please don't do this. [18:05.120 --> 18:13.120] Although sequences are not transactional, if you roll back a transaction, a sequence will not roll back. [18:13.120 --> 18:17.120] Sequences just mean to provide you a unique number. [18:17.120 --> 18:23.120] That's all. [18:23.120 --> 18:26.120] So we can ask the current value of the sequence. [18:26.120 --> 18:30.120] Select current ball, my sequence name here. [18:30.120 --> 18:37.120] This will only work if you already used the sequence in a current session, like you did an insert into a table. [18:37.120 --> 18:45.120] This will tell you what is the last value this current session I have inserted into this table. [18:45.120 --> 18:53.120] So if you have five different sessions inserting data, it will always show you what you, the current session, inserted. [18:53.120 --> 18:57.120] You can also ask for the next value by using the next wall. [18:57.120 --> 19:02.120] Keep in mind, it will not roll back if you roll back the transaction. [19:02.120 --> 19:08.120] It will only ever move forward. [19:08.120 --> 19:16.120] You can also set a sequence by just saying set wall and then specify your key, what is my new value I want to have. [19:16.120 --> 19:24.120] If you do this on a table, where you use the sequence on a table as primary key, and you set it to a previous value, [19:24.120 --> 19:34.120] you may run into collisions again, because it will reuse the same values again. [19:34.120 --> 19:40.120] Okay, a sequence internally in POSQUENCE is just another object, another table. [19:40.120 --> 19:44.120] So what you can say, select star for my sequence. [19:44.120 --> 19:47.120] It will show you all the data about the sequence. [19:47.120 --> 19:51.120] So what's the sequence name? What's the last value we used here? [19:51.120 --> 19:55.120] What's the minimum maximum value with the sequence cycle around? [19:55.120 --> 19:58.120] So when it comes to an end, will it start again? [19:58.120 --> 20:09.120] Will it wrap over or not? So by default, it will not wrap over, because otherwise you will end up with the same numbers again. [20:09.120 --> 20:12.120] Just another object. [20:12.120 --> 20:21.120] Good. Any questions about numeric types? [20:21.120 --> 20:34.120] What is this? [20:34.120 --> 20:35.120] South pole. [20:35.120 --> 20:36.120] Come again? [20:36.120 --> 20:37.120] South pole. [20:37.120 --> 20:42.120] South pole, sounds good. How did you figure it out? [20:42.120 --> 20:45.120] It looks like an apartment. [20:45.120 --> 20:46.120] That's one way. [20:46.120 --> 20:55.120] Yeah, it tells you here, 90 degrees south latitude. This is south pole. [20:55.120 --> 21:02.120] What time is it there right now? [21:02.120 --> 21:06.120] That's one valid answer, all the times. [21:06.120 --> 21:10.120] Let's see if we can answer this question. [21:10.120 --> 21:19.120] So we have a couple of date and time types in Postgres. So we have time stamp without time zone and time stamp with time zone. [21:19.120 --> 21:24.120] Depending on your use case, make a good choice which one you use. [21:24.120 --> 21:31.120] Or any time you just want to store time, date and time, you want to use the time stamp without time zone. [21:31.120 --> 21:43.120] If you work with multiple time zones, we will use this time zone. Internally Postgres will store the time as UTC time, but it will make sure to handle the transformation for you. [21:43.120 --> 21:46.120] We will see a couple examples. [21:46.120 --> 21:54.120] We also have time without time zone and time with time zone. [21:54.120 --> 21:57.120] And we have date and interval. [21:57.120 --> 22:01.120] Postgres will not know about any kind of leap seconds. [22:01.120 --> 22:06.120] So occasionally we have a year which is a second longer with the leap seconds. [22:06.120 --> 22:11.120] I think they're planning one because it's going slower so we use it. [22:11.120 --> 22:13.120] I don't know how this will go. [22:13.120 --> 22:21.120] But anyway, Postgres doesn't know about leap seconds because the time zone database doesn't know about it. [22:21.120 --> 22:33.120] So we have something which looks like a date. I cast it to a time stamp and we see that Postgres makes it this date midnight. [22:33.120 --> 22:37.120] So if you don't specify your time, it's always midnight. [22:37.120 --> 22:51.120] We can also say, okay January 5th, let's format the Americans using months first. We also see it's midnight here. [22:51.120 --> 22:55.120] We can also specify a time zone. [22:55.120 --> 23:02.120] So I have here 325 afternoon in UTC time zone. [23:02.120 --> 23:11.120] I cast this to a time stamp. Why does it say 525? Any idea? [23:11.120 --> 23:17.120] Postgres will always return times in my current time zone which is set on my system. [23:17.120 --> 23:21.120] So many companies use servers which are set to UTC. [23:21.120 --> 23:30.120] My laptop which I used for this example is set to Berlin time which in summer is 2 hours before UTC. [23:30.120 --> 23:38.120] So we see I specify a time zone as UTC here and in August I get 525 back. [23:38.120 --> 23:43.120] So we transform the time I specify to my local time. [23:43.120 --> 23:54.120] Same in winter. This is December here, 1023 UTC, I get 1123 back as my time. [23:54.120 --> 24:01.120] This can be very convenient but also very inconvenient depending on what you're working on. [24:01.120 --> 24:08.120] We can also say any time zone as any time zone your computer knows about. [24:08.120 --> 24:16.120] So we have a time zone database in computers and we can use any name from there to specify as a time zone. [24:16.120 --> 24:22.120] We can also say just plus or minus the number for the time zone. [24:22.120 --> 24:28.120] Obviously if you specify a time zone as a name Postgres knows about summer time. [24:28.120 --> 24:34.120] If you just say plus 4 it never knows about summer time because you just said plus 4. [24:34.120 --> 24:42.120] The example I'm using here is because at some point Russia just said ok we are no longer doing this dance with winter and summer time. [24:42.120 --> 24:52.120] We just stopped at some point and said ok here around it's one time zone. [24:52.120 --> 24:57.120] In Postgres time is stopped if you start a transaction. [24:57.120 --> 25:04.120] So we start a transaction here. If you say select now multiple times you always get the same time back. [25:04.120 --> 25:06.120] That's the transaction time. [25:06.120 --> 25:15.120] So I'm using this here now I'm setting my time zone to Europe Moscow what changed my output changed. [25:15.120 --> 25:30.120] This one is my Berlin time and this one is my Moscow time two time zones difference. [25:30.120 --> 25:38.120] What I can also say any timestamp I have in Postgres at a specific time zone. [25:38.120 --> 25:46.120] So previously everything here was always returned in my own time zone which is set on my computer. [25:46.120 --> 25:58.120] I could say change everything to this specific time zone or I can just say format one specific timestamp I have at a given time zone. [25:58.120 --> 26:08.120] And then of course you can say ok select now at Berlin comma now at New York comma now at Buenos Aires in one single query. [26:08.120 --> 26:19.120] So we can return as many time zones or timestamps you have at as many different time zones you have. [26:19.120 --> 26:28.120] A couple more examples so Postgres does not know about leap seconds but it knows about leap years. [26:28.120 --> 26:35.120] So we have two thousand fifths of January minus two thousand first of January. [26:35.120 --> 26:41.120] This is four days difference that's an interval now of four days. [26:41.120 --> 26:52.120] We have two thousand first of January minus fourth of January it's minus three days. [26:52.120 --> 27:06.120] We get an interval back so if we have timestamps Postgres will do all the calculation for us between the two timestamps including years dates including leap dates everything. [27:06.120 --> 27:12.120] And we can use this to figure out of a specific year is a leap year. [27:12.120 --> 27:24.120] Two thousand twenty eight of February plus one day interval gives me two thousand twenty ninths of February. [27:24.120 --> 27:36.120] Of course if I do this in two thousand one I only get three hundred sixty five days back in two thousand I get two hundred sixty six days back it's a leap year. [27:36.120 --> 27:45.120] It takes all of this into account which brings me back to my initial question what time is it at South Pole. [27:45.120 --> 27:55.120] This station at South Pole is operated by the Americans it's called St. Amundsen station which is however supplied from New Zealand. [27:55.120 --> 28:05.120] That's the closest airport they have for all the planes they're operating and by now we know how to figure out the time in New Zealand right. [28:05.120 --> 28:09.120] Select now at time zone New Zealand. [28:09.120 --> 28:16.120] That would be cheating because the operating system actually knows about Antarctica. [28:16.120 --> 28:27.120] So every station which humans have on Antarctica got its own time zone which is usually aligned to the country operating the station. [28:27.120 --> 28:37.120] Because this one is well operated by Americans but supplied from New Zealand conveniently they're using the same time zone as New Zealand. [28:37.120 --> 28:47.120] Select now at time zone Antarctica South Pole gives you the time at this South Pole. [28:47.120 --> 28:51.120] Anyone here using XML. [28:51.120 --> 28:56.120] Fine let's. [28:56.120 --> 29:01.120] What's your use case. [29:01.120 --> 29:05.120] Come again. [29:05.120 --> 29:07.120] Some data feeds. [29:07.120 --> 29:14.120] Well there are some basic support in Postgres for XML. [29:14.120 --> 29:16.120] You can set encoding. [29:16.120 --> 29:20.120] You cannot search directly in XML. [29:20.120 --> 29:24.120] So Postgres stores the XML as it is but you cannot really search in it. [29:24.120 --> 29:25.120] There's no support for this. [29:25.120 --> 29:30.120] What you could do is cast it to text and then try to make some sense out of it. [29:30.120 --> 29:32.120] But that's about it. [29:32.120 --> 29:37.120] We have two different types we can say it's a document type here. [29:37.120 --> 29:44.120] So I specify a text and I tell Postgres okay pass this as a document in XML. [29:44.120 --> 29:50.120] It will fail if it's not proper XML and tell me okay this doesn't work. [29:50.120 --> 29:58.120] So I get back in XML document here with all the formatting and the entire tree. [29:58.120 --> 30:04.120] I can also say I don't want to have an entire document I just want to have a piece of XML content. [30:04.120 --> 30:07.120] That's working with XML pass and content. [30:07.120 --> 30:10.120] But then again I cannot search in it. [30:10.120 --> 30:14.120] There's no support for it. [30:14.120 --> 30:20.120] I can serialize this and unsealize this if you want to store some larger XML documents in Postgres. [30:20.120 --> 30:22.120] That's working. [30:22.120 --> 30:27.120] So this one is a text now no longer an XML document. [30:27.120 --> 30:34.120] So if you really want to search in something and say okay does this specific text appear in my XML document. [30:34.120 --> 30:40.120] Go and serialize it and then maybe apply some like or way gaps on it. [30:40.120 --> 30:49.120] But then again if you search for name you will find plenty of this. [30:49.120 --> 30:56.120] Reality is if you want to store something like XML go for JSON. [30:56.120 --> 31:00.120] We have two different JSON data types in Postgres. [31:00.120 --> 31:04.120] One the older one is called JSON. [31:04.120 --> 31:08.120] JSON as it is stores the data as it comes in. [31:08.120 --> 31:12.120] So it basically takes the entire JSON block stores it as it is. [31:12.120 --> 31:18.120] And then later on if you work on the data type then it does all the parsing. [31:18.120 --> 31:24.120] So at insertion time it doesn't really know if your JSON object is valid or not. [31:24.120 --> 31:27.120] It only figures out when you try to operate on it. [31:27.120 --> 31:31.120] And then at some point there came a better JSON type around. [31:31.120 --> 31:33.120] It's called JSONB. [31:33.120 --> 31:42.120] I think one of the big mistakes this project made was not to duplicate JSON and say okay the new one is the JSON type. [31:42.120 --> 31:46.120] So I called it JSONB and now we have to live with it. [31:46.120 --> 31:54.120] The new one is better in almost every way so it does parse the JSON when you insert into the database. [31:54.120 --> 31:58.120] So when you create this type you can create an index on it. [31:58.120 --> 32:02.120] It already tells you on creation time if it's valid or not. [32:02.120 --> 32:10.120] And then you have a decomposed object as JSON object in your database which you can work on. [32:10.120 --> 32:17.120] All of this is using regular transactions like some other databases using JSON. [32:17.120 --> 32:19.120] You don't get really JSON support on this. [32:19.120 --> 32:31.120] In Postgres it's one more data type we use supporting transactions, everything supporting replication. [32:31.120 --> 32:33.120] How do we use this? [32:33.120 --> 32:36.120] This is a regular text here. [32:36.120 --> 32:42.120] I need to quote my text in JSON with a double quotes to make it a JSON text. [32:42.120 --> 32:47.120] Then I have my single quotes for Postgres telling it this is a string. [32:47.120 --> 33:01.120] I cast this string to JSON and I get my JSON text back. [33:01.120 --> 33:05.120] We can use arrays and lists and hashes in JSON. [33:05.120 --> 33:07.120] So here we have an array. [33:07.120 --> 33:12.120] As you can see we have several text types, JSON text types in the array. [33:12.120 --> 33:20.120] Then we create this array and then we make this a Postgres string. [33:20.120 --> 33:25.120] That's a JSON type and we see okay I still have my JSON text types here. [33:25.120 --> 33:33.120] My JSON array for Postgres all of this one string parsed as a JSON type. [33:33.120 --> 33:35.120] The same works with key value pairs. [33:35.120 --> 33:43.120] So we have key and value here as JSON, make it a hash and then make it a string in Postgres, [33:43.120 --> 33:50.120] cast this string to JSON B. [33:50.120 --> 33:55.120] Of course, since we passed the JSON array, Postgres knows what's in there. [33:55.120 --> 33:59.120] We can say okay I only want to have this key number two. [33:59.120 --> 34:09.120] Yeah, my text is DEF, so I can access whatever is in my JSON value. [34:09.120 --> 34:15.120] I can ask if the white element is in the list on the left. [34:15.120 --> 34:29.120] GHA is actually in there, so I get a tool back. It says yes, no question in Postgres. [34:29.120 --> 34:31.120] Same for the keys. [34:31.120 --> 34:32.120] So here's the value. [34:32.120 --> 34:34.120] Is this value in this list? [34:34.120 --> 34:36.120] Is this key in this list? [34:36.120 --> 34:41.120] So any of the questions you usually have from an application using JSON, [34:41.120 --> 34:45.120] like is the specific value there? Does the specific field exist? [34:45.120 --> 34:49.120] All of this you can answer directly in Postgres. [34:49.120 --> 34:52.120] You don't have to extract the entire data type, [34:52.120 --> 34:56.120] transfer it into your application and then try to make sense of it. [34:56.120 --> 35:03.120] All of this can be answered directly in Postgres in one query. [35:03.120 --> 35:08.120] On top of it, because Postgres already knows what is in the JSON, [35:08.120 --> 35:10.120] we can have an index support on this. [35:10.120 --> 35:17.120] This only works on JSON B, by the way, not on the old JSON type. [35:17.120 --> 35:22.120] So what I'm doing here is I have an index on my table. [35:22.120 --> 35:24.120] I have to use the GIN type. [35:24.120 --> 35:31.120] And I only create this index on one specific field in my JSON object. [35:31.120 --> 35:34.120] So this is not indexing the entire JSON field. [35:34.120 --> 35:36.120] It's just one field. [35:36.120 --> 35:38.120] My object is a name field. [35:38.120 --> 35:43.120] And then I can use this index to answer queries. [35:43.120 --> 35:47.120] So if you have this typical web application, [35:47.120 --> 35:52.120] we are storing 20, 50, 100 JSON values in one object, [35:52.120 --> 35:55.120] you don't want to index all of them. [35:55.120 --> 35:59.120] You maybe just want to have an index on one or two of the fields. [35:59.120 --> 36:05.120] This is possible in Postgres. [36:05.120 --> 36:08.120] Booleans. [36:08.120 --> 36:10.120] We have a web Boolean type in Postgres. [36:10.120 --> 36:13.120] So we can say two false. [36:13.120 --> 36:16.120] We have a couple of alternatives you can specify. [36:16.120 --> 36:20.120] So for two, you can say it's one or two or yes. [36:20.120 --> 36:25.120] For false, you can say it's false or no or n. [36:25.120 --> 36:30.120] Can you please be a little bit silent back there? [36:30.120 --> 36:32.120] In the end, what Postgres does, [36:32.120 --> 36:37.120] it transforms all of these values into the two or false value [36:37.120 --> 36:41.120] for the Boolean. [36:41.120 --> 36:44.120] We have a couple of other databases in the market [36:44.120 --> 36:46.120] which say, yeah, we have a Boolean. [36:46.120 --> 36:49.120] Under the root, it's maybe just an integer. [36:49.120 --> 36:52.120] And then you can insert not only zero and one, [36:52.120 --> 36:54.120] but also five, eight, fifteen, [36:54.120 --> 36:57.120] and then try to make sense out of this. [36:57.120 --> 37:05.120] Here, we only get two and false back. [37:05.120 --> 37:10.120] So, two cast algebraian, we see it's getting two. [37:10.120 --> 37:17.120] And false, we get an f back. [37:17.120 --> 37:20.120] Again, we can use this in queries. [37:20.120 --> 37:23.120] So I have one table here. [37:23.120 --> 37:28.120] Let's say that's the table where you store log file messages. [37:28.120 --> 37:31.120] And yeah, we have some content here, [37:31.120 --> 37:33.120] and we have one field which tells me, [37:33.120 --> 37:36.120] okay, this log entry is an error. [37:36.120 --> 37:40.120] Usually, we don't have a lot of errors, [37:40.120 --> 37:43.120] but this is entries in our table [37:43.120 --> 37:45.120] where we are mostly interested in. [37:45.120 --> 37:48.120] So maybe one, two percent of the queries [37:48.120 --> 37:53.120] of the entries in this table will have this flag set. [37:53.120 --> 37:58.120] What I'm doing here, I create a million rows, [37:58.120 --> 38:01.120] and about two percent of them have this flag set [38:01.120 --> 38:06.120] just to have some basic test data here. [38:06.120 --> 38:09.120] And when I go and say, okay, [38:09.120 --> 38:12.120] I only want to see all the entries in my table [38:12.120 --> 38:14.120] where the error is true, [38:14.120 --> 38:16.120] because that's what I'm interested in. [38:16.120 --> 38:18.120] You see, by default, [38:18.120 --> 38:21.120] Postgres has to scan the entire table. [38:21.120 --> 38:25.120] That's quite expensive. [38:25.120 --> 38:28.120] What I can say, I create an index, [38:28.120 --> 38:31.120] and make this a conditional index, [38:31.120 --> 38:35.120] and only every row where error is true [38:35.120 --> 38:37.120] goes into this index. [38:37.120 --> 38:39.120] So the 98 percent of the table, [38:39.120 --> 38:42.120] where there is no error I'm not interested in, [38:42.120 --> 38:44.120] because the cardinality is not high enough. [38:44.120 --> 38:49.120] I will never see this data in my index. [38:49.120 --> 38:52.120] Only the two percent here go into my index, [38:52.120 --> 38:55.120] and now suddenly the cost of the query [38:55.120 --> 39:03.120] came down from 16,000 something to 68. [39:03.120 --> 39:06.120] Very fast now. [39:06.120 --> 39:16.120] That's one use cases for Boolean for Boolean index. [39:16.120 --> 39:18.120] So for the cost of this, [39:18.120 --> 39:23.120] I have another index now on the entire column, [39:23.120 --> 39:29.120] and you can see the entire index needs 2,700 pages, [39:29.120 --> 39:35.120] and my Boolean index only needs 57 pages on this. [39:35.120 --> 39:42.120] So much, much faster. [39:42.120 --> 39:44.120] We have a bit type in Postgres, [39:44.120 --> 39:49.120] so not only Boolean, we can also say we want to store bits. [39:49.120 --> 39:52.120] Now you can specify how many bits you want to have. [39:52.120 --> 39:55.120] And the interesting part is you can tell Postgres [39:55.120 --> 39:57.120] by using the b in front. [39:57.120 --> 39:59.120] Okay, this is binary value, [39:59.120 --> 40:03.120] and it does all the transformation for you. [40:03.120 --> 40:05.120] From this, which looks like a string, [40:05.120 --> 40:08.120] with bits in it, to... [40:08.120 --> 40:23.120] Sorry. [40:23.120 --> 40:27.120] Postgres will do the transformation of the bits for us, [40:27.120 --> 40:30.120] so we don't have to do the mental calculation. [40:30.120 --> 40:35.120] Which bit is which value? [40:35.120 --> 40:39.120] We can also do bit operations on these values. [40:39.120 --> 40:42.120] So I have some data in my table, [40:42.120 --> 40:47.120] and I want to only select where this bit is set. [40:47.120 --> 40:50.120] Or specify that's a binary value here, [40:50.120 --> 40:52.120] a bit value here, [40:52.120 --> 40:56.120] and that's a logical end for a logical or. [40:56.120 --> 41:00.120] And then of course, because OR gives me any value [41:00.120 --> 41:03.120] which the bit is set on the left or the right side, [41:03.120 --> 41:07.120] I get everything back here. [41:07.120 --> 41:10.120] We can have exclusive OR on bits. [41:10.120 --> 41:13.120] We can also do mathematical operations on bits [41:13.120 --> 41:16.120] like shifting left and right for multiplied by 2 [41:16.120 --> 41:18.120] or divided by 2. [41:18.120 --> 41:23.120] All of these works. [41:23.120 --> 41:25.120] We can also search in bits. [41:25.120 --> 41:27.120] Which looks a bit complicated, [41:27.120 --> 41:30.120] because we need to make this a logical operation. [41:30.120 --> 41:35.120] So I want to search everything where this bit is set, [41:35.120 --> 41:38.120] and because it gives me a value back, [41:38.120 --> 41:40.120] I need to say, OK, if this is greater or null, [41:40.120 --> 41:43.120] then I get the result. [41:43.120 --> 41:47.120] If I search for everything where the second bit from the right is set, [41:47.120 --> 41:49.120] there's no value in it, [41:49.120 --> 41:57.120] so I don't get anything back from my database. [41:57.120 --> 42:02.120] I can cast bits to any value. [42:02.120 --> 42:04.120] Like that's an integer. [42:04.120 --> 42:06.120] I can cast it to bits. [42:06.120 --> 42:09.120] I get my bit value back any other way around. [42:09.120 --> 42:17.120] I can cast from bits to integers to any other value. [42:17.120 --> 42:23.120] Good. Anyone of you storing binary data in a database? [42:23.120 --> 42:27.120] Good for you. [42:27.120 --> 42:31.120] We have a byte A type, which can do this. [42:31.120 --> 42:33.120] And all I want to say about this, [42:33.120 --> 42:36.120] please use the functions of your programming language [42:36.120 --> 42:40.120] to transfer the data in and out of the database. [42:40.120 --> 42:46.120] Please don't write your own code to try and transform this data. [42:46.120 --> 42:49.120] The language we have, which supports Postgres, [42:49.120 --> 42:58.120] has functions for transferring binary data in and out of the database. [42:58.120 --> 43:00.120] We have two different formats. [43:00.120 --> 43:05.120] One is called the old escape format, which is hard to pass. [43:05.120 --> 43:07.120] And the new one is the hex format. [43:07.120 --> 43:15.120] So this will always store hex values for binary data in a database. [43:15.120 --> 43:21.120] We also have network types in Postgres. [43:21.120 --> 43:27.120] So Enet can store any IPv4 or IPv6 address. [43:27.120 --> 43:32.120] The network type CDIR can store the networks. [43:32.120 --> 43:37.120] And we can store MAC address in Postgres. [43:37.120 --> 43:47.120] So anytime you work with network addresses, [43:47.120 --> 43:49.120] please don't store them as a text. [43:49.120 --> 43:51.120] Use the proper data type for it, [43:51.120 --> 43:55.120] because you can go and have an index on it, [43:55.120 --> 43:57.120] or you can go and ask Postgres, [43:57.120 --> 44:08.120] is this IP address in this network? [44:08.120 --> 44:13.120] I created a table with a couple of IP addresses in here. [44:13.120 --> 44:15.120] And then I can ask Postgres, [44:15.120 --> 44:21.120] give me every IP address, which is in this network. [44:21.120 --> 44:25.120] So you don't have to do all the manual handling of IP addresses [44:25.120 --> 44:31.120] and text matching and whatever people do to find their IP addresses. [44:31.120 --> 44:34.120] And on top of that, it supports an index. [44:34.120 --> 44:36.120] It gets very, very fast. [44:36.120 --> 44:43.120] So any kind of address and IP address parsing can be very fast in Postgres. [44:43.120 --> 44:48.120] I'm almost running out of time because we started late. [44:48.120 --> 44:51.120] You can create your own data types in Postgres. [44:51.120 --> 44:57.120] So we have enums and a couple more we have Postgres as extension. [44:57.120 --> 45:05.120] This is part of another trocker health, not here, not today. [45:05.120 --> 45:16.120] So it gives this part any questions you have. [45:16.120 --> 45:35.120] Jimmy, do we have any questions? [45:35.120 --> 45:41.120] During the part about the timestamp or timestamp C data type, [45:41.120 --> 45:47.120] you said that the database handles the conversion from UTC type. [45:47.120 --> 45:53.120] Does it store the UTC type and also the offset in time zone? [45:53.120 --> 45:59.120] Or does it always convert to the current time zone of the database, I guess? [45:59.120 --> 46:07.120] Whatever you insert into the data type and you don't specify your time zone. [46:07.120 --> 46:12.120] It will assume your local time zone, if you specify a time zone, it uses this time zone. [46:12.120 --> 46:15.120] It always converts it to UTC internally. [46:15.120 --> 46:20.120] And then when you select it, it returns in your time zone you specify. [46:20.120 --> 46:26.120] Thank you. [46:26.120 --> 46:37.120] Jason, what do you say, can I use Postgres for my non-religional use purposes? [46:37.120 --> 46:39.120] For example, something like MongoDB. [46:39.120 --> 46:44.120] I think you're in the wrong room. The answer is yes. [46:44.120 --> 46:55.120] I mean, almost everything you can use MongoDB for, you can also do in Postgres. [46:55.120 --> 47:00.120] Hi, is there ever any hope of getting an unsigned integer type in Postgres QL? [47:00.120 --> 47:04.120] I think there's an extension which can do that, but it's not there by default. [47:04.120 --> 47:20.120] If you check the Postgres extension network, PGXN, there are a couple more data types you can use for this. [47:20.120 --> 47:23.120] I didn't get it quite right. [47:23.120 --> 47:28.120] You recommended the jsonv data type over jsonv. [47:28.120 --> 47:35.120] Basically, everything you can do in json, you can also do in jsonv, [47:35.120 --> 47:42.120] but it gives you more functionality in jsonv. [47:42.120 --> 47:49.120] It usually is, yes. [47:49.120 --> 48:02.120] Thank you. Thanks, Jimmy.