What Developers Should Know About Postgres With Andrew Atkinson

[00:00:00] Speaker A: Hello and welcome to the rubber duck Dev show. I'm Kreston. [00:00:03] Speaker B: I'm Coda. [00:00:04] Speaker C: And I'm Andrew. [00:00:06] Speaker A: And today we're going to be talking about what developers should know about postgres. But before we get into that, we can review Coda. How was your week? [00:00:16] Speaker B: It was all know at work. We've been putting together this demo for a trade show. So it's in Japan and like, one of these big trade shows. And things are going okay. But as always, there's just little things here and there. At the last minute, we're like, okay, this needs to be tweaked and fixed up. So it's been a little stressful on that front. But I'll be traveling out to three different countries in East Asia starting next week. So that's going to know kind of the usual busy travel life, I guess. Creston, how about you? [00:00:58] Speaker A: So I've been continuing to work on my postgres optimization course and spending fair amount of time building up a two terabyte database because that's kind of what was the objective of the course, is to kind of work through that, see different problems that come with that type of scale. And it's funny, I'm in the process of building it and my computer has become a mini heater in my office as it's running. And so far I'm just up to like 300gb. It's kind of a stopping point. I did say, all right, I'm going to kind of use this as a testing ground to see different. Gives examples of poor performance. But it already has like a billion row table, 3 million users. So at the point it gets to 2 billion, I'll probably have a 5 billion row table, 20 million users in this application. I should call it an application. It's kind of funny. I feel like I'm building an application without building an application because I'm trying to think through, okay, what would an application need to do in this scenario? And then building the database tables to support it and then building what inserts and update statements would typically happen with it. So feels a little weird. It's kind of like the phantom application. So it's just the database part of it. So it's been a little interesting taking longer than I had anticipated, but still moving forward with that. Andrew, what about you? [00:02:34] Speaker C: Yeah, I'm excited about your training course because I was an early purchaser and I'm looking forward to. [00:02:41] Speaker A: Yes, thank you. [00:02:42] Speaker C: Seeing what you're offering in there, and I can identify with the challenge of having an application and a workload in place for that purpose as well since I worked on that for I did a similar thing. Or there's probably some similarities to what I did for the book where I was putting this application together and thinking about what I wanted to show with examples and then trying to keep it realistic based on real applications I've worked on for companies and things like that. Yeah, so the last week I know I wanted to address my winter hat for anyone watching the video portion of this. I've been trying to stay warm. It's been about usually around zero degrees here this week. So we've finally gotten some cold weather here in Minnesota. But work wise, just this month I've actually launched independent consulting for the first time in my career actually. So it was kind of a big undertaking for me. I'm going to be giving it a while to try it out and find clients where I can help them with some of their database challenges and their rails web application challenges. And it's going well so far. I've had some success with working with clients and helping them achieve their goals and then have a few things in the pipeline. So yeah, so far so good. [00:04:02] Speaker A: Sounds good. [00:04:03] Speaker B: Actually, Andrew, I'm curious about that. When you do these consultations, what kind of scope are you looking at and how in the weeds do you get with that? [00:04:14] Speaker C: Yeah, I'm trying to be very flexible now since I don't know what folks might hire me for exactly. And then also of course I need to think about what kind of investment of time I might make relative to the kind of income goals I have with the project, of course. But what I really want to work on is try to leverage skills and knowledge I built up with postgres that could be useful to other teams and companies where they might not have a database administrator or database oriented senior engineers on their team. And so the scope could be anything from database level changes like parameter tuning or looking at queries and indexes. Or it could be more on the rails application side. And something that we might talk about in this conversation too is Ruby on rails has added in the last couple of releases more capabilities to use more databases. So you can have multiple databases connected to the application which opens up some possibilities as well as some database sharding capabilities. So if you have a b to b app that's multitenant and you want to run a customer on their own database and possibly on its own instance and things like that, it's something you can do within Ruby on rails. So a team might want to hire me to do that sort of. It would be likely rails and web application work, but it would be database oriented work. [00:05:42] Speaker B: Cool. [00:05:42] Speaker A: All right, so we're talking about kind of what developers should know about postgres or ideally. But before we get into that, I kind of wanted to talk a little bit about your history and kind of where you got to where you were, like, how did you, I guess, start getting into programming in. [00:06:05] Speaker C: Thanks. Yeah. And Creston. And for anyone else that doesn't know, I also appreciated the opportunity to appear on the rubber duck dev show. I think it was a little more than a year ago, it was a while back now. I think we all kind of shared some of our developer origin stories a bit. I believe on that episode I could be misremembering, but yeah, it's fun to think back about how things came to be. I guess. I think for me, without going into too much detail, it started all the way from just making static websites way back in the day, like when the Internet was relatively new in the then in university, I wanted to study computer science and I ended up studying kind of a combination of computer science and business classes as my main major. But I also got a bachelor's degree in spanish studies and kind of liked that mix of liberal arts and science oriented education. And then out in the working world, I kind of worked for. I've done all aspects of web application development where it might be like full stack or front end or back end, but I guess I always kind of like to tinker on computers too, especially a long time ago, like building gaming computers and stuff like that. So I think my arc has been more like full stack web development, and then over time, as I've tried to follow kind of what interested me, I've just become more and more interested in which I can talk about why, but more interested in databases and how they work. And then I think about where to use that in a working sense. And I think it's been being more of a databases oriented senior staff engineer for a team, someone that's a backend engineer. And then that comes along with likely doing a mix of application development and writing code, but also doing infrastructural work, maybe writing like terraform or managing instances, and kind of this mix of writing code, but also a mix of understanding the operations and optimize them and that sort of thing. So that's kind of where I'm at now, and I think it's really been holding my interest for several years and I don't see that changing much. However, that's just been my story. But I think another important thing is in the staff engineer book by Will Larson, which is a great book. I recommend all the time. Will talked about working in areas where there's both. I think I wrote down the exact words, but it's like the words that he used are nice. Let me just look those up really quick. It was room and attention are the words that he used. So working by room, it's like where there's room to work, where it's not overcrowded, there's not essentially more people than there are, than there is a need. And what I found is a lot of developers, they don't tend to get that excited about working with a relational database, which is maybe understandable. It's not like it's a new technology, it's decades old. And so I think there's a lot of room to work within that realm and to take on how to write better sql, how to work with indexes, like what the features and capabilities are of the database, and then on the attention side, relational databases usually are the system of record for businesses. So the data that they hold is extremely important. And companies invest in both, of course, the core application and its use of the relational database, but also data teams that use the relational data as a source for gathering insights about the company's operations, whether it might be like product insights or sales insights or that sort of thing. So I think there's a lot of attention paid as well. So that combination of room and attention, I think, is kind of something I keep in mind as well. That also fuels me, besides the technology pieces. [00:10:31] Speaker A: So why do you think developers are not too, I don't know how to phrase it. I don't know if you think they're not too interested in learning more about the database you mentioned, like, oh, maybe it's old technology, or I guess stable technology, or established probably is a better term, technology. But why do you think developers are less inclined to go in to learn more about the database, or what have you experienced? [00:11:01] Speaker C: Yeah, I think in my own experience, I think sometimes the way databases are taught, they're taught more from an academic sense, and that can be off putting. Maybe if you don't have a traditional computer science or academic background, like diving into binary search trees and balanced trees and things like that. For me, when I was getting started, those things, I think, have become much more interesting to me as I've been more of a practical operator with databases. Now, I'm curious about, okay, actually, what are the details of this algorithm and what does the physical data layout look like with postgres and stuff like think? And that could be because of the popularity of object relational mappers orms like in Ruby on rails. We have active record. Of course there's other ones, but they're fairly valuable in allowing developers to work in kind of an object oriented programming paradigm. But then the downside is it kind of insulates developers a bit from the relational database paradigm and normalized data and database constraints or other database objects that possibly that they're not leveraging for their overall development or the successful outcomes of their overall goals with their application. So it could be the way it's taught. I do think it's also like it's maybe just a bit intimidating as it's. Or maybe it's sort of the, you know, I think there's also another practical piece about, and this is something I think a lot about too, but going back to the jobs thing with operations, developers often are hired and teams often staff up with lots of developers because they want to expand the revenue for the company. And I think developers a lot are thought of as key employees to help grow the platform's features and capabilities and expand the revenue. And ultimately companies do of course they need to financially grow and be sustainable and that sort of thing. And I think sometimes on the operations side it's much less clear how operations can be a revenue contributor. So in terms of cost center versus a revenue center from an accounting perspective, I do think at companies with sufficient scale, of course they're going to need some specialized folks and some infrastructural folks to help make sure their operations are healthy and scalable. And we can do upgrades and we can respond to security incidents and all these kinds of things. But I think there's probably a bigger set of need in the overall software development economy. This is my speculation now, but probably a bigger need overall for application developers. So maybe developers perceive relational databases and database technologies as more narrow or a bit more limiting. I don't know. What do you all think about that? [00:14:17] Speaker B: So I think there is something to be said for kind of almost the visual feedback and also sort of the building end user applications is something that a lot of developers are really excited by. Right? And this is something where it's sort of similar to looking at consumer products versus b to b kinds of things, where you're kind of building a tool for developers to build better or you're kind of working in that scope where you're more kind of on that back end side. So I guess there's in some ways a little bit less glamour, right? And that might be part of it as well. [00:14:58] Speaker A: I kind of get the sense that a lot of developers, they like building the new thing, the new cool thing. So they're constantly like building features, building features. I want to build a feature. I want to build a feature. You can definitely disagree with me, I'm just thinking of that and that, hey, learn more about the database. How is that going to help me build new features? Right? Whereas when I look at it, because I'm like, you focus on postgres performance, I get a kick out of making something a thousand times faster or 10,000 times faster. It's like, wow, I love that. But that's me. But maybe other developers. Well, but clearly some of those developers are, because you can see the flame wars on different forums talking about, oh, don't use a hash here, use an array. They get into the minutiae of performance. So I don't know. Interesting. [00:16:08] Speaker C: Yeah, I think to summarize what I was saying, it's kind of a marketing problem to attract the attention of developers. And then maybe to summarize the benefit you're talking about, or maybe better marketing for developer attention is those opportunities to work on performance improvements where it's a really practical, arguably, it's likely a bigger gain than what you can get when you work within your programming language. I don't think you can get 1000 x gain in any program, even the worst performing Ruby code or whatever, but adding an index to a query, you can achieve that kind of improvement, which just is pretty wild. And it's building on decades of work that's gone into making that really optimized data structure for fast retrieval. And I think another way to market relational databases. I know this is not the main purpose of this episode, but to developers, I guess I think it's really cool to see, well, I think like indexes, I think indexes are one of the most fascinating parts for developers. And you do get to really build on and stand on top of these really extremely complex computer science concepts that are encoded into the source code of the database engine. And it's not something that is very bespoke to one company and one experience. It's actually something that's used by thousands of companies all over the world. And it's this really interesting challenge of general purpose data storage, read and write operations. But then having these highly optimized retrieval mechanisms through indexes, that I think is really pretty fascinating. I think another challenge too is just that paradigm like navigating that, though, if you're a newer developer, and I know I thought a lot about this, too, you're working with programming languages and you think about the programming execution model and that kind of thing, and the persistence aspect, writing and reading data and stuff. Like just, it feels like it's a little detail along the way. But Creston, maybe you feel like this, but for like, as I've gotten more and more into databases, I almost just think about things the other way around where I'm thinking about, okay, what are actually the read and write operations that this application is doing? And there's a bunch of other stuff, to me, I think more about the data and the I o because I think that ultimately is what the application is intended to create data. Right. If it's a consumer or business application, it's capturing things from end users and then it's serving those things back. And so those kinds of operational data flows and that kind of thing, to me they're very compelling. But yeah, I do think it's hard to market that to developers thinking about programming languages. [00:19:14] Speaker A: I also wonder if, because the thing we're talking about, like being able to achieve 1000 fold, 10,000 fold gains, that's only possible if you have large ish data. So when you're starting, if you're in a new app, or if it's a relatively small app with not as much data, then the database optimizes. Database isn't important. It does it without even breaking a sweat. So it depends on what the size of the data that the vast majority of developers are working with. So maybe if they're on smaller apps, then they may never need to know. I hate to say it, but they may never need to know this. But it's, once you start getting more data, there's a tipping point at which it's like, oh, boy, then this knowledge becomes absolutely essential. [00:20:05] Speaker C: Yeah, I guess, speaking for myself, I kind of glorified that. Or maybe it's like, oh, we're working on this really high scale application and that sort of thing. And I think the part where it's justified to glorify that maybe is usually that means it's successful and hopefully it's a valuable application to the end. Certainly it is to the end users. They're using it, but maybe more broadly in society or something. Hopefully it's a useful application, whether it's shopping or dating or sending things around, logistics or whatever, it is learning. But yeah, like you said, you don't really need that more intense focus on this aspect of the overall application architecture until you have those needs. Just taking things to my book for a moment, I was kind of hoping that folks would be interested in it, both that may be aspired to have those needs, or ideally they're maybe facing those challenges currently. But once you do, then of course then it turns more into a need versus a want, right? Typically what I've seen is you can often buy your way out of performance challenges by spending a lot more money. However, as an engineer, that's not very satisfying. You want to say, well, okay, these are the constraints that we're working within. We have these instances with this much memory, this much cpu, and then leaning into this wealth of information that's available with postgres or other relational databases as well, can let you get those performance gains without moving the goalposts or changing the instances and that kind of thing. [00:22:04] Speaker A: And it depends on the economic environment. Like for example, the last few years, it seems like corporations were very flush with cash, and if you hit a baldernack or you spend more money, more servers or whatever it is, now that the economic environment seems to be changing a little bit, money is not as very available. I'm seeing like even amongst some of my clients, they're starting to focus on some cost cutting measures. So it's like, okay, then, now is the point at which you can basically, it's very valuable to be able to learn these skills, to be able to optimize your database and not have to do the next database upgrade. I'm sorry, Coda, were you going to say something? [00:22:46] Speaker B: Oh, actually a couple of things, I think. Andrew, I don't know if you've introduced your book to the audience yet. Right. Maybe we should. [00:22:55] Speaker C: Yeah, I'd love to. Yeah, I have. Yeah. Thank. Yeah. So the book is called high performance postgres for rails. And the premise is that readers that use Ruby on rails, that primarily, they likely work as backend developers or they do backend development. If they use postgres and they want to learn to better use postgres, maybe features they're not using that aren't even available from active record, they'll learn things throughout the book through a combination of examples and exercises that they'll work on to allow them to leverage those capabilities they're not using at the moment to help them achieve more success. So it could be, like Preston said, it could be making their queries and workload run more efficiently and not needing to scale up or even being able to scale down on their instance sizes, for example, as a real concrete way to save money. It could be leveraging things like table partitioning to archive data to allow them to kind of control their cost growth from storing data. But I think probably the most fun part of the book is some of the kind of performance oriented stuff we're talking about how to use. There's many different types of indexes, and then amongst the types of indexes in postgres, there's also different indexing strategies. So single and multicolumn partial indexes, writing indexes on expressions, and then we also get. Last thing I'll say is we don't limit ourselves to just stock postgres and stock ruby on rails. There's also about 40 different postgres extensions and ruby gems, which are third party code for the most part. There's some extensions that are part of postgres as well, or that ship with postgres, but there's this broader ecosystem of tools that I think I've drawn from my own experience and from talking with other developers that will hopefully expand people's horizons a bit on different tools they can use to solve some of their challenges as well. [00:25:03] Speaker B: And wait, when is the book coming out? [00:25:04] Speaker A: Thank you. [00:25:06] Speaker C: Yeah, so the status of the book, it's been for sale since August 30 through pragmatic programmers, and it's in a beta release, so it's available for purchase in an ebook format only in the beta stage. And over that period today, it's January. We're in January now. So over the period up until December, I was actually still revising and editing the book based on tech review feedback, and the sales have been going pretty well and we fully finished up the author's portion. So my portion with the development editor in the middle of December, roughly. And so now what's happening next is it's going to be going into the production process for the publisher and will be available for print in the next couple of months. [00:25:53] Speaker B: Very good. Yeah. Is this your first book on, I guess first book, yeah, first book. [00:26:02] Speaker C: Okay. [00:26:02] Speaker B: All right. Congratulations. How was that process? I think you've been doing a lot of writing in general right before this as well, but it sounds like, I would imagine it's a little bit different going into writing a book as well. [00:26:18] Speaker C: Yeah, it's different. Although there are similarities to writing a blog post on a technical topic, you want to kind of understand over time, I think I gained a little bit of an efficiency in my process about kind of using things like outlining and strategies to help speed things up a bit. But yeah, it was a really interesting, I'm really appreciative of having the opportunity to have done this. I'm not sure that I'll ever write another book. We'll see. It's a very time intensive task and it's hard to work full time, which I was doing for the majority of it, and write a book. It's challenging, I think, financially a bit, because comparing it to being hired to do consulting work or that sort of thing, it's not as lucrative. However, I've always kind of liked education and teaching, and I realized it's a form of that. It's a way to take things that I've learned or I feel like are valuable to advocate for to other developers and then put them in in the form of examples and exercises into the book. So in that sense, it was really fulfilling to do. Yeah. The downside was it was very time intensive and had to skip out on a lot of things in order to make the time available to do it. So I'm glad it's done now. [00:27:44] Speaker A: Calendar time. Like, when did you actually start writing? Just out of curious. Start writing to the point at which you pretty much got the first version buttoned up. [00:27:54] Speaker C: Yeah, actually, I remember the final. So we started, I really would say I started writing it in earnest, like June of 2022, and it took until last year, around the holiday time, to have the first draft done. So I think the overall writing process was about 18 months, and it was actually probably about nine months, probably about half of that time, until I got to the first draft stage. And I remember thinking it was about a year ago now, a little more. I remember feeling pretty good. I was like, yeah, I'm done. First draft. I knew it wasn't really done because it was the first draft, but I really didn't realize how much more work there was to do after the first draft. And to some extent, like writing a technical book, it needs to be correct and it needs to be detailed. And those things are hard when you're trying to write a lot of content. So to some extent, there were sections where they just weren't detailed enough, or I glossed over things, or there were errors that I either caught myself from retesting or more likely, were caught from technical reviewers and some early readers in the beta period. So I'm super thankful for everyone that's read and contributed feedback, really. It was another almost a year from when the first draft was done until when it was fully done. And most of that time period was more of the editing than the actual writing. I had kind of locked in on the content and then just getting people to read it and go through the examples and exercises and point out issues and then be able to revise it. [00:29:44] Speaker B: I would also imagine with a lot of software related, technology related books, it's very important to keep that time to a minimum, right? To make sure that the stuff you say doesn't go out of date by the time you release it. I guess with databases, which being a little bit more of a mature area, I imagine that's not quite as much of a problem as if you're writing like some book on a JavaScript package manager or something. [00:30:13] Speaker A: Yeah, but you got rails in there too. Well, it frequently changes. Not as much. [00:30:24] Speaker B: I'm sure, in the time period. [00:30:28] Speaker C: Yeah, it's true. And I mean, for me part of the appeal was to try to work or write about technologies that wouldn't change too much, because just in terms of putting my mark on something, I guess I wanted it to be something that would be useful for a number of years. I almost feel like maybe if you are someone who has written more books, maybe you feel less. I'll use an expression like less precious about it or less attached to it or something. And you'd say, yeah, I'll write about this latest Javascript framework and I'm going to get this out as quickly as I can and take advantage of the moment. But that really wasn't my goal. My goal was I hoped that this could be useful for a lot of years. I've mentioned to others I'd love for this book to still be useful in a decade, at least for the postgres parts. Rails probably will have many, many more major versions by then. Postgres with an annual release cadence will have ten new versions by then. So I guess that would be version like 26. But yeah, the core parts. There's a lot of things throughout the book, SQL hasn't really changed much. A lot of the SQL that is being used and that also would be portable to other relational databases. So things like an earlier chapter shows readers how to scrub data and there's a whole chapter dedicated to doing that in a high performance way. That's something I could see being useful in lots of years. It's not really going to change. It's mostly SQL and some shell scripting and other technology that's within postgres. Being that it's native capabilities like table partitioning for example, I don't see that changing. I see that just continuing to be enhanced. So it's more likely the book would be the book I hope will be useful in years from now, but it might be missing some of the latest and greatest about particular feature areas, that kind of thing. [00:32:25] Speaker A: Out of curiosity, I don't think you mentioned this. If you did, forgive me, but what motivated you to decide to write the book? What were you doing when you're sitting there saying, I think I'm going to write a book? What was going through your mind? [00:32:41] Speaker C: Yeah, I didn't actually sit there thinking about writing my book. I was actually approached after. One thing I did do though, was I did sit around and think, oh, I'm going to try to present on postgres at a conference. So I did do that, and even more so even earlier than that, I guess. I think it really started from, and actually I meant to mention this earlier too, but regarding what you said earlier, Kristen, I think it's really important that some of the advanced database capabilities, when you don't need them, it's not likely something that developers want to invest their time and energy into. And what I found was I was interested in some of the details about indexes and query planning and that kind of thing and constraints, but it all became much more real when I had a real need at work. And so I did work at a company that's using a postgres ten database, essentially at the limits of what was possible, including failing, like going beyond the limits and having errors and that kind of thing. And we were on a huge instance with very high load. And so anything we could do to help lessen the load on the instance, like query optimization, we ended up splitting a new database out to shift some of the I O activity from what was originally just one database. And one know, having those opportunities really was the main spark for me that sent me down this path. And so I did a whole bunch of database project work and then I put that together in a presentation and ended up submitting that to postgres conference New York a few years ago, a couple of years ago. And after that is where I was approached by a book publisher that was looking to add books about postgres to their catalog. But prior to that, I've been blogging for 15 years or something on my personal blog, just writing about, mostly just for myself to organize my thoughts around technology I'm working with. So it's kind of all over the place, but lately it's been mostly about postgres and ruby on rails. But yeah, so I think I've always had an interest in writing. So that might have helped with the acquisitions editor at that publisher that reached out to me, they might have said, hey, I see you're already doing a bit of writing about this. And so I think that's good advice to folks that if they do aspire to do writing professionally, to just write, to write on their own blog or however that makes sense for them. But yeah, I didn't initially set out, but then, I'll be honest, I was pretty excited about the process once I started to realize, oh, hey, this is a real thing that could work. So the publisher has a proposal process, and I think I benefited from having submitted some proposals to conferences and going through that process a couple of times where I kind of looked at it in the same way. The proposal for the publisher has goals, what they want, and then just kind of attacking it. Like what is the content I think would make sense. What can I realistically do? And then just send it over and hope for the best. [00:35:58] Speaker A: All right, so I guess let's maybe shift focus a little bit and talk a little bit about more details about what we kind of think developers should, because we talked a little bit about why aren't they learning more about postgres and probably the benefits of doing so if they choose to do so. But what are some things that a rails developer, since there's a book on rails, should really, or what's the first couple of things they should maybe focus on if they wanted to start learning more about postgres? [00:36:34] Speaker C: Yeah, I think a good entry point is understanding writing active record, how it produces an SQL query, and just really reading the SQL query and then familiarizing yourself with, let's say you're new on a project and you didn't really build it yourself, so you're new to the data model, looking at what are the tables, what are the columns, what are the objects on the tables like indexes and constraints. And then trying to get some sense of if you're interested in the performance engineering part, probably working from your application performance monitoring tool, but understanding where end user performance is impacted. And then what I've seen commonly is to some extent it's because the tools don't necessarily give you enough visibility into this part. But there's sometimes a gap where you'll see some API endpoints for an application that are slower, the slower ones amongst the total set, and you'll be able to attribute a portion of it to the database traffic. And then what I've seen sometimes is that might be where a developer might stop. They might say, well, these queries are slow, or I'm not really sure what to do, which that's the point where then you can look that as an opportunity and you can say okay, well we need to have a good understanding of if it's more of a, first we need to identify whether it's a query that is being, if its data is being read. And then in that case we'd want to go and look at likely indexes and we want to learn about the query planner and we want to gather some query execution plans for those queries that are involved in that API endpoint. But if it's something like, if it's more write oriented, then we'd want to look at a different set of things related to write scalability. And I think that's a good place for developers to start, is to use their APM tool and then they would launch into some more database oriented observability tools. So we can talk about a few, but some that are going to be more specific to the database that are going to be beyond likely. What's the data that's available in the tool? Like what some of those, and I know we've talked about this a bit crescent as well, but for queries there's this popular, well, there's a bunch of tools. So I guess for queries, PG stat statements is something a lot of folks know about. If you don't know about it, it adds a bit of overhead, but basically it looks at all the queries coming into postgres, removes the parameters from them, and then stores a parameter less version of that. So that when new queries come in that match outside of their parameters, postgres will automatically capture statistics about that shape of that query or that normalized form of the query. So we can use that then to say what are the total counts of this particular type of query and what's its average execution time? And the queries that are the most costly or the most slow will bubble up to the top. That way we can then use SQL again to then query our own query statistics data and do that sort of ordered presentation of the data. So that's one tool. Another tool would be PG badger, which is a postgres log parser. And postgres generally there's loads of options that are not enabled by default. That allows a postgres operator to add more information into the log file with the trade off of bigger log files, possibly some additional overhead or latency. So it needs to be done gradually and with a bit of care about which settings are being changed. But what we can do is we can capture more information about queries from the log file as well. We can also look at locks that are being acquired as queries execute if you're running into errors related to resources being locked. So there's dozens of parameters that are part of running postgres instance, and a lot of them are relevant as to information gathering. If you're debugging a performance issue where you might want to adjust some of the values to have more information in the log, and then you can take your postgres log, analyze it with PG badger, and get a really nice HTML report that's organized by different categories of performance work. And then of course there's dedicated tools like PG Hero and even PG analyze. PG Hero is an open source tool that is popular in the rails world. It's available as a ruby gem or in a separate docker container, and it's going to give you some of the database specific queries and data oriented details that you might want to know if you're doing optimization work. And then PG analyze is more of a SaaS tool that you can pay for to sort of like if you were to analyze your log file, but do it on a more recurring basis automatically have PG analyze do that for you and jump into PG analyze to take a look at basically changes you can make to your running instances to help them run more optimally, as well as your queries and how you're storing data, retrieving data and that kind of thing. [00:42:30] Speaker A: Okay. All right. So since you've been on writing the book and it's been in beta being released, as well as you're going on different podcasts and whatnot, from my understand, what is the reaction you're hearing in terms of developers, in terms of, or even in some consulting engagements you may have had that. What issues are you seeing in the community that people are having with scaling rails or scaling postgres? [00:43:06] Speaker C: Yeah, I guess a couple of things that come to mind would be kind of like what we talked about, where it's hard to really invest, I think the time and build the skills around writing efficient queries and leveraging all the capabilities with indexes and stuff until you have those issues. So I do think some of, at least for the consulting work, it's teams that where they're successful, their app is growing a lot. They've got big databases in the hundreds of gigabytes or more range, and they want to really kind of level up their skills proportionally to the size of their database and operations. So there could be a training component and then there could be more of an advisory component to that. Another thing would be, I think a lot of times in web applications I don't want to lead with. It's hard to talk about this generically, but table partitioning is a generic solution that can be leveraged in a couple of ways, but particularly for data archival solution. As part of a data archival solution, I found that that's something that's a common occurrence operationally. I found for rails applications and how they use the database is they might have a couple of tables that are more of log style data that's capturing very granular information about user activities or maybe like sense. The classic example would be like a sensor reading sensor data. Yeah, but some sort of high volume time oriented data. And often once you start to think about access patterns, my experience has been kind of thinking about that across all of the tables that are part of the database and identifying well, if some of them are very write oriented tables, how much are we actually reading that data? And commonly it might be more like data that's only read based on recent data. So data in the last 30 days or 90 days or something like that. So that can be a great place to use the table partitioning strategy. However, it's not super easy to implement that, even if you know that's a good solution. So I think sometimes teams just put up with that is my best guess at. And so what that can look like is your database is ballooning in size, your backups are slow, your restores are slow because those operations individually are really slow. Accessing that table, like maybe building indexes, or maybe you have replication, you might be scaling instances unnecessarily, you might be scaling instances to deal with that. When in fact, if you are able to say, well, we only access 30 days, how can we actually retain all of this data but get it out of postgres, put it into archive files, or even ship it over to an analytical data store that doesn't have this hot transactional need like the web application does, that kind of thing could really help avoid scaling up an instance or help avoid ballooning database size and that kind of thing. So that's another way, I think, where I can help teams offering that. But yeah, just, and it's not so. [00:46:39] Speaker A: Hard implementing partitioning, it's desiring to migrate all said data. [00:46:46] Speaker C: It's the data migration. Yeah, I think another thing is tenancy, like thinking about within our database, if you have multiple customers and you want to give them some kind of compute or data isolation, thinking about does it make sense to do that within, there's a lot of different options. You can do that within a particular table. You can do that using postgres schema objects, you can split them into their own database. You can run those databases on separate instances. So I have not yet been hired for tenancy particularly, but I know I've worked on that at companies and I think that's where things get a little more challenging. When I worked at that company where engineers might. It's a little more challenging to think of your application running as multiple instances and then possibly the databases that are attached to each of those instances of the running app having different patterns and different behavior. [00:47:45] Speaker B: I see. What do you find are the areas that people shoot themselves in the foot most? What are the biggest mistakes you see especially kind of early on, before really people are thinking about scalability? [00:47:58] Speaker C: Yeah, that's a good question. Well, I think there's some fear and uncertainty and doubt raised around whether or not you should use database level constraints, and I'm an advocate of them. And earlier in my career I know that. I think there's sometimes misinformation about how they can impact your running operation. And they do make certain things more challenging because they essentially can be used, at least when you're talking about foreign key and primary key constraints, to create these dependencies within your data. And that can make some things like moving data around, data migrations more challenging, but not unsolvable, just a little more challenging. And so I think, unfortunately, that leads sometimes to people avoiding using those things, using constraints. And the impact of that can be having messy data in the database, data that's null, where you don't expect it to, it's missing when you expect there to be data, when it's poorly formatted, when referential, and data that's split between two tables, one half of the relationship is missing. I think not adopting constraints and not thinking about the data quality going in at the front end is a possible mistake. And then I think maybe just spending unnecessarily, you know, and, and going, this is a, this is hard to say in a vacuum, because the context really matters. But often I think my experience is it's a little bit of a hard sell sometimes to weigh the developer time against the convenience and ease of cloud provided databases and being able to scale those up and spend a bit more money. However, if you can manage to show the impact, if you can manage to leverage indexes and other techniques to lower your resource utilization and save money through better performance in the short term, then those savings will have a compounding effect. And you're also investing in being a better operator of a key tool in your stack, which I think then creates these knock on effects of like, well, what else can I do to generalize that one as my last point on this question anyways would be, I guess limiting yourself to what the object relational mapper can do, like within your application code. I would argue that's, I wouldn't really call it a mistake, but I would just say it could be that you don't necessarily know about the capabilities. In that case, you should buy my book or read more documentation if you're having those issues, or if you're just genuinely curious. However, yeah, I do think thinking about those, if you're an engineering lead on the team, I think it's prudent to know about those capabilities beyond what maybe is being used, and to think about all of this in terms of reliability, scalability, and cost efficiency as well. [00:51:22] Speaker A: All right, well, thank you for that. This has been great. I think we're coming up on time, so definitely thank you so much for coming and sharing your knowledge. Greatly appreciate it. So where can people find out more about your work and what you do? [00:51:41] Speaker C: Yeah, well, I still [email protected] and on social media I'm using Twitter and x. I'm also on Mastodon and blue sky. It's very lonely on blue sky. So if someone wants to connect with. [00:51:56] Speaker A: Me there, it's a blue sky wide open. [00:52:01] Speaker C: Fun fact, Blue sky uses postgres and they've got an engineer that I follow there that posts about their user growth. So they just crossed 3 million users recently, and I was having a bit of a conversation about their sharding strategy, or at least how they've distributed their users amongst their shards and stuff. So that was fun. But yeah, I'm on Twitter at adatki and then the book is available through pragmatic programmers and their website is pragprag.com. If you type postgres in there, you'll find the book. [00:52:35] Speaker A: Okay. All right, thanks so much. So I hope you guys enjoyed that. Be sure to visit theruberductdevshow.com where you can find links to any content discussed in this episode, as well as eventually you'll be getting a transcript. You can also get the podcast version and of course on YouTube, as well as you can sign up for our email bell list so you get notified when new episodes come out. I hope everyone has a great week. And until next time, happy coding. [00:53:03] Speaker B: Happy coding. Bye.

Show Notes

Episode Transcript

Other Episodes

Episode 57

Going Off The Rails With Drew Bragg | Rubber Duck Dev Show 57

Episode 26

The DarK Side of Open Source | Rubber Duck Dev Show 26

Episode 54

Open Source Experiences And The Pay Gem With Chris Oliver | Rubber Duck Dev Show 54

What Developers Should Know About Postgres With Andrew Atkinson | Rubber Duck Dev Show 114