TIL

My PostgreSQL wishlist

If you know me you’re aware that I love SQL and relational databases. Of all of the databases I have used, PostgreSQL is by far my favorite. SQL is the one technology I have seen in common across every job I have had over the past two decades. I would bet you could say the same - even if you aren’t writing it directly (which would be a shame in my opinion), it is still there, behind the scenes, handling all of your data. But if you’re fortunate enough to work directly with SQL and a database, it’s hard to find a better tool.

But while I have never seen a better database than PostgreSQL (and you can’t beat the price!), that doesn’t mean that there aren’t things that could be improved. My goal with this post is to start a conversation and throw out some ideas that I have gathered over the years of things I yearned for. Best case scenario, someone tells me that these things already exist and I learn something new! Second best, these ideas resonate with others and we find ways to get some of these in future versions. But I would also love any conversation about why any of these are bad ideas or technically infeasible - I’ll learn something then too. And if anyone has ideas about how I can contribute to these things directly, please let me know!

Read more
comments

January 8, 2024

Notes on PostgreSQL Explain Analyze

One thing you’ll want to learn if you use PostgreSQL for any length of time is how to use EXPLAIN. At my job at Vendr, like my previous roles, we are no exception. The good and bad thing is that in many cases you can go pretty far before you start having issues. But that means that not everyone has had the opportunity to learn the dark art of reading explain output. And as great as PG is, it is not as user friendly in this area as other RDBMs like MSSQL. But with this primer, a few free online tools, and a little bit of time, you can quickly learn how to think about the performance of your queries and where to start when optimizing.

Read more
comments

August 25, 2023

Postgres pivot table using JSON

Something I have a need to do often but can be difficult to do at times in SQL is to create a pivot table. As an example imagine wanting to see customers and their revenue by month. It is straightforward to create a normal data set where the dates are the rows and you have a revenue amount for each. Something like this:

dte	total
2022-01-01	22030
2022-02-01	22753
2022-03-01	0
2022-04-01	9456
2022-05-01	7798
2022-06-01	38278
2022-07-01	18736
2022-08-01	6794
2022-09-01	21033
2022-10-01	28576
2022-11-01	10172
2022-12-01	41901

But you quickly come up to two obstacles as you try to take it further - you either want to have the months as columns like this:

jan	feb	mar	apr	may	jun	jul	aug	sep	oct	nov	dec
22030	22753	0	9456	7798	38278	18736	6794	21033	28576	10172	41901

or you want to see multiple customers, which as a column can be difficult, or even harder is having the months as columns and the customers as rows:

cus_id	jan	feb	mar	apr	may	jun	jul	aug	sep	oct	nov	dec
1	0	10170	0	5399	0	14821	7927	0	14	15466	3675	14447
2	22030	12583	0	4057	7798	23457	10809	6794	21019	13110	6497	27454

The term for this is pivot table - which is something you may have done many times in Excel or other spreadsheet application.

But this is difficult in SQL, because SQL requires you to have a static column list. You can’t ask SQL to give you whatever columns are necessary, you must declare them in your query. (SELECT * may seem like an exception to this rule, but in this case the SQL engine still knows what the columns are going to be before the query is executed).

Luckily Postgres gives you a way around this. If you want actual columns you still have to specify them, but the hard part of aggregation into these pivoted columns and rows are made much easier. The key is using the JSON functionality, which allows you to represent complex values in a single cell. It allows you to aggregate the values into what really represents multiple values, and then pull them back apart after the fact.

Here is an example of what this looks like:

WITH columns AS (
  SELECT
       generate_series dte
     , customer_id
  FROM generate_series('2022-01-01'::date, '2022-12-31', '1 month')
  CROSS JOIN
    (
      SELECT DISTINCT
        customer_id
      FROM
        invoice
    ) AS customers
), data AS (
  SELECT
      columns.dte
    , columns.customer_id
    , SUM(COALESCE(invoice.amount, 0)) total
  FROM columns
  LEFT OUTER
    JOIN invoice
    ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
    AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
    AND columns.customer_id = invoice.customer_id
  GROUP BY
      columns.dte
    , columns.customer_id
), result AS (
  SELECT
    customer_id
    , JSONB_OBJECT_AGG(TO_CHAR(dte, 'YYYY-MM'), total) pivotData
  FROM
    data
  GROUP BY
    customer_id
)
SELECT
    customer_id
  , (pivotData->>'2022-01') "jan"
  , (pivotData->>'2022-02') "feb"
  , (pivotData->>'2022-03') "mar"
  , (pivotData->>'2022-04') "apr"
  , (pivotData->>'2022-05') "may"
  , (pivotData->>'2022-06') "jun"
  , (pivotData->>'2022-07') "jul"
  , (pivotData->>'2022-08') "aug"
  , (pivotData->>'2022-09') "sep"
  , (pivotData->>'2022-10') "oct"
  , (pivotData->>'2022-11') "nov"
  , (pivotData->>'2022-12') "dec"
FROM
  result
ORDER BY customer_id

Which gives results like we were after above.

If you would like to see how this is done, I have an interactive fiddle you can play with that shows you step by step how each of these parts work:

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=39e115cb8afd6e62c0101286ecd08a3f

This example is using PG 15, but this functionality works all the way back to PG 9.5.

This query is also a great example of using generate_series to generate a set of data to join against, so that you can find any holes and represent all the data points you need (months in this case), even if there is no actual data for that point.

In conclusion, the JSON functionality built into todays relational databases are more than just schemaless data stores and complex values in cells. They can also be powerful as intermediate steps to help you manipulate and transform your data in useful ways.

comments

May 15, 2022

Disconnected API Responses

An alternate title for this post might be Processing API Requests with a Queue. We recently had a project where we were expecting a burst of high traffic and heavy load on some API endpoints. We wanted to make sure that we could handle all of the traffic, even if the processing time was affected - dropped requests were not an option. After doing quite a bit of research this post is what we came up with. In the end this strategy worked well for our purposes, but we did identify some ways that we would improve it in the future.

Also this same strategy will work for any long-running api request. Things like reporting for example, where you need to be able to make a request but it may take a very long (and possibly indeterminate) amount of time to complete. Please forgive the tone of the document, it is being adapted from my notes and it isn’t in conversational form. It’s likely that there may be some things that I should expand upon, so if anything needs clarification please feel free to ask in the comments.

Read more
comments

January 13, 2019

Postgres numeric overflow error with json
Recently I came across an error using postgres that stumped me for a while so I wanted to document it for next time. I was issuing an update statement to a table that had no numeric columns, but received the error: ERROR: value overflows numeric format. Not only did my statement not affect any numeric columns, the table itself didn’t have any numeric columns. The actual problem ended up being some malformed json that I was trying to insert into a jsonb column. The json had a value like 300e715100 which was actually part of a hashed string, but the json serializer I was using incorreclty identified it as a very large scientific notiation number, and so did not quote it. Because postgres cannot deal with a number that large it throws the error. Quoting the value properly fixed the problem. I also want to note that the error would not happen with json, only with jsonb because postgres is actually parsing the document.
```
select '{"v": 300e715100}'::jsonb;
-- ERROR:  value overflows numeric format
-- LINE 1: select '{"v": 300e715100}'::jsonb

select '{"v": 300e715100}'::json;
-- this statement executes without error.
```
You can try the code and see the error for yourself here: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=1584330f148ab0e9ed72529dfb466a12

comments
August 6, 2017

My PostgreSQL wishlist

Notes on PostgreSQL Explain Analyze

Postgres pivot table using JSON

Disconnected API Responses

Postgres numeric overflow error with json