• My PostgreSQL wishlist

    If you know me you’re aware that I love SQL and relational databases. Of all of the databases I have used, PostgreSQL is by far my favorite. SQL is the one technology I have seen in common across every job I have had over the past two decades. I would bet you could say the same - even if you aren’t writing it directly (which would be a shame in my opinion), it is still there, behind the scenes, handling all of your data. But if you’re fortunate enough to work directly with SQL and a database, it’s hard to find a better tool.

    But while I have never seen a better database than PostgreSQL (and you can’t beat the price!), that doesn’t mean that there aren’t things that could be improved. My goal with this post is to start a conversation and throw out some ideas that I have gathered over the years of things I yearned for. Best case scenario, someone tells me that these things already exist and I learn something new! Second best, these ideas resonate with others and we find ways to get some of these in future versions. But I would also love any conversation about why any of these are bad ideas or technically infeasible - I’ll learn something then too. And if anyone has ideas about how I can contribute to these things directly, please let me know!

    Read more

    comments

  • Notes on PostgreSQL Explain Analyze

    One thing you’ll want to learn if you use PostgreSQL for any length of time is how to use EXPLAIN. At my job at Vendr, like my previous roles, we are no exception. The good and bad thing is that in many cases you can go pretty far before you start having issues. But that means that not everyone has had the opportunity to learn the dark art of reading explain output. And as great as PG is, it is not as user friendly in this area as other RDBMs like MSSQL. But with this primer, a few free online tools, and a little bit of time, you can quickly learn how to think about the performance of your queries and where to start when optimizing.

    Read more

    comments

  • Postgres pivot table using JSON

    Something I have a need to do often but can be difficult to do at times in SQL is to create a pivot table. As an example imagine wanting to see customers and their revenue by month. It is straightforward to create a normal data set where the dates are the rows and you have a revenue amount for each. Something like this:

    dte total
    2022-01-01 22030
    2022-02-01 22753
    2022-03-01 0
    2022-04-01 9456
    2022-05-01 7798
    2022-06-01 38278
    2022-07-01 18736
    2022-08-01 6794
    2022-09-01 21033
    2022-10-01 28576
    2022-11-01 10172
    2022-12-01 41901

    But you quickly come up to two obstacles as you try to take it further - you either want to have the months as columns like this:

    jan feb mar apr may jun jul aug sep oct nov dec
    22030 22753 0 9456 7798 38278 18736 6794 21033 28576 10172 41901

    or you want to see multiple customers, which as a column can be difficult, or even harder is having the months as columns and the customers as rows:

    cus_id jan feb mar apr may jun jul aug sep oct nov dec
    1 0 10170 0 5399 0 14821 7927 0 14 15466 3675 14447
    2 22030 12583 0 4057 7798 23457 10809 6794 21019 13110 6497 27454

    The term for this is pivot table - which is something you may have done many times in Excel or other spreadsheet application.

    But this is difficult in SQL, because SQL requires you to have a static column list. You can’t ask SQL to give you whatever columns are necessary, you must declare them in your query. (SELECT * may seem like an exception to this rule, but in this case the SQL engine still knows what the columns are going to be before the query is executed).

    Luckily Postgres gives you a way around this. If you want actual columns you still have to specify them, but the hard part of aggregation into these pivoted columns and rows are made much easier. The key is using the JSON functionality, which allows you to represent complex values in a single cell. It allows you to aggregate the values into what really represents multiple values, and then pull them back apart after the fact.

    Here is an example of what this looks like:

    WITH columns AS (
      SELECT
           generate_series dte
         , customer_id
      FROM generate_series('2022-01-01'::date, '2022-12-31', '1 month')
      CROSS JOIN
        (
          SELECT DISTINCT
            customer_id
          FROM
            invoice
        ) AS customers
    ), data AS (
      SELECT
          columns.dte
        , columns.customer_id
        , SUM(COALESCE(invoice.amount, 0)) total
      FROM columns
      LEFT OUTER
        JOIN invoice
        ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
        AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
        AND columns.customer_id = invoice.customer_id
      GROUP BY
          columns.dte
        , columns.customer_id
    ), result AS (
      SELECT
        customer_id
        , JSONB_OBJECT_AGG(TO_CHAR(dte, 'YYYY-MM'), total) pivotData
      FROM
        data
      GROUP BY
        customer_id
    )
    SELECT
        customer_id
      , (pivotData->>'2022-01') "jan"
      , (pivotData->>'2022-02') "feb"
      , (pivotData->>'2022-03') "mar"
      , (pivotData->>'2022-04') "apr"
      , (pivotData->>'2022-05') "may"
      , (pivotData->>'2022-06') "jun"
      , (pivotData->>'2022-07') "jul"
      , (pivotData->>'2022-08') "aug"
      , (pivotData->>'2022-09') "sep"
      , (pivotData->>'2022-10') "oct"
      , (pivotData->>'2022-11') "nov"
      , (pivotData->>'2022-12') "dec"
    FROM
      result
    ORDER BY customer_id
    

    Which gives results like we were after above.

    If you would like to see how this is done, I have an interactive fiddle you can play with that shows you step by step how each of these parts work:

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=39e115cb8afd6e62c0101286ecd08a3f

    This example is using PG 15, but this functionality works all the way back to PG 9.5.

    This query is also a great example of using generate_series to generate a set of data to join against, so that you can find any holes and represent all the data points you need (months in this case), even if there is no actual data for that point.

    In conclusion, the JSON functionality built into todays relational databases are more than just schemaless data stores and complex values in cells. They can also be powerful as intermediate steps to help you manipulate and transform your data in useful ways.

    comments

  • Disconnected API Responses

    An alternate title for this post might be Processing API Requests with a Queue. We recently had a project where we were expecting a burst of high traffic and heavy load on some API endpoints. We wanted to make sure that we could handle all of the traffic, even if the processing time was affected - dropped requests were not an option. After doing quite a bit of research this post is what we came up with. In the end this strategy worked well for our purposes, but we did identify some ways that we would improve it in the future.

    Also this same strategy will work for any long-running api request. Things like reporting for example, where you need to be able to make a request but it may take a very long (and possibly indeterminate) amount of time to complete. Please forgive the tone of the document, it is being adapted from my notes and it isn’t in conversational form. It’s likely that there may be some things that I should expand upon, so if anything needs clarification please feel free to ask in the comments.

    Read more

    comments

  • Postgres numeric overflow error with json

    Recently I came across an error using postgres that stumped me for a while so I wanted to document it for next time. I was issuing an update statement to a table that had no numeric columns, but received the error: ERROR: value overflows numeric format. Not only did my statement not affect any numeric columns, the table itself didn’t have any numeric columns. The actual problem ended up being some malformed json that I was trying to insert into a jsonb column. The json had a value like 300e715100 which was actually part of a hashed string, but the json serializer I was using incorreclty identified it as a very large scientific notiation number, and so did not quote it. Because postgres cannot deal with a number that large it throws the error. Quoting the value properly fixed the problem. I also want to note that the error would not happen with json, only with jsonb because postgres is actually parsing the document.

    select '{"v": 300e715100}'::jsonb;
    -- ERROR:  value overflows numeric format
    -- LINE 1: select '{"v": 300e715100}'::jsonb
    
    select '{"v": 300e715100}'::json;
    -- this statement executes without error.

    You can try the code and see the error for yourself here: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=1584330f148ab0e9ed72529dfb466a12

    comments