postgres on Nicola Iarocci

pg_dump and pg_restore can backup and restore single Postgres schemas

Tue, 30 Jan 2024 15:56:15 +0100

Today I learned that pg_dump can make a copy of a Postgres schema instead of the whole database. Likewise, if needed, pg_restore can restore the schema in either the original database or a different one.

Backup of a Postgres schema:

pg_dump -h host -d source_database -U user -n schema_name -F c -f schema_dump_file.dump

Restore of a Postgres schema:

pg_restore -h host -d dest_database -U user -n schema_name schema_dump_file.dump

All primary and shared knowledge, I am sure. Another item of note: the DB name is hard-coded in pg_dumpall files, so if one wants to restore on a database named differently, one must go into the dump file and edit it by hand.

pg_rman: a backup and restore management tool for PostgreSQL

Tue, 09 Jan 2024 09:19:48 +0100

The goal of the pg_rman project is to provide a method for online backup and PITR that is as easy as pg_dump. Also, it maintains a backup catalog per database cluster. Users can maintain old backups including archive logs with one command.

We’ve always been doing our Postgres backups the rudimentary way via pg_dumpall, which works and is purely logical (one can restore across different Postgres versions), but pg_rman maintains a catalog and has point-in-time recovery.

I might want to look into it at some point.

Awesome psql tips

Thu, 23 Feb 2023 07:05:25 +0100

Today I learned about psql-tips.org by Lætitia Avrot, an excellent repository of psql (the CLI tool, not the database itself) tips. I like how one randomized tip is playfully served on the home page while the complete list is always at hand.

Automatic deletion of older records in Postgres

Sun, 16 Jan 2022 07:05:25 +0100

We have a Postgres cluster with a database for each user. Each database has a table that records events, and we want this table to only record the last 15 days.

If we were on MongoDB, we could use a capped collection, but we are in Postgres, which does not have equivalent functionality. In Postgres, you have to make do with something homemade. My first idea was to install a cron job in the system. It would execute daily, deleting older events in each user database.

Before jumping in, I looked at what others do. One surprising popular approach appears to be using triggers: older ones are pruned when a new row is inserted in the table. For my use case, this seems unnecessarily taxing to the system. I’m happy with running a single maintenance task late in the night when the system is underused. Two other solutions are pgAgent or pg_cron. pgAgent is an external UI tool, so hard No to pgAgent from me. pg_cron essentially offers cron jobs baked into the database, which is good, but, like pg_Agent, it requires you to install the tool itself, create a Postgres extension, change Postgres configuration, optionally grant usage to a schema, etc. There’s also partitioning, which seems way overboard for our use case. It seems that these approaches demand new dependencies and unnecessary work for something I can easily accomplish by leveraging what is already available, for free, in the system. So, it’s good old Linux cron jobs for me.

Because we run Postgres in Docker, a little more work is involved, but overall, the solution is pretty straightforward. Here is the script I came up with:

readonly CONTAINER=$(docker ps -q -f name=postgres)
readonly USER=dbuser
docker exec -i $CONTAINER \
    psql -U $USER -d postgres -t \
        -c "select datname from pg_database where datname like 'cus_%'" | \
    xargs -n 1 -I"{}" docker exec -i $CONTAINER psql -U $USER -d {} -t \
        -c "delete from mytable where datetime < now() - interval '15 days'"

First, we find the id of the docker container (it runs in a swarm, so we don’t have exact container names). We need it because we want to execute psql from within the container. The second row sets a Postgres user other than the default one. That’s because we don’t allow default user logins. Then comes the Linux pipeline. First, we execute a query that returns all user database names; then, we pipe those into the query that deletes obsolete rows. The query executes against each target database. The script is then installed as a cron job with something similar to:

0 4 * * * /home/user/dir/cleanup.sh

Which ensures the script runs daily at 4 am. For a weekend task, I’m happy with the result.

How to restore a single Postgres database from a pg_dumpall dump

Wed, 25 Aug 2021 07:05:25 +0100

Today I learned how to restore a single Postgres database from a global dump generated with pg_dumpall. Now, pg_dumpall is handy when you want to back up an entire Postgres cluster. It will dump all databases and global objects in a single text file. In contrast, pg_dump, the go-to tool for Postgres backups, offers more control but only works with a single database and doesn’t dump global objects, such as the roles/users linked to the database.

The problem with pg_dumpall comes when you want to restore just one database from the dump file. That’s not supported out of the box, but it is achievable with some tinkering.

The pg_dumpall dump is a plain text file that contains all the SQL commands needed to restore the cluster. All database instructions are there as well; we only need to extract them. Say we have one “mydb” database that we need to retrieve. Open the dump file and look for a string starting with “connect mydb”. That’s where our database instructions begin. Then look for the first occurrence of “PostgreSQL database dump complete”. That’s where the instructions end. This script, which I have to say makes clever use of sed, will do just that for us:

#!/bin/bash
[ $# -lt 2 ] && { echo "Usage: $0  "; exit 1; }
sed  "/connect.*$2/,\$!d" $1 | sed "/PostgreSQL database dump complete/,\$d"

The output will be to STDOUT; we want to pipe it into a file. If we named the script pg_extract.sh, as I did, we’d do:

./pg_extract.sh dumpall.sql mydb >> mydb.dump

Now we have the specific DB dump, and we can restore it like this:

psql (connection options) mydb < mydb.dump

If the database still exists on the cluster, we first want to drop it, or we’ll only get error messages:

psql (connection options) -d postgres -c "DROP DATABASE IF EXISTS mydb"
psql (connection options) -d postgres -c "CREATE DATABASE mydb"

DROP DATABASE will fail if there are active connections. Either force-drop all active connections or tell your peers to leave the database alone. Merging the above passages in a script is an option.

Cleaning Up Your Postgres Database

Tue, 09 Mar 2021 07:05:25 +0100

I am an application/backend developer who has to quibble with databases more often than desired. I can get my way around Postgres pretty well, but I can always use a hint or two, especially when it comes to fine-tuning and performance.

I stumbled upon Cleaning Up Your Postgres Databases. It offers useful advice on spotting performance bottlenecks in your Postgres database. Take the cache and index hit queries, for example.

The first thing you’re going to want to look at is your cache hit ratio and index hit ratio. Your cache hit ratio is going to give the percentage of time your data is served from within memory vs. having to go to disk. Generally serving data from memory vs. disk is going to orders of magnitude faster, thus the more you can serve from memory the better. For a typical web application making a lot of short requests I’m going to target > 99% here.

I will be trying them real soon. Like, today.