PostgreSQL Backups - COPY vs INSERT
I wrote some scripts to backup and restore the data of Postgres tables. And normally everything was fine initially with Inserts, but as the size of the table increases, the restore took much more longer than tolerable. For example, one local restore action of Home Assistant DB took approximately 4 hours! To manage this I resorted to COPY, which reduced the time back to less than a few minutes.
And that was the happy end of a story, until a few days ago, some of my peers started to encounter strange issues, where they may be able to create rows, but can modify them. (Debugging the behavior to this level took some time of it's own.) Worse part was these applications were not developed by us, so diving into these issues took more time. Thankfully the issue happened in staging environment.
It took me sometime to remember that why COPY is faster than INSERT, because COPY skips some steps to be faster than INSERT. And many tables have an auto increment
column which was based on INSERT updating the counter (infact no triggers are invoked) to reflect the correct value. This critical step is skipped by COPY. Thus, if table was restored using COPY with 10 rows, auto increment
will still be set to initial value of `1` and thus whenever we try to create a new row, it'll clash and thus fail to create a row.
And there is a very simple solution for this, simply alter the sequence like below:
alter sequence some_id_seq restart with new_value;
Here is reference for the SQL above. And this fixes everything.