Recovering Postgres database from disk level corruption!!

by Denish Patel
Date 9th September 2016
Categories postgresql

A couple of weeks ago, I had to deal with corrupted Postgres database cluster. At the end, we couldn’t able to recover some of the data but managed to recover most part of it. Having experience working with dozens of database systems, I’m pleasantly surprised to experience resiliency of Postgres database.

Kudos to Postgres Development team for building the most resilience database in the world 🙂

Here is my Postgres database recovery story 🙂

Disclaimer:

I'm posting the steps carried out during the recovery process for information purpose only. This post doesn't provide any guarantee that it will work for your use-case and/or environment.

Note: The actual database name has been replaced with “dbname” and actual table names with “tablename”.

One of the Postgres DB cluster database experienced disk level corruption thus we were hitting this error:

postgres=# \c dbname
FATAL: could not read block 0 in file "base/16389/11930": Input/output error

Uh oh?? Really bad, isn’t it? Fortunately, it wasn’t mission critical system so we managed to take extended outage and work on partial recovery process because we didn’t want to loose all the data!!

Once we received the complaint, we immediately backed up corrupted database and created recovery cluster to bring up on different server so we can go through recovery efforts!!

Trial 1:

As many of you know, the first option is to bring up recovery database cluster with zero_damaged_pages=on . You can set the value in Postgres config file and try to reindex system catalog:

reindexdb -p 5433 --system dbname
reindexdb: could not connect to database dbname: FATAL: index "pg_index_indexrelid_index" contains unexpected zero page at block 0 
HINT:  Please REINDEX it.

Doh! Still, we could still not be able to connect to database !!

Trial 2:

If you aren’t aware, you should note down that there is a way to ignore indexes at system level. We started up recovery cluster with ignore_system_indexes=true setting:

pg_ctl -D /data -o '-c ignore_system_indexes=true'
restarted

dbname=# \c dbname

Yay! I could able to connect to DB now!

Trial 3:

Let’s try to reindex the database…

dbname=# reindex database "dbname";
NOTICE: table "pg_catalog.pg_class" was reindexed
2016-08-22 15:53:14.179 PDT rhost=[local] app=psql:user=postgres:db=dbname:ERROR: could not create unique index "pg_statistic_relid_att_inh_index"
2016-08-22 15:53:14.179 PDT rhost=[local] app=psql:user=postgres:db=dbname:DETAIL: Key (starelid, staattnum, stainherit)=(2608, 5, f) is duplicated.
2016-08-22 15:53:14.179 PDT rhost=[local] app=psql:user=postgres:db=dbname:STATEMENT: reindex database "dbname";
ERROR: could not create unique index "pg_statistic_relid_att_inh_index"
DETAIL: Key (starelid, staattnum, stainherit)=(2608, 5, f) is duplicated.

As the table is corrupted with duplicate entries, let’s find out and fix them.

dbname=# select starelid, staattnum, stainherit from pg_catalog.pg_statistic where starelid=2608 order by 2;
starelid | staattnum | stainherit
----------+-----------+------------
2608 | 1 | f
2608 | 2 | f
2608 | 3 | f
2608 | 4 | f
2608 | 5 | f
2608 | 5 | f
2608 | 6 | f
2608 | 7 | f
(8 rows)

Let’s remove one of the entry based on XMIN :

dbname=# delete from pg_catalog.pg_statistic where starelid=2608 and staattnum=5 and xmin=1228447;
DELETE 1

Trial 4:

Restart REINDEX but it failed again!!

2016-08-22 16:01:29.698 PDT rhost=[local] app=psql:user=postgres:db=dbname:ERROR: 1 constraint record(s) missing for rel tablename
2016-08-22 16:01:29.698 PDT rhost=[local] app=psql:user=postgres:db=dbname:STATEMENT: reindex database "dbname";
ERROR: 1 constraint record(s) missing for rel tablename

Trial 5:

Let’s try to vacuum analzye the table

dbname=# vacuum analyze tablename;
2016-08-22 16:04:01.282 PDT rhost=[local] app=psql:user=postgres:db=dbname: 1 constraint record(s) missing for rel tablename
2016-08-22 16:04:01.282 PDT rhost=[local] app=psql:user=postgres:db=dbname:STATEMENT: vacuum analyze tablename;
ERROR: 1 constraint record(s) missing for rel tablename

hrm…it’s still complaining about constraint

Trial 6:

let’s disable constraint check….

dbname=# update pg_class set relchecks=0 where relname='tablename';
UPDATE 1
The above update fixed the  the constraint error

Trial 7:

Let’s reindex the database again!

dbname =# reindex database "dbname";

Yay, Reindex is successful.

Once the reindex is successfully completed, we restarted recovery cluster without zero_damaged_page and ignore_system_indices settings.

Partial tables recovery through pg_dump process:

As the database is corrupted, it makes sense to kick off the pg_dump on the database … we kicked off the pg_dump but it was still showing some of the sequences with errors!!

/usr/lib/postgresql/9.4/bin/pg_dump dbname -p 5433 -Fc >recovery_dbname.dmp
2016-08-22 16:22:09.517 PDT rhost=[local] app=pg_dump:user=postgres:db=dbname:ERROR: invalid page in block 0 of relation base/16389/2825248
2016-08-22 16:22:09.517 PDT rhost=[local] app=pg_dump:user=postgres:db=dbname:STATEMENT: SELECT sequence_name, start_value, increment_by, CASE WHEN increment_by > 0 AND max_value = 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0 AND min_value = -9223372036854775807 THEN NULL ELSE min_value END AS min_value, cache_value, is_cycled FROM XX_id_seq
pg_dump: [archiver (db)] query failed: ERROR: invalid page in block 0 of relation base/16389/2825248
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value, increment_by, CASE WHEN increment_by > 0 AND max_value = 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0 AND min_value = -9223372036854775807 THEN NULL ELSE min_value END AS min_value, cache_value, is_cycled FROM XX_id_seq

We had issue recovering a couple of tables but we managed to recover the most of the tables in the database !

This was our Postgres database recovery story.

Hopefully, it will help someone in case they fall into corruption situation. Any feedback or other ideas are welcome 🙂

Join the conversation

4 Comments

Dinesh Bhandary says:

12th September 2016 at 3:11 pm

would the disk level corruption be mitigated by having a replicated instance? It seems like in your case that was not an option.

Reply
1. Denish says:
  
  19th September 2016 at 2:01 pm
  
  We didn’t have replicated instance but it might not have helped you because it could be possible block level corruption and it could have replicated there as well?
  
  Reply
Michael Vitale says:

17th September 2016 at 1:15 pm

I have used another technique to recover from corruption but it assumes you have saved the current DDL for the corrupted database. If you don’t have it, no use reading on…

The general technique is:

1. Create the skeleton database in the same cluster where the corrupted database exists.
2. For each table in the skeleton database, get its relfilename.
3. For each table in the corrupted database, get its relfilename.
4. Copy over the skeleton relfilename, the contents of the relfilename in the corrupted database. Of course make sure the Cluster is shut down before doing this.

If that doesn’t work, and you don’t have an identical database DDL file somewhere, there is nothing more that can be done. If you do have a database DDL file that contains exactly the DDL used in your corrupted database, then you might be able to still do partial recovery.

The general technique is:

Create the skeleton database in the same cluster where the corrupted database exists.
For each table in the skeleton database, get its relfilename.
For each table in the corrupted database, get its relfilename.
copy over the skeleton relfilename, the contents of the relfilename in the corrupted database. Of course make sure the Cluster is shut down before doing this.

Caution (right from the pg docs)
Note that while a table’s filenode often matches its OID, this is not necessarily the case; some operations, like TRUNCATE, REINDEX, CLUSTER and some forms of ALTER TABLE, can change the filenode while preserving the OID. Avoid assuming that filenode and table OID are the same. Also, for certain system catalogs including pg_class itself, pg_class.relfilenode contains zero. The actual filenode number of these catalogs is stored in a lower-level data structure, and can be obtained using the pg_relation_filenode() function.

After restarting the new cluster with the filenodes from the corrupted cluster, you may still have to do some of the stuff that Denish identified above.

Reply
1. Denish says:
  
  19th September 2016 at 2:04 pm
  
  @Michael Interesting technique. I will have to check out next time, i will be in similar situation. Thanks for posting here so i can refer in future!
  
  Reply

Menu

Contact Info

Folow us on social

Recovering Postgres database from disk level corruption!!

Trial 1:

Trial 2:

Trial 3:

Trial 4:

Trial 5:

Trial 6:

Trial 7:

Partial tables recovery through pg_dump process:

Join the conversation

Leave a Reply to Dinesh Bhandary Cancel reply

Leave a comment

Categories

Recent Blog

ARCHIVES

Menu

Contact Info

Folow us on social

Search

Trial 1:

Trial 2:

Trial 3:

Trial 4:

Trial 5:

Trial 6:

Trial 7:

Partial tables recovery through pg_dump process:

Join the conversation

Leave a Reply to Dinesh Bhandary Cancel reply

Leave a comment

Categories

Recent Blog

ARCHIVES

Want to get started ? Email us info@elephas.io