Contact Info

Folow us on social

Working with Amazon Aurora PostgreSQL: dag, standby rebooted again!

  • Michael Vitale
  • 28th April 2020
  • Aurora, Database, postgres, postgresql

While I continue to be amazed at how fast PG runs on Amazon Aurora (no WAL logging)  or how fast I can create a snapshot or standby (shared storage),  there are always a few clouds around.  Today the cloud I’m lookin’ at is replica reboots.  Every week or so, one of my standbys gets rebooted and I get notified.  It is always due to the same reason: “reboot due to slave lagging.”  Apparently, aurora will automatically restart a standby if it gets more than 10 seconds behind the primary, that’s right just 10 seconds.  And it is not configurable, burned into the aurora guts.

And so I was told by the aurora gods that I might have to upgrade the instance on my primary and standbys.  I rebutted to them that very infrequently do we get these bursts of writes from the client or maintenance scripts executing on schedule that might do some vacuum freezes, pg_repack actions, etc.  So why would I want to upgrade my instances just to account for some infrequent, heavy write loads?  No answer from the aurora gods, just silence.

Now combine this with what happens when a replica gets rebooted (it loses its monitoring stats – What happened to the stats?), and now we got a dba script that I have to run to do a vacuum analyze on every table in every schema in every database in my cluster whenever an important standby used for application queries goes down.

But will I forsake Aurora or give up on the Aurora gods?  No way!  They just got me excited again with in-place major upgrades introduced in March/April 2020!

Michael Vitale, Team Elephas


Leave a comment

Your email address will not be published. Required fields are marked *