Working with Amazon Aurora PostgreSQL: dag, standby rebooted again!

by Michael Vitale
Date 28th April 2020
Categories Aurora, Database, postgres, postgresql

While I continue to be amazed at how fast PG runs on Amazon Aurora (no WAL logging) or how fast I can create a snapshot or standby (shared storage), there are always a few clouds around. Today the cloud I’m lookin’ at is replica reboots. Every week or so, one of my standbys gets rebooted and I get notified. It is always due to the same reason: “reboot due to slave lagging.” Apparently, aurora will automatically restart a standby if it gets more than 10 seconds behind the primary, that’s right just 10 seconds. And it is not configurable, burned into the aurora guts.

And so I was told by the aurora gods that I might have to upgrade the instance on my primary and standbys. I rebutted to them that very infrequently do we get these bursts of writes from the client or maintenance scripts executing on schedule that might do some vacuum freezes, pg_repack actions, etc. So why would I want to upgrade my instances just to account for some infrequent, heavy write loads? No answer from the aurora gods, just silence.

Now combine this with what happens when a replica gets rebooted (it loses its monitoring stats – What happened to the stats?), and now we got a dba script that I have to run to do a vacuum analyze on every table in every schema in every database in my cluster whenever an important standby used for application queries goes down.

But will I forsake Aurora or give up on the Aurora gods? No way! They just got me excited again with in-place major upgrades introduced in March/April 2020!

Michael Vitale, Team Elephas

Menu

Contact Info

Folow us on social

Working with Amazon Aurora PostgreSQL: dag, standby rebooted again!

Leave a comment

Cancel reply

Categories

Recent Blog

ARCHIVES