- Denish Patel
- 6th July 2018
- auditing, Database, performance, postgresql
Recently, someone reached out to me asking “what’s the best way to achieve database DML auditing in Postgres?”
I have suggested a couple of options below to achieve DML auditing in Postgres:
- Audit using Postgres logs. Postgres allows to keep track of DML statements at database level by enabling auditing directly in the postgres logs. You can use https://github.com/pgaudit/pgaudit extension to make the audit logging consistent and readable.
- Audit tables using trigger based approach by creating audit schema on live database and keep the audit tables updated through trigger.
Both of these approaches have pros and cons.
If you are looking for detailed auditing (including SELECT) at database level, you can use pgaudit extension. However, you will have to deal on how to make auditing data queryable for end users. Additionally, you have to enable at database level instead of specific tables.
On the other hand, if you are only concerned about auditing DML and for specific tables and even further if you want to optimize at column level, trigger based approach is your answer. However, you have to deal with audit schema growth . However, audit tables are readily available to query without any further processing or tools. If you are using exact schema for audit tables, the trigger based approach requires to change schema on audit tables when you change live schema so there will be additional overhead managing audit schema. However, if you use Postgres JSONB column to keep track of auditing, you can come over overhead of schema changes on audit tables with live tables.
In this post, I have explored a way to use JSONB data type to store auditing details using trigger based approach.
Let’s create sample USERS table:
create schema if not exists public; create table public.users( userid serial primary key, username text not null, first_name text not null, last_name text not null, city varchar(30) not null, state varchar(30) not null, email text not null, phone varchar(30), is_like_sport boolean default false );
Create audit schema and table to keep track of changes in USERS_AUDIT table. As you can see below, the before and after columns are JSONB.
create schema if not exists audit; create table audit.users_audit( audit_ts timestamptz not null default now(), operation varchar(10)not null, username text not null default "current_user"(), before jsonb, after jsonb );
Create a trigger function and trigger on USERS table to keep track of audit changes
CREATE OR REPLACE FUNCTION public.users_audit_trig() RETURNS trigger LANGUAGE plpgsql AS $function$ begin IF TG_OP = 'INSERT' THEN INSERT INTO audit.users_audit (operation, after) VALUES (TG_OP, to_jsonb(NEW)); RETURN NEW; ELSIF TG_OP = 'UPDATE' THEN IF NEW != OLD THEN INSERT INTO audit.users_audit (operation, before, after) VALUES (TG_OP, to_jsonb(OLD), to_jsonb(NEW)); END IF; RETURN NEW; ELSIF TG_OP = 'DELETE' THEN INSERT INTO audit.users_audit (operation, before) VALUES (TG_OP, to_jsonb(OLD)); RETURN OLD; END IF; end; $function$ ; CREATE TRIGGER users_audit_trig BEFORE INSERT OR UPDATE OR DELETE ON public.users FOR EACH ROW EXECUTE PROCEDURE public.users_audit_trig();
I have used https://mockaroo.com/ for generating mock data for users table.
Copy mock data into users table
app=# copy users from '/usr/local/var/postgres/mockdata/MOCK_DATA.csv' with header CSV; COPY 1000
To see how Mockroo is generating mock data, let’s query userid=101
app=# select * from users where userid=101; -[ RECORD 1 ]-+------------------------- userid | 101 username | resslement2s first_name | Rudiger last_name | Esslement city | El Paso state | Texas email | resslement2s@auda.org.au phone | 915-528-7033 is_like_sport | t
The audit table for userid=101 row looks like …
app=# select * from audit.users_audit where after->>'userid'='101'; -[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- audit_ts | 2018-07-05 14:39:06.960812-04 operation | INSERT username | denishpatel before | after | {"city": "El Paso", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Texas", "userid": 101, "username": "resslement2s", "last_name": "Esslement", "first_name": "Rudiger", "is_like_sport": true}
INSERT statement audit is straight forward to query.
Let’s update city and state column where userid=101
app=# update users set city='Baltimore',state='Maryland' where userid=101; UPDATE 1
You want to see only changed columns from before and after columns. Let’s create utility function jsonb_diff function to show difference between two JSONB values..
CREATE OR REPLACE FUNCTION audit.jsonb_diff(l JSONB, r JSONB) RETURNS JSONB AS $json_diff$ SELECT jsonb_object_agg(a.key, a.value) FROM ( SELECT key, value FROM jsonb_each(l) ) a LEFT OUTER JOIN ( SELECT key, value FROM jsonb_each(r) ) b ON a.key = b.key WHERE a.value != b.value OR b.key IS NULL; $json_diff$ LANGUAGE sql;
Now, query audit table to see UPDATED values..
app=# select after->>'userid' as userid , audit.jsonb_diff(before,after) as before_change , audit.jsonb_diff(after,before) as after_change from audit.users_audit where operation='UPDATE'; -[ RECORD 1 ]-+-------------------------------------------- userid | 101 before_change | {"city": "El Paso", "state": "Texas"} after_change | {"city": "Baltimore", "state": "Maryland"}
Postgres allows to create indexing on entire JSONB columns as well to specific key in the JSONB.
let’s create some useful indices..
app=# create index idx_users_audit_audit_ts_operation on audit.users_audit (audit_ts desc,operation); CREATE INDEX app=# create index idx_gin_users_audit_before on audit.users_audit using GIN(before); CREATE INDEX -- create index on entire after JSONB object app=# create index idx_gin_users_audit_after on audit.users_audit using GIN(after); CREATE INDEX -- create index on userid key from after object app=# create index idx_gin_users_audit_after_userid on audit.users_audit using GIN ((after->'userid')); CREATE INDEX
The audit tables grows really fast so you want to partition audit tables. Postgres 10 makes table partitioning much easier to manage so you should convert audit tables into partition tables to keep only X months of data in production database and archive rest of partitioned table off to cheaper storage i.e Amazon S3 or Glacier. The upcoming Postgres 11 release includes with even more partitioning performance features so they will be helpful to keep up DB performance with users and database growth.
Hope it will be helpful to someone architecting audit schema in Postgres.
Any suggestions/ comments welcome.
Leave a comment
Categories
Recent Blog
- Do you really need a DBA?
- Running VACUUM FULL in Non-blocking Mode
- Connection Scaling
- PSQL Helper: Managing Connections and Simplifying Queries
- Vacuum Those MVs!
- Working With Repmgr: Using Other 3rd Party Tools for Setting up a Standby
- Working with Amazon Aurora PostgreSQL: dag, standby rebooted again!
- Working with Amazon Aurora PostgreSQL: what happened to the stats?
- How to set application_name for psql command line utility?
- Is there a limit on number of partitions handled by Postgres?
I think another option since Postgres 10 is to use an output plugin for logical decoding. That should not have any performance overhead.
It is convenient to use the enum type for the “operation” field of the above audit table; doing so saves space. It should work faster for searching, too.