- Denish Patel
- 20th August 2009
- postgresql
One of the key features any enterprise considers when choosing a database technology for their architecture solution stack is that of replication. Oracle and MySQL both have built in replication solutions, but as of yet PostgreSQL doesn’t support a built in replication solution. There are many replication solutions available however, and different companies are using different solutions customized for their needs.
Among all of the solutions, Slony is probably the most widely tested and deployed within organizations, although it does have the following limitations:
- Replicated tables must have a unique or primary key
- It does not support replication of large objects
- Schema changes are not propagated (though they can be coordinated)
- It does not support synchronizing databases outside of replication
- There are limitations on version compatability; you can not replicate from PostgreSQL 8.2 to PostgreSQL 8.4 for example
- It is more difficult to set up than many other replication solutions
One new alternative to Slony is a project known as RubyRep, which is designed to avoid some of the limitations of Slony. RubyRep provides both master-slave and master-master replication, and it works for PostgreSQL as well as MySQL. It is currently developed by Arndt Lehmann, a German who has been living since 2001 in Tokyo, Japan. He also provides great support to the RubyRep mailing list, especially for adding new features or fixing bugs.
RubyRep always operates on two databases. To make it simple to understand, the databases are referred to as “left” and “right” database respectively.
RubyRep’s key features includes:
- Simple configuration, complete setup can be done via single configuration file.
- Simple Installation, if you have a JVM installed, then you just have to download and extract the files.
- Platform Independent, it runs on Unix and Windows platform.
- Table Design Independent, meaning that all commands work on tables no matter if they have a simple primary key (all data types acceptable), a combined primary key, or no primary key at all. It successfully processes multi-byte texts and “big” data types
- It replicates tsvector datatype
In addition to the above, RubyRep actually provides three tools in one; a Compare, Sync, and Replication tools.
Compare
This tool scans corresponding tables of left and right database, looking for diverging data. Key features of the comparison tool are:
- Different output modes, from a count of differences to full row dumps.
- Low bandwidth mode available, reducing the number of round-trips so only actual differences go through the network.
- A progress bar with estimated remaining amount of work.
- Server load is targeted toward only the “right” database server.
In one test we ran, we compared two 50 million row tables in around 3 hours, without affecting production server load. This is accomplished by comparing rows in batches, and you can adjust the batch size in the configuration file.
Sync
The sync tool is used to synchronize data in corresponding tables of a left and right pair of databases. Key features of the sync tool are:
- All features of the Compare tool also apply to syncs
- Automatically orders table syncs to avoid foreign key conflicts.
- You can configure the Sync policy to ignore deletes in left database, or to ignore creating records in right database, and other such combinations
- Provides two prebuilt conflict resolution methods, either left db wins or right db wins
- Custom conflict resolution methods specifiable via ruby code snippets
- Merge decisions can optionally be logged in the rubyrep event log table.
Replicate
Of course RubyRep also provides a replication tool. Some of the key features of the replication tool include:
- Automatically sets up all necessary triggers, log tables, etc.
- Automatically discovers newly added tables and synchronizes the table content
- Automatically reconfigures sequences to avoid duplicate key conflicts
- Tracks changes to primary key columns
- Can implement either master-slave or master-master replication
- Prebuilt conflict resolution methods available include left or right wins, or earlier, later change wins
- Custom conflict resolution specifiable via ruby code snippets
- Replication decisions can optionally be logged in the rubyrep event log table
One of the problems common to replication solutions is that of setting up new nodes. With Slony, there are always some headaches caused by high load on master database server, as a result of the TRUNCATE/COPY cycle Slony goes through. In the case of RubyRep, most of the CPU load is on the slave server, and you can use the Sync command in advance before you start replicating database. RubyRep also provides some flexibility to ignore the Sync commands if you don’t want to sync the database again.
RubyRep in action…
- Help
; ./rubyrep --help
Usage: ./bin/rubyrep [general options] command [parameters, ...]
Asynchronous master-master replication of relational databases.
Available options:
--verbose Show errors with full stack trace
-v, --version Show version information.
--help Show this message
Available commands:
generate Generates a configuration file template
help Shows detailed help for the specified command
proxy Proxies connections from rubyrep commands to the database
replicate Starts a replication process
scan Scans for differing records between databases
sync Syncs records between databases
uninstall Removes all rubyrep tables, triggers, etc. from "left" and "right" database
; ./rubyrep generate pagila.conf
; cat pagila.conf
RR::Initializer::run do |config|
config.left = {
:adapter => 'postgresql', # or 'mysql'
:database => 'pagila,
:username => 'rubyrep',
:password => 'rubyrep',
:host => '192.168.0.1',
:port =>'5432'
}
config.right = {
:adapter => 'postgresql',
:database => 'pagila',
:username => 'rubyrep',
:password => 'rubyrep',
:host => '127,0.0.1',
:port => '5483'
}
config.include_tables 'users'
# config.include_tables /^e/ # regexp matching all tables starting with e
# config.include_tables /./ # regexp matching all tables in the database
end
; ./rubyrep scan -d=keys -b -c pagila.conf > users_diff.log
; cat users_diff.log
users users ......................... 1
---
:conflict:
- lastname: patel
zipcode: "2096"
ipaddress: 192.168.0.126
userid: 48212620
address: columbia
; ./rubyrep sync -c pagila.conf
By default, RubyRep runs in master-master replication mode, but you can adjust the following configuration setting to make it master-slave replication:
; cat pagila_replicate.conf
RR::Initializer::run do |config|
config.left = {
:adapter => 'postgresql',
:database => 'pagila,
:username => 'rubyrep',
:password => 'rubyrep',
:host => '192.168.0.1',
:port =>'5432',
:schema_search_path => 'public,pagila'
}
config.right = {
:adapter => 'postgresql',
:database => 'pagila',
:username => 'rubyrep',
:password => 'rubyrep',
:host => '127,0.0.1',
:port => '5483',
:schema_search_path => 'public,pagila'
}
config.include_tables /./ # regexp matching all tables in the database
config.options[:auto_key_limit] = 60
config.options[:adjust_sequences] = false
config.options[:sequence_increment] = 1
#Sync Policy: Changes in the right database will not be applied to the left database.
config.options[:right_record_handling] = :ignore
config.options[:sync_conflict_handling] = :left_wins
# Additional logging
config.options[:logged_replication_events] = [
:ignored_changes,
:ignored_conflicts
]
#ignore history tables
config.exclude_tables /_history/
config.exclude_tables 'test1'
config.exclude_tables 'pagila'
end
Detailed information for the each configuration setting can be found in the RubyRep documentation. There are also sample configuration files and a tutorial available too for getting familiar with each of the features RubyRep offers. Based on our initial testing, it should also be possible to upgrade some older PostgreSQL databases from 8.2 to 8.4.
Leave a comment
Categories
Recent Blog
- Do you really need a DBA?
- Running VACUUM FULL in Non-blocking Mode
- Connection Scaling
- PSQL Helper: Managing Connections and Simplifying Queries
- Vacuum Those MVs!
- Working With Repmgr: Using Other 3rd Party Tools for Setting up a Standby
- Working with Amazon Aurora PostgreSQL: dag, standby rebooted again!
- Working with Amazon Aurora PostgreSQL: what happened to the stats?
- How to set application_name for psql command line utility?
- Is there a limit on number of partitions handled by Postgres?
I hope it works. I need the same solution.
Hi Denish! The RubyRep allows DDL propagation? If i insert a new column in one table it replicates to another database with success? Thanks very much!
@abarrao No, it does not propagate DDL. you need to do it manually.
Denish, We have tried the search_path configoption but itkeeps returninga specific table not found even though we can see it exists in both of the databases