Feb 4, 2010

2 notes

Converting a Rails Database From MySQL to PostgreSQL

For a number of reasons, we want to convert one of ours Rails applications so that it is backed by PostgreSQL rather than MySQL. Although Rails and ActiveRecord abstract many of the differences between the two database engines, there are still some issues to be aware of and no one-size-fits-all solution to the problem. Here are some of the challenges we encountered in making the switch.

Converting the schema

The first thing in the conversion process is converting the schema. Fortunately, if you have a fairly simple schema, then Rails makes this relatively easy.

Look in config/environment.rb and make sure that config.active_record.schema_format is unset or set to :ruby. This is the default, but some projects change it to :sql for various reasons.

Then do rake db:schema:dump to generate db/schema.rb with your current database schema. Edit config/database.yml to use a PostgreSQL database instead of a MySQL one and do rake db:schema:load to load the schema in.

If your database migrations are simple, you might instead be able to do rake db:migrate to construct the Postgres database from scratch, but in my experience most Rails projects eventually end up with some kind of database-specific SQL in at least one migration.

If, on the other hand, your database schema is so complex that ActiveRecord’s schema dumps can’t handle it, you might consider looking at SQLFairy to help you manage the transition.

Converting the code

This is likely to be the trickiest part of the conversion process. You must go through your code and replace anything MySQL-specific with something PostgreSQL-specific or (better) database-agnostic.

Good test coverage of your application is definitely a huge help here: if you can run tests and fix bugs as they appear, you dramatically increase the likelihood of catching everything.

Here are some of the issues that we encountered in our codebase; of course, different applications and coding styles may reveal other issues.

Differences in quoting

MySQL allows you to quote table and column names with backquotes; PostgreSQL requires double-quotes. One of our tables has a column in it called when, which must be quoted whenever we use it.

Rails will of course handle the quoting for you if you do something like OurModel.find_by_when(Time.now). But if you are constructing your own SQL or conditions, you need to handle the quoting yourself. Given MySQL-compatible code like this:

OurModel.find(:conditions => ["`when` < ?", @yesterday])

You need to either use Postgres-specific code like this:

OurModel.find(:conditions => ["\"when\" < ?", @yesterday])

Or, better, convert it to something database-agnostic:

OurModel.find(:conditions => ["#{OurModel.connection.quote_column_name(when)} < ?", @yesterday])

In addition to quote_column_name, ActiveRecord also provides a quote_table_name function for quoting table names and, crucially, a quote function for quoting the data you actually store in your database. These functions are your friends in database-agnostic programming—be sure to use them!

Booleans

MySQL lacks a native BOOLEAN type, so if you create a boolean column in Rails, you will end up with a TINYINT(1) column which has values of 0 and 1 for false and true respectively. There’s no particular need to be aware of this fact—except that most Rails developers are aware of this fact and so often use 0 and false somewhat interchangeably. PostgreSQL has a native BOOLEAN type, so these shenanigans will no longer fly. If important is a boolean column, then code like this will work in MySQL-backed rails but not under PostgreSQL:

m = OurModel.new(:all, :important => 1)
m.save

Instead, be sure to use true and false where you mean to be using booleans:

m = OurModel.new(:all, :important => true)
m.save

Similarly, if you are using booleans in your conditions, this will work in MySQL:

OurModel.find(:all, :conditions => "important = 1")

Instead, take advantage of ActiveRecord’s helpful quoting capabilities and replace your code with the database-agnostic equivalent:

OurModel.find(:all, :conditions => ["important = ?", true])

You may also be tripped up by this problem in your fixtures; replace 0 and 1 with false and true in your YAML files: they’ll then work in both MySQL and PostgreSQL.

Other differences

There are other differences which we encountered. In one part of our code, we need to select rows from the database at random. In MySQL, this is expressed as ORDER BY RAND(), but in PostgreSQL, the function is called RANDOM(). Since there is only one place in the code where this occurs, we did not bother to implement any kind of sophisticated database-agnostic function. Instead, we took advantage of the connection.adapter_name function to be able to write this:

OurModel.find(:all, :conditions => ["value < ?", @max_wanted_value],
              :order => OurModel.connection.adapter_name == 'PostgreSQL' ? 'RANDOM()' : 'RAND()',
	      :limit => 50)

Additionally, we had one unexpected issue: MySQL DATETIME columns store their data with integer second precision, but PostgreSQL timestamp without time zone columns are based on floating point numbers. We had to modify our code in a couple of places to round times to the nearest second before storing them in the database in order to get certain equality comparisons to work properly.

Copying the data

Once we had the application more-or-less working as we wanted, it was time to copy the data from the MySQL database into Postgres. Naïve approaches actually do not work well here; due to the difference in quoting rules, the differing storage for booleans mentioned above, and a couple of issues, calling mysqldump and then attempting to read the output into Postgres fails.

Instead, we did the data transfer taking advantage of ActiveRecord’s knowledge of databases. We based the method on this one provided by Rama McIntosh but we had to make a couple of changes. We modified the code to manually generate INSERT statements rather than constructing ActiveRecord objects. This has the chief advantage of working with has_and_belongs_to_many-style join tables without id columns and the secondary advantage of being somewhat faster. We also modified the code to automatically update the PostgreSQL sequence objects so as not to break future inserts into the table.

The resultant code has been checked into github. Feel free to use it and modify it further if you like.

If you are working on a similar conversion project, good luck!

Blog comments powered by Disqus

  1. mlcastle posted this
About
This is a tumblr. I don't know what it is for. Subscribe via RSS.