Alfresco is a full featured opensource ECM. It uses opensource software extensively , as a webapp server (tomcat by default), an office suite (openoffice or libreoffice), image manipulation tools (imagemagick), and of course an SGBD! In this article, the former component is the one we will focus on. After having shipped mysql by default, the Alfresco installer now comes woth postgreSQL as a default database server. We could argue about pros and cons of each other, and that would probably not lead to a mutual agreement…
Nevertheless, I had cases where DBAs get rid of their MySQL DB and switch to Postgres.
This article explains how to do this migration for an Alfresco DB. The following has been tested with Alfresco 3.4.x et 4.x. It is also known not to work as is, with version older than 3.4!
The idea here is fairly simple:
The database schema being relatively complex, and mainly managed by hibernate/ibatis… it’s easy to get lost in those muddy waters! So in order to minimimse tthe sources of errors, we’ll let Alfresco deal with the creation of a new schema, and then insert our (almost) raw data…
- Prerequisites:
- We will assume destination and source instances of Alfresco are the same version. iF further upgrade is planned t can be done once database migration has been completed successfully.
- We will assume that collations and charsetsare similar. By default Postgres and MySQL should use respectively UTF-8 and utf8_bin.
- downtime has to be scheduled during this migration in order to ensure data consistency across the instances.
- A brand new DB:
To start with, we’ll deploy a fresh Alfresco instance, so called out of the box instance, using PostgreSQL as an SGDB. I won’t dive into much details here as I assume you’re already used to lfresco installation if yo plan to migrate from one SL server to another! To quickly summarise, install tomcat, create a postgres database and its associated user, put the alfresco wars in the webapps folder, and configure your alfresco-global.properties fileto point to the new empty DB.
When tomcat deploys Alfresco, and Alfresco finds an empty database, he will create a schema for it, just has onewould expect from a clean install (and also populate it with some data).
Once bootstrap completed, Alfresco has done its part, we can turn it of and start our job! We will use this database as a template to ensure the structure is as Alfresco expect it to be. Let’s dump this schema (and forget about those ootb data we don’t really care about).
~$ pg_dump -h gimly -U alfresco -W -s alfresco > /tmp/pg_schema.sql
In order to avoid problems at restore time, we are going to split this file in two. One file will contain table definitions (/tmp/pg_schema_table.sql), and the second one will contains constraints and indexes (/tmp/pg_schema_fk-idx.sql). Now let’s store those files in a safe place for later use. The rest of the new ootb instance can be deleted.
- Production data backup:
We now have to turn off Alfresco (the one running MySQL), in order to do the database backup. mysqldump is a perfect tools for our use case.
~$ mysqldump -u root -p --hex-blob -c --default-character-set=utf8 --compact -n --compatible=postgresql -t alfresco | \
sed -r -e 's/0x00([),])/false\1/g' \
-e 's/0x01([),])/true\1/g' \
-e "s/([(,])0x([0-9a-fA-F]*)([),])/\1decode('\2','hex')\3/g" \
-e 's/INSERT INTO "([a-zA-Z_-]+)" \((.*)\) VALUES \(/INSERT INTO "\1" (\L\2) VALUES (/g' \
> /tmp/data.sql
Here is an explanation of the different options we use:
-n : Tell mysqldump to exclude table creation statements.
-c : Tell mysqldump to incluse table names in theINSERT
statements. This is to avoid issues related to column ordering within a table.
–compatible=postgresql : Tell mysqldump produce a PostgreSQL friendly output (This is unfortunately not enough for Postgres to read this file).
–hex-blob : Tell mysqldump to treat binary fields (blolb or boolean) as hexadecimal fields (offer an easier parser for tools like sed.
–default-character-set=utf8 : Set the default charset to UTF-8.
–compact: Tell mysqldump ignore some useless statements.
s/0x00([),])/false\1/g : Map hexadecimal value 0x00, which is not know to Postgres, to false (typical boolean field).
s/0x01([),])/true\1/g : Map hexadecimal value 0x01, which is not know to Postgres, to true (typical boolean field).
s/([(,])0x([0-9a-fA-F]*)([),])/\1decode(‘\2′,’hex’)\3/g : Insert the decode() function for Postgres to flawlessly reècreate binary fields.
s/INSERT INTO “([a-zA-Z_-]+)” \((.*)\) VALUES \( : lower table and columns name case.
The resulting file is almost ready, we will just add the line bellow to the top of the file:
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
- Restore data to PostgreSQL:
In order to restore we will first create a new database (ensure to delete the one create on postgres at first step as its useless and could be missleading), here is how to delete the previous db from postgres and re-create it empty:
~$ dropdb alfresco
~$ createdb -Oalfresco alfresco
Now let’s restore the bare structure of the db (no index, no constraints):
~$ psql -h gimly -U alfresco -W alfresco < /tmp/pg_schema_table.sql
and then insert data:
~$ psql -h gimly -U alfresco -W alfresco < /tmp/data.sql
Note that at this step you may find some data that doesn’t fit in the new db, this is usually because of the length of thefields and is rather easy to fix by updating the table definition in the file /tmp/pg_schema_table.sql Once data have been inserted, just recreate indexes and foreign contraints:
~$ psql -h gimly -U alfresco -W alfresco < /tmp/pg_schema3411_fkIdx.sql
At this step the database is almost ready to startup Alfresco. The last issue we have to deal with are sequences. Postgres, as most RDBMS do, uses séquences to automatically increment integer values. MySQL, doesn’t provide sequences , and instead uses an integer field option:AUTO_INCREMENT
. Once again we could discuss the best approach or the reason of this choice.
But the important bit here is postgreSQL sequences have to be updated in order to have Alfresco working properly.
So we will extract from MySQL the values of the involved fields and put them in a file:
mysql> use information_schema
mysql> SELECT TABLE_NAME,AUTO_INCREMENT
FROM TABLES
WHERE TABLE_SCHEMA='alfresco'
AND TABLE_NAME NOT LIKE 'jbpm_%'
INTO OUTFILE '/tmp/AI.sql';
And then update the sequences accordingly in PostgreSQL:
grep -v '\N$' /tmp/AI.sql | \
awk '{print "ALTER SEQUENCE " $1 "_seq RESTART WITH " $2 ";"}' | \
psql -h gimly -U alfresco -W alfresco
One last sequence have to be updated. This one is slightly different from the others as it is shared by several tables (see Alfresco JIRA MNT-7445). To find out the value the sequence has to be updated to, we use the SQL statement bellow:
mysql> SELECT max(AUTO_INCREMENT)
FROM TABLES
WHERE TABLE_SCHEMA='alfresco'
AND TABLE_NAME LIKE 'JBPM%';
We pick the higher value and use it inthe statement bellow:
ALTER SEQUENCE hibernate_sequence RESTART WITH <VALUE> ;
- Alfresco Configuration:
Last step before starting Alfresco, update Alfresco configuration to use Postgres instead of MySQL. Ths is done by editing the alfresco-global.properties file (please read official documentation):
db.driver=org.postgresql.Driver
db.url=jdbc:postgresql://localhost:5432/${db.name}
db.name=alfresco
db.username=alfresco
db.password=donttellanybody
Make sure your tomcat installation offers the necessary jar files to allow Alfresco to use the postgres jdbc driver, the restart Alfresco.
Please note that this procedure (tested against Alfresco entreprise 3.4.11 followed by an upgrade to 4.2) is not officially supported. So if you subscribed to Alfresco support, contacting them prior to try anything is always a good idea. If you want to use this on a production system, make sure you have all the necessary consistent backup, and test evert aspect of Alfresco thoroughly to make sure everything works as expected.
Leave a Reply to guillaume vola Cancel reply