2001-02-10 19:43:33 +08:00
#!/usr/bin/perl
2001-02-10 19:43:12 +08:00
#
# My2Pg: MySQL to PostgreSQL dump conversion utility
#
# (c) 2000,2001 Maxim Rudensky <fonin@ziet.zhitomir.ua>
# (c) 2000 Valentine Danilchuk <valdan@ziet.zhitomir.ua>
# All right reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. All advertising materials mentioning features or use of this software
# must display the following acknowledgement:
# This product includes software developed by the Max Rudensky
# and its contributors.
# 4. Neither the name of the author nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
2004-04-20 07:11:49 +08:00
# $My2pg: my2pg.pl,v 1.28 2001/12/06 19:32:20 fonin Exp $
2004-04-20 07:18:12 +08:00
# $Id: my2pg.pl,v 1.13 2004/04/19 23:18:12 momjian Exp $
# Custom patch
# Revision 1.9 2002/08/22 00:01:39 tgl
# Add a bunch of pseudo-types to replace the behavior formerly associated
# with OPAQUE, as per recent pghackers discussion. I still want to do some
# more work on the 'cstring' pseudo-type, but I'm going to commit the bulk
# of the changes now before the tree starts shifting under me ...
2001-02-10 19:43:12 +08:00
#
# $Log: my2pg.pl,v $
2004-04-20 07:18:12 +08:00
# Revision 1.13 2004/04/19 23:18:12 momjian
# Update to my2pg version 1.28, add docs, update URL for newest version.
#
# Create diff of custom changes Tom made to the utility for CREATE
# FUNCTION.
#
# This will make moving this utility out of CVS easier.
#
2004-04-20 07:11:49 +08:00
# Revision 1.12 2004/04/19 23:11:49 momjian
# Update to my2pg 1.28, from:
2003-11-30 06:41:33 +08:00
#
2004-04-20 07:11:49 +08:00
# http://www.omnistarinc.com/~fonin/downloads.php#my2pg
2003-11-30 06:41:33 +08:00
#
2004-04-20 07:11:49 +08:00
# Revision 1.28 2002/11/30 12:03:48 fonin
# PostgreSQL does not support indexes on the partial length of column,
# e.g.
# CREATE INDEX i_index ON table (column(16));
# will not work. Fixed.
#
# Added command-line option -s that prevents my2pg from attempting convert
# the data (currently only timestamps).
#
# Better timestamps conversion.
2003-01-08 06:18:43 +08:00
#
# Revision 1.27 2002/07/16 14:54:07 fonin
# Bugfix - didn't quote the fields inside PRIMARY KEY with -d option.
# Fix by Milan P. Stanic <mps@rns-nis.co.yu>.
2002-08-22 08:01:51 +08:00
#
2003-01-08 06:18:43 +08:00
# Revision 1.26 2002/07/14 10:30:27 fonin
# Bugfix - MySQL keywords inside data (INSERT INTO sentence) were replaced
# with Postgres keywords and therefore messed up the data.
#
# Revision 1.25 2002/07/05 09:20:25 fonin
# - fixed data that contains two consecutive timestamps - thanks to
# Ben Darnell <bdarnell@google.com>
# - word 'default' was converted to upper case inside the data - fixed.
# Thanks to Madsen Wikholm <madsen@iki.fi>
2002-04-24 09:42:29 +08:00
#
# Revision 1.24 2002/04/20 14:15:43 fonin
# Patch by Felipe Nievinski <fnievinski@terra.com.br>.
# A table I was re-creating had a composite primary key, and I was using
# the -d switch to maintain the table and column names
# adding double quotes around them.
#
# The SQL code generated was something like this:
#
# CREATE TABLE "rinav" (
# "UnidadeAtendimento" INT8 DEFAULT '0' NOT NULL,
# "NumeroRinav" INT8 DEFAULT '0' NOT NULL,
# -- ...
# PRIMARY KEY ("UnidadeAtendimento"," NumeroRinav")
# );
#
# Please note the space inside the second column name string in the PK
# definition. Because of this PostgreSQL was not able to create the table.
#
# FIXED.
2002-02-08 22:47:56 +08:00
#
# Revision 1.23 2002/02/07 22:13:52 fonin
# Bugfix by Hans-Juergen Schoenig <hs@cybertec.at>: additional space after
# FLOAT8 is required.
2002-01-08 06:36:51 +08:00
#
# Revision 1.22 2001/12/06 19:32:20 fonin
# Patch: On line 594 where you check for UNIQUE, I believe the regex should try
# and match 'UNIQUE KEY'. Otherwise it outputs no unique indexes for the
# postgres dump.
# Thanks to Brad Hilton <bhilton@vpop.net>
2001-11-21 10:43:30 +08:00
#
# Revision 1.21 2001/08/25 18:55:28 fonin
# Incorporated changes from Yunliang Yu <yu@math.duke.edu>:
# - By default table & column names are not quoted; use the new
# "-d" option if you want to,
# - Use conditional substitutions to speed up and preserve
# the data integrity.
# Fixes by Max:
# - timestamps conversion fix. Shouldn't break now matching binary data and
# strings.
#
# Revision 1.21 2001/07/23 03:04:39 yu
# Updates & fixes by Yunliang Yu <yu@math.duke.edu>
# . By default table & column names are not quoted; use the new
# "-d" option if you want to,
# . Use conditional substitutions to speed up and preserve
# the data integrity.
#
# Revision 1.20 2001/07/05 12:45:05 fonin
# Timestamp conversion enhancement from Joakim Lemstr<74> m <jocke@bytewize.com>
#
# Revision 1.19 2001/05/07 19:36:38 fonin
# Fixed a bug in quoting PRIMARY KEYs, KEYs and UNIQUE indexes with more than 2 columns. Thanks to Jeff Waugh <jaw@ic.net>.
2001-03-07 06:46:50 +08:00
#
# Revision 1.18 2001/03/06 22:25:40 fonin
# Documentation up2dating.
2001-02-10 19:43:33 +08:00
#
2001-03-04 23:43:33 +08:00
# Revision 1.17 2001/03/04 13:01:50 fonin
2001-03-07 06:46:50 +08:00
# Fixes to make work it right with MySQL 3.23 dumps. Tested on mysqldump 8.11.
# Also, AUTO_INCREMENT->SERIAL fields no more have DEFAULT and NOT NULL
# definitions.
2001-03-04 23:43:33 +08:00
#
# Revision 1.16 2001/02/02 08:15:34 fonin
# Sequences should be created BEFORE creating any objects \nthat depends on it.
2001-02-10 19:43:12 +08:00
#
# Revision 1.15 2001/01/30 10:13:36 fonin
# Re-released under BSD-like license.
#
# Revision 1.14 2000/12/18 20:55:13 fonin
# Better -n implementation.
#
# Revision 1.13 2000/12/18 15:26:33 fonin
2001-03-04 23:43:33 +08:00
# Added command-line options. -n forces *CHAR DEFAULT '' NOT NULL to be
# converted to *CHAR NULL.
# AUTO_INCREMENT fields converted not in SERIAL but in
# INT* NOT NULL DEFAULT nextval('seqname').
# Documentation refreshed.
# Dump enclosed in single transaction from now.
2001-02-10 19:43:12 +08:00
#
# Revision 1.12 2000/12/14 20:57:15 fonin
# Doublequotation bug fixed (in CREATE INDEX ON TABLE (field1,field2))
#
# Revision 1.10 2000/11/27 14:18:22 fonin
# Fixed bug - occasionaly was broken CREATE SEQUENCE generation
#
# Revision 1.8 2000/11/24 15:24:16 fonin
# TIMESTAMP fix: MySQL output YYYYMMDDmmhhss to YYYYMMDD mmhhss
#
# Revision 1.7 2000/11/22 23:04:41 fonin
# TIMESTAMP field fix. Better doublequoting. Splitting output dump
# into 2 transactions - create/load/indexing first, sequence setvals then.
# Added POD documentation.
#
#
use Getopt::Std ;
my % opts ; # command line options
my $ chareg = '' ; # CHAR conversion regexps
2001-11-21 10:43:30 +08:00
my $ dq = '' ; # double quote
2001-02-10 19:43:12 +08:00
# parse command line
2004-04-20 07:11:49 +08:00
getopts ( 'nhds' , \ % opts ) ;
2001-02-10 19:43:12 +08:00
# output syntax
if ( $ opts { h } ne '' ) {
usage ( ) ;
exit ;
}
# convert CHAR types from NOT NULL DEFAULT '' to NULL
if ( $ opts { n } ne '' ) {
$ chareg = '\s*?(default\s*?\'\')*?\s*?not\s*?null' ;
}
2001-11-21 10:43:30 +08:00
# want double quotes
if ( $ opts { d } ne '' ) {
$ dq = '"' ;
}
2001-02-10 19:43:12 +08:00
2004-04-20 07:11:49 +08:00
if ( $ opts { s } ne '' ) {
$ safe_data_conv = 1 ;
}
else {
$ safe_data_conv = 0 ;
}
2001-02-10 19:43:12 +08:00
$| = 1 ;
print ( "------------------------------------------------------------------" ) ;
2004-04-20 07:11:49 +08:00
print ( "\n-- My2Pg 1.28 translated dump" ) ;
2001-02-10 19:43:12 +08:00
print ( "\n--" ) ;
print ( "\n------------------------------------------------------------------" ) ;
print ( "\n\nBEGIN;\n\n\n" ) ;
my % index ; # contains array of CREATE INDEX for each table
my % seq ; # contains CREATE SEQUENCE for each table
my % primary ; # contains primary (eg SERIAL) fields for each table
my % identifier ; # contains generated by this program identifiers
my $ j = - 1 ; # current position in $index{table}
my @ check ; # CHECK constraint for current
# generating full path to libtypes.c
my $ libtypesource = 'libtypes.c' ;
my $ libtypename = `pwd` ;
chomp ( $ libtypename ) ;
$ libtypename . = '/libtypes.so' ;
# push header to libtypes.c
open ( LIBTYPES , ">$libtypesource" ) ;
print LIBTYPES "/******************************************************" ;
2003-01-08 06:18:43 +08:00
print LIBTYPES "\n * My2Pg 1.27 \translated dump" ;
2001-02-10 19:43:12 +08:00
print LIBTYPES "\n * User types definitions" ;
print LIBTYPES "\n ******************************************************/" ;
print LIBTYPES "\n\n#include <postgres.h>\n" ;
print LIBTYPES "\n#define ADD_COMMA if(strcmp(result,\"\")!=0) strcat(result,\",\")\n" ;
# reading STDIN...
2001-11-21 10:43:30 +08:00
my $ tabledef = 0 ; # we are outside a table definition
2001-02-10 19:43:12 +08:00
while ( < > ) {
2004-04-20 07:11:49 +08:00
if ( ! $ tabledef && /^CREATE TABLE \S+/i ) {
$ tabledef = 1 ;
} elsif ( $ tabledef && /^\) type=\w*;/i ) { # /^\w/i
$ tabledef = 0 ;
}
2001-11-21 10:43:30 +08:00
2001-02-10 19:43:12 +08:00
# Comments start with -- in SQL
2001-11-21 10:43:30 +08:00
if ( /^#/ ) { # !/insert into.*\(.*#.*\)/i, in mysqldump output
2001-02-10 19:43:12 +08:00
s/#/--/ ;
}
2001-11-21 10:43:30 +08:00
2004-04-20 07:11:49 +08:00
if ( $ tabledef ) {
2001-02-10 19:43:12 +08:00
# Convert numeric types
2004-04-20 07:11:49 +08:00
s/tinyint\(\d+\)/INT2/i ;
s/smallint\(\d+\)/INT2/i ;
s/mediumint\(\d+\)/INT4/i ;
s/bigint\(\d+\)/INT8/i ;
s/int\(\d+\)/INT4/i ;
s/float(\(\d+,\d*\))/DECIMAL$1/i ;
s/double precision/FLOAT8 /i ;
s/([\W])double(\(\d+,\d*\))/$1DECIMAL$2/i ;
s/([\W])double[\W]/$1FLOAT8 /i ;
s/([\W])real[\W]/$1FLOAT8 /i ;
s/([\W])real(\(\d+,\d*\))/$1DECIMAL$2/i ;
2001-02-10 19:43:12 +08:00
# Convert string types
2004-04-20 07:11:49 +08:00
s/\w*blob$chareg/text/i ;
s/mediumtext$chareg/text/i ;
s/tinytext$chareg/text/i ;
s/\stext\s+not\s+null/ TEXT DEFAULT '' NOT NULL/i ;
s/(.*?char\(.*?\))$chareg/$1/i ;
2001-02-10 19:43:12 +08:00
2001-11-21 10:43:30 +08:00
# Old and New are reserved words in Postgres
2004-04-20 07:11:49 +08:00
s/^(\s+)Old /${1}MyOld / ;
s/^(\s+)New /${1}MyNew / ;
2001-11-21 10:43:30 +08:00
2001-02-10 19:43:12 +08:00
# Convert DATE types
2004-04-20 07:11:49 +08:00
s/datetime/TIMESTAMP/ ;
s/timestamp\(\d+\)/TIMESTAMP/i ;
s/ date / DATE /i ;
if ( ( /date/ig || /time/ig ) && /[,(]\d{4}(\d{2})(\d{2})[,)]/ &&
$ 1 >= 0 && $ 1 <= 12 && $ 2 >= 0 && $ 2 <= 31 ) {
s/,(\d{4})(\d{2})(\d{2}),/,'$1-$2-$3 00:00:00',/g ;
}
2001-03-04 23:43:33 +08:00
# small hack - convert "default" to uppercase, because below we
# enclose all lowercase words in double quotes
2004-04-20 07:11:49 +08:00
if ( ! /^INSERT/ ) {
s/default/DEFAULT/ ;
}
2001-03-04 23:43:33 +08:00
2001-02-10 19:43:12 +08:00
# Change all AUTO_INCREMENT fields to SERIAL ones with a pre-defined sequence
2004-04-20 07:11:49 +08:00
if ( /([\w\d]+)\sint.*auto_increment/i ) {
$ tmpseq = new_name ( "$table_name" . "_" . "$+" . "_SEQ" , 28 ) ;
$ seq { $ table_name } = $ tmpseq ;
$ primary { $ table_name } = $+ ;
s/(int.*?) .*AUTO_INCREMENT/$1 DEFAULT nextval\('$tmpseq'\)/i ;
}
2001-02-10 19:43:12 +08:00
# convert UNSIGNED to CHECK constraints
2004-04-20 07:11:49 +08:00
if ( /^\s+?([\w\d_]+).*?unsigned/i ) {
$ check . = ",\n CHECK ($dq$1$dq>=0)" ;
}
s/unsigned//i ;
2001-02-10 19:43:12 +08:00
# Limited ENUM support - little heuristic
2004-04-20 07:11:49 +08:00
s/enum\('N','Y'\)/BOOL/i ;
s/enum\('Y','N'\)/BOOL/i ;
2001-02-10 19:43:12 +08:00
# ENUM support
2004-04-20 07:11:49 +08:00
if ( /^\s+?([\w\d_]+).*?enum\((.*?)\)/i ) {
my $ enumlist = $ 2 ;
my @ item ;
$ item [ 0 ] = '' ;
while ( $ enumlist =~ s/'([\d\w_]+)'//i ) {
$ item [ + + $# item ] = $ 1 ;
}
2001-02-10 19:43:12 +08:00
# forming identifier name
2004-04-20 07:11:49 +08:00
$ typename = new_name ( 'enum_' . $ table_name . '_' . $ item [ 1 ] , 28 ) ;
2001-02-10 19:43:12 +08:00
# creating input type function
2004-04-20 07:11:49 +08:00
my $ func_in = "
2001-02-10 19:43:12 +08:00
int2 * $ typename "." _in ( char * str ) {
int2 * result ;
if ( str == NULL )
return NULL ;
result = ( int2 * ) palloc ( sizeof ( int2 ) ) ;
* result = - 1 ; " ;
2004-04-20 07:11:49 +08:00
for ( my $ i = 0 ; $ i <= $# item ; $ i + + ) {
$ func_in . = "
2001-02-10 19:43:12 +08:00
if ( strcmp ( str , \ " $ item [ $ i ] \ " ) == 0 ) {
* result = $ i ;
} " ;
2004-04-20 07:11:49 +08:00
}
$ func_in . = "
2001-02-10 19:43:12 +08:00
if ( * result == - 1 ) {
elog ( ERROR , \ "$typename" . " _in: incorrect input value \ " ) ;
return NULL ;
}
return ( result ) ;
} \ n " ;
2004-04-20 07:11:49 +08:00
$ types . = "\n---" ;
$ types . = "\n--- Types for table " . uc ( $ table_name ) ;
$ types . = "\n---\n" ;
print LIBTYPES "\n/*" ;
print LIBTYPES "\n * Types for table " . uc ( $ table_name ) ;
print LIBTYPES "\n */\n" ;
2004-04-20 07:18:12 +08:00
$ types . = "\nCREATE FUNCTION $typename" . " _in ( cstring )
2001-02-10 19:43:12 +08:00
RETURNS $ typename
AS '$libtypename'
LANGUAGE 'c'
2004-04-20 07:18:12 +08:00
WITH ( ISSTRICT , ISCACHABLE ) ; \ n " ;
2001-02-10 19:43:12 +08:00
# creating output function
2004-04-20 07:11:49 +08:00
my $ func_out = "
2001-02-10 19:43:12 +08:00
char * $ typename "." _out ( int2 * outvalue ) {
char * result ;
if ( outvalue == NULL )
return NULL ;
result = ( char * ) palloc ( 10 ) ;
switch ( * outvalue ) { " ;
2004-04-20 07:11:49 +08:00
for ( my $ i = 0 ; $ i <= $# item ; $ i + + ) {
$ func_out . = "
2001-02-10 19:43:12 +08:00
case $ i:
strcpy ( result , \ " $ item [ $ i ] \ " ) ;
break ; " ;
2004-04-20 07:11:49 +08:00
}
$ func_out . = "
2001-02-10 19:43:12 +08:00
default :
elog ( ERROR , \ "$typename" . " _out: incorrect stored value \ " ) ;
return NULL ;
break ;
}
return result ;
} \ n " ;
2004-04-20 07:11:49 +08:00
$ func_out . = "\nbool $typename" . " _eq ( int2 * a , int2 * b ) {
2001-02-10 19:43:12 +08:00
return ( * a == * b ) ;
}
bool $ typename "." _ne ( int2 * a , int2 * b ) {
return ( * a != * b ) ;
}
bool $ typename "." _lt ( int2 * a , int2 * b ) {
return ( * a < * b ) ;
}
bool $ typename "." _le ( int2 * a , int2 * b ) {
return ( * a <= * b ) ;
}
bool $ typename "." _gt ( int2 * a , int2 * b ) {
return ( * a > * b ) ;
}
bool $ typename "." _ge ( int2 * a , int2 * b ) {
return ( * a >= * b ) ;
} \ n " ;
2004-04-20 07:18:12 +08:00
$ types . = "\nCREATE FUNCTION $typename" . " _out ( $ typename )
RETURNS cstring
2001-02-10 19:43:12 +08:00
AS '$libtypename'
LANGUAGE 'c'
2004-04-20 07:18:12 +08:00
WITH ( ISSTRICT , ISCACHABLE ) ; \ n " ;
2001-02-10 19:43:12 +08:00
2004-04-20 07:11:49 +08:00
$ types . = " \ nCREATE TYPE $ typename (
2001-02-10 19:43:12 +08:00
internallength = 2 ,
input = $ typename \ _in ,
output = $ typename \ _out
) ; \ n " ;
2004-04-20 07:11:49 +08:00
$ types . = "\nCREATE FUNCTION $typename" . " _eq ( $ typename , $ typename )
2001-02-10 19:43:12 +08:00
RETURNS bool
AS '$libtypename'
LANGUAGE 'c' ;
CREATE FUNCTION $ typename "." _lt ( $ typename , $ typename )
RETURNS bool
AS '$libtypename'
LANGUAGE 'c' ;
CREATE FUNCTION $ typename "." _le ( $ typename , $ typename )
RETURNS bool
AS '$libtypename'
LANGUAGE 'c' ;
CREATE FUNCTION $ typename "." _gt ( $ typename , $ typename )
RETURNS bool
AS '$libtypename'
LANGUAGE 'c' ;
CREATE FUNCTION $ typename "." _ge ( $ typename , $ typename )
RETURNS bool
AS '$libtypename'
LANGUAGE 'c' ;
CREATE FUNCTION $ typename "." _ne ( $ typename , $ typename )
RETURNS bool
AS '$libtypename'
LANGUAGE 'c' ;
CREATE OPERATOR < (
leftarg = $ typename ,
rightarg = $ typename ,
- - negator = >= ,
procedure = $ typename "." _lt
) ;
CREATE OPERATOR <= (
leftarg = $ typename ,
rightarg = $ typename ,
- - negator = > ,
procedure = $ typename "." _le
) ;
CREATE OPERATOR = (
leftarg = $ typename ,
rightarg = $ typename ,
commutator = = ,
- - negator = < > ,
procedure = $ typename "." _eq
) ;
CREATE OPERATOR >= (
leftarg = $ typename ,
rightarg = $ typename ,
negator = < ,
procedure = $ typename "." _ge
) ;
CREATE OPERATOR > (
leftarg = $ typename ,
rightarg = $ typename ,
negator = <= ,
procedure = $ typename "." _gt
) ;
CREATE OPERATOR < > (
leftarg = $ typename ,
rightarg = $ typename ,
negator = = ,
procedure = $ typename "." _ne
) ; \ n " ;
2004-04-20 07:11:49 +08:00
print LIBTYPES $ func_in ;
print LIBTYPES $ func_out ;
s/enum\(.*?\)/$typename/i ;
}
2001-02-10 19:43:12 +08:00
# SET support
2004-04-20 07:11:49 +08:00
if ( /^\s+?([\w\d_]+).*?set\((.*?)\)/i ) {
my $ setlist = $ 2 ;
my @ item ;
$ item [ 0 ] = '' ;
my $ maxlen = 0 ; # maximal string length
while ( $ setlist =~ s/'([\d\w_]+)'//i ) {
$ item [ + + $# item ] = $ 1 ;
$ maxlen += length ( $ item [ $# item ] ) + 1 ;
}
$ maxlen += 1 ;
my $ typesize = int ( $# item / 8 ) ;
if ( $ typesize < 2 ) {
$ typesize = 2 ;
}
$ internalsize = $ typesize ;
$ typesize = 'int' . $ typesize ;
$ typename = new_name ( 'set_' . $ table_name . '_' . $ item [ 1 ] , 28 ) ;
2001-02-10 19:43:12 +08:00
# creating input type function
2004-04-20 07:11:49 +08:00
my $ func_in = "
2001-02-10 19:43:12 +08:00
$ typesize * $ typename "." _in ( char * str ) {
$ typesize * result ;
char * token ;
if ( str == NULL )
return NULL ;
result = ( $ typesize * ) palloc ( sizeof ( $ typesize ) ) ;
* result = 0 ;
if ( strcmp ( str , \ " \ " ) == 0 )
return result ;
for ( token = strtok ( str , \ ",\");token!=NULL;token=strtok(NULL,\",\")) {" ;
2004-04-20 07:11:49 +08:00
for ( my $ i = 0 , my $ j = 1 ; $ i <= $# item ; $ i + + , $ j *= 2 ) {
$ func_in . = "
2001-02-10 19:43:12 +08:00
if ( strcmp ( token , \ " $ item [ $ i ] \ " ) == 0 ) {
* result |= $ j ;
continue ;
} " ;
2004-04-20 07:11:49 +08:00
}
$ func_in . = "
2001-02-10 19:43:12 +08:00
}
if ( * result == 0 ) {
elog ( ERROR , \ "$typename" . " _in: incorrect input value \ " ) ;
return NULL ;
}
return ( result ) ;
} \ n " ;
2004-04-20 07:11:49 +08:00
$ types . = "\n---" ;
$ types . = "\n--- Types for table " . uc ( $ table_name ) ;
$ types . = "\n---\n" ;
print LIBTYPES "\n/*" ;
print LIBTYPES "\n * Types for table " . uc ( $ table_name ) ;
print LIBTYPES "\n */\n" ;
2004-04-20 07:18:12 +08:00
$ types . = "\nCREATE FUNCTION $typename" . " _in ( cstring )
2001-02-10 19:43:12 +08:00
RETURNS $ typename
AS '$libtypename'
LANGUAGE 'c' ; \ n " ;
# creating output function
2004-04-20 07:11:49 +08:00
my $ func_out = "
2001-02-10 19:43:12 +08:00
char * $ typename "." _out ( $ typesize * outvalue ) {
char * result ;
int i ;
if ( outvalue == NULL )
return NULL ;
result = ( char * ) palloc ( $ maxlen ) ;
strcpy ( result , \ " \ " ) ;
for ( i = 1 ; i <= 2 << ( sizeof ( int2 ) * 8 ) ; i *= 2 ) {
switch ( * outvalue & i ) { " ;
2004-04-20 07:11:49 +08:00
for ( my $ i = 0 , $ j = 1 ; $ i <= $# item ; $ i + + , $ j *= 2 ) {
$ func_out . = "
2001-02-10 19:43:12 +08:00
case $ j: " ;
2004-04-20 07:11:49 +08:00
if ( $ item [ $ i ] ne '' ) {
$ func_out . = "ADD_COMMA;" ;
}
$ func_out . = " strcat ( result , \ " $ item [ $ i ] \ " ) ;
2001-02-10 19:43:12 +08:00
break ; " ;
2004-04-20 07:11:49 +08:00
}
$ func_out . = "
2001-02-10 19:43:12 +08:00
default :
break ;
}
}
return result ;
} \ n " ;
2004-04-20 07:11:49 +08:00
$ func_out . = "\nbool $typename" . " _eq ( $ typesize * a , $ typesize * b ) {
2001-02-10 19:43:12 +08:00
return ( * a == * b ) ;
}
$ typesize find_in_set ( $ typesize * a , $ typesize * b ) {
int i ;
for ( i = 1 ; i <= sizeof ( $ typesize ) * 8 ; i *= 2 ) {
if ( * a & * b ) {
return 1 ;
}
}
return 0 ;
}
\ n " ;
2004-04-20 07:18:12 +08:00
$ types . = "\nCREATE FUNCTION $typename" . " _out ( $ typename )
RETURNS cstring
2001-02-10 19:43:12 +08:00
AS '$libtypename'
LANGUAGE 'c' ; \ n " ;
2004-04-20 07:11:49 +08:00
$ types . = " \ nCREATE TYPE $ typename (
2001-02-10 19:43:12 +08:00
internallength = $ internalsize ,
input = $ typename \ _in ,
output = $ typename \ _out
) ; \ n " ;
2004-04-20 07:11:49 +08:00
$ types . = "\nCREATE FUNCTION $typename" . " _eq ( $ typename , $ typename )
2001-02-10 19:43:12 +08:00
RETURNS bool
AS '$libtypename'
LANGUAGE 'c' ;
CREATE FUNCTION find_in_set ( $ typename , $ typename )
RETURNS bool
AS '$libtypename'
LANGUAGE 'c' ;
CREATE OPERATOR = (
leftarg = $ typename ,
rightarg = $ typename ,
commutator = = ,
procedure = $ typename "." _eq
) ;
CREATE OPERATOR < > (
leftarg = $ typename ,
rightarg = $ typename ,
commutator = < > ,
negator = = ,
procedure = $ typename "." _eq
) ;
\ n " ;
2004-04-20 07:11:49 +08:00
print LIBTYPES $ func_in ;
print LIBTYPES $ func_out ;
s/set\(.*?\)/$typename/i ;
}
2001-02-10 19:43:12 +08:00
# Change multy-field keys to multi-field indices
# MySQL Dump usually ends the CREATE TABLE statement like this:
# CREATE TABLE bids (
# ...
# PRIMARY KEY (bids_id),
# KEY offer_id (offer_id,user_id,the_time),
# KEY bid_value (bid_value)
# );
# We want to replace this with smth like
# CREATE TABLE bids (
# ...
# PRIMARY KEY (bids_id),
# );
# CREATE INDEX offer_id ON bids (offer_id,user_id,the_time);
# CREATE INDEX bid_value ON bids (bid_value);
2004-04-20 07:11:49 +08:00
if ( s/CREATE TABLE (.*) /CREATE TABLE $dq$1$dq /i ) {
if ( $ oldtable ne $ table_name ) {
$ oldtable = $ table_name ;
$ j = - 1 ;
$ check = '' ;
if ( $ seq { $ table_name } ne '' ) {
print "\n\n--" ;
print "\n-- Sequences for table " . uc ( $ table_name ) ;
print "\n--\n" ;
print "\nCREATE SEQUENCE " . $ seq { $ table_name } . ";\n\n" ;
}
print $ types ;
$ types = '' ;
$ dump =~ s/,\n\).*;/\n\);/gmi ;
2001-03-04 23:43:33 +08:00
# removing table options after closing bracket:
# ) TYPE=ISAM PACK_KEYS=1;
2004-04-20 07:11:49 +08:00
$ dump =~ s/\n\).*/\n\);/gmi ;
print $ dump ;
$ dump = '' ;
}
$ table_name = $ 1 ;
}
2001-02-10 19:43:12 +08:00
# output CHECK constraints instead UNSIGNED modifiers
2004-04-20 07:11:49 +08:00
if ( /PRIMARY KEY\s+\((.*)\)/i ) {
my $ tmpfld = $ 1 ;
$ tmpfld =~ s/,/","/g if $ dq ;
$ tmpfld =~ s/ //g ;
s/PRIMARY KEY\s+(\(.*\))/PRIMARY KEY \($dq$tmpfld$dq\)/i ;
s/(PRIMARY KEY \(.*\)).*/$1$check\n/i ;
}
2001-02-10 19:43:12 +08:00
2004-04-20 07:11:49 +08:00
if ( /^\s*KEY ([\w\d_]+)\s*\((.*)\).*/i ) {
my $ tmpfld = $ 2 ; my $ ky = $ 1 ;
$ tmpfld =~ s/\s*,\s*/","/g if $ dq ;
$ tmpfld =~ s/(\(\d+\))//g ;
$ index { $ table_name } [ + + $ j ] = "CREATE INDEX ${ky}_$table_name\_index ON $dq$table_name$dq ($dq$tmpfld$dq);" ;
}
if ( /^\s*UNIQUE.*?([\w\d_]+)\s*\((.*)\).*/i ) {
my $ tmpfld = $ 2 ; my $ ky = $ 1 ;
$ tmpfld =~ s/,/","/g if $ dq ;
$ tmpfld =~ s/(\(\d+\))//g ;
$ index { $ table_name } [ + + $ j ] = "CREATE UNIQUE INDEX ${ky}_$table_name\_index ON $dq$table_name$dq ($dq$tmpfld$dq);" ;
}
s/^\s*UNIQUE (.+).*(\(.*\)).*\n//i ;
s/^\s*KEY (.+).*(\(.*\)).*\n//i ;
2001-11-21 10:43:30 +08:00
2004-04-20 07:11:49 +08:00
if ( $ dq && ! /^\s*(PRIMARY KEY|UNIQUE |KEY |CREATE TABLE|INSERT INTO|\);)/i ) {
s/\s([A-Za-z_\d]+)\s/ $dq$+$dq / ;
}
} # end of if($tabledef)
2001-02-10 19:43:12 +08:00
2004-04-20 07:11:49 +08:00
s/INSERT INTO\s+?(.*?)\s+?/INSERT INTO $dq$1$dq /i ;
2001-02-10 19:43:12 +08:00
2004-04-20 07:11:49 +08:00
# if not defined -s command-line option (safe data conversion),
# attempting to convert timestamp data
if ( ! $ safe_data_conv ) {
2001-11-21 10:43:30 +08:00
# Fix timestamps
2004-04-20 07:11:49 +08:00
s/'0000-00-00/'0001-01-01/g ;
# may corrupt data !!!
s/([,(])00000000000000(?=[,)])/$1'00010101 000000'/g ;
if ( /[,(]\d{4}(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})[,)]/ &&
$ 1 >= 0 && $ 1 <= 12 && $ 2 >= 0 && $ 2 <= 31 && $ 3 >= 0 && $ 3 <= 23 &&
$ 4 >= 0 && $ 4 <= 59 && $ 5 >= 0 && $ 5 <= 59 ) {
s/([,(])(\d{8})(\d{6})(?=[,)])/$1'$2 $3'/g ;
}
if ( /[,(]\d{4}(\d{2})(\d{2})[,)]/ &&
$ 2 >= 0 && $ 2 <= 12 && $ 3 >= 0 && $ 3 <= 31 ) {
s/([,(])(\d{4})(\d{2})(\d{2})(?=[,)])/$1'$2-$3-$4 00:00:00'/g ;
}
}
2001-02-10 19:43:12 +08:00
$ dump . = $ _ ;
}
2001-03-04 23:43:33 +08:00
if ( $ seq { $ table_name } ne '' ) {
print "\n\n--" ;
print "\n-- Sequences for table " . uc ( $ table_name ) ;
print "\n--\n" ;
print "\nCREATE SEQUENCE " . $ seq { $ table_name } . ";\n\n" ;
}
2001-02-10 19:43:12 +08:00
print $ types ;
2001-03-04 23:43:33 +08:00
$ dump =~ s/,\n\).*;/\n\);/gmi ;
$ dump =~ s/\n\).*/\n\);/gmi ;
2001-02-10 19:43:12 +08:00
print $ dump ;
# Output indices for tables
while ( my ( $ table , $ ind ) = each ( % index ) ) {
print "\n\n--" ;
print "\n-- Indexes for table " . uc ( $ table ) ;
print "\n--\n" ;
for ( my $ i = 0 ; $ i <= $# { $ ind } ; $ i + + ) {
print "\n$ind->[$i]" ;
}
}
while ( my ( $ table , $ s ) = each ( % seq ) ) {
print "\n\n--" ;
print "\n-- Sequences for table " . uc ( $ table ) ;
print "\n--\n" ;
# setting SERIAL sequence values right
if ( $ primary { $ table } ne '' ) {
2001-11-21 10:43:30 +08:00
print "\nSELECT SETVAL('" . $ seq { $ table } . "',(select case when max($dq" . $ primary { $ table } . "$dq)>0 then max($dq" . $ primary { $ table } . "$dq)+1 else 1 end from $dq$table$dq));" ;
2001-02-10 19:43:12 +08:00
}
}
2001-03-04 23:43:33 +08:00
print ( "\n\nCOMMIT;\n" ) ;
2001-02-10 19:43:12 +08:00
close ( LIBTYPES ) ;
open ( MAKE , ">Makefile" ) ;
print MAKE " #
2004-04-20 07:18:12 +08:00
# My2Pg \$Revision: 1.13 $ \translated dump
2001-02-10 19:43:12 +08:00
# Makefile
#
all: libtypes . so
libtypes . o: libtypes . c
gcc - c - fPIC - g - O libtypes . c
libtypes . so: libtypes . o
ld - Bshareable - o libtypes . so libtypes . o " ;
close ( MAKE ) ;
#
# Function generates unique identifier
# Args : template name, max length
# Globals: %identifier
#
sub new_name () {
my $ name = lc ( shift @ _ ) ;
my $ len = shift @ _ ;
# truncate long names
if ( length ( $ name ) > $ len ) {
$ name =~ s/(.{$len}).*/$1/i ;
}
# find reserved identifiers
if ( $ identifier { $ name } != 1 ) {
$ identifier { $ name } = 1 ;
return $ name ;
}
else {
for ( my $ i = 1 , my $ tmpname = $ name . $ i ; $ identifier { $ tmpname } != 1 ; ) {
$ tmpname = $ name . $ i
}
$ identifier { $ tmpname } = 1 ;
return $ tmpname ;
}
die "Error during unique identifier generation :-(" ;
}
sub usage () {
print << EOF
my2pg - MySQL to PostgreSQL database dump converter
2004-04-20 07:11:49 +08:00
Copyright ( c ) 2000 - 2002 Max Rudensky <fonin\@ziet.zhitomir.ua>
2001-02-10 19:43:12 +08:00
Copyright ( c ) 2000 Valentine Danilchuk <valdan\@ziet.zhitomir.ua>
This program is distributed in the hope that it will be useful ,
but WITHOUT ANY WARRANTY ; without even the implied warranty of
2001-03-04 23:43:33 +08:00
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the
2001-11-21 10:43:30 +08:00
code source for license details .
2001-02-10 19:43:12 +08:00
SYNTAX:
2004-04-20 07:11:49 +08:00
my2pg [ - hnds ]
2001-02-10 19:43:12 +08:00
OPTIONS:
h - this help
n - convert * CHAR NOT NULL DEFAULT '' types to * CHAR NULL
2004-04-20 07:11:49 +08:00
d - double quotes around table and column names
s - do not attempt to convert data ( timestamps at the moment )
2001-02-10 19:43:12 +08:00
EOF
;
}
= head1 NAME
my2pg - MySQL - > PostgreSQL dump conversion utility .
= head1 SYNTAX
2004-04-20 07:11:49 +08:00
mysqldump db | . / my2pg . pl [ - nds ] > pgsqldump . sql
2001-02-10 19:43:12 +08:00
vi libtypes . c
make
psql database < pgsqldump . txt
where
= over 4
2004-04-20 07:11:49 +08:00
= item F <pgsqldump.sql>
2001-02-10 19:43:12 +08:00
- file suitable for loading into PostgreSQL .
2004-04-20 07:11:49 +08:00
= item F <libtypes.c>
2001-02-10 19:43:12 +08:00
- C source for emulated MySQL types ( ENUM , SET ) generated by B <my2pg>
= back
= head1 OVERVIEW
B <my2pg> utility attempts to convert MySQL database dump to Postgres ' s one .
B <my2pg> performs such conversions:
= over 4
2004-04-20 07:11:49 +08:00
= item * Type conversion .
2001-02-10 19:43:12 +08:00
It tries to find proper Postgres
type for each column .
Unknown types are silently pushing to output dump ;
ENUM and SET types implemented via user types
( C source for such types can be found in
B <libtypes.c> file ) ;
2004-04-20 07:11:49 +08:00
= item * Encloses identifiers into double quotes .
2001-02-10 19:43:12 +08:00
All column and table
names should be enclosed to double - quotes to prevent
2004-04-20 07:11:49 +08:00
conflict with reserved SQL keywords ;
2001-02-10 19:43:12 +08:00
2004-04-20 07:11:49 +08:00
= item * Converting
2001-02-10 19:43:12 +08:00
2001-03-07 06:46:50 +08:00
AUTO_INCREMENT fields to SERIAL . Actually , creating the sequence and
setting default value to nextval ( 'seq' ) , well , you know : )
2004-04-20 07:11:49 +08:00
= item * Converting
2001-03-07 06:46:50 +08:00
2001-02-10 19:43:12 +08:00
KEY ( field ) to CREATE INDEX i_field on table ( field ) ;
2004-04-20 07:11:49 +08:00
= item * The same
2001-02-10 19:43:12 +08:00
for UNIQUE keys ;
2004-04-20 07:11:49 +08:00
= item * Indices
2001-02-10 19:43:12 +08:00
2001-03-07 06:46:50 +08:00
are creating AFTER rows insertion ( to speed up the load ) ;
2001-02-10 19:43:12 +08:00
2004-04-20 07:11:49 +08:00
= item * Translates '#'
2001-02-10 19:43:12 +08:00
MySQL comments to ANSI SQL '--'
= back
It encloses dump in transaction block to prevent single errors
during data load .
= head1 COMMAND - LINE OPTIONS
My2pg takes the following command - line options:
= over 2
= item - n
Convert * CHAR DEFAULT '' NOT NULL types to * CHAR NULL .
Postgres can 't load empty ' ' strings in NOT NULL fields .
2001-11-21 10:43:30 +08:00
= item - d
Add double quotes around table and column names
2001-02-10 19:43:12 +08:00
= item - h
Show usage banner .
2004-04-20 07:11:49 +08:00
= item - s
Do not attempt to convert data . Currently my2pg only tries to convert
date and time data .
2001-02-10 19:43:12 +08:00
= back
= head1 SIDE EFFECTS
= over 4
2004-04-20 07:11:49 +08:00
= item * creates
2001-02-10 19:43:12 +08:00
file B <libtypes.c> in current directory
overwriting existed file without any checks ;
2004-04-20 07:11:49 +08:00
= item * the same
2001-02-10 19:43:12 +08:00
for Makefile .
= back
= head1 BUGS
Known bugs are:
= over 4
2004-04-20 07:11:49 +08:00
= item * Possible problems with the timestamp data .
PostgreSQL does not accept incorrect date / time values like B <2002-00-15> ,
while MySQL does not care about that . Currently my2pg cannot handle this
issue . You should care yourself to convert such a data .
= item * Use - s option if your numeric data are broken during conversion .
My2pg attempts to convert MySQL timestamps of the form B <yyyymmdd> to
B <yyyy-mm-dd> and B <yyyymmddhhmmss> to B < yyyy - mm - dd hh:mm:ss > . It performs
some heuristic checks to ensure that the month , day , hour , minutes and seconds have
values from the correct range ( 0 .. 12 , 0 .. 31 , 0 .. 23 , 0 .. 59 , 0 .. 59 respectively ) .
It is still possible that your numeric values that satisfy these conditions
will get broken .
= item * Possible problems with enclosing identifiers in double quotes .
2001-02-10 19:43:12 +08:00
All identifiers such as table and column names should be enclosed in double
quotes . Program can ' t handle upper - case identifiers ,
like DBA . Lower - case identifiers are OK .
2004-04-20 07:11:49 +08:00
= item * SET type emulation is not full . LIKE operation on
2001-02-10 19:43:12 +08:00
SETs , raw integer input values should be implemented
2004-04-20 07:11:49 +08:00
= item * B <Makefile>
generated during output is
2001-02-10 19:43:12 +08:00
platform - dependent and surely works only on
Linux /gcc (FreeBSD/gcc probably works as well - not tested )
2004-04-20 07:11:49 +08:00
= item * Generated B <libtypes.c> contain line
2001-02-10 19:43:12 +08:00
#include <postgres.h>
This file may be located not in standard compiler
include path , you need to check it before compiling .
= back
= head1 AUTHORS
2004-04-20 07:11:49 +08:00
B < ( c ) 2000 - 2002 Maxim V . Rudensky ( fonin @ ziet . zhitomir . ua ) > ( developer , maintainer )
B < ( c ) 2000 Valentine V . Danilchuk ( valdan @ ziet . zhitomir . ua ) > ( original script )
2001-02-10 19:43:12 +08:00
2001-11-21 10:43:30 +08:00
= head1 CREDITS
2004-04-20 07:11:49 +08:00
Great thanks to all those people who provided feedback and make development
of this tool easier .
2001-11-21 10:43:30 +08:00
Jeff Waugh <jaw@ic.net>
2004-04-20 07:11:49 +08:00
2001-11-21 10:43:30 +08:00
Joakim Lemstr <EFBFBD> m <jocke@bytewize.com> || <buddyh19@hotmail.com>
2004-04-20 07:11:49 +08:00
2001-11-21 10:43:30 +08:00
Yunliang Yu <yu@math.duke.edu>
2004-04-20 07:11:49 +08:00
2002-01-08 06:36:51 +08:00
Brad Hilton <bhilton@vpop.net>
2001-11-21 10:43:30 +08:00
2004-04-20 07:11:49 +08:00
If you are not listed here please write to me .
2001-02-10 19:43:12 +08:00
= head1 LICENSE
B <BSD>
= cut