06 February 2012

MySQL "LOAD DATA INFILE" with no primary key

I'm migrating a whole bunch of legacy data into mysql for a client. I export from their system, use sed to patch the nonconforming unparseable CSV it outputs, convert from macroman to UTF8, then tell MySQL to load it all into a series of tables from which the data will later be integrated into the standard system.

Today I realised the error of my ways: the column that I was using as a primary key isn't a primary key at all (at least not in the modern sense - of having a unique value - that we've become accustomed to since Codd invented the relational model in 1969).

I could have realised this earlier by just declaring it a primary key and watching MySQL spit at me, and therefore have fixed the problem sooner ... but no, who needs tests and constraints when assumptions are so much faster?

Enough about me ... the problem now is to assign each row in the exported data a unique id, and the easiest, obviousest way to do that is to include an id integer primary key auto_increment column in each table definition, let LOAD DATA INFILE etc etc shove the data into the table, and the primary key takes care of itself.

But it wasn't obvious how to make this work without warnings about mismatched columns. Here's the clever bit: what I did in the end was to add the id column after loading the data. In other words, I imported the data into a table with no id primary key column; and added that column afterwards. It's easy to script, it does exactly what I want, and it produces no spurious warnings. Perfecto!

Here's the concept in code:

> create table legacy_stuff(leg_col_1 varchar(255), leg_col_2 varchar(255), et_cetera_1 varchar(255), et_cetera_2 varchar(255));

> load data infile 'legacy_stuff.csv' into table legacy_stuff; -- plus all your favourite options;

> show warnings;

> alter table legacy_stuff add column id integer primary key auto_increment;

Good luck...

Update: for another way to look at this, see the stackoverflow page on this topic. It turns out you can use load data infile with the list of columns you want to import to, in which case MySQL will match the columns of your CSV to your specified column list. This way you can let your auto-increment primary key column simply manage itself. This solution does not suit my particular situation as I'm importing tables with hundreds of columns (they didn't believe in normalisation where this data comes from), and I don't want to have to maintain the column list in multiple places.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.