I'm migrating a whole bunch of legacy data into mysql for a client. I export from their system, use sed to patch the nonconforming unparseable CSV it outputs, convert from macroman to UTF8, then tell MySQL to load it all into a series of tables from which the data will later be integrated into the standard system.
Today I realised the error of my ways: the column that I was using as a primary key isn't a primary key at all (at least not in the modern sense - of having a unique value - that we've become accustomed to since Codd invented the relational model in 1969).
I could have realised this earlier by just declaring it a primary key and watching MySQL spit at me, and therefore have fixed the problem sooner ... but no, who needs tests and constraints when assumptions are so much faster?
Enough about me ... the problem now is to assign each row in the exported
data a unique id, and the easiest, obviousest way to do that is to include
id integer primary key auto_increment column in each table
LOAD DATA INFILE etc etc shove the data into
the table, and the primary key takes care of itself.
But it wasn't obvious how to make this work without warnings about mismatched
columns. Here's the clever bit: what I did in the end was to add the
column after loading the data. In other words, I imported the data
into a table with no
id primary key column; and added that
column afterwards. It's easy to script, it does exactly what I want, and
it produces no spurious warnings. Perfecto!
Here's the concept in code:
> create table legacy_stuff(leg_col_1 varchar(255), leg_col_2 varchar(255), et_cetera_1 varchar(255), et_cetera_2 varchar(255)); > load data infile 'legacy_stuff.csv' into table legacy_stuff; -- plus all your favourite options; > show warnings; > alter table legacy_stuff add column id integer primary key auto_increment;
Update: for another way to look at this, see the
page on this topic. It turns out you can use
load data infile
with the list of columns you want to import to, in which case MySQL will
match the columns of your CSV to your specified column list. This way you
can let your auto-increment primary key column simply manage itself. This
solution does not suit my particular situation as I'm importing tables with
hundreds of columns (they didn't believe in normalisation where this data
comes from), and I don't want to have to maintain the column list in multiple