A colleague of mine has been benchmarking mysqldump data load vs. various versions of LOAD DATA INFILE. He created a sample data as a text file with either 100k or 20M rows of five integers each, the first column of which is the pk.
All insertion has been done on empty and new tables. The text files we read at least once before to warm up the OS disk cache. The tables have two non-unique single-column indexes. All happens on a idle-ish DB master with some substantial memory and a NetApp hosting the datadir (via XFS and LVM).
He benchmarked four cases:
Summary: The result is not surprising: Both using PK order and dropping/re-adding indexes improves performance considerably. The PK order insertion becomes more and more crucial with a larger dataset (which is not at all surprising if you think about what happens when adding a record to the innodb PK tree).
Continue reading "LOAD DATA INFILE (and mysqldump)"
CODE:
perl -MList::Util=shuffle -e '@k=shuffle(1..20e6);
for (@k) {
print $_, " ", join(" ", map int(rand(1e9)), 0..3), "\n";
}' > loadme_nonpkorder.txt
perl -e 'print ++$i, " ", join(" ", map int(rand(1e9)), 0..3), "\n"
for 1..20e6' > loadme_pkorder.txt
for (@k) {
print $_, " ", join(" ", map int(rand(1e9)), 0..3), "\n";
}' > loadme_nonpkorder.txt
perl -e 'print ++$i, " ", join(" ", map int(rand(1e9)), 0..3), "\n"
for 1..20e6' > loadme_pkorder.txt
All insertion has been done on empty and new tables. The text files we read at least once before to warm up the OS disk cache. The tables have two non-unique single-column indexes. All happens on a idle-ish DB master with some substantial memory and a NetApp hosting the datadir (via XFS and LVM).
He benchmarked four cases:
- Insertion in PK order.
- Insertion in PK order, dropping indexes before insertion and re-adding them later.
- Insertion in random order.
- Insertion in random order, dropping indexes before insertion and re-adding them later.
Summary: The result is not surprising: Both using PK order and dropping/re-adding indexes improves performance considerably. The PK order insertion becomes more and more crucial with a larger dataset (which is not at all surprising if you think about what happens when adding a record to the innodb PK tree).
Continue reading "LOAD DATA INFILE (and mysqldump)"