This is the english translation of an article in my german blog. This article, like the german original, is licensed CC-BY-SA. The english translation has been kindly provided by Tobias Klausmann.
Recently, I had to explain this to several people, hence a writeup for the blog for easier reference. The question:
In MySQL, every string has a label that describes the character encoding the string was written in (and should be interpreted in). The string _latin1"Köhntopp" thus (hopefully) is the character sequence K-0xF6-hntopp and the string _utf8"Köhntopp" consequently should be K-0xC3 0xB6-hntopp. Problems arise as soon as the label (_latin1 or _utf8) does not match the encoding inside the string (0xF6 vs. 0xC3 0xB6).
This is outlined in more detail in Handling character sets, and you should have read that article before you continue.
Continue reading "MySQL is destroying my Umlauts"
Recently, I had to explain this to several people, hence a writeup for the blog for easier reference. The question:
I have content in my database that can be sucessfully read and written by my application, but if I do a mysqldump to transfer the data to a new system, all the non-ASCII characters like Umlauts are destroyed.This happens if you save data to a DB with the wrong text encoding label.
In MySQL, every string has a label that describes the character encoding the string was written in (and should be interpreted in). The string _latin1"Köhntopp" thus (hopefully) is the character sequence K-0xF6-hntopp and the string _utf8"Köhntopp" consequently should be K-0xC3 0xB6-hntopp. Problems arise as soon as the label (_latin1 or _utf8) does not match the encoding inside the string (0xF6 vs. 0xC3 0xB6).
This is outlined in more detail in Handling character sets, and you should have read that article before you continue.
Continue reading "MySQL is destroying my Umlauts"