Quantcast
Viewing all articles
Browse latest Browse all 15

House and Heisenberg having Replication Delay

So I am getting a mail with a complaint about rising replication delays in a certain replication hierarchy.

Not good, because said hierarchy is one of the important ones. As in 'If that breaks, people are sleeping under the bridge'-important.

The theory was that the change rate in that hierarchy is too high for the single threaded nature of MySQL replication. That was supported by the observation that all affected boxes had no local datadir, but were filer clients. Filer clients as slaves are dying first because the SAN introduces communication latencies that local disks don't have, and the single threaded nature of replication is not helpful here, either. Filers are better when it comes to concurrent accesses, really.

So if that theory would hold that would really ruin my day. Make that month: Said hierarchy is just now recovering from severe refactoring surgery and should have almost no exploitable technical debt that can be leveraged for short term scaling and tuning. If that thing really accumulates delay we are in serious trouble.

Now, I am used to lies. People lie. Boxes lie. So let's fire up the Graphite and have a look at how bad things are. I am choosing a random host from the supposedly sick hierarchy:


Continue reading "House and Heisenberg having Replication Delay"

Viewing all articles
Browse latest Browse all 15

Trending Articles