2017-12-19 seconds_behind_master

基础

seconds_behind_master 在show slave status的时候展示，表示从库相对于主库的延迟。他的实现，简单来讲，就是用从库当前执行到的binlog的时间戳（主库的binlog内提供，事件开始时间+事件执行耗时）和当前时间做对比，就是延迟。

问题

在正常场景下，这个值是基本准确的，可以反映主从延迟。但是异常情况下，往往没有什么用，无法反应真实的情况。

主库binlog线程挂掉

当主库binlog线程挂掉，从库的seconds_behind_master并不会增大。因为从库计算的基准是自己拿到的最新的binlog。当主库binlog线程挂掉，从库拿不到新数据的时候，从库只是会以为没有新数据而已，而seconds_behind_master会一直都是0。

从库遇到死锁

当从库重放主库binlog的时候，遇到死锁，多次尝试后依然失败，就会停止主从同步。这个时候看到的seconds_behind_master就是最后一条，不再变化，因为主从已经停止（io线程和sql都停止了）。

这个时候其实主从是停止了，但是如果从一些监控平台（如grafana）查看主从延迟，会发现是0，因为这个值来自show master status输出的seconds_behind_master。其实如果直接查看show master_status的输出，会看到死锁的错误提示，但是监控平台一般只是采集seconds_behind_master这个值，从而忽略了真实的问题。

相关配置：

innodb_lock_wait_timeout 锁等待时间
slave_transaction_retries 重试次数

The innodb_lock_wait_timeout variable set the length of time a transaction will wait for a row lock resource before “giving up”. The default value is 50 seconds. Please note that this variable applies to InnoDB row locks only. A MySQL table lock does not happen inside InnoDB and this timeout does not apply to waits for table locks. InnoDB does detect transaction deadlocks in its own lock table immediately and rolls back one transaction. The lock wait timeout value does not apply to such a wait.

The slave_transaction_retries set the number of time the replication slave thread to retry before stopping with an error due to any of the following situation (the default for this variable is 10):

an InnoDB deadlock