Administrator
发布于 2025-08-20 / 5 阅读
0
0

MySQL死锁排查

数据库日常维护中我们经常遇到死锁的问题,由于无法获取造成死锁的事务内执行过的语句,对我们死锁的分析造成很大的困难。但是在MySQL 5.7中我们可以利用performance_schema来获取这些语句,为我们解决死锁问题提供了一个新的思路,下面我们将解释这种方法的使用。

一、开启相关统计的方法

如果打开performance_schema选项来收集执行过的语句和事务会有性能损失,一般建议需要的时候开启,然后在线关闭掉。

(1) 在my.cnf中设置打开、关闭performance_schema选项随数据库启动

#设置setup_instruments表收集相关statement event

performance-schema-instrument='statement/%=ON'

#开启events_statements_current表存储当前连接线程执行的最后一条statement event信息

performance-schema-consumer-events-statements-current=ON

#开启events-statements-history表默认存储每个线程最近10条statement event信息

performance-schema-consumer-events-statements-history=ON

#开启events-statements-history-long表默认存储最近10000条语句event信息

performance-schema-consumer-events-statements-history-long=ON

performance-schema-consumer-statements-digest=ON

#设置setup_instruments表收集transaction event

performance-schema-instrument='transaction=ON'

#开启events_transactions_current表存储当前连接线程执行的transaction event信息

performance-schema-consumer-events-transactions-current=ON

#开启events_transactions_history表默认存储每个线程最近10条transaction event信息

performance-schema-consumer-events-transactions-history=ON

#开启events_statements_history_long表默认存储最近10000条语句event信息。

performance-schema-consumer-events-transactions-history-long=ON

(2) 在my.cnf中设置关闭performance_schema选项语句和事务收集

performance-schema-instrument='statement/%=OFF'

performance-schema-consumer-events-statements-current=OFF

performance-schema-consumer-events-statements-history=OFF

performance-schema-consumer-events-statements-history-long=OFF

performance-schema-consumer-statements-digest=OFF

performance-schema-instrument='transaction=OFF'

performance-schema-consumer-events-transactions-current=OFF

performance-schema-consumer-events-transactions-history=OFF

performance-schema-consumer-events-transactions-history-long=OFF

(3) 在线打开performance_schema选项收集执行过的语句和事务

update setup_consumers set ENABLED='yes' where name like 'events_statements%';

update setup_consumers set ENABLED='yes' where name like 'events_transactions%';

update setup_consumers set ENABLED='yes' where name like 'statements-digest';

update setup_instruments set ENABLED='yes',TIMED='yes' where name ='transaction';

(4) 在线关闭performance_schema选项收集执行过的语句和事务

update setup_consumers set ENABLED='no' where name like 'events_statements%';

update setup_consumers set ENABLED='no' where name like 'events_transactions%';

update setup_consumers set ENABLED='no' where name like 'statements-digest';

update setup_instruments set ENABLED='no',TIMED='no' where name ='transaction';

(5) 相关参数

performance_schema_events_statements_history_size:定义events_statements_history表中每个线程最大行数,静态参数,修改需要重启实例。

performance_schema_events_statements_history_long_size:定义events_statements_history_long表最大行数,静态参数,修改需要重启实例。

performance_schema_events_transactions_history_size:定义events_transactions_history表每个线程最大行数,静态参数,修改需要重启实例。

performance_schema_events_transactions_history_long_size:定义events_transactions_history_long表最大行数,静态参数,修改需要重启实例。

二、根据死锁信息来获取造成死锁的语句

下面是一个典型的死锁日志:

*** (1) TRANSACTION:

TRANSACTION 897551, ACTIVE 48 sec starting index read

mysql tables in use 1, locked 1

LOCK WAIT 5 lock struct(s), heap size 1136, 3 row lock(s)

MySQL thread id 4, OS thread handle 140029509830400, query id 123 localhost root

statistics

select * from t1 where id=1 for update

2019-07-09T10:29:40.710270+08:00 3 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO

BE GRANTED:

RECORD LOCKS space id 591 page no 3 n bits 80 index PRIMARY of table txc_testt1

trx id 897551 lock_mode X locks rec but not gap waiting

......

2019-07-09T10:29:40.711199+08:00 3 [Note] InnoDB: *** (2) TRANSACTION:

TRANSACTION 897550, ACTIVE 62 sec starting index read

mysql tables in use 1, locked 1

5 lock struct(s), heap size 1136, 3 row lock(s)

MySQL thread id 3, OS thread handle 140029510035200, query id 124 localhost root

statistics

select * from t2 where id=1 for update

2019-07-09T10:29:40.711364+08:00 3 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 591 page no 3 n bits 80 index PRIMARY of table txc_testt1

trx id 897550 lock_mode X locks rec but not gap

......

2019-07-09T10:29:40.712249+08:00 3 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO

BE GRANTED:

RECORD LOCKS space id 592 page no 3 n bits 80 index PRIMARY of table txc_testt2

trx id 897550 lock_mode X locks rec but not gap waiting

......

2019-07-09T10:29:40.713153+08:00 3 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)

(1)通过events_statements_history_long来查询死锁语句

事务1

在死锁日志中我们可以看到导致死锁的最后一个语句是‘select * from t1 where id=1 for update’,那么我们就可以使用它来查询events_statements_history_long表。

mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,

CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE, MESSAGE_TEXT,timer_start

from events_statements_history_long where sql_text like

'select * from t1 where id=1 for update' \G

*************************** 1. row ***************************

THREAD_ID: 37

EVENT_ID: 98

END_EVENT_ID: 106

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

SQL_TEXT: select * from t1 where id=1 for update

DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 96

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: NULL

timer_start: 266799699801000

*************************** 2. row ***************************

THREAD_ID: 38

EVENT_ID: 35

END_EVENT_ID: 37

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

SQL_TEXT: select * from t1 where id=1 for update

DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 19

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: NULL

事务2

在死锁日志中我们可以看到导致死锁的最后一个语句是‘select * from t2 where id=1 for update’,那么我们就可以使用它来查询events_statements_history_long表。

mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,

CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE, MESSAGE_TEXT

from events_statements_history_long where sql_text like

'delete from t where id=25' \G

*************************** 1. row ***************************

(event关联的线程id)

THREAD_ID: 38

EVENT_ID: 21

END_EVENT_ID: 29

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

(执行的sql语句信息)

SQL_TEXT: select * from t2 where id=1 for update

DIGEST: 315b4a6a8f7424bc7591256a8937a213

CURRENT_SCHEMA: txc_test

(父语句event id)

NESTING_EVENT_ID: 19

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: NULL

(event开始时间(从实例启动到现在的时间差),单位皮秒)

timer_start: 280428752972000

*************************** 2. row ***************************

(event关联的线程id)

THREAD_ID: 37

EVENT_ID: 112

END_EVENT_ID: 114

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

(执行的sql语句信息)

SQL_TEXT: select * from t2 where id=1 for update

DIGEST: 315b4a6a8f7424bc7591256a8937a213

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 96

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: Deadlock found when trying to get lock; try restarting transaction

(event开始时间(从实例启动到现在的时间差),单位皮秒)

timer_start: 328978250011000

虽然我们根据死锁日志进行了语句查询,但是我们发现没法确认哪一个是事务1,哪一个是事务2。不过我们可以通过MESSAGE_TEXT中的信息‘Deadlock found when trying to get lock; try restarting transaction’,

确认THREAD_ID:37为事务2,THREAD_ID: 38为事务1的线程,因为我们的死锁日志显示事务2被回滚掉了。需要注意这里的THREAD_ID是performance_schema内部使用的,和我们平时show processlist访问的PROCESSLIST_ID不一样。它是一个performance_schema内部的计数器:

(2)提取信息

事务1

最后一条语句的END_EVENT_ID=37

最后一条语句的NESTING_EVENT_ID=19

THREAD_ID=38

EVENT_ID:事件启动时的线程当前事件号

THREAD_ID,EVENT_ID:与事件关联的线程号和事件启动时的事件编号,可以使用THREAD_ID和EVENT_ID列值来唯一标识该行,这两行的值作为组合条件时不会出现相同的数据行

END_EVENT_ID:当一个事件开始执行时,对应行记录的该列值被设置为NULL,当一个事件执行结束时,对应的行记录的该列值被更新为该事件的ID

NESTING_EVENT_ID:表示该行信息中的EVENT_ID事件是嵌套在哪个事件中,即父事件的EVENT_ID

因此我们可以通过如下语句得出经历的所有语句如下:

mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,

CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE, MESSAGE_TEXT,timer_start

from events_statements_history_long where thread_id=38 and END_EVENT_ID>=19 and

END_EVENT_ID <=37 \G

*************************** 1. row ***************************

THREAD_ID: 38

EVENT_ID: 18

END_EVENT_ID: 19

EVENT_NAME: statement/sql/begin

SOURCE: socket_connection.cc:101

SQL_TEXT: begin

DIGEST: f57daa74a09445d1e1c496f28fe6d906

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: NULL

NESTING_EVENT_TYPE: NULL

MESSAGE_TEXT: NULL

timer_start: 274749880798000

*************************** 2. row ***************************

THREAD_ID: 38

EVENT_ID: 21

END_EVENT_ID: 29

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

SQL_TEXT: select * from t2 where id=1 for update

DIGEST: 315b4a6a8f7424bc7591256a8937a213

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 19

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: NULL

timer_start: 280428752972000

*************************** 3. row ***************************

THREAD_ID: 38

EVENT_ID: 31

END_EVENT_ID: 33

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

SQL_TEXT: select * from t1 where id=3 for update

DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 19

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: NULL

timer_start: 296208754204000

*************************** 4. row ***************************

THREAD_ID: 38

EVENT_ID: 35

END_EVENT_ID: 37

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

SQL_TEXT: select * from t1 where id=1 for update

DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 19

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: NULL

timer_start: 319574947823000

4 rows in set (0.00 sec)

事务2

最后一条语句的END_EVENT_ID=114

最后一条语句的NESTING_EVENT_ID=96

THREAD_ID=37

因此我们可以通过如下语句得出经历的所有语句如下:

mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,

CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE,MESSAGE_TEXT,timer_start

from events_statements_history_long where thread_id = 37 and END_EVENT_ID >= 96

and END_EVENT_ID <= 114\G

*************************** 1. row ***************************

THREAD_ID: 37

EVENT_ID: 95

END_EVENT_ID: 96

EVENT_NAME: statement/sql/begin

SOURCE: socket_connection.cc:101

SQL_TEXT: begin

DIGEST: f57daa74a09445d1e1c496f28fe6d906

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: NULL

NESTING_EVENT_TYPE: NULL

MESSAGE_TEXT: NULL

timer_start: 251219441701000

*************************** 2. row ***************************

THREAD_ID: 37

EVENT_ID: 98

END_EVENT_ID: 106

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

SQL_TEXT: select * from t1 where id=1 for update

DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 96

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: NULL

timer_start: 266799699801000

*************************** 3. row ***************************

THREAD_ID: 37

EVENT_ID: 108

END_EVENT_ID: 110

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

SQL_TEXT: select * from t2 where id=6 for update

DIGEST: 315b4a6a8f7424bc7591256a8937a213

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 96

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: NULL

timer_start: 308095739676000

*************************** 4. row ***************************

THREAD_ID: 37

EVENT_ID: 112

END_EVENT_ID: 114

EVENT_NAME: statement/sql/select

SOURCE: socket_connection.cc:101

SQL_TEXT: select * from t2 where id=1 for update

DIGEST: 315b4a6a8f7424bc7591256a8937a213

CURRENT_SCHEMA: txc_test

NESTING_EVENT_ID: 96

NESTING_EVENT_TYPE: TRANSACTION

MESSAGE_TEXT: Deadlock found when trying to get lock; try restarting transaction

timer_start: 328978250011000

4 rows in set (0.01 sec)

(3)猜测死锁形成过程

根据上面两个事务执行的sql大致可以推断出死锁的sql语句,如果两个事务里面执行的sql很多那可能就需要花更多的时间来找出造成死锁的语句:

TX1 TX2

select * from t1 where id=1 for update;

select * from t2 where id=1 for update;

select * from t1 where id=3 for update;

select * from t2 where id=6 for update;

select * from t1 where id=1 for update;

select * from t2 where id=1 for update;

三、总结

通过以上的查询基本可以获取造成死锁的事务内执行的语句,由于线上业务量大可能造成events_statements_history_long查询不到需要的语句(默认存储10000条),需要及时监控发现死锁。

打开performance_schema的选项,有性能损失。

如果线上实例是每个database对应一个独立用户,可以通过设置收集指定用户执行的所有event。

1).关闭收集所有用户的event

update setup_actors set ENABLED='NO',HISTORY='NO';

2).插入需要收集event的指定用户(例如我只想收集txc用户下的所有event,参考如下)

insert into setup_actors select '%','txc','%','YES','YES';

3).select * from setup_actors;

+------+------+------+---------+---------+

| HOST | USER | ROLE | ENABLED | HISTORY |

+------+------+------+---------+---------+

| % | % | % | NO | NO |

| % | txc | % | YES | YES |

+------+------+------+---------+---------+


评论