SQL优化:一个含有JOIN+LIMIT+ORDER BY的SQL(上)

有一条SQL语句执行比较慢，在并发情况下存在瓶颈：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52


select account.ID                  as id,
       account.GENDER_ID           as genderId,
       userGender.CODE             as genderCode,
       userGender.NAME             as genderName,
       account.NATION_ID           as nationId,
       userNation.CODE             as nationCode,
       userNation.NAME             as nationName,
       account.COUNTRY_ID          as countryId,
       userCountry.CODE            as countryCode,
       userCountry.NAME            as countryName,
       account.ADDRESS_ID          as addressId,
       userAddress.CODE            as addressCode,
       userAddress.NAME            as addressName,
       account.USER_ID             as userId,
       account.USER_NAME           as userName,
       account.USER_UID            as userUid,
       account.ACCOUNT_NAME        as accountName,
       account.IDENTITY_TYPE_ID    as identityTypeId,
       accountIdentity.CODE        as identityTypeCode,
       accountIdentity.NAME        as identityTypeName,
       account.ORGANIZATION_ID     as organizationId,
       accountOrganization.CODE    as organizationCode,
       accountOrganization.NAME    as organizationName,
       account.IS_DATA_CENTER      as isDataCenter,
       account.ACTIVATION          as activation,
       account.STATE               as state,
       account.ACCOUNT_EXPIRY_DATE as accountExpiryDate,
       account.ACCOUNT_LOCKED      as accountLocked,
       account.PHONE_NUMBER        as phoneNumber,
       account.EMAIL               as email,
       user.IMAGE_URL              as imageUrl
from TB_B_ACCOUNT account
         inner join TB_B_ORGANIZATION accountOrganization
                    on account.ORGANIZATION_ID = accountOrganization.ID AND accountOrganization.DELETED = 0
         inner join TB_B_IDENTITY_TYPE accountIdentity on account.IDENTITY_TYPE_ID = accountIdentity.ID
         inner join TB_B_DICTIONARY userCertificateType on account.certificate_type_id = userCertificateType.ID
         left join TB_B_DICTIONARY userGender on account.GENDER_ID = userGender.ID
         left join TB_B_DICTIONARY userNation on account.NATION_ID = userNation.ID
         left join TB_B_DICTIONARY userCountry on account.COUNTRY_ID = userCountry.ID
         left join TB_B_DICTIONARY userAddress on account.ADDRESS_ID = userAddress.ID
         left join TB_B_USER user on account.USER_ID = user.ID
where 1 = 1
  and (
        account.ACCOUNT_NAME like '%user%'
        OR account.USER_NAME like '%user%'
        OR accountIdentity.NAME like '%user%'
        OR accountOrganization.NAME like '%user%'
    )
  and account.USER_NAME like '%user%'
  and account.ACCOUNT_NAME like '%user%'
ORDER BY length(account.ACCOUNT_NAME), account.ACCOUNT_NAME
limit 20;

在优化之前先刷新相关表的统计数据：

1
2
3
4
5


ANALYZE TABLE TB_B_ACCOUNT;
ANALYZE TABLE TB_B_ORGANIZATION;
ANALYZE TABLE TB_B_IDENTITY_TYPE;
ANALYZE TABLE TB_B_DICTIONARY;
ANALYZE TABLE TB_B_USER;

1）排除ORDER BY和LIMIT

检查字段索引

检查查询条件中的相关字段，把相关字段的索引加上。

EXPLAIN结果：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	accountOrganization	NULL	ALL	PRIMARY	NULL	NULL	NULL	84	10	Using where
1	SIMPLE	account	NULL	ref	ORGANIZATION_ID,IDENTITY_TYPE_ID,TB_B_ACCOUNT_fk3	ORGANIZATION_ID	194	user_new.accountOrganization.ID	3637	1.23	Using where
1	SIMPLE	accountIdentity	NULL	eq_ref	PRIMARY	PRIMARY	194	user_new.account.IDENTITY_TYPE_ID	1	100	Using where
1	SIMPLE	userCertificateType	NULL	eq_ref	PRIMARY	PRIMARY	194	user_new.account.CERTIFICATE_TYPE_ID	1	100	Using index
1	SIMPLE	userGender	NULL	eq_ref	PRIMARY	PRIMARY	194	user_new.account.GENDER_ID	1	100	NULL
1	SIMPLE	userNation	NULL	eq_ref	PRIMARY	PRIMARY	194	user_new.account.NATION_ID	1	100	NULL
1	SIMPLE	userCountry	NULL	eq_ref	PRIMARY	PRIMARY	194	user_new.account.COUNTRY_ID	1	100	NULL
1	SIMPLE	userAddress	NULL	eq_ref	PRIMARY	PRIMARY	194	user_new.account.ADDRESS_ID	1	100	NULL
1	SIMPLE	user	NULL	eq_ref	PRIMARY	PRIMARY	194	user_new.account.USER_ID	1	100	NULL

去掉JOIN

根据业务逻辑，把查询分成两部分，第一步先查ACCOUNT_NAME，第二步查其他字段，第一步SQL：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


select account.ACCOUNT_NAME
from TB_B_ACCOUNT account
where 1 = 1
  and (
        account.ACCOUNT_NAME like '%user%'
        OR account.USER_NAME like '%user%'
        OR account.IDENTITY_TYPE_ID in (select accountIdentity.ID from TB_B_IDENTITY_TYPE accountIdentity where accountIdentity.NAME like '%user%')
        OR account.ORGANIZATION_ID in  (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where accountOrganization.NAME like '%user%')
    )
  and account.USER_NAME like '%user%'
  and account.ACCOUNT_NAME like '%user%'
  and account.ORGANIZATION_ID in
      (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where DELETED = 0);

EXPLAIN如下：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	accountOrganization	NULL	ALL	PRIMARY	NULL	NULL	NULL	84	10	Using where
1	PRIMARY	account	NULL	ref	ORGANIZATION_ID	ORGANIZATION_ID	194	user_new.accountOrganization.ID	3637	1.23	Using where
3	SUBQUERY	accountOrganization	NULL	index	PRIMARY	IDX_ORG_NAME	602	NULL	84	11.11	Using where; Using index
2	SUBQUERY	accountIdentity	NULL	index	PRIMARY	IDX_ID_TYPE_NAME	602	NULL	49	11.11	Using where; Using index

优化LIKE

看到LIKE用的是中间匹配%...%，这种是无法利用索引的，遂改成前缀匹配，所以做出以下优化：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


select account.ACCOUNT_NAME
from TB_B_ACCOUNT account
where 1 = 1
  and (
        account.ACCOUNT_NAME like 'user%'
        OR account.USER_NAME like 'user%'
        OR account.IDENTITY_TYPE_ID in (select accountIdentity.ID from TB_B_IDENTITY_TYPE accountIdentity where accountIdentity.NAME like 'user%')
        OR account.ORGANIZATION_ID in  (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where accountOrganization.NAME like 'user%')
    )
  and account.USER_NAME like 'user%'
  and account.ACCOUNT_NAME like 'user%'
  and account.ORGANIZATION_ID in
      (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where DELETED = 0);

EXPLAIN结果：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	accountOrganization	NULL	ALL	PRIMARY	NULL	NULL	NULL	84	10	Using where
1	PRIMARY	account	NULL	ref	UQ_ACCOUNT_NAME,ORGANIZATION_ID,UQ_ACCOUNT_USRNAME,IDX_ACCOUNT_USERNAME	ORGANIZATION_ID	194	user_new.accountOrganization.ID	3637	25	Using where
3	SUBQUERY	accountOrganization	NULL	range	PRIMARY,IDX_ORG_NAME	IDX_ORG_NAME	602	NULL	1	100	Using where; Using index
2	SUBQUERY	accountIdentity	NULL	range	PRIMARY,IDX_ID_TYPE_NAME	IDX_ID_TYPE_NAME	602	NULL	1	100	Using where; Using index

2）考虑ORDER BY优化

加上原先的ORDER BY之后，SQL如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


select account.ACCOUNT_NAME
from TB_B_ACCOUNT account
where 1 = 1
  and (
        account.ACCOUNT_NAME like 'user%'
        OR account.USER_NAME like 'user%'
        OR account.IDENTITY_TYPE_ID in (select accountIdentity.ID from TB_B_IDENTITY_TYPE accountIdentity where accountIdentity.NAME like 'user%')
        OR account.ORGANIZATION_ID in  (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where accountOrganization.NAME like 'user%')
    )
  and account.USER_NAME like 'user%'
  and account.ACCOUNT_NAME like 'user%'
  and account.ORGANIZATION_ID in
      (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where DELETED = 0)
ORDER BY length(account.ACCOUNT_NAME), account.ACCOUNT_NAME;

EXPLAIN结果：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	accountOrganization	NULL	ALL	PRIMARY	NULL	NULL	NULL	84	10	Using where; Using temporary; Using filesort
1	PRIMARY	account	NULL	ref	UQ_ACCOUNT_NAME,ORGANIZATION_ID,UQ_ACCOUNT_USRNAME,IDX_ACCOUNT_USERNAME	ORGANIZATION_ID	194	user_new.accountOrganization.ID	3637	25	Using where
3	SUBQUERY	accountOrganization	NULL	range	PRIMARY,IDX_ORG_NAME	IDX_ORG_NAME	602	NULL	1	100	Using where; Using index
2	SUBQUERY	accountIdentity	NULL	range	PRIMARY,IDX_ID_TYPE_NAME	IDX_ID_TYPE_NAME	602	NULL	1	100	Using where; Using index

第一行可以看到 Using temporary; Using filesort。

Using temporary意思是为了排序，使用了临时表hold住结果。
Using filesort意思是无法使用索引直接得到排好序的数据，需要利用内存或者磁盘文件排序。

三种方案评估

那么对于这个SQL有三种做法，下面分析它们的复杂度。

先汇总一下基本情况：

A表 accountOrganization表有N行
B表 account表有M行
JOIN条件A.ID = B.ORGANIZATION_ID
B.ORGANIZATION上有索引。

列出基本算法复杂度

B+树查找复杂度 Log2(n)
排序复杂度是 n * Log2(n)
堆插入复杂度 Log2(n)

原方案

A做驱动表 N=500，B是被驱动表 M=14w，对结果做普通排序。

算法复杂度分析：

扫描A表 + A表每一行利用B表索引查找 + B表每条记录查找主键索引（回表） + B表记录的排序。

1
2
3


  N + N * Log2(M) + M * Log2(M) + M * Log2(M)
= 500 + 500 * Log2(14w) + 14w * Log2(14w) + 14w * Log2(14w)
= 4,372,745

优化方案一

A做驱动表 N=500，B是被驱动表 M=14w，利用filesort with small LIMIT optimization优化，即利用一个尺寸20的堆（LIMIT 20）来取前20条数据，前提是数据能够在sort_buffer_size里放得下。

算法复杂度分析：

扫描A表 + A表每一行利用B表索引查找 + B表每行查找主键索引（回表） + B表每行过一遍堆。

1
2
3


  N + N * Log2(M) + M * Log2(M) + M * Log2(20)
= 500 + 500 * Log2(14w) + 14w * Log2(14w) + 14w * Log2(20)
= 2,612,945

优化方案二

B做驱动表 M=14w，A是被驱动表 N=500，在B表新建字段ACCOUNT_NAME_LEN，并建立索引ACCOUNT_NAME_LEN,ACCOUNT_NAME，查询时强制走ACCOUNT_LEN_NAME索引从而达避免排序。

算法复杂度分析：

扫描B表索引 + B表每一行利用A表主键查找。

1
2
3


  M + M * Log2(N)
= 14w + 14w * Log2(500)
= 1,394,400

方案抉择

虽然从分析结果看，方案二更好，但是方案二强制使用了某个索引。而方案一看上去不好，但是MySQL可能会根据查询条件使用合适的索引来过滤数据避免全表扫描，所以选择方案一。

实现优化方案一

实现优化方案一的关键是让filesort在内存中发生，然后配合LIMIT，我们需要跟踪优化器来查看原始SQL执行情况：

1
2
3
4
5
6
7


SET optimizer_trace="enabled=on";
SET OPTIMIZER_TRACE_MAX_MEM_SIZE=10485760; 
SELECT ...; # your query here
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
# possibly more queries...
# When done with tracing, disable it:
SET optimizer_trace="enabled=off";

得到结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


"filesort_priority_queue_optimization": {
  "limit": 1000,
  "chosen": false,
  "cause": "sort_is_cheaper"
},
"filesort_summary": {
  "memory_available": 262144,
  "key_size": 248,
  "row_size": 622,
  "max_rows_per_buffer": 421,
  "num_rows_estimate": 15,
  "num_rows_found": 131072,
  "num_initial_chunks_spilled_to_disk": 141,
  "peak_memory_used": 262144,
  "sort_algorithm": "std::stable_sort",
  "sort_mode": "<fixed_sort_key, packed_additional_fields>"
}

可以看到num_initial_chunks_spilled_to_disk:141，写了141个临时文件。

让filesort在内存中发生的相关系统参数有：

sort_buffer_size，规定了用于排序的缓存大小（字节），默认262144=256K。如果排序的数据在此buffer中能hold住，那么就会避免写磁盘。
max_sort_length，规定了排序时，每行最多取前多少个字节，如果行的尺寸超出该值，MySQL也就只会使用前多少个字节进行比较，该参数默认1024=1K。
Sort_merge_passes，是一个状态变量，表示了排序合并的次数。

我们先设置sort_buffer_size到1M（默认256K）

1
2


set sort_buffer_size=1048576;         # 只针对当前session有效
set persist sort_buffer_size=1048576; # 全局有效

执行SQL，注意添加了LIMIT：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


select account.ACCOUNT_NAME
from TB_B_ACCOUNT account
where 1 = 1
  and (
        account.ACCOUNT_NAME like 'user%'
        OR account.USER_NAME like 'user%'
        OR account.IDENTITY_TYPE_ID in (select accountIdentity.ID from TB_B_IDENTITY_TYPE accountIdentity where accountIdentity.NAME like 'user%')
        OR account.ORGANIZATION_ID in  (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where accountOrganization.NAME like 'user%')
    )
  and account.USER_NAME like 'user%'
  and account.ACCOUNT_NAME like 'user%'
  and account.ORGANIZATION_ID in
      (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where DELETED = 0)
ORDER BY length(account.ACCOUNT_NAME), account.ACCOUNT_NAME
LIMIT 20;

然后跟踪优化器：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


"filesort_priority_queue_optimization": {
  "limit": 1000,
  "chosen": true
},
"filesort_summary": {
  "memory_available": 1048576,
  "key_size": 248,
  "row_size": 618,
  "max_rows_per_buffer": 1001,
  "num_rows_estimate": 7623,
  "num_rows_found": 131072,
  "num_initial_chunks_spilled_to_disk": 0,
  "peak_memory_used": 626626,
  "sort_algorithm": "std::stable_sort",
  "unpacked_addon_fields": "using_priority_queue",
  "sort_mode": "<fixed_sort_key, additional_fields>"
}

可以看到filesort_priority_queue_optimization启用了，同时num_initial_chunks_spilled_to_disk变成了0。

filesort_priority_queue_optimization启用意味着MySQL采用了优先级队列（堆）来从结果集中取最小的N个元素。

在业务场景下，大部分分页每页20条，极少超过10页，因此这里不考虑深分页问题，所以加上LIMIT 200, 220看看：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


select account.ACCOUNT_NAME
from TB_B_ACCOUNT account
where 1 = 1
  and (
        account.ACCOUNT_NAME like 'user%'
        OR account.USER_NAME like 'user%'
        OR account.IDENTITY_TYPE_ID in (select accountIdentity.ID from TB_B_IDENTITY_TYPE accountIdentity where accountIdentity.NAME like 'user%')
        OR account.ORGANIZATION_ID in  (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where accountOrganization.NAME like 'user%')
    )
  and account.USER_NAME like 'user%'
  and account.ACCOUNT_NAME like 'user%'
  and account.ORGANIZATION_ID in
      (select accountOrganization.ID FROM TB_B_ORGANIZATION accountOrganization where DELETED = 0)
ORDER BY length(account.ACCOUNT_NAME), account.ACCOUNT_NAME
LIMIT 200, 220;

跟踪优化器结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


"filesort_priority_queue_optimization": {
  "limit": 420,
  "chosen": true
},
"filesort_summary": {
  "memory_available": 1048576,
  "key_size": 248,
  "row_size": 618,
  "max_rows_per_buffer": 421,
  "num_rows_estimate": 7623,
  "num_rows_found": 131072,
  "num_initial_chunks_spilled_to_disk": 0,
  "peak_memory_used": 263546,
  "sort_algorithm": "std::stable_sort",
  "unpacked_addon_fields": "using_priority_queue",
  "sort_mode": "<fixed_sort_key, additional_fields>"
}

可以看到filesort_priority_queue_optimization依然启用，而且num_initial_chunks_spilled_to_disk依然是0。

结论

做了以下优化：

WHERE用到的字段都加上索引。
LIKE改写成前缀匹配，这样可以利用索引。
避免JOIN，分两次查询，第一次先查ACCOUNT_NAME，第二次根据ACCOUNT_NAME查询想要的数据。
增加sort_buffer_size，避免filesort排序时写磁盘，并且利用堆+LIMIT来优化排序。

SQL优化:一个含有JOIN+LIMIT+ORDER BY的SQL(上)

1）排除ORDER BY和LIMIT

检查字段索引

去掉JOIN

优化LIKE

2）考虑ORDER BY优化

三种方案评估

原方案

优化方案一

优化方案二

方案抉择

实现优化方案一

结论

参考资料

版权

评论

SQL优化:一个含有JOIN+LIMIT+ORDER BY的SQL(上)

1）排除ORDER BY和LIMIT

检查字段索引

去掉JOIN

优化LIKE

2）考虑ORDER BY优化

三种方案评估

原方案

优化方案一

优化方案二

方案抉择

实现优化方案一

结论

参考资料

版权

相关文章

评论