(*) 选择第一个JOIN的表为A
(**) 确定A表的访问方式
因为A表是第一个表,所以无法使用索引`IND_DID`(B.DepartmentID = A.DepartmentID)
那么只能使用索引`IND_L_D`(A.LastName = 'zhou')
使用IND_L_D索引的成本计算,总成本为25.2;参考前面计算;
(**) 这里访问A表的成本已经是25.2,比之前的最优成本2.4要大,忽略该顺序
所以,这次穷举搜索到此结束
把上面的过程简化如下:
(*) 选择第一个JOIN的表为B
(**) 确定B表的访问方式
(**) 从剩余的表中穷举选出第二个JOIN的表,这里剩余的表为:A
(**) 将A表加入JOIN,并确定其访问方式
(***) IND_L_D A.LastName = 'zhou'
(***) IND_DID B.DepartmentID = A.DepartmentID
(***) IND_L_D成本为25.2;IND_DID成本为1.2,所以选择后者为当前表的访问方式
(**) 确定A使用索引IND_DID,访问方式为ref
(**) JOIN顺序B|A,总成本为:1.2+1.2 = 2.4
(*) 选择第一个JOIN的表为A
(**) 确定A表的访问方式
(**) 这里访问A表的成本已经是25.2,比之前的最优成本2.4要大,忽略该顺序
至此,MySQL优化器就确定了所有表的最佳JOIN顺序和访问方式。
3. 测试环境
MySQL: 5.1.48-debug-log innodb plugin 1.0.9
CREATE TABLE `department` (
`DepartmentID` int(11) DEFAULT NULL,
`DepartmentName` varchar(20) DEFAULT NULL,
KEY `IND_D` (`DepartmentID`),
KEY `IND_DN` (`DepartmentName`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk;
CREATE TABLE `employee` (
`LastName` varchar(20) DEFAULT NULL,
`DepartmentID` int(11) DEFAULT NULL,
KEY `IND_L_D` (`LastName`),
KEY `IND_DID` (`DepartmentID`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk;
for i in `seq 1 1000` ; do mysql -vvv -uroot test -e 'insert into department values (600000*rand(),repeat(char(65+rand()*58),rand()*20))'; done
for i in `seq 1 1000` ; do mysql -vvv -uroot test -e 'insert into employee values (repeat(char(65+rand()*58),rand()*20),600000*rand())'; done
for i in `seq 1 50` ; do mysql -vvv -uroot test -e 'insert into employee values ("zhou",27760)'; done
for i in `seq 1 200` ; do mysql -vvv -uroot test -e 'insert into employee values (repeat(char(65+rand()*58),rand()*20),27760)'; done
for i in `seq 1 1` ; do mysql -vvv -uroot test -e 'insert into department values (27760,"TBX")'; done
show index from employee;
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| employee | 1 | IND_L_D | 1 | LastName | A | 1349 | NULL | NULL | YES | BTREE | |
| employee | 1 | IND_DID | 1 | DepartmentID | A | 1349 | NULL | NULL | YES | BTREE | |
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
show index from department;
+------------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| department | 1 | IND_D | 1 | DepartmentID | A | 1001 | NULL | NULL | YES | BTREE | |
| department | 1 | IND_DN | 1 | DepartmentName | A | 1001 | NULL | NULL | YES | BTREE | |
+------------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
4. 构造一个Bad case
因为关联条件中MySQL使用索引统计信息做成本预估,所以数据分布不均匀的时候,就容易做出错误的判断。简单的我们构造下面的案例:
表和索引结构不变,按照下面的方式构造数据:
for i in `seq 1 10000` ; do mysql -uroot test -e 'insert into department values (600000*rand(),repeat(char(65+rand()*58),rand()*20))'; done
for i in `seq 1 10000` ; do mysql -uroot test -e 'insert into employee values (repeat(char(65+rand()*58),rand()*20),600000*rand())'; done
for i in `seq 1 1` ; do mysql -uroot test -e 'insert into employee values ("zhou",27760)'; done
for i in `seq 1 10` ; do mysql -uroot test -e 'insert into department values (27760,"TBX")'; done
原文转自:http://www.orczhou.com/index.php/2013/04/how-mysql-choose-index-in-a-join/