Wednesday, January 30, 2013

IN Subqueries in MySQL 5.6 Are Optimized Away

IN Subqueries in MySQL 5.6 Are Optimized Away:
In MySQL 5.6, it looks like IN() subqueries are optimized even better than they are in MariaDB 5.5. Here’s a typical IN() subquery, using the sakila sample database (query taken from slide 6 of the presentation about new MySQL 5.6 optimizer statistics):
SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)
Before there were any subquery optimizations, say if you are using MySQL 5.1, you would have to rewrite this query as a JOIN, to avoid the dreaded DEPENDENT SUBQUERY that shows up in the EXPLAIN:
mysql> explain SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G

*************************** 1. row ***************************

id: 1

select_type: PRIMARY

table: film

type: index

possible_keys: NULL

key: idx_title

key_len: 767

ref: NULL

rows: 1025

Extra: Using where; Using index

*************************** 2. row ***************************

id: 2

select_type: DEPENDENT SUBQUERY

table: film_actor

type: index_subquery

possible_keys: idx_fk_film_id

key: idx_fk_film_id

key_len: 2

ref: func

rows: 1

Extra: Using index

MariaDB 5.5 boasts subquery optimization, and rightfully so. It looks like MariaDB materializes the subquery:
MariaDB [sakila]> explain SELECT title FROM film WHERE film_id IN

-> (SELECT film_id FROM film_actor)\G

*************************** 1. row ***************************

id: 1

select_type: PRIMARY

table: film

type: index

possible_keys: PRIMARY

key: idx_title

key_len: 767

ref: NULL

rows: 1043

Extra: Using index

*************************** 2. row ***************************

id: 1

select_type: PRIMARY

table:

type: eq_ref

possible_keys: distinct_key

key: distinct_key

key_len: 2

ref: func

rows: 1

Extra:

*************************** 3. row ***************************

id: 2

select_type: MATERIALIZED

table: film_actor

type: index

possible_keys: idx_fk_film_id

key: idx_fk_film_id

key_len: 2

ref: NULL

rows: 4889

Extra: Using index

So MariaDB recognizes the subquery and optimizes it. But it is still optimized as a subquery. There are 3 rows here, a new <subquery2> table is used to help optimize the results.
In MySQL 5.6, the subquery is actually optimized away. The optimizer actually changes it. You can see this in the EXPLAIN plan by looking at the select_type. In both the MySQL 5.1 and MariaDB 5.5 examples, the select_type is PRIMARY, which is used for the outer query in a subquery (or the first SELECT in a UNION, but that does not apply here). In MySQL 5.6, the select_type is SIMPLE for both rows. Note that MySQL 5.6 also does not have to add a third table as MariaDB does:
mysql> explain SELECT title FROM film WHERE film_id IN

-> (SELECT film_id FROM film_actor)\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: film

type: index

possible_keys: PRIMARY

key: idx_title

key_len: 767

ref: NULL

rows: 1000

Extra: Using index

*************************** 2. row ***************************

id: 1

select_type: SIMPLE

table: film_actor

type: ref

possible_keys: idx_fk_film_id

key: idx_fk_film_id

key_len: 2

ref: sakila.film.film_id

rows: 1

Extra: Using index; FirstMatch(film)

2 rows in set (0.00 sec)

In the presentation, the Oracle team says that for DBT3 Query #18, “execution time reduces from days to seconds”. With optimizations like this, I believe it!

PlanetMySQL Voting:
Vote UP /
Vote DOWN

DIGITAL JUICE

No comments:

Post a Comment

Thank's!