Finding differences between two tables takes too long

I'm trying to find the diff of two tables using a left join.

My query is as follows:

CREATE TABLE my_diff (INDEX my_index (name, type))

SELECT AA.name, AA.type

FROM AA

LEFT JOIN BB

USING (name, type)

WHERE BB.type IS NULL;

Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column

Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)

EXPLAIN EXTENDED:

id  select_type     table   type    possible_keys   key         key_len     ref     rows    filtered    Extra

1   SIMPLE          AA      index   NULL            PRIMARY     383         NULL    396178  100.00      Using index

1   SIMPLE          BB      index   NULL            my_index    104         NULL    8359    100.00      Using where; Not exists; Using index; Using join buffer (Block Nested Loop)

The query takes a mind-boggling 11 min to finish.

I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.

What can I do to improve this?

edited Aug 24 '15 at 16:07

Paul White♦

51k14278450

asked Aug 24 '15 at 12:58

space_monkey

1112

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

– Rick James
Aug 25 '15 at 21:05

What is the value of innodb_buffer_pool_size?

– Rick James
Aug 25 '15 at 21:08

Could you please show create tables' structures?

– Jehad Keriaki
Aug 26 '15 at 3:16

add a comment |

I'm trying to find the diff of two tables using a left join.

My query is as follows:

CREATE TABLE my_diff (INDEX my_index (name, type))

SELECT AA.name, AA.type

FROM AA

LEFT JOIN BB

USING (name, type)

WHERE BB.type IS NULL;

Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column

Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)

EXPLAIN EXTENDED:

id  select_type     table   type    possible_keys   key         key_len     ref     rows    filtered    Extra

1   SIMPLE          AA      index   NULL            PRIMARY     383         NULL    396178  100.00      Using index

1   SIMPLE          BB      index   NULL            my_index    104         NULL    8359    100.00      Using where; Not exists; Using index; Using join buffer (Block Nested Loop)

The query takes a mind-boggling 11 min to finish.

I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.

What can I do to improve this?

edited Aug 24 '15 at 16:07

Paul White♦

51k14278450

asked Aug 24 '15 at 12:58

space_monkey

1112

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

– Rick James
Aug 25 '15 at 21:05

What is the value of innodb_buffer_pool_size?

– Rick James
Aug 25 '15 at 21:08

Could you please show create tables' structures?

– Jehad Keriaki
Aug 26 '15 at 3:16

add a comment |

I'm trying to find the diff of two tables using a left join.

My query is as follows:

CREATE TABLE my_diff (INDEX my_index (name, type))

SELECT AA.name, AA.type

FROM AA

LEFT JOIN BB

USING (name, type)

WHERE BB.type IS NULL;

Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column

Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)

EXPLAIN EXTENDED:

id  select_type     table   type    possible_keys   key         key_len     ref     rows    filtered    Extra

1   SIMPLE          AA      index   NULL            PRIMARY     383         NULL    396178  100.00      Using index

1   SIMPLE          BB      index   NULL            my_index    104         NULL    8359    100.00      Using where; Not exists; Using index; Using join buffer (Block Nested Loop)

The query takes a mind-boggling 11 min to finish.

I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.

What can I do to improve this?

edited Aug 24 '15 at 16:07

Paul White♦

51k14278450

asked Aug 24 '15 at 12:58

space_monkey

1112

I'm trying to find the diff of two tables using a left join.

My query is as follows:

CREATE TABLE my_diff (INDEX my_index (name, type))

SELECT AA.name, AA.type

FROM AA

LEFT JOIN BB

USING (name, type)

WHERE BB.type IS NULL;

Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column

Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)

EXPLAIN EXTENDED:

id  select_type     table   type    possible_keys   key         key_len     ref     rows    filtered    Extra

1   SIMPLE          AA      index   NULL            PRIMARY     383         NULL    396178  100.00      Using index

1   SIMPLE          BB      index   NULL            my_index    104         NULL    8359    100.00      Using where; Not exists; Using index; Using join buffer (Block Nested Loop)

The query takes a mind-boggling 11 min to finish.

I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.

What can I do to improve this?

mysql query-performance

edited Aug 24 '15 at 16:07

Paul White♦

51k14278450

asked Aug 24 '15 at 12:58

space_monkey

1112

edited Aug 24 '15 at 16:07

Paul White♦

51k14278450

asked Aug 24 '15 at 12:58

space_monkey

1112

edited Aug 24 '15 at 16:07

Paul White♦

51k14278450

edited Aug 24 '15 at 16:07

Paul White♦

51k14278450

edited Aug 24 '15 at 16:07

Paul White♦

51k14278450

asked Aug 24 '15 at 12:58

space_monkey

1112

asked Aug 24 '15 at 12:58

space_monkey

1112

asked Aug 24 '15 at 12:58

space_monkey

1112

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

– Rick James
Aug 25 '15 at 21:05

What is the value of innodb_buffer_pool_size?

– Rick James
Aug 25 '15 at 21:08

Could you please show create tables' structures?

– Jehad Keriaki
Aug 26 '15 at 3:16

add a comment |

You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

– Rick James
Aug 25 '15 at 21:05

What is the value of innodb_buffer_pool_size?

– Rick James
Aug 25 '15 at 21:08

Could you please show create tables' structures?

– Jehad Keriaki
Aug 26 '15 at 3:16

You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

– Rick James
Aug 25 '15 at 21:05

What is the value of innodb_buffer_pool_size?

– Rick James
Aug 25 '15 at 21:08

Could you please show create tables' structures?

– Jehad Keriaki
Aug 26 '15 at 3:16

add a comment |

2 Answers
2

active

oldest

votes

Create the result table, my_diff, without index, and when it is
populated, add the desired index.

Make sure you have an index on (name, type) in both tables. i.e. a composite index

It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.

From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.

answered Aug 24 '15 at 14:50

Jehad Keriaki

2,5831813

On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

– Rick James
Aug 25 '15 at 21:02

#2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

– Rick James
Aug 25 '15 at 21:07

add a comment |

Create the temp table as follows

#

# Make Temp Table

#

CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;

ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;

ALTER TABLE mydiff ADD PRIMARY KEY (user,type);

#

# Load Unique Keys From Table AA

#

INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;

#

# Load Keys From Table BB

#

INSERT INTO mydiff (user,type) SELECT user,type FROM BB

ON DUPLICATE KEY UPDATE diffcount=diffcount+1;

Then, count the missing keys

SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount

FROM mydiff GROUP BY diffcount WITH ROLLUP;

The output for diffcount of 1 shows how many (user,type) are not represented

To see the unused (user,type) records, run this

SELECT user,type FROM mydiff WHERE diffcount = 1;

GIVE IT A TRY !!!

It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.

edited Aug 24 '15 at 16:01

answered Aug 24 '15 at 15:56

RolandoMySQLDBA

141k24221379

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f112005%2ffinding-differences-between-two-tables-takes-too-long%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Create the result table, my_diff, without index, and when it is
populated, add the desired index.

Make sure you have an index on (name, type) in both tables. i.e. a composite index

It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.

From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.

answered Aug 24 '15 at 14:50

Jehad Keriaki

2,5831813

On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

– Rick James
Aug 25 '15 at 21:02

#2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

– Rick James
Aug 25 '15 at 21:07

add a comment |

Create the result table, my_diff, without index, and when it is
populated, add the desired index.

Make sure you have an index on (name, type) in both tables. i.e. a composite index

It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.

From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.

answered Aug 24 '15 at 14:50

Jehad Keriaki

2,5831813

On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

– Rick James
Aug 25 '15 at 21:02

#2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

– Rick James
Aug 25 '15 at 21:07

add a comment |

Create the result table, my_diff, without index, and when it is
populated, add the desired index.

Make sure you have an index on (name, type) in both tables. i.e. a composite index

It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.

From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.

answered Aug 24 '15 at 14:50

Jehad Keriaki

2,5831813

Create the result table, my_diff, without index, and when it is
populated, add the desired index.

Make sure you have an index on (name, type) in both tables. i.e. a composite index

It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.

From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.

answered Aug 24 '15 at 14:50

Jehad Keriaki

2,5831813

answered Aug 24 '15 at 14:50

Jehad Keriaki

2,5831813

answered Aug 24 '15 at 14:50

Jehad Keriaki

2,5831813

answered Aug 24 '15 at 14:50

Jehad Keriaki

2,5831813

On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

– Rick James
Aug 25 '15 at 21:02

#2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

– Rick James
Aug 25 '15 at 21:07

add a comment |

On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

– Rick James
Aug 25 '15 at 21:02

#2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

– Rick James
Aug 25 '15 at 21:07

On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

– Rick James
Aug 25 '15 at 21:02

#2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

– Rick James
Aug 25 '15 at 21:07

add a comment |

Create the temp table as follows

#

# Make Temp Table

#

CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;

ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;

ALTER TABLE mydiff ADD PRIMARY KEY (user,type);

#

# Load Unique Keys From Table AA

#

INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;

#

# Load Keys From Table BB

#

INSERT INTO mydiff (user,type) SELECT user,type FROM BB

ON DUPLICATE KEY UPDATE diffcount=diffcount+1;

Then, count the missing keys

SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount

FROM mydiff GROUP BY diffcount WITH ROLLUP;

The output for diffcount of 1 shows how many (user,type) are not represented

To see the unused (user,type) records, run this

SELECT user,type FROM mydiff WHERE diffcount = 1;

GIVE IT A TRY !!!

It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.

edited Aug 24 '15 at 16:01

answered Aug 24 '15 at 15:56

RolandoMySQLDBA

141k24221379

add a comment |

Create the temp table as follows

#

# Make Temp Table

#

CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;

ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;

ALTER TABLE mydiff ADD PRIMARY KEY (user,type);

#

# Load Unique Keys From Table AA

#

INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;

#

# Load Keys From Table BB

#

INSERT INTO mydiff (user,type) SELECT user,type FROM BB

ON DUPLICATE KEY UPDATE diffcount=diffcount+1;

Then, count the missing keys

SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount

FROM mydiff GROUP BY diffcount WITH ROLLUP;

The output for diffcount of 1 shows how many (user,type) are not represented

To see the unused (user,type) records, run this

SELECT user,type FROM mydiff WHERE diffcount = 1;

GIVE IT A TRY !!!

It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.

edited Aug 24 '15 at 16:01

answered Aug 24 '15 at 15:56

RolandoMySQLDBA

141k24221379

add a comment |

Create the temp table as follows

#

# Make Temp Table

#

CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;

ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;

ALTER TABLE mydiff ADD PRIMARY KEY (user,type);

#

# Load Unique Keys From Table AA

#

INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;

#

# Load Keys From Table BB

#

INSERT INTO mydiff (user,type) SELECT user,type FROM BB

ON DUPLICATE KEY UPDATE diffcount=diffcount+1;

Then, count the missing keys

SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount

FROM mydiff GROUP BY diffcount WITH ROLLUP;

The output for diffcount of 1 shows how many (user,type) are not represented

To see the unused (user,type) records, run this

SELECT user,type FROM mydiff WHERE diffcount = 1;

GIVE IT A TRY !!!

It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.

edited Aug 24 '15 at 16:01

answered Aug 24 '15 at 15:56

RolandoMySQLDBA

141k24221379

Create the temp table as follows

#

# Make Temp Table

#

CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;

ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;

ALTER TABLE mydiff ADD PRIMARY KEY (user,type);

#

# Load Unique Keys From Table AA

#

INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;

#

# Load Keys From Table BB

#

INSERT INTO mydiff (user,type) SELECT user,type FROM BB

ON DUPLICATE KEY UPDATE diffcount=diffcount+1;

Then, count the missing keys

SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount

FROM mydiff GROUP BY diffcount WITH ROLLUP;

The output for diffcount of 1 shows how many (user,type) are not represented

To see the unused (user,type) records, run this

SELECT user,type FROM mydiff WHERE diffcount = 1;

GIVE IT A TRY !!!

It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.

edited Aug 24 '15 at 16:01

answered Aug 24 '15 at 15:56

RolandoMySQLDBA

141k24221379

edited Aug 24 '15 at 16:01

answered Aug 24 '15 at 15:56

RolandoMySQLDBA

141k24221379

answered Aug 24 '15 at 15:56

RolandoMySQLDBA

141k24221379

answered Aug 24 '15 at 15:56

RolandoMySQLDBA

141k24221379

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vryjdfkk