Finding differences between two tables takes too long
I'm trying to find the diff of two tables using a left join.
My query is as follows:
CREATE TABLE my_diff (INDEX my_index (name, type))
SELECT AA.name, AA.type
FROM AA
LEFT JOIN BB
USING (name, type)
WHERE BB.type IS NULL;
- Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column
- Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)
EXPLAIN EXTENDED:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE AA index NULL PRIMARY 383 NULL 396178 100.00 Using index
1 SIMPLE BB index NULL my_index 104 NULL 8359 100.00 Using where; Not exists; Using index; Using join buffer (Block Nested Loop)
The query takes a mind-boggling 11 min to finish.
I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.
What can I do to improve this?
mysql query-performance
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I'm trying to find the diff of two tables using a left join.
My query is as follows:
CREATE TABLE my_diff (INDEX my_index (name, type))
SELECT AA.name, AA.type
FROM AA
LEFT JOIN BB
USING (name, type)
WHERE BB.type IS NULL;
- Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column
- Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)
EXPLAIN EXTENDED:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE AA index NULL PRIMARY 383 NULL 396178 100.00 Using index
1 SIMPLE BB index NULL my_index 104 NULL 8359 100.00 Using where; Not exists; Using index; Using join buffer (Block Nested Loop)
The query takes a mind-boggling 11 min to finish.
I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.
What can I do to improve this?
mysql query-performance
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.
– Rick James
Aug 25 '15 at 21:05
What is the value ofinnodb_buffer_pool_size
?
– Rick James
Aug 25 '15 at 21:08
Could you pleaseshow create table
s' structures?
– Jehad Keriaki
Aug 26 '15 at 3:16
add a comment |
I'm trying to find the diff of two tables using a left join.
My query is as follows:
CREATE TABLE my_diff (INDEX my_index (name, type))
SELECT AA.name, AA.type
FROM AA
LEFT JOIN BB
USING (name, type)
WHERE BB.type IS NULL;
- Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column
- Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)
EXPLAIN EXTENDED:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE AA index NULL PRIMARY 383 NULL 396178 100.00 Using index
1 SIMPLE BB index NULL my_index 104 NULL 8359 100.00 Using where; Not exists; Using index; Using join buffer (Block Nested Loop)
The query takes a mind-boggling 11 min to finish.
I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.
What can I do to improve this?
mysql query-performance
I'm trying to find the diff of two tables using a left join.
My query is as follows:
CREATE TABLE my_diff (INDEX my_index (name, type))
SELECT AA.name, AA.type
FROM AA
LEFT JOIN BB
USING (name, type)
WHERE BB.type IS NULL;
- Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column
- Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)
EXPLAIN EXTENDED:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE AA index NULL PRIMARY 383 NULL 396178 100.00 Using index
1 SIMPLE BB index NULL my_index 104 NULL 8359 100.00 Using where; Not exists; Using index; Using join buffer (Block Nested Loop)
The query takes a mind-boggling 11 min to finish.
I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.
What can I do to improve this?
mysql query-performance
mysql query-performance
edited Aug 24 '15 at 16:07
Paul White♦
51k14278450
51k14278450
asked Aug 24 '15 at 12:58
space_monkeyspace_monkey
1112
1112
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.
– Rick James
Aug 25 '15 at 21:05
What is the value ofinnodb_buffer_pool_size
?
– Rick James
Aug 25 '15 at 21:08
Could you pleaseshow create table
s' structures?
– Jehad Keriaki
Aug 26 '15 at 3:16
add a comment |
You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.
– Rick James
Aug 25 '15 at 21:05
What is the value ofinnodb_buffer_pool_size
?
– Rick James
Aug 25 '15 at 21:08
Could you pleaseshow create table
s' structures?
– Jehad Keriaki
Aug 26 '15 at 3:16
You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.
– Rick James
Aug 25 '15 at 21:05
You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.
– Rick James
Aug 25 '15 at 21:05
What is the value of
innodb_buffer_pool_size
?– Rick James
Aug 25 '15 at 21:08
What is the value of
innodb_buffer_pool_size
?– Rick James
Aug 25 '15 at 21:08
Could you please
show create table
s' structures?– Jehad Keriaki
Aug 26 '15 at 3:16
Could you please
show create table
s' structures?– Jehad Keriaki
Aug 26 '15 at 3:16
add a comment |
2 Answers
2
active
oldest
votes
- Create the result table,
my_diff
, without index, and when it is
populated, add the desired index. - Make sure you have an index on (name, type) in both tables. i.e. a composite index
- It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.
From your explain
result, it looks like you don't have the index on (name, type), so for each row in AA
, it will do a full scan to BB. This is one enough reason to make the query slow.
On the contrary, it saysUsing index
for both tables, implying that there are composite indexes on both tables.
– Rick James
Aug 25 '15 at 21:02
#2 -- he clearly has that in both tables, since it saysUsing index
. There is no full scan of BB; it's even caching the index in the Join buffer.
– Rick James
Aug 25 '15 at 21:07
add a comment |
Create the temp table as follows
#
# Make Temp Table
#
CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
#
# Load Unique Keys From Table AA
#
INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
#
# Load Keys From Table BB
#
INSERT INTO mydiff (user,type) SELECT user,type FROM BB
ON DUPLICATE KEY UPDATE diffcount=diffcount+1;
Then, count the missing keys
SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
FROM mydiff GROUP BY diffcount WITH ROLLUP;
The output for diffcount of 1 shows how many (user,type)
are not represented
To see the unused (user,type)
records, run this
SELECT user,type FROM mydiff WHERE diffcount = 1;
GIVE IT A TRY !!!
It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f112005%2ffinding-differences-between-two-tables-takes-too-long%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
- Create the result table,
my_diff
, without index, and when it is
populated, add the desired index. - Make sure you have an index on (name, type) in both tables. i.e. a composite index
- It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.
From your explain
result, it looks like you don't have the index on (name, type), so for each row in AA
, it will do a full scan to BB. This is one enough reason to make the query slow.
On the contrary, it saysUsing index
for both tables, implying that there are composite indexes on both tables.
– Rick James
Aug 25 '15 at 21:02
#2 -- he clearly has that in both tables, since it saysUsing index
. There is no full scan of BB; it's even caching the index in the Join buffer.
– Rick James
Aug 25 '15 at 21:07
add a comment |
- Create the result table,
my_diff
, without index, and when it is
populated, add the desired index. - Make sure you have an index on (name, type) in both tables. i.e. a composite index
- It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.
From your explain
result, it looks like you don't have the index on (name, type), so for each row in AA
, it will do a full scan to BB. This is one enough reason to make the query slow.
On the contrary, it saysUsing index
for both tables, implying that there are composite indexes on both tables.
– Rick James
Aug 25 '15 at 21:02
#2 -- he clearly has that in both tables, since it saysUsing index
. There is no full scan of BB; it's even caching the index in the Join buffer.
– Rick James
Aug 25 '15 at 21:07
add a comment |
- Create the result table,
my_diff
, without index, and when it is
populated, add the desired index. - Make sure you have an index on (name, type) in both tables. i.e. a composite index
- It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.
From your explain
result, it looks like you don't have the index on (name, type), so for each row in AA
, it will do a full scan to BB. This is one enough reason to make the query slow.
- Create the result table,
my_diff
, without index, and when it is
populated, add the desired index. - Make sure you have an index on (name, type) in both tables. i.e. a composite index
- It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.
From your explain
result, it looks like you don't have the index on (name, type), so for each row in AA
, it will do a full scan to BB. This is one enough reason to make the query slow.
answered Aug 24 '15 at 14:50
Jehad KeriakiJehad Keriaki
2,5831813
2,5831813
On the contrary, it saysUsing index
for both tables, implying that there are composite indexes on both tables.
– Rick James
Aug 25 '15 at 21:02
#2 -- he clearly has that in both tables, since it saysUsing index
. There is no full scan of BB; it's even caching the index in the Join buffer.
– Rick James
Aug 25 '15 at 21:07
add a comment |
On the contrary, it saysUsing index
for both tables, implying that there are composite indexes on both tables.
– Rick James
Aug 25 '15 at 21:02
#2 -- he clearly has that in both tables, since it saysUsing index
. There is no full scan of BB; it's even caching the index in the Join buffer.
– Rick James
Aug 25 '15 at 21:07
On the contrary, it says
Using index
for both tables, implying that there are composite indexes on both tables.– Rick James
Aug 25 '15 at 21:02
On the contrary, it says
Using index
for both tables, implying that there are composite indexes on both tables.– Rick James
Aug 25 '15 at 21:02
#2 -- he clearly has that in both tables, since it says
Using index
. There is no full scan of BB; it's even caching the index in the Join buffer.– Rick James
Aug 25 '15 at 21:07
#2 -- he clearly has that in both tables, since it says
Using index
. There is no full scan of BB; it's even caching the index in the Join buffer.– Rick James
Aug 25 '15 at 21:07
add a comment |
Create the temp table as follows
#
# Make Temp Table
#
CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
#
# Load Unique Keys From Table AA
#
INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
#
# Load Keys From Table BB
#
INSERT INTO mydiff (user,type) SELECT user,type FROM BB
ON DUPLICATE KEY UPDATE diffcount=diffcount+1;
Then, count the missing keys
SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
FROM mydiff GROUP BY diffcount WITH ROLLUP;
The output for diffcount of 1 shows how many (user,type)
are not represented
To see the unused (user,type)
records, run this
SELECT user,type FROM mydiff WHERE diffcount = 1;
GIVE IT A TRY !!!
It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.
add a comment |
Create the temp table as follows
#
# Make Temp Table
#
CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
#
# Load Unique Keys From Table AA
#
INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
#
# Load Keys From Table BB
#
INSERT INTO mydiff (user,type) SELECT user,type FROM BB
ON DUPLICATE KEY UPDATE diffcount=diffcount+1;
Then, count the missing keys
SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
FROM mydiff GROUP BY diffcount WITH ROLLUP;
The output for diffcount of 1 shows how many (user,type)
are not represented
To see the unused (user,type)
records, run this
SELECT user,type FROM mydiff WHERE diffcount = 1;
GIVE IT A TRY !!!
It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.
add a comment |
Create the temp table as follows
#
# Make Temp Table
#
CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
#
# Load Unique Keys From Table AA
#
INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
#
# Load Keys From Table BB
#
INSERT INTO mydiff (user,type) SELECT user,type FROM BB
ON DUPLICATE KEY UPDATE diffcount=diffcount+1;
Then, count the missing keys
SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
FROM mydiff GROUP BY diffcount WITH ROLLUP;
The output for diffcount of 1 shows how many (user,type)
are not represented
To see the unused (user,type)
records, run this
SELECT user,type FROM mydiff WHERE diffcount = 1;
GIVE IT A TRY !!!
It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.
Create the temp table as follows
#
# Make Temp Table
#
CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
#
# Load Unique Keys From Table AA
#
INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
#
# Load Keys From Table BB
#
INSERT INTO mydiff (user,type) SELECT user,type FROM BB
ON DUPLICATE KEY UPDATE diffcount=diffcount+1;
Then, count the missing keys
SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
FROM mydiff GROUP BY diffcount WITH ROLLUP;
The output for diffcount of 1 shows how many (user,type)
are not represented
To see the unused (user,type)
records, run this
SELECT user,type FROM mydiff WHERE diffcount = 1;
GIVE IT A TRY !!!
It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.
edited Aug 24 '15 at 16:01
answered Aug 24 '15 at 15:56
RolandoMySQLDBARolandoMySQLDBA
141k24221379
141k24221379
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f112005%2ffinding-differences-between-two-tables-takes-too-long%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.
– Rick James
Aug 25 '15 at 21:05
What is the value of
innodb_buffer_pool_size
?– Rick James
Aug 25 '15 at 21:08
Could you please
show create table
s' structures?– Jehad Keriaki
Aug 26 '15 at 3:16