Deleting duplicates with group by and count
What is the fastest method to convert the following query:
SELECT COUNT(*) as c FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1;
... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.
mysql duplication
|
show 2 more comments
What is the fastest method to convert the following query:
SELECT COUNT(*) as c FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1;
... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.
mysql duplication
No primary key? ugh
– Derek Downey
Dec 22 '11 at 15:35
No primary key, and "several million entries"
– Aaron
Dec 22 '11 at 15:46
Tell me about it :)
– tlvince
Dec 22 '11 at 15:46
1
Is there a field that is at least unique?
– Aaron
Dec 22 '11 at 16:00
1
is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?
– Lumpy
Dec 22 '11 at 16:27
|
show 2 more comments
What is the fastest method to convert the following query:
SELECT COUNT(*) as c FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1;
... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.
mysql duplication
What is the fastest method to convert the following query:
SELECT COUNT(*) as c FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1;
... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.
mysql duplication
mysql duplication
asked Dec 22 '11 at 15:07
tlvincetlvince
11112
11112
No primary key? ugh
– Derek Downey
Dec 22 '11 at 15:35
No primary key, and "several million entries"
– Aaron
Dec 22 '11 at 15:46
Tell me about it :)
– tlvince
Dec 22 '11 at 15:46
1
Is there a field that is at least unique?
– Aaron
Dec 22 '11 at 16:00
1
is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?
– Lumpy
Dec 22 '11 at 16:27
|
show 2 more comments
No primary key? ugh
– Derek Downey
Dec 22 '11 at 15:35
No primary key, and "several million entries"
– Aaron
Dec 22 '11 at 15:46
Tell me about it :)
– tlvince
Dec 22 '11 at 15:46
1
Is there a field that is at least unique?
– Aaron
Dec 22 '11 at 16:00
1
is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?
– Lumpy
Dec 22 '11 at 16:27
No primary key? ugh
– Derek Downey
Dec 22 '11 at 15:35
No primary key? ugh
– Derek Downey
Dec 22 '11 at 15:35
No primary key, and "several million entries"
– Aaron
Dec 22 '11 at 15:46
No primary key, and "several million entries"
– Aaron
Dec 22 '11 at 15:46
Tell me about it :)
– tlvince
Dec 22 '11 at 15:46
Tell me about it :)
– tlvince
Dec 22 '11 at 15:46
1
1
Is there a field that is at least unique?
– Aaron
Dec 22 '11 at 16:00
Is there a field that is at least unique?
– Aaron
Dec 22 '11 at 16:00
1
1
is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?
– Lumpy
Dec 22 '11 at 16:27
is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?
– Lumpy
Dec 22 '11 at 16:27
|
show 2 more comments
3 Answers
3
active
oldest
votes
According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid
as a rule for uniqueness in the GROUP BY
clause.
You can try this :
CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid
fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique
table, do this:
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
Give it a Try !!!
It seems to have worked and very quickly at that. However, using asum
on thecount
field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.
– tlvince
Dec 22 '11 at 23:08
add a comment |
If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like
DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)
This would remove the duplicates then you could delete the
If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.
I added a new column usingalter table tbl_fields add unq mediumint not null auto_increment key
and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?
– tlvince
Dec 22 '11 at 23:05
This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.
– a_horse_with_no_name
Jul 19 '12 at 13:04
add a comment |
CREATE TABLE friends_copy LIKE friends;
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1
New contributor
deleting duplicates and keeping only one copy
– Anju Jhanji
12 mins ago
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;
– Anju Jhanji
11 mins ago
if only unique records are to be kept
– Anju Jhanji
10 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f9602%2fdeleting-duplicates-with-group-by-and-count%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid
as a rule for uniqueness in the GROUP BY
clause.
You can try this :
CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid
fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique
table, do this:
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
Give it a Try !!!
It seems to have worked and very quickly at that. However, using asum
on thecount
field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.
– tlvince
Dec 22 '11 at 23:08
add a comment |
According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid
as a rule for uniqueness in the GROUP BY
clause.
You can try this :
CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid
fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique
table, do this:
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
Give it a Try !!!
It seems to have worked and very quickly at that. However, using asum
on thecount
field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.
– tlvince
Dec 22 '11 at 23:08
add a comment |
According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid
as a rule for uniqueness in the GROUP BY
clause.
You can try this :
CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid
fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique
table, do this:
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
Give it a Try !!!
According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid
as a rule for uniqueness in the GROUP BY
clause.
You can try this :
CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid
fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique
table, do this:
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;
Give it a Try !!!
answered Dec 22 '11 at 17:07
RolandoMySQLDBARolandoMySQLDBA
143k24226382
143k24226382
It seems to have worked and very quickly at that. However, using asum
on thecount
field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.
– tlvince
Dec 22 '11 at 23:08
add a comment |
It seems to have worked and very quickly at that. However, using asum
on thecount
field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.
– tlvince
Dec 22 '11 at 23:08
It seems to have worked and very quickly at that. However, using a
sum
on the count
field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.– tlvince
Dec 22 '11 at 23:08
It seems to have worked and very quickly at that. However, using a
sum
on the count
field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.– tlvince
Dec 22 '11 at 23:08
add a comment |
If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like
DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)
This would remove the duplicates then you could delete the
If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.
I added a new column usingalter table tbl_fields add unq mediumint not null auto_increment key
and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?
– tlvince
Dec 22 '11 at 23:05
This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.
– a_horse_with_no_name
Jul 19 '12 at 13:04
add a comment |
If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like
DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)
This would remove the duplicates then you could delete the
If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.
I added a new column usingalter table tbl_fields add unq mediumint not null auto_increment key
and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?
– tlvince
Dec 22 '11 at 23:05
This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.
– a_horse_with_no_name
Jul 19 '12 at 13:04
add a comment |
If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like
DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)
This would remove the duplicates then you could delete the
If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.
If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like
DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)
This would remove the duplicates then you could delete the
If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.
answered Dec 22 '11 at 18:40
LumpyLumpy
1,12961935
1,12961935
I added a new column usingalter table tbl_fields add unq mediumint not null auto_increment key
and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?
– tlvince
Dec 22 '11 at 23:05
This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.
– a_horse_with_no_name
Jul 19 '12 at 13:04
add a comment |
I added a new column usingalter table tbl_fields add unq mediumint not null auto_increment key
and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?
– tlvince
Dec 22 '11 at 23:05
This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.
– a_horse_with_no_name
Jul 19 '12 at 13:04
I added a new column using
alter table tbl_fields add unq mediumint not null auto_increment key
and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?– tlvince
Dec 22 '11 at 23:05
I added a new column using
alter table tbl_fields add unq mediumint not null auto_increment key
and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?– tlvince
Dec 22 '11 at 23:05
This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.
– a_horse_with_no_name
Jul 19 '12 at 13:04
This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.
– a_horse_with_no_name
Jul 19 '12 at 13:04
add a comment |
CREATE TABLE friends_copy LIKE friends;
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1
New contributor
deleting duplicates and keeping only one copy
– Anju Jhanji
12 mins ago
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;
– Anju Jhanji
11 mins ago
if only unique records are to be kept
– Anju Jhanji
10 mins ago
add a comment |
CREATE TABLE friends_copy LIKE friends;
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1
New contributor
deleting duplicates and keeping only one copy
– Anju Jhanji
12 mins ago
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;
– Anju Jhanji
11 mins ago
if only unique records are to be kept
– Anju Jhanji
10 mins ago
add a comment |
CREATE TABLE friends_copy LIKE friends;
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1
New contributor
CREATE TABLE friends_copy LIKE friends;
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1
New contributor
edited 13 mins ago
New contributor
answered 21 mins ago
Anju JhanjiAnju Jhanji
11
11
New contributor
New contributor
deleting duplicates and keeping only one copy
– Anju Jhanji
12 mins ago
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;
– Anju Jhanji
11 mins ago
if only unique records are to be kept
– Anju Jhanji
10 mins ago
add a comment |
deleting duplicates and keeping only one copy
– Anju Jhanji
12 mins ago
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;
– Anju Jhanji
11 mins ago
if only unique records are to be kept
– Anju Jhanji
10 mins ago
deleting duplicates and keeping only one copy
– Anju Jhanji
12 mins ago
deleting duplicates and keeping only one copy
– Anju Jhanji
12 mins ago
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;
– Anju Jhanji
11 mins ago
insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;
– Anju Jhanji
11 mins ago
if only unique records are to be kept
– Anju Jhanji
10 mins ago
if only unique records are to be kept
– Anju Jhanji
10 mins ago
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f9602%2fdeleting-duplicates-with-group-by-and-count%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
No primary key? ugh
– Derek Downey
Dec 22 '11 at 15:35
No primary key, and "several million entries"
– Aaron
Dec 22 '11 at 15:46
Tell me about it :)
– tlvince
Dec 22 '11 at 15:46
1
Is there a field that is at least unique?
– Aaron
Dec 22 '11 at 16:00
1
is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?
– Lumpy
Dec 22 '11 at 16:27