Deleting duplicates with group by and count












2















What is the fastest method to convert the following query:



SELECT COUNT(*) as c FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1;


... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.










share|improve this question























  • No primary key? ugh

    – Derek Downey
    Dec 22 '11 at 15:35











  • No primary key, and "several million entries"

    – Aaron
    Dec 22 '11 at 15:46











  • Tell me about it :)

    – tlvince
    Dec 22 '11 at 15:46






  • 1





    Is there a field that is at least unique?

    – Aaron
    Dec 22 '11 at 16:00








  • 1





    is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

    – Lumpy
    Dec 22 '11 at 16:27
















2















What is the fastest method to convert the following query:



SELECT COUNT(*) as c FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1;


... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.










share|improve this question























  • No primary key? ugh

    – Derek Downey
    Dec 22 '11 at 15:35











  • No primary key, and "several million entries"

    – Aaron
    Dec 22 '11 at 15:46











  • Tell me about it :)

    – tlvince
    Dec 22 '11 at 15:46






  • 1





    Is there a field that is at least unique?

    – Aaron
    Dec 22 '11 at 16:00








  • 1





    is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

    – Lumpy
    Dec 22 '11 at 16:27














2












2








2








What is the fastest method to convert the following query:



SELECT COUNT(*) as c FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1;


... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.










share|improve this question














What is the fastest method to convert the following query:



SELECT COUNT(*) as c FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1;


... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.







mysql duplication






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 22 '11 at 15:07









tlvincetlvince

11112




11112













  • No primary key? ugh

    – Derek Downey
    Dec 22 '11 at 15:35











  • No primary key, and "several million entries"

    – Aaron
    Dec 22 '11 at 15:46











  • Tell me about it :)

    – tlvince
    Dec 22 '11 at 15:46






  • 1





    Is there a field that is at least unique?

    – Aaron
    Dec 22 '11 at 16:00








  • 1





    is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

    – Lumpy
    Dec 22 '11 at 16:27



















  • No primary key? ugh

    – Derek Downey
    Dec 22 '11 at 15:35











  • No primary key, and "several million entries"

    – Aaron
    Dec 22 '11 at 15:46











  • Tell me about it :)

    – tlvince
    Dec 22 '11 at 15:46






  • 1





    Is there a field that is at least unique?

    – Aaron
    Dec 22 '11 at 16:00








  • 1





    is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

    – Lumpy
    Dec 22 '11 at 16:27

















No primary key? ugh

– Derek Downey
Dec 22 '11 at 15:35





No primary key? ugh

– Derek Downey
Dec 22 '11 at 15:35













No primary key, and "several million entries"

– Aaron
Dec 22 '11 at 15:46





No primary key, and "several million entries"

– Aaron
Dec 22 '11 at 15:46













Tell me about it :)

– tlvince
Dec 22 '11 at 15:46





Tell me about it :)

– tlvince
Dec 22 '11 at 15:46




1




1





Is there a field that is at least unique?

– Aaron
Dec 22 '11 at 16:00







Is there a field that is at least unique?

– Aaron
Dec 22 '11 at 16:00






1




1





is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

– Lumpy
Dec 22 '11 at 16:27





is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

– Lumpy
Dec 22 '11 at 16:27










3 Answers
3






active

oldest

votes


















3














According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.



You can try this :



CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:



ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


Give it a Try !!!






share|improve this answer
























  • It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

    – tlvince
    Dec 22 '11 at 23:08



















0














If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like



DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)


This would remove the duplicates then you could delete the



If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.






share|improve this answer
























  • I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

    – tlvince
    Dec 22 '11 at 23:05











  • This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

    – a_horse_with_no_name
    Jul 19 '12 at 13:04



















0














CREATE TABLE friends_copy LIKE friends;



insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1






share|improve this answer










New contributor




Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • deleting duplicates and keeping only one copy

    – Anju Jhanji
    12 mins ago











  • insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

    – Anju Jhanji
    11 mins ago













  • if only unique records are to be kept

    – Anju Jhanji
    10 mins ago











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f9602%2fdeleting-duplicates-with-group-by-and-count%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.



You can try this :



CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:



ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


Give it a Try !!!






share|improve this answer
























  • It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

    – tlvince
    Dec 22 '11 at 23:08
















3














According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.



You can try this :



CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:



ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


Give it a Try !!!






share|improve this answer
























  • It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

    – tlvince
    Dec 22 '11 at 23:08














3












3








3







According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.



You can try this :



CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:



ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


Give it a Try !!!






share|improve this answer













According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.



You can try this :



CREATE TABLE tbl_fields_unique LIKE tbl_fields;
ALTER TABLE tbl_fields_unique
ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);
INSERT IGNORE INTO tbl_fields_unique
SELECT * FROM tbl_fields;
ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:



ALTER TABLE tbl_fields RENAME tbl_fields_old;
ALTER TABLE tbl_fields_unique RENAME tbl_fields;


Give it a Try !!!







share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 22 '11 at 17:07









RolandoMySQLDBARolandoMySQLDBA

143k24226382




143k24226382













  • It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

    – tlvince
    Dec 22 '11 at 23:08



















  • It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

    – tlvince
    Dec 22 '11 at 23:08

















It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

– tlvince
Dec 22 '11 at 23:08





It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

– tlvince
Dec 22 '11 at 23:08













0














If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like



DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)


This would remove the duplicates then you could delete the



If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.






share|improve this answer
























  • I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

    – tlvince
    Dec 22 '11 at 23:05











  • This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

    – a_horse_with_no_name
    Jul 19 '12 at 13:04
















0














If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like



DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)


This would remove the duplicates then you could delete the



If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.






share|improve this answer
























  • I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

    – tlvince
    Dec 22 '11 at 23:05











  • This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

    – a_horse_with_no_name
    Jul 19 '12 at 13:04














0












0








0







If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like



DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)


This would remove the duplicates then you could delete the



If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.






share|improve this answer













If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like



DELETE FROM tbl_fields
WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)
FROM tbl_fields
WHERE fieldnotes IS NULL
GROUP BY fieldno,fieldserial,id,fielddate,fieldsid
HAVING COUNT(*) > 1)


This would remove the duplicates then you could delete the



If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.







share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 22 '11 at 18:40









LumpyLumpy

1,12961935




1,12961935













  • I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

    – tlvince
    Dec 22 '11 at 23:05











  • This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

    – a_horse_with_no_name
    Jul 19 '12 at 13:04



















  • I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

    – tlvince
    Dec 22 '11 at 23:05











  • This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

    – a_horse_with_no_name
    Jul 19 '12 at 13:04

















I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

– tlvince
Dec 22 '11 at 23:05





I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

– tlvince
Dec 22 '11 at 23:05













This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

– a_horse_with_no_name
Jul 19 '12 at 13:04





This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

– a_horse_with_no_name
Jul 19 '12 at 13:04











0














CREATE TABLE friends_copy LIKE friends;



insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1






share|improve this answer










New contributor




Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • deleting duplicates and keeping only one copy

    – Anju Jhanji
    12 mins ago











  • insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

    – Anju Jhanji
    11 mins ago













  • if only unique records are to be kept

    – Anju Jhanji
    10 mins ago
















0














CREATE TABLE friends_copy LIKE friends;



insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1






share|improve this answer










New contributor




Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • deleting duplicates and keeping only one copy

    – Anju Jhanji
    12 mins ago











  • insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

    – Anju Jhanji
    11 mins ago













  • if only unique records are to be kept

    – Anju Jhanji
    10 mins ago














0












0








0







CREATE TABLE friends_copy LIKE friends;



insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1






share|improve this answer










New contributor




Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










CREATE TABLE friends_copy LIKE friends;



insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1







share|improve this answer










New contributor




Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer








edited 13 mins ago





















New contributor




Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 21 mins ago









Anju JhanjiAnju Jhanji

11




11




New contributor




Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • deleting duplicates and keeping only one copy

    – Anju Jhanji
    12 mins ago











  • insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

    – Anju Jhanji
    11 mins ago













  • if only unique records are to be kept

    – Anju Jhanji
    10 mins ago



















  • deleting duplicates and keeping only one copy

    – Anju Jhanji
    12 mins ago











  • insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

    – Anju Jhanji
    11 mins ago













  • if only unique records are to be kept

    – Anju Jhanji
    10 mins ago

















deleting duplicates and keeping only one copy

– Anju Jhanji
12 mins ago





deleting duplicates and keeping only one copy

– Anju Jhanji
12 mins ago













insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

– Anju Jhanji
11 mins ago







insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

– Anju Jhanji
11 mins ago















if only unique records are to be kept

– Anju Jhanji
10 mins ago





if only unique records are to be kept

– Anju Jhanji
10 mins ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f9602%2fdeleting-duplicates-with-group-by-and-count%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Liste der Baudenkmale in Friedland (Mecklenburg)

Single-Malt-Whisky

Czorneboh