Deleting duplicates with group by and count

What is the fastest method to convert the following query:

SELECT COUNT(*) as c FROM tbl_fields

WHERE fieldnotes IS NULL

GROUP BY fieldno,fieldserial,id,fielddate,fieldsid 

HAVING COUNT(*) > 1;

... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.

asked Dec 22 '11 at 15:07

tlvince

11112

No primary key? ugh

– Derek Downey
Dec 22 '11 at 15:35

No primary key, and "several million entries"

– Aaron
Dec 22 '11 at 15:46

Tell me about it :)

– tlvince
Dec 22 '11 at 15:46

1

Is there a field that is at least unique?

– Aaron
Dec 22 '11 at 16:00

1

is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

– Lumpy
Dec 22 '11 at 16:27

|
show 2 more comments

What is the fastest method to convert the following query:

SELECT COUNT(*) as c FROM tbl_fields

WHERE fieldnotes IS NULL

GROUP BY fieldno,fieldserial,id,fielddate,fieldsid 

HAVING COUNT(*) > 1;

... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.

asked Dec 22 '11 at 15:07

tlvince

11112

No primary key? ugh

– Derek Downey
Dec 22 '11 at 15:35

No primary key, and "several million entries"

– Aaron
Dec 22 '11 at 15:46

Tell me about it :)

– tlvince
Dec 22 '11 at 15:46

1

Is there a field that is at least unique?

– Aaron
Dec 22 '11 at 16:00

1

is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

– Lumpy
Dec 22 '11 at 16:27

|
show 2 more comments

What is the fastest method to convert the following query:

SELECT COUNT(*) as c FROM tbl_fields

WHERE fieldnotes IS NULL

GROUP BY fieldno,fieldserial,id,fielddate,fieldsid 

HAVING COUNT(*) > 1;

... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.

asked Dec 22 '11 at 15:07

tlvince

11112

What is the fastest method to convert the following query:

SELECT COUNT(*) as c FROM tbl_fields

WHERE fieldnotes IS NULL

GROUP BY fieldno,fieldserial,id,fielddate,fieldsid 

HAVING COUNT(*) > 1;

... into one that will delete duplicated records? The table does not have any primary keys and contains several million entries.

mysql duplication

asked Dec 22 '11 at 15:07

tlvince

11112

asked Dec 22 '11 at 15:07

tlvince

11112

asked Dec 22 '11 at 15:07

tlvince

11112

asked Dec 22 '11 at 15:07

tlvince

11112

asked Dec 22 '11 at 15:07

tlvince

11112

No primary key? ugh

– Derek Downey
Dec 22 '11 at 15:35

No primary key, and "several million entries"

– Aaron
Dec 22 '11 at 15:46

Tell me about it :)

– tlvince
Dec 22 '11 at 15:46

1

Is there a field that is at least unique?

– Aaron
Dec 22 '11 at 16:00

1

is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

– Lumpy
Dec 22 '11 at 16:27

|
show 2 more comments

No primary key? ugh

– Derek Downey
Dec 22 '11 at 15:35

No primary key, and "several million entries"

– Aaron
Dec 22 '11 at 15:46

Tell me about it :)

– tlvince
Dec 22 '11 at 15:46

1

Is there a field that is at least unique?

– Aaron
Dec 22 '11 at 16:00

1

is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

– Lumpy
Dec 22 '11 at 16:27

No primary key? ugh

– Derek Downey
Dec 22 '11 at 15:35

No primary key, and "several million entries"

– Aaron
Dec 22 '11 at 15:46

Tell me about it :)

– tlvince
Dec 22 '11 at 15:46

Is there a field that is at least unique?

– Aaron
Dec 22 '11 at 16:00

is it possible to add a column to the table and fill the column with a unique key then drop the column after the clean up?

– Lumpy
Dec 22 '11 at 16:27

|
show 2 more comments

3 Answers
3

active

oldest

votes

According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.

You can try this :

CREATE TABLE tbl_fields_unique LIKE tbl_fields;

ALTER TABLE tbl_fields_unique

ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);

INSERT IGNORE INTO tbl_fields_unique

SELECT * FROM tbl_fields;

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

Give it a Try !!!

answered Dec 22 '11 at 17:07

RolandoMySQLDBA

143k24226382

It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

– tlvince
Dec 22 '11 at 23:08

add a comment |

If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like

DELETE FROM tbl_fields

WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)

                            FROM tbl_fields

                           WHERE fieldnotes IS NULL 

                          GROUP BY fieldno,fieldserial,id,fielddate,fieldsid  

                         HAVING COUNT(*) > 1)

This would remove the duplicates then you could delete the

If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.

answered Dec 22 '11 at 18:40

Lumpy

1,12961935

I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

– tlvince
Dec 22 '11 at 23:05

This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

– a_horse_with_no_name
Jul 19 '12 at 13:04

add a comment |

CREATE TABLE friends_copy LIKE friends;

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1

edited 13 mins ago

answered 21 mins ago

Anju Jhanji

New contributor

deleting duplicates and keeping only one copy

– Anju Jhanji
12 mins ago

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

– Anju Jhanji
11 mins ago

if only unique records are to be kept

– Anju Jhanji
10 mins ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f9602%2fdeleting-duplicates-with-group-by-and-count%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.

You can try this :

CREATE TABLE tbl_fields_unique LIKE tbl_fields;

ALTER TABLE tbl_fields_unique

ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);

INSERT IGNORE INTO tbl_fields_unique

SELECT * FROM tbl_fields;

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

Give it a Try !!!

answered Dec 22 '11 at 17:07

RolandoMySQLDBA

143k24226382

It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

– tlvince
Dec 22 '11 at 23:08

add a comment |

According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.

You can try this :

CREATE TABLE tbl_fields_unique LIKE tbl_fields;

ALTER TABLE tbl_fields_unique

ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);

INSERT IGNORE INTO tbl_fields_unique

SELECT * FROM tbl_fields;

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

Give it a Try !!!

answered Dec 22 '11 at 17:07

RolandoMySQLDBA

143k24226382

It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

– tlvince
Dec 22 '11 at 23:08

add a comment |

According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.

You can try this :

CREATE TABLE tbl_fields_unique LIKE tbl_fields;

ALTER TABLE tbl_fields_unique

ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);

INSERT IGNORE INTO tbl_fields_unique

SELECT * FROM tbl_fields;

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

Give it a Try !!!

answered Dec 22 '11 at 17:07

RolandoMySQLDBA

143k24226382

According to your query, you have fieldno,fieldserial,id,fielddate,fieldsid as a rule for uniqueness in the GROUP BY clause.

You can try this :

CREATE TABLE tbl_fields_unique LIKE tbl_fields;

ALTER TABLE tbl_fields_unique

ADD UNIQUE KEY unq (fieldno,fieldserial,id,fielddate,fieldsid);

INSERT IGNORE INTO tbl_fields_unique

SELECT * FROM tbl_fields;

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

This will filter rows with duplicate fieldno,fieldserial,id,fielddate,fieldsid fields. Look over the new table. Once you are satisfied with the contents of the tbl_fields_unique table, do this:

ALTER TABLE tbl_fields RENAME tbl_fields_old;

ALTER TABLE tbl_fields_unique RENAME tbl_fields;

Give it a Try !!!

answered Dec 22 '11 at 17:07

RolandoMySQLDBA

143k24226382

answered Dec 22 '11 at 17:07

RolandoMySQLDBA

143k24226382

answered Dec 22 '11 at 17:07

RolandoMySQLDBA

143k24226382

answered Dec 22 '11 at 17:07

RolandoMySQLDBA

143k24226382

It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

– tlvince
Dec 22 '11 at 23:08

add a comment |

It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

– tlvince
Dec 22 '11 at 23:08

It seems to have worked and very quickly at that. However, using a sum on the count field in my query above, I get 3847, whereas your suggestion finds 5500 duplicates. I'll check the results and get back to you. Thanks.

– tlvince
Dec 22 '11 at 23:08

add a comment |

If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like

DELETE FROM tbl_fields

WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)

                            FROM tbl_fields

                           WHERE fieldnotes IS NULL 

                          GROUP BY fieldno,fieldserial,id,fielddate,fieldsid  

                         HAVING COUNT(*) > 1)

This would remove the duplicates then you could delete the

If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.

answered Dec 22 '11 at 18:40

Lumpy

1,12961935

I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

– tlvince
Dec 22 '11 at 23:05

This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

– a_horse_with_no_name
Jul 19 '12 at 13:04

add a comment |

If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like

DELETE FROM tbl_fields

WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)

                            FROM tbl_fields

                           WHERE fieldnotes IS NULL 

                          GROUP BY fieldno,fieldserial,id,fielddate,fieldsid  

                         HAVING COUNT(*) > 1)

This would remove the duplicates then you could delete the

If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.

answered Dec 22 '11 at 18:40

Lumpy

1,12961935

I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

– tlvince
Dec 22 '11 at 23:05

This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

– a_horse_with_no_name
Jul 19 '12 at 13:04

add a comment |

If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like

DELETE FROM tbl_fields

WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)

                            FROM tbl_fields

                           WHERE fieldnotes IS NULL 

                          GROUP BY fieldno,fieldserial,id,fielddate,fieldsid  

                         HAVING COUNT(*) > 1)

This would remove the duplicates then you could delete the

If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.

answered Dec 22 '11 at 18:40

Lumpy

1,12961935

If you can add an ID column and fill it with a unique value for each record then you should be able to run a query like

DELETE FROM tbl_fields

WHERE <New ID Column> IN (SELECT MAX(<New ID Column>)

                            FROM tbl_fields

                           WHERE fieldnotes IS NULL 

                          GROUP BY fieldno,fieldserial,id,fielddate,fieldsid  

                         HAVING COUNT(*) > 1)

This would remove the duplicates then you could delete the

If you have enough duplication that you exceed SQLs limitation for the IN statment then You could insert the ID values for the IN statement into a temp table and run an exists against that.

answered Dec 22 '11 at 18:40

Lumpy

1,12961935

answered Dec 22 '11 at 18:40

Lumpy

1,12961935

answered Dec 22 '11 at 18:40

Lumpy

1,12961935

answered Dec 22 '11 at 18:40

Lumpy

1,12961935

I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

– tlvince
Dec 22 '11 at 23:05

This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

– a_horse_with_no_name
Jul 19 '12 at 13:04

add a comment |

I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

– tlvince
Dec 22 '11 at 23:05

This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

– a_horse_with_no_name
Jul 19 '12 at 13:04

I added a new column using alter table tbl_fields add unq mediumint not null auto_increment key and left your suggestion running for ~3 hours before killing it. Even with a subset of the table it appeared to hang. Is this to be expected?

– tlvince
Dec 22 '11 at 23:05

This won't work as MySQL does not allow to use the tables that is being deleted in a sub-select.

– a_horse_with_no_name
Jul 19 '12 at 13:04

add a comment |

CREATE TABLE friends_copy LIKE friends;

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1

edited 13 mins ago

answered 21 mins ago

Anju Jhanji

New contributor

deleting duplicates and keeping only one copy

– Anju Jhanji
12 mins ago

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

– Anju Jhanji
11 mins ago

if only unique records are to be kept

– Anju Jhanji
10 mins ago

add a comment |

CREATE TABLE friends_copy LIKE friends;

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1

edited 13 mins ago

answered 21 mins ago

Anju Jhanji

New contributor

deleting duplicates and keeping only one copy

– Anju Jhanji
12 mins ago

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

– Anju Jhanji
11 mins ago

if only unique records are to be kept

– Anju Jhanji
10 mins ago

add a comment |

CREATE TABLE friends_copy LIKE friends;

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1

edited 13 mins ago

answered 21 mins ago

Anju Jhanji

New contributor

CREATE TABLE friends_copy LIKE friends;

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)>=1

edited 13 mins ago

answered 21 mins ago

Anju Jhanji

New contributor

edited 13 mins ago

answered 21 mins ago

Anju Jhanji

New contributor

answered 21 mins ago

Anju Jhanji

answered 21 mins ago

Anju Jhanji

New contributor

Anju Jhanji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

deleting duplicates and keeping only one copy

– Anju Jhanji
12 mins ago

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

– Anju Jhanji
11 mins ago

if only unique records are to be kept

– Anju Jhanji
10 mins ago

add a comment |

deleting duplicates and keeping only one copy

– Anju Jhanji
12 mins ago

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

– Anju Jhanji
11 mins ago

if only unique records are to be kept

– Anju Jhanji
10 mins ago

deleting duplicates and keeping only one copy

– Anju Jhanji
12 mins ago

insert into friends_copy(name,dob) select name,dob from friends group by name,dob having count(*)=1;

– Anju Jhanji
11 mins ago

if only unique records are to be kept

– Anju Jhanji
10 mins ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

WBXsykQE,uLl,mpLLJ WHD4S yjvl5df60FUi7fF

搜尋此網誌

Vryjdfkk