MongoDB merge two collections and drop documents with same value in field
In my database I have two collections, but some documents were added (possibly) at different times to both collections.
I could use mongodump
and then mongorestore
for merging. But then I have the same documents that were added to both collections as duplicates in my new collection. mongorestore --drop
does not help neither, because the documents not necessarily have the same _id
.
How to drop a document when a document with userid
exists already?
mongodb mongorestore
bumped to the homepage by Community♦ 10 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
In my database I have two collections, but some documents were added (possibly) at different times to both collections.
I could use mongodump
and then mongorestore
for merging. But then I have the same documents that were added to both collections as duplicates in my new collection. mongorestore --drop
does not help neither, because the documents not necessarily have the same _id
.
How to drop a document when a document with userid
exists already?
mongodb mongorestore
bumped to the homepage by Community♦ 10 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Welcome to the StackExchange. what is MongoDB version(x,y,z)?
– Md Haidar Ali Khan
Nov 29 '18 at 9:23
I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)
– Wuff
Dec 12 '18 at 12:43
add a comment |
In my database I have two collections, but some documents were added (possibly) at different times to both collections.
I could use mongodump
and then mongorestore
for merging. But then I have the same documents that were added to both collections as duplicates in my new collection. mongorestore --drop
does not help neither, because the documents not necessarily have the same _id
.
How to drop a document when a document with userid
exists already?
mongodb mongorestore
In my database I have two collections, but some documents were added (possibly) at different times to both collections.
I could use mongodump
and then mongorestore
for merging. But then I have the same documents that were added to both collections as duplicates in my new collection. mongorestore --drop
does not help neither, because the documents not necessarily have the same _id
.
How to drop a document when a document with userid
exists already?
mongodb mongorestore
mongodb mongorestore
edited Nov 27 '18 at 14:50
Wuff
asked Nov 27 '18 at 14:28
WuffWuff
62
62
bumped to the homepage by Community♦ 10 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 10 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Welcome to the StackExchange. what is MongoDB version(x,y,z)?
– Md Haidar Ali Khan
Nov 29 '18 at 9:23
I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)
– Wuff
Dec 12 '18 at 12:43
add a comment |
Welcome to the StackExchange. what is MongoDB version(x,y,z)?
– Md Haidar Ali Khan
Nov 29 '18 at 9:23
I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)
– Wuff
Dec 12 '18 at 12:43
Welcome to the StackExchange. what is MongoDB version(x,y,z)?
– Md Haidar Ali Khan
Nov 29 '18 at 9:23
Welcome to the StackExchange. what is MongoDB version(x,y,z)?
– Md Haidar Ali Khan
Nov 29 '18 at 9:23
I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)
– Wuff
Dec 12 '18 at 12:43
I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)
– Wuff
Dec 12 '18 at 12:43
add a comment |
1 Answer
1
active
oldest
votes
There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.
- Compare
userid
fields in both collections and take the difference. - Use mongos
$out
operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.
mongodump
the difference collection andmongorestore
it into the original collection.
from pymongo import MongoClient
import subprocess
# collection names
db_name = 'db'
col_original = 'col_original'
col_new = 'col_new'
field = 'userid'
# Get ids from collection
def get_ids(db, col):
docs = db[col].aggregate([
{'$project':
{'_id': 0,
'id': '$' + field}}
])
docs = list(docs)
ids = [x['id'] for x in docs]
return ids
# Connect to MongoDB
client = MongoClient('mongodb://localhost')
db = client[db_name]
# Get difference in ids
ids_new = get_ids(db, col_new)
ids_original = get_ids(db, col_original)
ids_diff = list(set(ids_new).difference(ids_original))
# Get all documents with userid that are in col_new, but not in
# col_original. Hence, all duplicates are skipped.
db[col_new].aggregate([
{'$match': {
field: {'$in': ids_diff}}},
{'$out': 'col_diff'}])
# Use mongodump to save col_diff
subprocess.check_output(['mongodump',
'-d',
db_name,
'-c',
'col_diff'])
# Merge col_diff into col_original
subprocess.check_output(['mongorestore',
'-d',
db_name,
'-c',
col_original,
'dump/' + db_name + '/col_diff.bson'])
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223551%2fmongodb-merge-two-collections-and-drop-documents-with-same-value-in-field%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.
- Compare
userid
fields in both collections and take the difference. - Use mongos
$out
operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.
mongodump
the difference collection andmongorestore
it into the original collection.
from pymongo import MongoClient
import subprocess
# collection names
db_name = 'db'
col_original = 'col_original'
col_new = 'col_new'
field = 'userid'
# Get ids from collection
def get_ids(db, col):
docs = db[col].aggregate([
{'$project':
{'_id': 0,
'id': '$' + field}}
])
docs = list(docs)
ids = [x['id'] for x in docs]
return ids
# Connect to MongoDB
client = MongoClient('mongodb://localhost')
db = client[db_name]
# Get difference in ids
ids_new = get_ids(db, col_new)
ids_original = get_ids(db, col_original)
ids_diff = list(set(ids_new).difference(ids_original))
# Get all documents with userid that are in col_new, but not in
# col_original. Hence, all duplicates are skipped.
db[col_new].aggregate([
{'$match': {
field: {'$in': ids_diff}}},
{'$out': 'col_diff'}])
# Use mongodump to save col_diff
subprocess.check_output(['mongodump',
'-d',
db_name,
'-c',
'col_diff'])
# Merge col_diff into col_original
subprocess.check_output(['mongorestore',
'-d',
db_name,
'-c',
col_original,
'dump/' + db_name + '/col_diff.bson'])
add a comment |
There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.
- Compare
userid
fields in both collections and take the difference. - Use mongos
$out
operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.
mongodump
the difference collection andmongorestore
it into the original collection.
from pymongo import MongoClient
import subprocess
# collection names
db_name = 'db'
col_original = 'col_original'
col_new = 'col_new'
field = 'userid'
# Get ids from collection
def get_ids(db, col):
docs = db[col].aggregate([
{'$project':
{'_id': 0,
'id': '$' + field}}
])
docs = list(docs)
ids = [x['id'] for x in docs]
return ids
# Connect to MongoDB
client = MongoClient('mongodb://localhost')
db = client[db_name]
# Get difference in ids
ids_new = get_ids(db, col_new)
ids_original = get_ids(db, col_original)
ids_diff = list(set(ids_new).difference(ids_original))
# Get all documents with userid that are in col_new, but not in
# col_original. Hence, all duplicates are skipped.
db[col_new].aggregate([
{'$match': {
field: {'$in': ids_diff}}},
{'$out': 'col_diff'}])
# Use mongodump to save col_diff
subprocess.check_output(['mongodump',
'-d',
db_name,
'-c',
'col_diff'])
# Merge col_diff into col_original
subprocess.check_output(['mongorestore',
'-d',
db_name,
'-c',
col_original,
'dump/' + db_name + '/col_diff.bson'])
add a comment |
There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.
- Compare
userid
fields in both collections and take the difference. - Use mongos
$out
operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.
mongodump
the difference collection andmongorestore
it into the original collection.
from pymongo import MongoClient
import subprocess
# collection names
db_name = 'db'
col_original = 'col_original'
col_new = 'col_new'
field = 'userid'
# Get ids from collection
def get_ids(db, col):
docs = db[col].aggregate([
{'$project':
{'_id': 0,
'id': '$' + field}}
])
docs = list(docs)
ids = [x['id'] for x in docs]
return ids
# Connect to MongoDB
client = MongoClient('mongodb://localhost')
db = client[db_name]
# Get difference in ids
ids_new = get_ids(db, col_new)
ids_original = get_ids(db, col_original)
ids_diff = list(set(ids_new).difference(ids_original))
# Get all documents with userid that are in col_new, but not in
# col_original. Hence, all duplicates are skipped.
db[col_new].aggregate([
{'$match': {
field: {'$in': ids_diff}}},
{'$out': 'col_diff'}])
# Use mongodump to save col_diff
subprocess.check_output(['mongodump',
'-d',
db_name,
'-c',
'col_diff'])
# Merge col_diff into col_original
subprocess.check_output(['mongorestore',
'-d',
db_name,
'-c',
col_original,
'dump/' + db_name + '/col_diff.bson'])
There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.
- Compare
userid
fields in both collections and take the difference. - Use mongos
$out
operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.
mongodump
the difference collection andmongorestore
it into the original collection.
from pymongo import MongoClient
import subprocess
# collection names
db_name = 'db'
col_original = 'col_original'
col_new = 'col_new'
field = 'userid'
# Get ids from collection
def get_ids(db, col):
docs = db[col].aggregate([
{'$project':
{'_id': 0,
'id': '$' + field}}
])
docs = list(docs)
ids = [x['id'] for x in docs]
return ids
# Connect to MongoDB
client = MongoClient('mongodb://localhost')
db = client[db_name]
# Get difference in ids
ids_new = get_ids(db, col_new)
ids_original = get_ids(db, col_original)
ids_diff = list(set(ids_new).difference(ids_original))
# Get all documents with userid that are in col_new, but not in
# col_original. Hence, all duplicates are skipped.
db[col_new].aggregate([
{'$match': {
field: {'$in': ids_diff}}},
{'$out': 'col_diff'}])
# Use mongodump to save col_diff
subprocess.check_output(['mongodump',
'-d',
db_name,
'-c',
'col_diff'])
# Merge col_diff into col_original
subprocess.check_output(['mongorestore',
'-d',
db_name,
'-c',
col_original,
'dump/' + db_name + '/col_diff.bson'])
answered Dec 12 '18 at 12:41
WuffWuff
62
62
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223551%2fmongodb-merge-two-collections-and-drop-documents-with-same-value-in-field%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to the StackExchange. what is MongoDB version(x,y,z)?
– Md Haidar Ali Khan
Nov 29 '18 at 9:23
I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)
– Wuff
Dec 12 '18 at 12:43