MongoDB merge two collections and drop documents with same value in field












1















In my database I have two collections, but some documents were added (possibly) at different times to both collections.



I could use mongodump and then mongorestore for merging. But then I have the same documents that were added to both collections as duplicates in my new collection. mongorestore --drop does not help neither, because the documents not necessarily have the same _id.



How to drop a document when a document with userid exists already?










share|improve this question
















bumped to the homepage by Community 10 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Welcome to the StackExchange. what is MongoDB version(x,y,z)?

    – Md Haidar Ali Khan
    Nov 29 '18 at 9:23











  • I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)

    – Wuff
    Dec 12 '18 at 12:43
















1















In my database I have two collections, but some documents were added (possibly) at different times to both collections.



I could use mongodump and then mongorestore for merging. But then I have the same documents that were added to both collections as duplicates in my new collection. mongorestore --drop does not help neither, because the documents not necessarily have the same _id.



How to drop a document when a document with userid exists already?










share|improve this question
















bumped to the homepage by Community 10 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Welcome to the StackExchange. what is MongoDB version(x,y,z)?

    – Md Haidar Ali Khan
    Nov 29 '18 at 9:23











  • I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)

    – Wuff
    Dec 12 '18 at 12:43














1












1








1








In my database I have two collections, but some documents were added (possibly) at different times to both collections.



I could use mongodump and then mongorestore for merging. But then I have the same documents that were added to both collections as duplicates in my new collection. mongorestore --drop does not help neither, because the documents not necessarily have the same _id.



How to drop a document when a document with userid exists already?










share|improve this question
















In my database I have two collections, but some documents were added (possibly) at different times to both collections.



I could use mongodump and then mongorestore for merging. But then I have the same documents that were added to both collections as duplicates in my new collection. mongorestore --drop does not help neither, because the documents not necessarily have the same _id.



How to drop a document when a document with userid exists already?







mongodb mongorestore






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 14:50







Wuff

















asked Nov 27 '18 at 14:28









WuffWuff

62




62





bumped to the homepage by Community 10 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 10 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • Welcome to the StackExchange. what is MongoDB version(x,y,z)?

    – Md Haidar Ali Khan
    Nov 29 '18 at 9:23











  • I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)

    – Wuff
    Dec 12 '18 at 12:43



















  • Welcome to the StackExchange. what is MongoDB version(x,y,z)?

    – Md Haidar Ali Khan
    Nov 29 '18 at 9:23











  • I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)

    – Wuff
    Dec 12 '18 at 12:43

















Welcome to the StackExchange. what is MongoDB version(x,y,z)?

– Md Haidar Ali Khan
Nov 29 '18 at 9:23





Welcome to the StackExchange. what is MongoDB version(x,y,z)?

– Md Haidar Ali Khan
Nov 29 '18 at 9:23













I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)

– Wuff
Dec 12 '18 at 12:43





I use MongoDB v3.6.5 (build environment: distmod: debian92 distarch: x86_64 target_arch: x86_64)

– Wuff
Dec 12 '18 at 12:43










1 Answer
1






active

oldest

votes


















0














There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.




  1. Compare userid fields in both collections and take the difference.

  2. Use mongos $out operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.


  3. mongodump the difference collection and mongorestore it into the original collection.


from pymongo import MongoClient
import subprocess

# collection names
db_name = 'db'
col_original = 'col_original'
col_new = 'col_new'
field = 'userid'


# Get ids from collection
def get_ids(db, col):
docs = db[col].aggregate([
{'$project':
{'_id': 0,
'id': '$' + field}}
])
docs = list(docs)
ids = [x['id'] for x in docs]
return ids


# Connect to MongoDB
client = MongoClient('mongodb://localhost')
db = client[db_name]

# Get difference in ids
ids_new = get_ids(db, col_new)
ids_original = get_ids(db, col_original)
ids_diff = list(set(ids_new).difference(ids_original))

# Get all documents with userid that are in col_new, but not in
# col_original. Hence, all duplicates are skipped.
db[col_new].aggregate([
{'$match': {
field: {'$in': ids_diff}}},
{'$out': 'col_diff'}])

# Use mongodump to save col_diff
subprocess.check_output(['mongodump',
'-d',
db_name,
'-c',
'col_diff'])

# Merge col_diff into col_original
subprocess.check_output(['mongorestore',
'-d',
db_name,
'-c',
col_original,
'dump/' + db_name + '/col_diff.bson'])





share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223551%2fmongodb-merge-two-collections-and-drop-documents-with-same-value-in-field%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.




    1. Compare userid fields in both collections and take the difference.

    2. Use mongos $out operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.


    3. mongodump the difference collection and mongorestore it into the original collection.


    from pymongo import MongoClient
    import subprocess

    # collection names
    db_name = 'db'
    col_original = 'col_original'
    col_new = 'col_new'
    field = 'userid'


    # Get ids from collection
    def get_ids(db, col):
    docs = db[col].aggregate([
    {'$project':
    {'_id': 0,
    'id': '$' + field}}
    ])
    docs = list(docs)
    ids = [x['id'] for x in docs]
    return ids


    # Connect to MongoDB
    client = MongoClient('mongodb://localhost')
    db = client[db_name]

    # Get difference in ids
    ids_new = get_ids(db, col_new)
    ids_original = get_ids(db, col_original)
    ids_diff = list(set(ids_new).difference(ids_original))

    # Get all documents with userid that are in col_new, but not in
    # col_original. Hence, all duplicates are skipped.
    db[col_new].aggregate([
    {'$match': {
    field: {'$in': ids_diff}}},
    {'$out': 'col_diff'}])

    # Use mongodump to save col_diff
    subprocess.check_output(['mongodump',
    '-d',
    db_name,
    '-c',
    'col_diff'])

    # Merge col_diff into col_original
    subprocess.check_output(['mongorestore',
    '-d',
    db_name,
    '-c',
    col_original,
    'dump/' + db_name + '/col_diff.bson'])





    share|improve this answer




























      0














      There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.




      1. Compare userid fields in both collections and take the difference.

      2. Use mongos $out operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.


      3. mongodump the difference collection and mongorestore it into the original collection.


      from pymongo import MongoClient
      import subprocess

      # collection names
      db_name = 'db'
      col_original = 'col_original'
      col_new = 'col_new'
      field = 'userid'


      # Get ids from collection
      def get_ids(db, col):
      docs = db[col].aggregate([
      {'$project':
      {'_id': 0,
      'id': '$' + field}}
      ])
      docs = list(docs)
      ids = [x['id'] for x in docs]
      return ids


      # Connect to MongoDB
      client = MongoClient('mongodb://localhost')
      db = client[db_name]

      # Get difference in ids
      ids_new = get_ids(db, col_new)
      ids_original = get_ids(db, col_original)
      ids_diff = list(set(ids_new).difference(ids_original))

      # Get all documents with userid that are in col_new, but not in
      # col_original. Hence, all duplicates are skipped.
      db[col_new].aggregate([
      {'$match': {
      field: {'$in': ids_diff}}},
      {'$out': 'col_diff'}])

      # Use mongodump to save col_diff
      subprocess.check_output(['mongodump',
      '-d',
      db_name,
      '-c',
      'col_diff'])

      # Merge col_diff into col_original
      subprocess.check_output(['mongorestore',
      '-d',
      db_name,
      '-c',
      col_original,
      'dump/' + db_name + '/col_diff.bson'])





      share|improve this answer


























        0












        0








        0







        There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.




        1. Compare userid fields in both collections and take the difference.

        2. Use mongos $out operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.


        3. mongodump the difference collection and mongorestore it into the original collection.


        from pymongo import MongoClient
        import subprocess

        # collection names
        db_name = 'db'
        col_original = 'col_original'
        col_new = 'col_new'
        field = 'userid'


        # Get ids from collection
        def get_ids(db, col):
        docs = db[col].aggregate([
        {'$project':
        {'_id': 0,
        'id': '$' + field}}
        ])
        docs = list(docs)
        ids = [x['id'] for x in docs]
        return ids


        # Connect to MongoDB
        client = MongoClient('mongodb://localhost')
        db = client[db_name]

        # Get difference in ids
        ids_new = get_ids(db, col_new)
        ids_original = get_ids(db, col_original)
        ids_diff = list(set(ids_new).difference(ids_original))

        # Get all documents with userid that are in col_new, but not in
        # col_original. Hence, all duplicates are skipped.
        db[col_new].aggregate([
        {'$match': {
        field: {'$in': ids_diff}}},
        {'$out': 'col_diff'}])

        # Use mongodump to save col_diff
        subprocess.check_output(['mongodump',
        '-d',
        db_name,
        '-c',
        'col_diff'])

        # Merge col_diff into col_original
        subprocess.check_output(['mongorestore',
        '-d',
        db_name,
        '-c',
        col_original,
        'dump/' + db_name + '/col_diff.bson'])





        share|improve this answer













        There might be more elegent solutions for this, but let me answer how I merged a new collection into an original collection without importing the duplicates.




        1. Compare userid fields in both collections and take the difference.

        2. Use mongos $out operator to save documents to a difference collection. This new collection has all documents of the new collection, except of the duplicates.


        3. mongodump the difference collection and mongorestore it into the original collection.


        from pymongo import MongoClient
        import subprocess

        # collection names
        db_name = 'db'
        col_original = 'col_original'
        col_new = 'col_new'
        field = 'userid'


        # Get ids from collection
        def get_ids(db, col):
        docs = db[col].aggregate([
        {'$project':
        {'_id': 0,
        'id': '$' + field}}
        ])
        docs = list(docs)
        ids = [x['id'] for x in docs]
        return ids


        # Connect to MongoDB
        client = MongoClient('mongodb://localhost')
        db = client[db_name]

        # Get difference in ids
        ids_new = get_ids(db, col_new)
        ids_original = get_ids(db, col_original)
        ids_diff = list(set(ids_new).difference(ids_original))

        # Get all documents with userid that are in col_new, but not in
        # col_original. Hence, all duplicates are skipped.
        db[col_new].aggregate([
        {'$match': {
        field: {'$in': ids_diff}}},
        {'$out': 'col_diff'}])

        # Use mongodump to save col_diff
        subprocess.check_output(['mongodump',
        '-d',
        db_name,
        '-c',
        'col_diff'])

        # Merge col_diff into col_original
        subprocess.check_output(['mongorestore',
        '-d',
        db_name,
        '-c',
        col_original,
        'dump/' + db_name + '/col_diff.bson'])






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 12 '18 at 12:41









        WuffWuff

        62




        62






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223551%2fmongodb-merge-two-collections-and-drop-documents-with-same-value-in-field%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Liste der Baudenkmale in Friedland (Mecklenburg)

            Single-Malt-Whisky

            Czorneboh