Finding differences between two tables takes too long












2















I'm trying to find the diff of two tables using a left join.



My query is as follows:



CREATE TABLE my_diff (INDEX my_index (name, type))
SELECT AA.name, AA.type
FROM AA
LEFT JOIN BB
USING (name, type)
WHERE BB.type IS NULL;



  • Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column

  • Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)


EXPLAIN EXTENDED:



id  select_type     table   type    possible_keys   key         key_len     ref     rows    filtered    Extra
1 SIMPLE AA index NULL PRIMARY 383 NULL 396178 100.00 Using index
1 SIMPLE BB index NULL my_index 104 NULL 8359 100.00 Using where; Not exists; Using index; Using join buffer (Block Nested Loop)


The query takes a mind-boggling 11 min to finish.



I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.



What can I do to improve this?










share|improve this question
















bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

    – Rick James
    Aug 25 '15 at 21:05











  • What is the value of innodb_buffer_pool_size?

    – Rick James
    Aug 25 '15 at 21:08











  • Could you please show create tables' structures?

    – Jehad Keriaki
    Aug 26 '15 at 3:16
















2















I'm trying to find the diff of two tables using a left join.



My query is as follows:



CREATE TABLE my_diff (INDEX my_index (name, type))
SELECT AA.name, AA.type
FROM AA
LEFT JOIN BB
USING (name, type)
WHERE BB.type IS NULL;



  • Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column

  • Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)


EXPLAIN EXTENDED:



id  select_type     table   type    possible_keys   key         key_len     ref     rows    filtered    Extra
1 SIMPLE AA index NULL PRIMARY 383 NULL 396178 100.00 Using index
1 SIMPLE BB index NULL my_index 104 NULL 8359 100.00 Using where; Not exists; Using index; Using join buffer (Block Nested Loop)


The query takes a mind-boggling 11 min to finish.



I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.



What can I do to improve this?










share|improve this question
















bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

    – Rick James
    Aug 25 '15 at 21:05











  • What is the value of innodb_buffer_pool_size?

    – Rick James
    Aug 25 '15 at 21:08











  • Could you please show create tables' structures?

    – Jehad Keriaki
    Aug 26 '15 at 3:16














2












2








2








I'm trying to find the diff of two tables using a left join.



My query is as follows:



CREATE TABLE my_diff (INDEX my_index (name, type))
SELECT AA.name, AA.type
FROM AA
LEFT JOIN BB
USING (name, type)
WHERE BB.type IS NULL;



  • Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column

  • Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)


EXPLAIN EXTENDED:



id  select_type     table   type    possible_keys   key         key_len     ref     rows    filtered    Extra
1 SIMPLE AA index NULL PRIMARY 383 NULL 396178 100.00 Using index
1 SIMPLE BB index NULL my_index 104 NULL 8359 100.00 Using where; Not exists; Using index; Using join buffer (Block Nested Loop)


The query takes a mind-boggling 11 min to finish.



I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.



What can I do to improve this?










share|improve this question
















I'm trying to find the diff of two tables using a left join.



My query is as follows:



CREATE TABLE my_diff (INDEX my_index (name, type))
SELECT AA.name, AA.type
FROM AA
LEFT JOIN BB
USING (name, type)
WHERE BB.type IS NULL;



  • Table AA is sized: ~400K records and has PRIMARY KEY on all columns as well as an INDEX on each column

  • Table BB is sized: ~8K records and has a composite INDEX on both columns: my_index (name, type)


EXPLAIN EXTENDED:



id  select_type     table   type    possible_keys   key         key_len     ref     rows    filtered    Extra
1 SIMPLE AA index NULL PRIMARY 383 NULL 396178 100.00 Using index
1 SIMPLE BB index NULL my_index 104 NULL 8359 100.00 Using where; Not exists; Using index; Using join buffer (Block Nested Loop)


The query takes a mind-boggling 11 min to finish.



I have tried various options of composite and single column indexes / primary keys, tried forcing the keys/indexes on the select and the join, but to no avail.



What can I do to improve this?







mysql query-performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 24 '15 at 16:07









Paul White

51k14278450




51k14278450










asked Aug 24 '15 at 12:58









space_monkeyspace_monkey

1112




1112





bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

    – Rick James
    Aug 25 '15 at 21:05











  • What is the value of innodb_buffer_pool_size?

    – Rick James
    Aug 25 '15 at 21:08











  • Could you please show create tables' structures?

    – Jehad Keriaki
    Aug 26 '15 at 3:16



















  • You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

    – Rick James
    Aug 25 '15 at 21:05











  • What is the value of innodb_buffer_pool_size?

    – Rick James
    Aug 25 '15 at 21:08











  • Could you please show create tables' structures?

    – Jehad Keriaki
    Aug 26 '15 at 3:16

















You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

– Rick James
Aug 25 '15 at 21:05





You have the perfect set of index and the perfect formulation. I guess you are stuck with 11 minutes.

– Rick James
Aug 25 '15 at 21:05













What is the value of innodb_buffer_pool_size?

– Rick James
Aug 25 '15 at 21:08





What is the value of innodb_buffer_pool_size?

– Rick James
Aug 25 '15 at 21:08













Could you please show create tables' structures?

– Jehad Keriaki
Aug 26 '15 at 3:16





Could you please show create tables' structures?

– Jehad Keriaki
Aug 26 '15 at 3:16










2 Answers
2






active

oldest

votes


















0















  1. Create the result table, my_diff, without index, and when it is
    populated, add the desired index.

  2. Make sure you have an index on (name, type) in both tables. i.e. a composite index

  3. It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.


From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.






share|improve this answer
























  • On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

    – Rick James
    Aug 25 '15 at 21:02











  • #2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

    – Rick James
    Aug 25 '15 at 21:07



















0














Create the temp table as follows



#
# Make Temp Table
#
CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
#
# Load Unique Keys From Table AA
#
INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
#
# Load Keys From Table BB
#
INSERT INTO mydiff (user,type) SELECT user,type FROM BB
ON DUPLICATE KEY UPDATE diffcount=diffcount+1;


Then, count the missing keys



SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
FROM mydiff GROUP BY diffcount WITH ROLLUP;


The output for diffcount of 1 shows how many (user,type) are not represented



To see the unused (user,type) records, run this



SELECT user,type FROM mydiff WHERE diffcount = 1;


GIVE IT A TRY !!!



It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.






share|improve this answer

























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f112005%2ffinding-differences-between-two-tables-takes-too-long%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0















    1. Create the result table, my_diff, without index, and when it is
      populated, add the desired index.

    2. Make sure you have an index on (name, type) in both tables. i.e. a composite index

    3. It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.


    From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.






    share|improve this answer
























    • On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

      – Rick James
      Aug 25 '15 at 21:02











    • #2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

      – Rick James
      Aug 25 '15 at 21:07
















    0















    1. Create the result table, my_diff, without index, and when it is
      populated, add the desired index.

    2. Make sure you have an index on (name, type) in both tables. i.e. a composite index

    3. It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.


    From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.






    share|improve this answer
























    • On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

      – Rick James
      Aug 25 '15 at 21:02











    • #2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

      – Rick James
      Aug 25 '15 at 21:07














    0












    0








    0








    1. Create the result table, my_diff, without index, and when it is
      populated, add the desired index.

    2. Make sure you have an index on (name, type) in both tables. i.e. a composite index

    3. It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.


    From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.






    share|improve this answer














    1. Create the result table, my_diff, without index, and when it is
      populated, add the desired index.

    2. Make sure you have an index on (name, type) in both tables. i.e. a composite index

    3. It is not a good idea to have primary key on all fields. This is especially true if you are using InnoDB engine. Instead, alter the table to add an integer auto increment field as primary key, and if needed, add a composite unique index on the fields you want to be unique.


    From your explain result, it looks like you don't have the index on (name, type), so for each row in AA, it will do a full scan to BB. This is one enough reason to make the query slow.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Aug 24 '15 at 14:50









    Jehad KeriakiJehad Keriaki

    2,5831813




    2,5831813













    • On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

      – Rick James
      Aug 25 '15 at 21:02











    • #2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

      – Rick James
      Aug 25 '15 at 21:07



















    • On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

      – Rick James
      Aug 25 '15 at 21:02











    • #2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

      – Rick James
      Aug 25 '15 at 21:07

















    On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

    – Rick James
    Aug 25 '15 at 21:02





    On the contrary, it says Using index for both tables, implying that there are composite indexes on both tables.

    – Rick James
    Aug 25 '15 at 21:02













    #2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

    – Rick James
    Aug 25 '15 at 21:07





    #2 -- he clearly has that in both tables, since it says Using index. There is no full scan of BB; it's even caching the index in the Join buffer.

    – Rick James
    Aug 25 '15 at 21:07













    0














    Create the temp table as follows



    #
    # Make Temp Table
    #
    CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
    ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
    ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
    #
    # Load Unique Keys From Table AA
    #
    INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
    #
    # Load Keys From Table BB
    #
    INSERT INTO mydiff (user,type) SELECT user,type FROM BB
    ON DUPLICATE KEY UPDATE diffcount=diffcount+1;


    Then, count the missing keys



    SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
    FROM mydiff GROUP BY diffcount WITH ROLLUP;


    The output for diffcount of 1 shows how many (user,type) are not represented



    To see the unused (user,type) records, run this



    SELECT user,type FROM mydiff WHERE diffcount = 1;


    GIVE IT A TRY !!!



    It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.






    share|improve this answer






























      0














      Create the temp table as follows



      #
      # Make Temp Table
      #
      CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
      ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
      ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
      #
      # Load Unique Keys From Table AA
      #
      INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
      #
      # Load Keys From Table BB
      #
      INSERT INTO mydiff (user,type) SELECT user,type FROM BB
      ON DUPLICATE KEY UPDATE diffcount=diffcount+1;


      Then, count the missing keys



      SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
      FROM mydiff GROUP BY diffcount WITH ROLLUP;


      The output for diffcount of 1 shows how many (user,type) are not represented



      To see the unused (user,type) records, run this



      SELECT user,type FROM mydiff WHERE diffcount = 1;


      GIVE IT A TRY !!!



      It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.






      share|improve this answer




























        0












        0








        0







        Create the temp table as follows



        #
        # Make Temp Table
        #
        CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
        ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
        ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
        #
        # Load Unique Keys From Table AA
        #
        INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
        #
        # Load Keys From Table BB
        #
        INSERT INTO mydiff (user,type) SELECT user,type FROM BB
        ON DUPLICATE KEY UPDATE diffcount=diffcount+1;


        Then, count the missing keys



        SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
        FROM mydiff GROUP BY diffcount WITH ROLLUP;


        The output for diffcount of 1 shows how many (user,type) are not represented



        To see the unused (user,type) records, run this



        SELECT user,type FROM mydiff WHERE diffcount = 1;


        GIVE IT A TRY !!!



        It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.






        share|improve this answer















        Create the temp table as follows



        #
        # Make Temp Table
        #
        CREATE TABLE mydiff SELECT name,type FROM AA WHERE 1=2;
        ALTER TABLE mydiff ADD COLUMN diffcount INT DEFAULT 1;
        ALTER TABLE mydiff ADD PRIMARY KEY (user,type);
        #
        # Load Unique Keys From Table AA
        #
        INSERT IGNORE INTO mydiff (user,type) SELECT user,type FROM AA;
        #
        # Load Keys From Table BB
        #
        INSERT INTO mydiff (user,type) SELECT user,type FROM BB
        ON DUPLICATE KEY UPDATE diffcount=diffcount+1;


        Then, count the missing keys



        SELECT IFNULL(diffcount,'Total') diffcount,COUNT(1) rowcount
        FROM mydiff GROUP BY diffcount WITH ROLLUP;


        The output for diffcount of 1 shows how many (user,type) are not represented



        To see the unused (user,type) records, run this



        SELECT user,type FROM mydiff WHERE diffcount = 1;


        GIVE IT A TRY !!!



        It's better to insert 4 Million Rows than join 3,311,651,902 (396178 X 8359) rows.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Aug 24 '15 at 16:01

























        answered Aug 24 '15 at 15:56









        RolandoMySQLDBARolandoMySQLDBA

        141k24221379




        141k24221379






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f112005%2ffinding-differences-between-two-tables-takes-too-long%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Liste der Baudenkmale in Friedland (Mecklenburg)

            Single-Malt-Whisky

            Czorneboh