Multi-threaded replay of master activity for load/performance testing





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







5















Does anyone know of any existing tools/products that accomplish what I'm trying to do?



After searching around for a bit the only thing I could find surrounding what I'm trying to accomplish is someone else looking for the same thing http://forums.mysql.com/read.php?24,192250,192250



The idea is I'd like to capture all the traffic to my master to save for a replay log against a snapshot of the entire database taken when the monitoring started. Bin logs won't serve what I want since they only include writes. I want read activity to realistically view the effects of proposed changes with "real" production traffic.



Real production traffic meaning everything from all applications that are hitting the database to be modified. If there's some application level change, tests running just that app don't account for other activity going on in the system at the time. Running all applications in a test environment aren't guaranteed to have the same state of the database. I could take a test snapshot as a starting point as I fire them up but the applications don't have the ability to do an exact replay of their own activity.



I've use the tcpdump script from http://www.mysqlperformanceblog.com/2008/11/07/poor-mans-query-logging/ to monitor activity but this doesn't tell me which queries are coming from which connections. Part of the playback I'm wanting is a multi threaded approach that replays the activity from the same number of threads that were actually in use.



I can't afford to turn on general query logging b/c my production master wouldn't be able to handle the performance hit for that.



The whole snapshot part of the process is to have a golden start point database to ensure everything is the same during the start of each test run.










share|improve this question
















bumped to the homepage by Community 14 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Hmm, what kind of traffic does your production server get that it can't handle the 'performance' hit of the general_log ? An interesting case-study at the worst difference from 'no logging' was cpu bound with 20 concurrent connections and it still was able to handle 22k transactions/minute and faired much better on i/o bound (ugh) server

    – Derek Downey
    Sep 30 '11 at 20:40











  • Also don't want to be misunderstood...I am curious, not meant to be attacking.

    – Derek Downey
    Sep 30 '11 at 20:41











  • After rereading how I worded it I feel I kind of lied. It would have been more appropriate to say "I've always heard that was bad and didn't want to play around in production to get hard numbers."

    – atxdba
    Oct 4 '11 at 3:31


















5















Does anyone know of any existing tools/products that accomplish what I'm trying to do?



After searching around for a bit the only thing I could find surrounding what I'm trying to accomplish is someone else looking for the same thing http://forums.mysql.com/read.php?24,192250,192250



The idea is I'd like to capture all the traffic to my master to save for a replay log against a snapshot of the entire database taken when the monitoring started. Bin logs won't serve what I want since they only include writes. I want read activity to realistically view the effects of proposed changes with "real" production traffic.



Real production traffic meaning everything from all applications that are hitting the database to be modified. If there's some application level change, tests running just that app don't account for other activity going on in the system at the time. Running all applications in a test environment aren't guaranteed to have the same state of the database. I could take a test snapshot as a starting point as I fire them up but the applications don't have the ability to do an exact replay of their own activity.



I've use the tcpdump script from http://www.mysqlperformanceblog.com/2008/11/07/poor-mans-query-logging/ to monitor activity but this doesn't tell me which queries are coming from which connections. Part of the playback I'm wanting is a multi threaded approach that replays the activity from the same number of threads that were actually in use.



I can't afford to turn on general query logging b/c my production master wouldn't be able to handle the performance hit for that.



The whole snapshot part of the process is to have a golden start point database to ensure everything is the same during the start of each test run.










share|improve this question
















bumped to the homepage by Community 14 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Hmm, what kind of traffic does your production server get that it can't handle the 'performance' hit of the general_log ? An interesting case-study at the worst difference from 'no logging' was cpu bound with 20 concurrent connections and it still was able to handle 22k transactions/minute and faired much better on i/o bound (ugh) server

    – Derek Downey
    Sep 30 '11 at 20:40











  • Also don't want to be misunderstood...I am curious, not meant to be attacking.

    – Derek Downey
    Sep 30 '11 at 20:41











  • After rereading how I worded it I feel I kind of lied. It would have been more appropriate to say "I've always heard that was bad and didn't want to play around in production to get hard numbers."

    – atxdba
    Oct 4 '11 at 3:31














5












5








5








Does anyone know of any existing tools/products that accomplish what I'm trying to do?



After searching around for a bit the only thing I could find surrounding what I'm trying to accomplish is someone else looking for the same thing http://forums.mysql.com/read.php?24,192250,192250



The idea is I'd like to capture all the traffic to my master to save for a replay log against a snapshot of the entire database taken when the monitoring started. Bin logs won't serve what I want since they only include writes. I want read activity to realistically view the effects of proposed changes with "real" production traffic.



Real production traffic meaning everything from all applications that are hitting the database to be modified. If there's some application level change, tests running just that app don't account for other activity going on in the system at the time. Running all applications in a test environment aren't guaranteed to have the same state of the database. I could take a test snapshot as a starting point as I fire them up but the applications don't have the ability to do an exact replay of their own activity.



I've use the tcpdump script from http://www.mysqlperformanceblog.com/2008/11/07/poor-mans-query-logging/ to monitor activity but this doesn't tell me which queries are coming from which connections. Part of the playback I'm wanting is a multi threaded approach that replays the activity from the same number of threads that were actually in use.



I can't afford to turn on general query logging b/c my production master wouldn't be able to handle the performance hit for that.



The whole snapshot part of the process is to have a golden start point database to ensure everything is the same during the start of each test run.










share|improve this question
















Does anyone know of any existing tools/products that accomplish what I'm trying to do?



After searching around for a bit the only thing I could find surrounding what I'm trying to accomplish is someone else looking for the same thing http://forums.mysql.com/read.php?24,192250,192250



The idea is I'd like to capture all the traffic to my master to save for a replay log against a snapshot of the entire database taken when the monitoring started. Bin logs won't serve what I want since they only include writes. I want read activity to realistically view the effects of proposed changes with "real" production traffic.



Real production traffic meaning everything from all applications that are hitting the database to be modified. If there's some application level change, tests running just that app don't account for other activity going on in the system at the time. Running all applications in a test environment aren't guaranteed to have the same state of the database. I could take a test snapshot as a starting point as I fire them up but the applications don't have the ability to do an exact replay of their own activity.



I've use the tcpdump script from http://www.mysqlperformanceblog.com/2008/11/07/poor-mans-query-logging/ to monitor activity but this doesn't tell me which queries are coming from which connections. Part of the playback I'm wanting is a multi threaded approach that replays the activity from the same number of threads that were actually in use.



I can't afford to turn on general query logging b/c my production master wouldn't be able to handle the performance hit for that.



The whole snapshot part of the process is to have a golden start point database to ensure everything is the same during the start of each test run.







mysql performance performance-testing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 5 '17 at 17:05









RolandoMySQLDBA

144k24228385




144k24228385










asked Sep 30 '11 at 19:14









atxdbaatxdba

3,90342754




3,90342754





bumped to the homepage by Community 14 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 14 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • Hmm, what kind of traffic does your production server get that it can't handle the 'performance' hit of the general_log ? An interesting case-study at the worst difference from 'no logging' was cpu bound with 20 concurrent connections and it still was able to handle 22k transactions/minute and faired much better on i/o bound (ugh) server

    – Derek Downey
    Sep 30 '11 at 20:40











  • Also don't want to be misunderstood...I am curious, not meant to be attacking.

    – Derek Downey
    Sep 30 '11 at 20:41











  • After rereading how I worded it I feel I kind of lied. It would have been more appropriate to say "I've always heard that was bad and didn't want to play around in production to get hard numbers."

    – atxdba
    Oct 4 '11 at 3:31



















  • Hmm, what kind of traffic does your production server get that it can't handle the 'performance' hit of the general_log ? An interesting case-study at the worst difference from 'no logging' was cpu bound with 20 concurrent connections and it still was able to handle 22k transactions/minute and faired much better on i/o bound (ugh) server

    – Derek Downey
    Sep 30 '11 at 20:40











  • Also don't want to be misunderstood...I am curious, not meant to be attacking.

    – Derek Downey
    Sep 30 '11 at 20:41











  • After rereading how I worded it I feel I kind of lied. It would have been more appropriate to say "I've always heard that was bad and didn't want to play around in production to get hard numbers."

    – atxdba
    Oct 4 '11 at 3:31

















Hmm, what kind of traffic does your production server get that it can't handle the 'performance' hit of the general_log ? An interesting case-study at the worst difference from 'no logging' was cpu bound with 20 concurrent connections and it still was able to handle 22k transactions/minute and faired much better on i/o bound (ugh) server

– Derek Downey
Sep 30 '11 at 20:40





Hmm, what kind of traffic does your production server get that it can't handle the 'performance' hit of the general_log ? An interesting case-study at the worst difference from 'no logging' was cpu bound with 20 concurrent connections and it still was able to handle 22k transactions/minute and faired much better on i/o bound (ugh) server

– Derek Downey
Sep 30 '11 at 20:40













Also don't want to be misunderstood...I am curious, not meant to be attacking.

– Derek Downey
Sep 30 '11 at 20:41





Also don't want to be misunderstood...I am curious, not meant to be attacking.

– Derek Downey
Sep 30 '11 at 20:41













After rereading how I worded it I feel I kind of lied. It would have been more appropriate to say "I've always heard that was bad and didn't want to play around in production to get hard numbers."

– atxdba
Oct 4 '11 at 3:31





After rereading how I worded it I feel I kind of lied. It would have been more appropriate to say "I've always heard that was bad and didn't want to play around in production to get hard numbers."

– atxdba
Oct 4 '11 at 3:31










2 Answers
2






active

oldest

votes


















0














I believe that pt-query-digest from the Percona Toolkit (http://www.percona.com/doc/percona-toolkit/2.0/pt-query-digest.html) is what you are looking for.






share|improve this answer































    0














    I actually have a suggestion that will take some work but is doable.



    THEORETICAL SCENARIO




    • You have 3 databases: db1, db2, db3

    • Production DB had binary logging enabled


    What you need are the following to create enough traffic:




    • mysqldump of Production DB two weeks old (TwoWeekOldData.sql) of db1, db2 and db3

    • Binary logs on Production DB that are two or more weeks old

    • Four DB Servers running MySQL


      • DBServers 1-3 are Slaves to Production DB

      • DBServers 1-4 have Binary Logging Enabled

      • DBServer1 has replicate-do-db=db1 and has only db1 data loaded

      • DBServer2 has replicate-do-db=db2 and has only db2 data loaded

      • DBServer3 has replicate-do-db=db3 and has only db3 data loaded

      • DBServer4 has all data from TwoWeekOldData.sql loaded




    You will need the following:




    • Run mysqlbinlog against all binary logs on Slave1 and collect SQL into db1traffic.sql

    • Run mysqlbinlog against all binary logs on Slave2 and collect SQL into db2traffic.sql

    • Run mysqlbinlog against all binary logs on Slave3 and collect SQL into db3traffic.sql


    • Run these three traffic SQL files in parallel like this




      • mysql -hdb4 -u... -p... -A -Ddb1 < db1traffic.sql &

      • mysql -hdb4 -u... -p... -A -Ddb2 < db2traffic.sql &

      • mysql -hdb4 -u... -p... -A -Ddb3 < db3traffic.sql &




    • You can use DBServer4 to measure




      • server load

      • query performance

      • I/O utilization

      • Whatever you want to measure




    As I mentioned in the outset, this takes some setup work along with access to old data and all binary log entriers since the old data's last backup.



    I hope this gives you some guidance on producing real traffic rather that simulating ordinary load.



    GIVE IT A TRY!!






    share|improve this answer


























    • FYI This simulates only commands that update the db as it only uses the binlogs, it doesn't simulate select statement, which is where a lot of load comes from.

      – timetofly
      Jan 3 '18 at 17:55












    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f6391%2fmulti-threaded-replay-of-master-activity-for-load-performance-testing%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I believe that pt-query-digest from the Percona Toolkit (http://www.percona.com/doc/percona-toolkit/2.0/pt-query-digest.html) is what you are looking for.






    share|improve this answer




























      0














      I believe that pt-query-digest from the Percona Toolkit (http://www.percona.com/doc/percona-toolkit/2.0/pt-query-digest.html) is what you are looking for.






      share|improve this answer


























        0












        0








        0







        I believe that pt-query-digest from the Percona Toolkit (http://www.percona.com/doc/percona-toolkit/2.0/pt-query-digest.html) is what you are looking for.






        share|improve this answer













        I believe that pt-query-digest from the Percona Toolkit (http://www.percona.com/doc/percona-toolkit/2.0/pt-query-digest.html) is what you are looking for.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 21 '12 at 23:34









        malonsomalonso

        16125




        16125

























            0














            I actually have a suggestion that will take some work but is doable.



            THEORETICAL SCENARIO




            • You have 3 databases: db1, db2, db3

            • Production DB had binary logging enabled


            What you need are the following to create enough traffic:




            • mysqldump of Production DB two weeks old (TwoWeekOldData.sql) of db1, db2 and db3

            • Binary logs on Production DB that are two or more weeks old

            • Four DB Servers running MySQL


              • DBServers 1-3 are Slaves to Production DB

              • DBServers 1-4 have Binary Logging Enabled

              • DBServer1 has replicate-do-db=db1 and has only db1 data loaded

              • DBServer2 has replicate-do-db=db2 and has only db2 data loaded

              • DBServer3 has replicate-do-db=db3 and has only db3 data loaded

              • DBServer4 has all data from TwoWeekOldData.sql loaded




            You will need the following:




            • Run mysqlbinlog against all binary logs on Slave1 and collect SQL into db1traffic.sql

            • Run mysqlbinlog against all binary logs on Slave2 and collect SQL into db2traffic.sql

            • Run mysqlbinlog against all binary logs on Slave3 and collect SQL into db3traffic.sql


            • Run these three traffic SQL files in parallel like this




              • mysql -hdb4 -u... -p... -A -Ddb1 < db1traffic.sql &

              • mysql -hdb4 -u... -p... -A -Ddb2 < db2traffic.sql &

              • mysql -hdb4 -u... -p... -A -Ddb3 < db3traffic.sql &




            • You can use DBServer4 to measure




              • server load

              • query performance

              • I/O utilization

              • Whatever you want to measure




            As I mentioned in the outset, this takes some setup work along with access to old data and all binary log entriers since the old data's last backup.



            I hope this gives you some guidance on producing real traffic rather that simulating ordinary load.



            GIVE IT A TRY!!






            share|improve this answer


























            • FYI This simulates only commands that update the db as it only uses the binlogs, it doesn't simulate select statement, which is where a lot of load comes from.

              – timetofly
              Jan 3 '18 at 17:55
















            0














            I actually have a suggestion that will take some work but is doable.



            THEORETICAL SCENARIO




            • You have 3 databases: db1, db2, db3

            • Production DB had binary logging enabled


            What you need are the following to create enough traffic:




            • mysqldump of Production DB two weeks old (TwoWeekOldData.sql) of db1, db2 and db3

            • Binary logs on Production DB that are two or more weeks old

            • Four DB Servers running MySQL


              • DBServers 1-3 are Slaves to Production DB

              • DBServers 1-4 have Binary Logging Enabled

              • DBServer1 has replicate-do-db=db1 and has only db1 data loaded

              • DBServer2 has replicate-do-db=db2 and has only db2 data loaded

              • DBServer3 has replicate-do-db=db3 and has only db3 data loaded

              • DBServer4 has all data from TwoWeekOldData.sql loaded




            You will need the following:




            • Run mysqlbinlog against all binary logs on Slave1 and collect SQL into db1traffic.sql

            • Run mysqlbinlog against all binary logs on Slave2 and collect SQL into db2traffic.sql

            • Run mysqlbinlog against all binary logs on Slave3 and collect SQL into db3traffic.sql


            • Run these three traffic SQL files in parallel like this




              • mysql -hdb4 -u... -p... -A -Ddb1 < db1traffic.sql &

              • mysql -hdb4 -u... -p... -A -Ddb2 < db2traffic.sql &

              • mysql -hdb4 -u... -p... -A -Ddb3 < db3traffic.sql &




            • You can use DBServer4 to measure




              • server load

              • query performance

              • I/O utilization

              • Whatever you want to measure




            As I mentioned in the outset, this takes some setup work along with access to old data and all binary log entriers since the old data's last backup.



            I hope this gives you some guidance on producing real traffic rather that simulating ordinary load.



            GIVE IT A TRY!!






            share|improve this answer


























            • FYI This simulates only commands that update the db as it only uses the binlogs, it doesn't simulate select statement, which is where a lot of load comes from.

              – timetofly
              Jan 3 '18 at 17:55














            0












            0








            0







            I actually have a suggestion that will take some work but is doable.



            THEORETICAL SCENARIO




            • You have 3 databases: db1, db2, db3

            • Production DB had binary logging enabled


            What you need are the following to create enough traffic:




            • mysqldump of Production DB two weeks old (TwoWeekOldData.sql) of db1, db2 and db3

            • Binary logs on Production DB that are two or more weeks old

            • Four DB Servers running MySQL


              • DBServers 1-3 are Slaves to Production DB

              • DBServers 1-4 have Binary Logging Enabled

              • DBServer1 has replicate-do-db=db1 and has only db1 data loaded

              • DBServer2 has replicate-do-db=db2 and has only db2 data loaded

              • DBServer3 has replicate-do-db=db3 and has only db3 data loaded

              • DBServer4 has all data from TwoWeekOldData.sql loaded




            You will need the following:




            • Run mysqlbinlog against all binary logs on Slave1 and collect SQL into db1traffic.sql

            • Run mysqlbinlog against all binary logs on Slave2 and collect SQL into db2traffic.sql

            • Run mysqlbinlog against all binary logs on Slave3 and collect SQL into db3traffic.sql


            • Run these three traffic SQL files in parallel like this




              • mysql -hdb4 -u... -p... -A -Ddb1 < db1traffic.sql &

              • mysql -hdb4 -u... -p... -A -Ddb2 < db2traffic.sql &

              • mysql -hdb4 -u... -p... -A -Ddb3 < db3traffic.sql &




            • You can use DBServer4 to measure




              • server load

              • query performance

              • I/O utilization

              • Whatever you want to measure




            As I mentioned in the outset, this takes some setup work along with access to old data and all binary log entriers since the old data's last backup.



            I hope this gives you some guidance on producing real traffic rather that simulating ordinary load.



            GIVE IT A TRY!!






            share|improve this answer















            I actually have a suggestion that will take some work but is doable.



            THEORETICAL SCENARIO




            • You have 3 databases: db1, db2, db3

            • Production DB had binary logging enabled


            What you need are the following to create enough traffic:




            • mysqldump of Production DB two weeks old (TwoWeekOldData.sql) of db1, db2 and db3

            • Binary logs on Production DB that are two or more weeks old

            • Four DB Servers running MySQL


              • DBServers 1-3 are Slaves to Production DB

              • DBServers 1-4 have Binary Logging Enabled

              • DBServer1 has replicate-do-db=db1 and has only db1 data loaded

              • DBServer2 has replicate-do-db=db2 and has only db2 data loaded

              • DBServer3 has replicate-do-db=db3 and has only db3 data loaded

              • DBServer4 has all data from TwoWeekOldData.sql loaded




            You will need the following:




            • Run mysqlbinlog against all binary logs on Slave1 and collect SQL into db1traffic.sql

            • Run mysqlbinlog against all binary logs on Slave2 and collect SQL into db2traffic.sql

            • Run mysqlbinlog against all binary logs on Slave3 and collect SQL into db3traffic.sql


            • Run these three traffic SQL files in parallel like this




              • mysql -hdb4 -u... -p... -A -Ddb1 < db1traffic.sql &

              • mysql -hdb4 -u... -p... -A -Ddb2 < db2traffic.sql &

              • mysql -hdb4 -u... -p... -A -Ddb3 < db3traffic.sql &




            • You can use DBServer4 to measure




              • server load

              • query performance

              • I/O utilization

              • Whatever you want to measure




            As I mentioned in the outset, this takes some setup work along with access to old data and all binary log entriers since the old data's last backup.



            I hope this gives you some guidance on producing real traffic rather that simulating ordinary load.



            GIVE IT A TRY!!







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Oct 27 '16 at 14:55









            Philᵀᴹ

            25.7k65591




            25.7k65591










            answered Oct 26 '11 at 20:45









            RolandoMySQLDBARolandoMySQLDBA

            144k24228385




            144k24228385













            • FYI This simulates only commands that update the db as it only uses the binlogs, it doesn't simulate select statement, which is where a lot of load comes from.

              – timetofly
              Jan 3 '18 at 17:55



















            • FYI This simulates only commands that update the db as it only uses the binlogs, it doesn't simulate select statement, which is where a lot of load comes from.

              – timetofly
              Jan 3 '18 at 17:55

















            FYI This simulates only commands that update the db as it only uses the binlogs, it doesn't simulate select statement, which is where a lot of load comes from.

            – timetofly
            Jan 3 '18 at 17:55





            FYI This simulates only commands that update the db as it only uses the binlogs, it doesn't simulate select statement, which is where a lot of load comes from.

            – timetofly
            Jan 3 '18 at 17:55


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f6391%2fmulti-threaded-replay-of-master-activity-for-load-performance-testing%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Liste der Baudenkmale in Friedland (Mecklenburg)

            Single-Malt-Whisky

            Czorneboh