pg_repack slows down PostgreSQL replication












1














I have a master PostgreSQL 9.5 server and a standby server. For replication I use repmgr (WAL streaming). Typically the delay between master and standby is <5s:



$ psql -t -c "SELECT extract(epoch from now() - pg_last_xact_replay_timestamp());"
0.044554


Periodically pg_repack is invoked on master in order to optimize indexes and tables. Repacking tables causes massive changes in WAL streaming and significantly slows down replication, so that the difference between master and standby could be more than 1h.



Is there a way how to reduce such delay? Is it possible to synchronize newly incoming data with higher priority than repack changes?










share|improve this question














bumped to the homepage by Community 27 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.




















    1














    I have a master PostgreSQL 9.5 server and a standby server. For replication I use repmgr (WAL streaming). Typically the delay between master and standby is <5s:



    $ psql -t -c "SELECT extract(epoch from now() - pg_last_xact_replay_timestamp());"
    0.044554


    Periodically pg_repack is invoked on master in order to optimize indexes and tables. Repacking tables causes massive changes in WAL streaming and significantly slows down replication, so that the difference between master and standby could be more than 1h.



    Is there a way how to reduce such delay? Is it possible to synchronize newly incoming data with higher priority than repack changes?










    share|improve this question














    bumped to the homepage by Community 27 mins ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.


















      1












      1








      1







      I have a master PostgreSQL 9.5 server and a standby server. For replication I use repmgr (WAL streaming). Typically the delay between master and standby is <5s:



      $ psql -t -c "SELECT extract(epoch from now() - pg_last_xact_replay_timestamp());"
      0.044554


      Periodically pg_repack is invoked on master in order to optimize indexes and tables. Repacking tables causes massive changes in WAL streaming and significantly slows down replication, so that the difference between master and standby could be more than 1h.



      Is there a way how to reduce such delay? Is it possible to synchronize newly incoming data with higher priority than repack changes?










      share|improve this question













      I have a master PostgreSQL 9.5 server and a standby server. For replication I use repmgr (WAL streaming). Typically the delay between master and standby is <5s:



      $ psql -t -c "SELECT extract(epoch from now() - pg_last_xact_replay_timestamp());"
      0.044554


      Periodically pg_repack is invoked on master in order to optimize indexes and tables. Repacking tables causes massive changes in WAL streaming and significantly slows down replication, so that the difference between master and standby could be more than 1h.



      Is there a way how to reduce such delay? Is it possible to synchronize newly incoming data with higher priority than repack changes?







      postgresql replication postgresql-9.5 repmgr






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 31 '17 at 9:16









      TombartTombart

      231111




      231111





      bumped to the homepage by Community 27 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 27 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
























          1 Answer
          1






          active

          oldest

          votes


















          0
















          I was about to ask a similar question, but it is related to the replication streaming ("physical"/block related) that streams/replicates the actual data writes to the disk(s). With vacuum full (reindexes), and truncates/restores, and now pg_repack, the tables are rewritten to disk, causing a lot of data writes that need to be streamed to the other side...



          Thus, no, I don't believe you'll be able to do the "prioritization" as the moment the rebuilt table are "swapped" into active table, the new updates/writes on the master, will be going to the rebuild table, not the old table, and then the replica needs that table "available"!



          I've been getting into the habit of killing the replication, doing the major data changes (perhaps a good practise to have it as a "backup" available before the data changes) and then doing a new full pg_basebackup/replication restarts



          Hope this helps to explain the situation you are in and how I've been solving it till now :)



          That said do go read: https://www.depesz.com/2013/06/21/bloat-removal-by-tuples-moving/



          Depesz explains a mechanism that helped him move data to the beginning of the table "on the fly" with data available all the time using code similar to:



          with x as (
          delete from test where id in (999997,999998,999999) returning *
          )
          insert into test
          select * from x;


          this is then run in batches, with $vacuum$ statements running together to clean up the space. Doing this in a slow/managed method, you could be able to do the "repack" with not too far behind replicas.



          Just be careful of triggers on update/insert/delete !






          share|improve this answer























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "182"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f162719%2fpg-repack-slows-down-postgresql-replication%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0
















            I was about to ask a similar question, but it is related to the replication streaming ("physical"/block related) that streams/replicates the actual data writes to the disk(s). With vacuum full (reindexes), and truncates/restores, and now pg_repack, the tables are rewritten to disk, causing a lot of data writes that need to be streamed to the other side...



            Thus, no, I don't believe you'll be able to do the "prioritization" as the moment the rebuilt table are "swapped" into active table, the new updates/writes on the master, will be going to the rebuild table, not the old table, and then the replica needs that table "available"!



            I've been getting into the habit of killing the replication, doing the major data changes (perhaps a good practise to have it as a "backup" available before the data changes) and then doing a new full pg_basebackup/replication restarts



            Hope this helps to explain the situation you are in and how I've been solving it till now :)



            That said do go read: https://www.depesz.com/2013/06/21/bloat-removal-by-tuples-moving/



            Depesz explains a mechanism that helped him move data to the beginning of the table "on the fly" with data available all the time using code similar to:



            with x as (
            delete from test where id in (999997,999998,999999) returning *
            )
            insert into test
            select * from x;


            this is then run in batches, with $vacuum$ statements running together to clean up the space. Doing this in a slow/managed method, you could be able to do the "repack" with not too far behind replicas.



            Just be careful of triggers on update/insert/delete !






            share|improve this answer




























              0
















              I was about to ask a similar question, but it is related to the replication streaming ("physical"/block related) that streams/replicates the actual data writes to the disk(s). With vacuum full (reindexes), and truncates/restores, and now pg_repack, the tables are rewritten to disk, causing a lot of data writes that need to be streamed to the other side...



              Thus, no, I don't believe you'll be able to do the "prioritization" as the moment the rebuilt table are "swapped" into active table, the new updates/writes on the master, will be going to the rebuild table, not the old table, and then the replica needs that table "available"!



              I've been getting into the habit of killing the replication, doing the major data changes (perhaps a good practise to have it as a "backup" available before the data changes) and then doing a new full pg_basebackup/replication restarts



              Hope this helps to explain the situation you are in and how I've been solving it till now :)



              That said do go read: https://www.depesz.com/2013/06/21/bloat-removal-by-tuples-moving/



              Depesz explains a mechanism that helped him move data to the beginning of the table "on the fly" with data available all the time using code similar to:



              with x as (
              delete from test where id in (999997,999998,999999) returning *
              )
              insert into test
              select * from x;


              this is then run in batches, with $vacuum$ statements running together to clean up the space. Doing this in a slow/managed method, you could be able to do the "repack" with not too far behind replicas.



              Just be careful of triggers on update/insert/delete !






              share|improve this answer


























                0












                0








                0








                I was about to ask a similar question, but it is related to the replication streaming ("physical"/block related) that streams/replicates the actual data writes to the disk(s). With vacuum full (reindexes), and truncates/restores, and now pg_repack, the tables are rewritten to disk, causing a lot of data writes that need to be streamed to the other side...



                Thus, no, I don't believe you'll be able to do the "prioritization" as the moment the rebuilt table are "swapped" into active table, the new updates/writes on the master, will be going to the rebuild table, not the old table, and then the replica needs that table "available"!



                I've been getting into the habit of killing the replication, doing the major data changes (perhaps a good practise to have it as a "backup" available before the data changes) and then doing a new full pg_basebackup/replication restarts



                Hope this helps to explain the situation you are in and how I've been solving it till now :)



                That said do go read: https://www.depesz.com/2013/06/21/bloat-removal-by-tuples-moving/



                Depesz explains a mechanism that helped him move data to the beginning of the table "on the fly" with data available all the time using code similar to:



                with x as (
                delete from test where id in (999997,999998,999999) returning *
                )
                insert into test
                select * from x;


                this is then run in batches, with $vacuum$ statements running together to clean up the space. Doing this in a slow/managed method, you could be able to do the "repack" with not too far behind replicas.



                Just be careful of triggers on update/insert/delete !






                share|improve this answer
















                I was about to ask a similar question, but it is related to the replication streaming ("physical"/block related) that streams/replicates the actual data writes to the disk(s). With vacuum full (reindexes), and truncates/restores, and now pg_repack, the tables are rewritten to disk, causing a lot of data writes that need to be streamed to the other side...



                Thus, no, I don't believe you'll be able to do the "prioritization" as the moment the rebuilt table are "swapped" into active table, the new updates/writes on the master, will be going to the rebuild table, not the old table, and then the replica needs that table "available"!



                I've been getting into the habit of killing the replication, doing the major data changes (perhaps a good practise to have it as a "backup" available before the data changes) and then doing a new full pg_basebackup/replication restarts



                Hope this helps to explain the situation you are in and how I've been solving it till now :)



                That said do go read: https://www.depesz.com/2013/06/21/bloat-removal-by-tuples-moving/



                Depesz explains a mechanism that helped him move data to the beginning of the table "on the fly" with data available all the time using code similar to:



                with x as (
                delete from test where id in (999997,999998,999999) returning *
                )
                insert into test
                select * from x;


                this is then run in batches, with $vacuum$ statements running together to clean up the space. Doing this in a slow/managed method, you could be able to do the "repack" with not too far behind replicas.



                Just be careful of triggers on update/insert/delete !







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Aug 14 '17 at 12:13

























                answered Aug 14 '17 at 12:01









                HvisageHvisage

                1113




                1113






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f162719%2fpg-repack-slows-down-postgresql-replication%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Liste der Baudenkmale in Friedland (Mecklenburg)

                    Single-Malt-Whisky

                    Czorneboh