Optimizing table/indices for getting the latest row (filtering by one additional column)












5















I have a SQL Server table (SQL Server 2012 SP3 Standard edition) that stores a bunch of configuration information (basically text blobs) for different organizations. The schema is something like this:



[ConfigurationID]       INT IDENTITY (1,1) NOT NULL,
[OrganizationID] INT NOT NULL,
[TimestampUtc] DATETIME NOT NULL,
[ConfigurationData] NVARCHAR (MAX) NOT NULL,
[ChangedBy] NVARCHAR (256) NOT NULL,
[Comment] NVARCHAR (MAX) NOT NULL,
[ChangeType] INT NOT NULL


The TimestampUtc will always be increasing (I won't ever be INSERTing "back-dated" entries into the table), and the rows won't ever be UPDATEd (I'm only INSERTing new rows). For some OrganizationIDs there will be lots of rows, for some very few, and a new row for any OrganizationID may be INSERTed at any time.



If needed, I can guarantee uniqueness of TimestampUtc (but it would be great to have a solution that didn't need that).



INSERTs are relatively rare (at most dozens of times per day, but typically much less than that), reads are very frequent (essentially on every web request to my application).



My goals are:




  • Getting the ConfigurationData with the latest TimestampUtc for a given OrganizationID should be extremely fast regardless of the size of the table

  • INSERT performance doesn't matter too much, but I'd like to avoid horrible index fragmentation if at all possible (so my first idea of a unique clustered index on OrganizationID ASC, TimestampUtc DESC is probably not a great idea).


Questions



I know I can denormalize and just store the latest ConfigurationData in one table and the historical log of previous values in another table, but is it possible to meet my goals with just one table? What's the best way to do it? (I.e. what's the best index structure? do I need to change anything about the table schema, etc.?)










share|improve this question
















bumped to the homepage by Community 26 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Which version and edition of sql server are you using?

    – SqlZim
    Feb 16 '17 at 17:06











  • SQL Server 2012 SP3 Standard, I've added this to the question

    – Eugene O
    Feb 16 '17 at 17:13








  • 4





    Why do you place index fragmentation in such high regard? The index you describe sounds perfect (though if you are going after a specific OrgID, I wonder why that wouldn't be a non-clustered index, or why it has to be unique, since timestamps aren't guaranteed to be unique). Worry about fragmentation when it actually proves to be an obvious performance problem. Until then, design for workload performance, not to avoid potential fragmentation goblins.

    – Aaron Bertrand
    Feb 16 '17 at 17:42













  • With my proposed index, wouldn't it add a whole new page on literally every insert? Is this not something I should be worried about? Also, why would you recommend a non-clustered index?

    – Eugene O
    Feb 16 '17 at 18:47











  • Will your two NVARCHAR(MAX) columns potentially be very long? I assume that if so, you will not be indexing those columns. Check the discussion on NVARCHAR(MAX) at: stackoverflow.com/questions/148398/…

    – RLF
    Feb 16 '17 at 22:05


















5















I have a SQL Server table (SQL Server 2012 SP3 Standard edition) that stores a bunch of configuration information (basically text blobs) for different organizations. The schema is something like this:



[ConfigurationID]       INT IDENTITY (1,1) NOT NULL,
[OrganizationID] INT NOT NULL,
[TimestampUtc] DATETIME NOT NULL,
[ConfigurationData] NVARCHAR (MAX) NOT NULL,
[ChangedBy] NVARCHAR (256) NOT NULL,
[Comment] NVARCHAR (MAX) NOT NULL,
[ChangeType] INT NOT NULL


The TimestampUtc will always be increasing (I won't ever be INSERTing "back-dated" entries into the table), and the rows won't ever be UPDATEd (I'm only INSERTing new rows). For some OrganizationIDs there will be lots of rows, for some very few, and a new row for any OrganizationID may be INSERTed at any time.



If needed, I can guarantee uniqueness of TimestampUtc (but it would be great to have a solution that didn't need that).



INSERTs are relatively rare (at most dozens of times per day, but typically much less than that), reads are very frequent (essentially on every web request to my application).



My goals are:




  • Getting the ConfigurationData with the latest TimestampUtc for a given OrganizationID should be extremely fast regardless of the size of the table

  • INSERT performance doesn't matter too much, but I'd like to avoid horrible index fragmentation if at all possible (so my first idea of a unique clustered index on OrganizationID ASC, TimestampUtc DESC is probably not a great idea).


Questions



I know I can denormalize and just store the latest ConfigurationData in one table and the historical log of previous values in another table, but is it possible to meet my goals with just one table? What's the best way to do it? (I.e. what's the best index structure? do I need to change anything about the table schema, etc.?)










share|improve this question
















bumped to the homepage by Community 26 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Which version and edition of sql server are you using?

    – SqlZim
    Feb 16 '17 at 17:06











  • SQL Server 2012 SP3 Standard, I've added this to the question

    – Eugene O
    Feb 16 '17 at 17:13








  • 4





    Why do you place index fragmentation in such high regard? The index you describe sounds perfect (though if you are going after a specific OrgID, I wonder why that wouldn't be a non-clustered index, or why it has to be unique, since timestamps aren't guaranteed to be unique). Worry about fragmentation when it actually proves to be an obvious performance problem. Until then, design for workload performance, not to avoid potential fragmentation goblins.

    – Aaron Bertrand
    Feb 16 '17 at 17:42













  • With my proposed index, wouldn't it add a whole new page on literally every insert? Is this not something I should be worried about? Also, why would you recommend a non-clustered index?

    – Eugene O
    Feb 16 '17 at 18:47











  • Will your two NVARCHAR(MAX) columns potentially be very long? I assume that if so, you will not be indexing those columns. Check the discussion on NVARCHAR(MAX) at: stackoverflow.com/questions/148398/…

    – RLF
    Feb 16 '17 at 22:05
















5












5








5


1






I have a SQL Server table (SQL Server 2012 SP3 Standard edition) that stores a bunch of configuration information (basically text blobs) for different organizations. The schema is something like this:



[ConfigurationID]       INT IDENTITY (1,1) NOT NULL,
[OrganizationID] INT NOT NULL,
[TimestampUtc] DATETIME NOT NULL,
[ConfigurationData] NVARCHAR (MAX) NOT NULL,
[ChangedBy] NVARCHAR (256) NOT NULL,
[Comment] NVARCHAR (MAX) NOT NULL,
[ChangeType] INT NOT NULL


The TimestampUtc will always be increasing (I won't ever be INSERTing "back-dated" entries into the table), and the rows won't ever be UPDATEd (I'm only INSERTing new rows). For some OrganizationIDs there will be lots of rows, for some very few, and a new row for any OrganizationID may be INSERTed at any time.



If needed, I can guarantee uniqueness of TimestampUtc (but it would be great to have a solution that didn't need that).



INSERTs are relatively rare (at most dozens of times per day, but typically much less than that), reads are very frequent (essentially on every web request to my application).



My goals are:




  • Getting the ConfigurationData with the latest TimestampUtc for a given OrganizationID should be extremely fast regardless of the size of the table

  • INSERT performance doesn't matter too much, but I'd like to avoid horrible index fragmentation if at all possible (so my first idea of a unique clustered index on OrganizationID ASC, TimestampUtc DESC is probably not a great idea).


Questions



I know I can denormalize and just store the latest ConfigurationData in one table and the historical log of previous values in another table, but is it possible to meet my goals with just one table? What's the best way to do it? (I.e. what's the best index structure? do I need to change anything about the table schema, etc.?)










share|improve this question
















I have a SQL Server table (SQL Server 2012 SP3 Standard edition) that stores a bunch of configuration information (basically text blobs) for different organizations. The schema is something like this:



[ConfigurationID]       INT IDENTITY (1,1) NOT NULL,
[OrganizationID] INT NOT NULL,
[TimestampUtc] DATETIME NOT NULL,
[ConfigurationData] NVARCHAR (MAX) NOT NULL,
[ChangedBy] NVARCHAR (256) NOT NULL,
[Comment] NVARCHAR (MAX) NOT NULL,
[ChangeType] INT NOT NULL


The TimestampUtc will always be increasing (I won't ever be INSERTing "back-dated" entries into the table), and the rows won't ever be UPDATEd (I'm only INSERTing new rows). For some OrganizationIDs there will be lots of rows, for some very few, and a new row for any OrganizationID may be INSERTed at any time.



If needed, I can guarantee uniqueness of TimestampUtc (but it would be great to have a solution that didn't need that).



INSERTs are relatively rare (at most dozens of times per day, but typically much less than that), reads are very frequent (essentially on every web request to my application).



My goals are:




  • Getting the ConfigurationData with the latest TimestampUtc for a given OrganizationID should be extremely fast regardless of the size of the table

  • INSERT performance doesn't matter too much, but I'd like to avoid horrible index fragmentation if at all possible (so my first idea of a unique clustered index on OrganizationID ASC, TimestampUtc DESC is probably not a great idea).


Questions



I know I can denormalize and just store the latest ConfigurationData in one table and the historical log of previous values in another table, but is it possible to meet my goals with just one table? What's the best way to do it? (I.e. what's the best index structure? do I need to change anything about the table schema, etc.?)







sql-server database-design sql-server-2012 performance index






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 16 '17 at 18:31









MDCCL

6,85331745




6,85331745










asked Feb 16 '17 at 16:54









Eugene OEugene O

1263




1263





bumped to the homepage by Community 26 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 26 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • Which version and edition of sql server are you using?

    – SqlZim
    Feb 16 '17 at 17:06











  • SQL Server 2012 SP3 Standard, I've added this to the question

    – Eugene O
    Feb 16 '17 at 17:13








  • 4





    Why do you place index fragmentation in such high regard? The index you describe sounds perfect (though if you are going after a specific OrgID, I wonder why that wouldn't be a non-clustered index, or why it has to be unique, since timestamps aren't guaranteed to be unique). Worry about fragmentation when it actually proves to be an obvious performance problem. Until then, design for workload performance, not to avoid potential fragmentation goblins.

    – Aaron Bertrand
    Feb 16 '17 at 17:42













  • With my proposed index, wouldn't it add a whole new page on literally every insert? Is this not something I should be worried about? Also, why would you recommend a non-clustered index?

    – Eugene O
    Feb 16 '17 at 18:47











  • Will your two NVARCHAR(MAX) columns potentially be very long? I assume that if so, you will not be indexing those columns. Check the discussion on NVARCHAR(MAX) at: stackoverflow.com/questions/148398/…

    – RLF
    Feb 16 '17 at 22:05





















  • Which version and edition of sql server are you using?

    – SqlZim
    Feb 16 '17 at 17:06











  • SQL Server 2012 SP3 Standard, I've added this to the question

    – Eugene O
    Feb 16 '17 at 17:13








  • 4





    Why do you place index fragmentation in such high regard? The index you describe sounds perfect (though if you are going after a specific OrgID, I wonder why that wouldn't be a non-clustered index, or why it has to be unique, since timestamps aren't guaranteed to be unique). Worry about fragmentation when it actually proves to be an obvious performance problem. Until then, design for workload performance, not to avoid potential fragmentation goblins.

    – Aaron Bertrand
    Feb 16 '17 at 17:42













  • With my proposed index, wouldn't it add a whole new page on literally every insert? Is this not something I should be worried about? Also, why would you recommend a non-clustered index?

    – Eugene O
    Feb 16 '17 at 18:47











  • Will your two NVARCHAR(MAX) columns potentially be very long? I assume that if so, you will not be indexing those columns. Check the discussion on NVARCHAR(MAX) at: stackoverflow.com/questions/148398/…

    – RLF
    Feb 16 '17 at 22:05



















Which version and edition of sql server are you using?

– SqlZim
Feb 16 '17 at 17:06





Which version and edition of sql server are you using?

– SqlZim
Feb 16 '17 at 17:06













SQL Server 2012 SP3 Standard, I've added this to the question

– Eugene O
Feb 16 '17 at 17:13







SQL Server 2012 SP3 Standard, I've added this to the question

– Eugene O
Feb 16 '17 at 17:13






4




4





Why do you place index fragmentation in such high regard? The index you describe sounds perfect (though if you are going after a specific OrgID, I wonder why that wouldn't be a non-clustered index, or why it has to be unique, since timestamps aren't guaranteed to be unique). Worry about fragmentation when it actually proves to be an obvious performance problem. Until then, design for workload performance, not to avoid potential fragmentation goblins.

– Aaron Bertrand
Feb 16 '17 at 17:42







Why do you place index fragmentation in such high regard? The index you describe sounds perfect (though if you are going after a specific OrgID, I wonder why that wouldn't be a non-clustered index, or why it has to be unique, since timestamps aren't guaranteed to be unique). Worry about fragmentation when it actually proves to be an obvious performance problem. Until then, design for workload performance, not to avoid potential fragmentation goblins.

– Aaron Bertrand
Feb 16 '17 at 17:42















With my proposed index, wouldn't it add a whole new page on literally every insert? Is this not something I should be worried about? Also, why would you recommend a non-clustered index?

– Eugene O
Feb 16 '17 at 18:47





With my proposed index, wouldn't it add a whole new page on literally every insert? Is this not something I should be worried about? Also, why would you recommend a non-clustered index?

– Eugene O
Feb 16 '17 at 18:47













Will your two NVARCHAR(MAX) columns potentially be very long? I assume that if so, you will not be indexing those columns. Check the discussion on NVARCHAR(MAX) at: stackoverflow.com/questions/148398/…

– RLF
Feb 16 '17 at 22:05







Will your two NVARCHAR(MAX) columns potentially be very long? I assume that if so, you will not be indexing those columns. Check the discussion on NVARCHAR(MAX) at: stackoverflow.com/questions/148398/…

– RLF
Feb 16 '17 at 22:05












2 Answers
2






active

oldest

votes


















0














Given the low rate of inserts, your proposed index will be completely fine, and perfect for the usage goal.



Given a fresh index with fill factor 100%, and sufficient history for each organisation to fill a page, there will be a page split on the first subsequent insert for each organisation. But then there won't be another page split for that organisation until the new page is filled up.

Even these splits and fragmentation can be mitigated by starting with a <100 fill factor, and regular reorganise.



Aaron's thought regarding using non-clustered was probably because that would allow you to cluster on the always ascending identity column, ensuring the only splits and fragmentation you will get is in a very compact separate index. But I suspect he was only mentioning that in the context of you wanting very much to avoid fragmentation, and not as something really necessary in this scenario.






share|improve this answer































    0














    If you alter your table and add a flag for the active records you can use a filtered index, in this case a unique on OrganizationID where IsActive=1. Of course you had to update the currently active record for an Organization before you can insert the new active entry.



    CREATE UNIQUE INDEX UI_test ON dbo.test (OrganizationID) WHERE IsActive=1





    share|improve this answer























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "182"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f164568%2foptimizing-table-indices-for-getting-the-latest-row-filtering-by-one-additional%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      Given the low rate of inserts, your proposed index will be completely fine, and perfect for the usage goal.



      Given a fresh index with fill factor 100%, and sufficient history for each organisation to fill a page, there will be a page split on the first subsequent insert for each organisation. But then there won't be another page split for that organisation until the new page is filled up.

      Even these splits and fragmentation can be mitigated by starting with a <100 fill factor, and regular reorganise.



      Aaron's thought regarding using non-clustered was probably because that would allow you to cluster on the always ascending identity column, ensuring the only splits and fragmentation you will get is in a very compact separate index. But I suspect he was only mentioning that in the context of you wanting very much to avoid fragmentation, and not as something really necessary in this scenario.






      share|improve this answer




























        0














        Given the low rate of inserts, your proposed index will be completely fine, and perfect for the usage goal.



        Given a fresh index with fill factor 100%, and sufficient history for each organisation to fill a page, there will be a page split on the first subsequent insert for each organisation. But then there won't be another page split for that organisation until the new page is filled up.

        Even these splits and fragmentation can be mitigated by starting with a <100 fill factor, and regular reorganise.



        Aaron's thought regarding using non-clustered was probably because that would allow you to cluster on the always ascending identity column, ensuring the only splits and fragmentation you will get is in a very compact separate index. But I suspect he was only mentioning that in the context of you wanting very much to avoid fragmentation, and not as something really necessary in this scenario.






        share|improve this answer


























          0












          0








          0







          Given the low rate of inserts, your proposed index will be completely fine, and perfect for the usage goal.



          Given a fresh index with fill factor 100%, and sufficient history for each organisation to fill a page, there will be a page split on the first subsequent insert for each organisation. But then there won't be another page split for that organisation until the new page is filled up.

          Even these splits and fragmentation can be mitigated by starting with a <100 fill factor, and regular reorganise.



          Aaron's thought regarding using non-clustered was probably because that would allow you to cluster on the always ascending identity column, ensuring the only splits and fragmentation you will get is in a very compact separate index. But I suspect he was only mentioning that in the context of you wanting very much to avoid fragmentation, and not as something really necessary in this scenario.






          share|improve this answer













          Given the low rate of inserts, your proposed index will be completely fine, and perfect for the usage goal.



          Given a fresh index with fill factor 100%, and sufficient history for each organisation to fill a page, there will be a page split on the first subsequent insert for each organisation. But then there won't be another page split for that organisation until the new page is filled up.

          Even these splits and fragmentation can be mitigated by starting with a <100 fill factor, and regular reorganise.



          Aaron's thought regarding using non-clustered was probably because that would allow you to cluster on the always ascending identity column, ensuring the only splits and fragmentation you will get is in a very compact separate index. But I suspect he was only mentioning that in the context of you wanting very much to avoid fragmentation, and not as something really necessary in this scenario.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 17 '17 at 15:18









          T.H.T.H.

          81536




          81536

























              0














              If you alter your table and add a flag for the active records you can use a filtered index, in this case a unique on OrganizationID where IsActive=1. Of course you had to update the currently active record for an Organization before you can insert the new active entry.



              CREATE UNIQUE INDEX UI_test ON dbo.test (OrganizationID) WHERE IsActive=1





              share|improve this answer




























                0














                If you alter your table and add a flag for the active records you can use a filtered index, in this case a unique on OrganizationID where IsActive=1. Of course you had to update the currently active record for an Organization before you can insert the new active entry.



                CREATE UNIQUE INDEX UI_test ON dbo.test (OrganizationID) WHERE IsActive=1





                share|improve this answer


























                  0












                  0








                  0







                  If you alter your table and add a flag for the active records you can use a filtered index, in this case a unique on OrganizationID where IsActive=1. Of course you had to update the currently active record for an Organization before you can insert the new active entry.



                  CREATE UNIQUE INDEX UI_test ON dbo.test (OrganizationID) WHERE IsActive=1





                  share|improve this answer













                  If you alter your table and add a flag for the active records you can use a filtered index, in this case a unique on OrganizationID where IsActive=1. Of course you had to update the currently active record for an Organization before you can insert the new active entry.



                  CREATE UNIQUE INDEX UI_test ON dbo.test (OrganizationID) WHERE IsActive=1






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jun 16 '17 at 11:03









                  Stefan WilmsStefan Wilms

                  954




                  954






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Database Administrators Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f164568%2foptimizing-table-indices-for-getting-the-latest-row-filtering-by-one-additional%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Liste der Baudenkmale in Friedland (Mecklenburg)

                      Single-Malt-Whisky

                      Czorneboh