Query to obtain weighted percentile












0















Trying generate SQL to compute a weighted continuous value at a given set of percentile values (the 25%,50%,and 75% levels used below, but solution should allow for an arbitrary parameter level). In other words, want to find the interpolated "raw" values, weighted by "cnt", at each of the 25%, 50% and 75% cumulative percentiles for the test data in "source" table below.



Test data: (Table: source)



|  site  |  dateval   |  raw  |   cnt   |
+--------+------------+-------+---------+
| A | 2019-01-05 | 45 | 14 |
| A | 2019-01-05 | 52 | 178 |
| A | 2019-01-05 | 45 | 9 |
| A | 2019-01-05 | 37 | 75 |
| A | 2019-01-05 | 23 | 98 |
| A | 2019-01-05 | 78 | 102 |
| A | 2019-01-05 | 56 | 9 |
| A | 2019-01-05 | 17 | 54 |
| A | 2019-01-05 | 56 | 8 |
| A | 2019-01-06 | 33 | 35 |
| A | 2019-01-06 | 67 | 45 |
| A | 2019-01-06 | 65 | 93 |
| A | 2019-01-06 | 89 | 113 |
| A | 2019-01-06 | 52 | 64 |
| A | 2019-01-06 | 101 | 12 |
| B | 2019-01-05 | 5 | 25 |
| B | 2019-01-05 | 16 | 48 |
| B | 2019-01-05 | 12 | 107 |
| B | 2019-01-05 | 25 | 78 |
| B | 2019-01-05 | 44 | 53 |
| B | 2019-01-05 | 8 | 12 |
| B | 2019-01-05 | 31 | 32 |
| B | 2019-01-06 | 34 | 87 |
| B | 2019-01-06 | 18 | 35 |
| B | 2019-01-06 | 51 | 17 |
| B | 2019-01-06 | 22 | 23 |
| B | 2019-01-06 | 14 | 52 |
| B | 2019-01-06 | 6 | 34 |
+--------+------------+-------+---------+


Expected output (rounded to nearest 1/100th):



|  site  |   dateval  |   p00   |   p25   |   p50   |   p75   |   p100  |
+--------+------------+---------+---------+---------+---------+---------+
| A | 2019-01-05 | 17.00 | 22.07 | 45.92 | 51.30 | 78.00 |
| A | 2019-01-06 | 33.00 | 49.48 | 63.46 | 73.72 | 101.00 |
| B | 2019-01-05 | 5.00 | 9.93 | 14.79 | 24.57 | 44.00 |
| B | 2019-01-06 | 6.00 | 10.31 | 18.52 | 27.79 | 51.00 |
+--------+------------+---------+---------+---------+---------+---------+


Current SQL



The below computes percentile values at discreet points, based on the "raw" values present in table "source." However, the desired output is the "raw" value that corresponds to a given percentile on a continuous basis (for simplicity, the interpolation between discreet "raw" levels is linear instead of splines/other). Frankly, not sure the following approach is the most appropriate path:



WITH raw_lvl AS (
SELECT "site", "dateval", "raw", sum("cnt") AS "sumcnt"
FROM source
GROUP BY "site", "dateval", "raw"
), cum_raw AS (
SELECT tlr.*, sum(tlr."sumcnt") OVER "win_cr" AS "cumsumcnt"
FROM raw_lvl AS "tlr"
WINDOW "win_cr" AS (PARTITION BY tlr."site", tlr."dateval" ORDER BY tlr."raw" ASC)
)
SELECT cr.*, cr."cumsumcnt"/(sum(cr."sumcnt") OVER "win_pr") AS "percentile"
FROM cum_raw AS cr
WINDOW "win_pr" AS (PARTITION BY cr."site", cr."dateval");


Postgres version 10.3









share



























    0















    Trying generate SQL to compute a weighted continuous value at a given set of percentile values (the 25%,50%,and 75% levels used below, but solution should allow for an arbitrary parameter level). In other words, want to find the interpolated "raw" values, weighted by "cnt", at each of the 25%, 50% and 75% cumulative percentiles for the test data in "source" table below.



    Test data: (Table: source)



    |  site  |  dateval   |  raw  |   cnt   |
    +--------+------------+-------+---------+
    | A | 2019-01-05 | 45 | 14 |
    | A | 2019-01-05 | 52 | 178 |
    | A | 2019-01-05 | 45 | 9 |
    | A | 2019-01-05 | 37 | 75 |
    | A | 2019-01-05 | 23 | 98 |
    | A | 2019-01-05 | 78 | 102 |
    | A | 2019-01-05 | 56 | 9 |
    | A | 2019-01-05 | 17 | 54 |
    | A | 2019-01-05 | 56 | 8 |
    | A | 2019-01-06 | 33 | 35 |
    | A | 2019-01-06 | 67 | 45 |
    | A | 2019-01-06 | 65 | 93 |
    | A | 2019-01-06 | 89 | 113 |
    | A | 2019-01-06 | 52 | 64 |
    | A | 2019-01-06 | 101 | 12 |
    | B | 2019-01-05 | 5 | 25 |
    | B | 2019-01-05 | 16 | 48 |
    | B | 2019-01-05 | 12 | 107 |
    | B | 2019-01-05 | 25 | 78 |
    | B | 2019-01-05 | 44 | 53 |
    | B | 2019-01-05 | 8 | 12 |
    | B | 2019-01-05 | 31 | 32 |
    | B | 2019-01-06 | 34 | 87 |
    | B | 2019-01-06 | 18 | 35 |
    | B | 2019-01-06 | 51 | 17 |
    | B | 2019-01-06 | 22 | 23 |
    | B | 2019-01-06 | 14 | 52 |
    | B | 2019-01-06 | 6 | 34 |
    +--------+------------+-------+---------+


    Expected output (rounded to nearest 1/100th):



    |  site  |   dateval  |   p00   |   p25   |   p50   |   p75   |   p100  |
    +--------+------------+---------+---------+---------+---------+---------+
    | A | 2019-01-05 | 17.00 | 22.07 | 45.92 | 51.30 | 78.00 |
    | A | 2019-01-06 | 33.00 | 49.48 | 63.46 | 73.72 | 101.00 |
    | B | 2019-01-05 | 5.00 | 9.93 | 14.79 | 24.57 | 44.00 |
    | B | 2019-01-06 | 6.00 | 10.31 | 18.52 | 27.79 | 51.00 |
    +--------+------------+---------+---------+---------+---------+---------+


    Current SQL



    The below computes percentile values at discreet points, based on the "raw" values present in table "source." However, the desired output is the "raw" value that corresponds to a given percentile on a continuous basis (for simplicity, the interpolation between discreet "raw" levels is linear instead of splines/other). Frankly, not sure the following approach is the most appropriate path:



    WITH raw_lvl AS (
    SELECT "site", "dateval", "raw", sum("cnt") AS "sumcnt"
    FROM source
    GROUP BY "site", "dateval", "raw"
    ), cum_raw AS (
    SELECT tlr.*, sum(tlr."sumcnt") OVER "win_cr" AS "cumsumcnt"
    FROM raw_lvl AS "tlr"
    WINDOW "win_cr" AS (PARTITION BY tlr."site", tlr."dateval" ORDER BY tlr."raw" ASC)
    )
    SELECT cr.*, cr."cumsumcnt"/(sum(cr."sumcnt") OVER "win_pr") AS "percentile"
    FROM cum_raw AS cr
    WINDOW "win_pr" AS (PARTITION BY cr."site", cr."dateval");


    Postgres version 10.3









    share

























      0












      0








      0








      Trying generate SQL to compute a weighted continuous value at a given set of percentile values (the 25%,50%,and 75% levels used below, but solution should allow for an arbitrary parameter level). In other words, want to find the interpolated "raw" values, weighted by "cnt", at each of the 25%, 50% and 75% cumulative percentiles for the test data in "source" table below.



      Test data: (Table: source)



      |  site  |  dateval   |  raw  |   cnt   |
      +--------+------------+-------+---------+
      | A | 2019-01-05 | 45 | 14 |
      | A | 2019-01-05 | 52 | 178 |
      | A | 2019-01-05 | 45 | 9 |
      | A | 2019-01-05 | 37 | 75 |
      | A | 2019-01-05 | 23 | 98 |
      | A | 2019-01-05 | 78 | 102 |
      | A | 2019-01-05 | 56 | 9 |
      | A | 2019-01-05 | 17 | 54 |
      | A | 2019-01-05 | 56 | 8 |
      | A | 2019-01-06 | 33 | 35 |
      | A | 2019-01-06 | 67 | 45 |
      | A | 2019-01-06 | 65 | 93 |
      | A | 2019-01-06 | 89 | 113 |
      | A | 2019-01-06 | 52 | 64 |
      | A | 2019-01-06 | 101 | 12 |
      | B | 2019-01-05 | 5 | 25 |
      | B | 2019-01-05 | 16 | 48 |
      | B | 2019-01-05 | 12 | 107 |
      | B | 2019-01-05 | 25 | 78 |
      | B | 2019-01-05 | 44 | 53 |
      | B | 2019-01-05 | 8 | 12 |
      | B | 2019-01-05 | 31 | 32 |
      | B | 2019-01-06 | 34 | 87 |
      | B | 2019-01-06 | 18 | 35 |
      | B | 2019-01-06 | 51 | 17 |
      | B | 2019-01-06 | 22 | 23 |
      | B | 2019-01-06 | 14 | 52 |
      | B | 2019-01-06 | 6 | 34 |
      +--------+------------+-------+---------+


      Expected output (rounded to nearest 1/100th):



      |  site  |   dateval  |   p00   |   p25   |   p50   |   p75   |   p100  |
      +--------+------------+---------+---------+---------+---------+---------+
      | A | 2019-01-05 | 17.00 | 22.07 | 45.92 | 51.30 | 78.00 |
      | A | 2019-01-06 | 33.00 | 49.48 | 63.46 | 73.72 | 101.00 |
      | B | 2019-01-05 | 5.00 | 9.93 | 14.79 | 24.57 | 44.00 |
      | B | 2019-01-06 | 6.00 | 10.31 | 18.52 | 27.79 | 51.00 |
      +--------+------------+---------+---------+---------+---------+---------+


      Current SQL



      The below computes percentile values at discreet points, based on the "raw" values present in table "source." However, the desired output is the "raw" value that corresponds to a given percentile on a continuous basis (for simplicity, the interpolation between discreet "raw" levels is linear instead of splines/other). Frankly, not sure the following approach is the most appropriate path:



      WITH raw_lvl AS (
      SELECT "site", "dateval", "raw", sum("cnt") AS "sumcnt"
      FROM source
      GROUP BY "site", "dateval", "raw"
      ), cum_raw AS (
      SELECT tlr.*, sum(tlr."sumcnt") OVER "win_cr" AS "cumsumcnt"
      FROM raw_lvl AS "tlr"
      WINDOW "win_cr" AS (PARTITION BY tlr."site", tlr."dateval" ORDER BY tlr."raw" ASC)
      )
      SELECT cr.*, cr."cumsumcnt"/(sum(cr."sumcnt") OVER "win_pr") AS "percentile"
      FROM cum_raw AS cr
      WINDOW "win_pr" AS (PARTITION BY cr."site", cr."dateval");


      Postgres version 10.3









      share














      Trying generate SQL to compute a weighted continuous value at a given set of percentile values (the 25%,50%,and 75% levels used below, but solution should allow for an arbitrary parameter level). In other words, want to find the interpolated "raw" values, weighted by "cnt", at each of the 25%, 50% and 75% cumulative percentiles for the test data in "source" table below.



      Test data: (Table: source)



      |  site  |  dateval   |  raw  |   cnt   |
      +--------+------------+-------+---------+
      | A | 2019-01-05 | 45 | 14 |
      | A | 2019-01-05 | 52 | 178 |
      | A | 2019-01-05 | 45 | 9 |
      | A | 2019-01-05 | 37 | 75 |
      | A | 2019-01-05 | 23 | 98 |
      | A | 2019-01-05 | 78 | 102 |
      | A | 2019-01-05 | 56 | 9 |
      | A | 2019-01-05 | 17 | 54 |
      | A | 2019-01-05 | 56 | 8 |
      | A | 2019-01-06 | 33 | 35 |
      | A | 2019-01-06 | 67 | 45 |
      | A | 2019-01-06 | 65 | 93 |
      | A | 2019-01-06 | 89 | 113 |
      | A | 2019-01-06 | 52 | 64 |
      | A | 2019-01-06 | 101 | 12 |
      | B | 2019-01-05 | 5 | 25 |
      | B | 2019-01-05 | 16 | 48 |
      | B | 2019-01-05 | 12 | 107 |
      | B | 2019-01-05 | 25 | 78 |
      | B | 2019-01-05 | 44 | 53 |
      | B | 2019-01-05 | 8 | 12 |
      | B | 2019-01-05 | 31 | 32 |
      | B | 2019-01-06 | 34 | 87 |
      | B | 2019-01-06 | 18 | 35 |
      | B | 2019-01-06 | 51 | 17 |
      | B | 2019-01-06 | 22 | 23 |
      | B | 2019-01-06 | 14 | 52 |
      | B | 2019-01-06 | 6 | 34 |
      +--------+------------+-------+---------+


      Expected output (rounded to nearest 1/100th):



      |  site  |   dateval  |   p00   |   p25   |   p50   |   p75   |   p100  |
      +--------+------------+---------+---------+---------+---------+---------+
      | A | 2019-01-05 | 17.00 | 22.07 | 45.92 | 51.30 | 78.00 |
      | A | 2019-01-06 | 33.00 | 49.48 | 63.46 | 73.72 | 101.00 |
      | B | 2019-01-05 | 5.00 | 9.93 | 14.79 | 24.57 | 44.00 |
      | B | 2019-01-06 | 6.00 | 10.31 | 18.52 | 27.79 | 51.00 |
      +--------+------------+---------+---------+---------+---------+---------+


      Current SQL



      The below computes percentile values at discreet points, based on the "raw" values present in table "source." However, the desired output is the "raw" value that corresponds to a given percentile on a continuous basis (for simplicity, the interpolation between discreet "raw" levels is linear instead of splines/other). Frankly, not sure the following approach is the most appropriate path:



      WITH raw_lvl AS (
      SELECT "site", "dateval", "raw", sum("cnt") AS "sumcnt"
      FROM source
      GROUP BY "site", "dateval", "raw"
      ), cum_raw AS (
      SELECT tlr.*, sum(tlr."sumcnt") OVER "win_cr" AS "cumsumcnt"
      FROM raw_lvl AS "tlr"
      WINDOW "win_cr" AS (PARTITION BY tlr."site", tlr."dateval" ORDER BY tlr."raw" ASC)
      )
      SELECT cr.*, cr."cumsumcnt"/(sum(cr."sumcnt") OVER "win_pr") AS "percentile"
      FROM cum_raw AS cr
      WINDOW "win_pr" AS (PARTITION BY cr."site", cr."dateval");


      Postgres version 10.3







      postgresql postgresql-10





      share












      share










      share



      share










      asked 3 mins ago









      WheeWhee

      336




      336






















          0






          active

          oldest

          votes











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "182"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f227543%2fquery-to-obtain-weighted-percentile%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Database Administrators Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f227543%2fquery-to-obtain-weighted-percentile%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Liste der Baudenkmale in Friedland (Mecklenburg)

          Single-Malt-Whisky

          Czorneboh