Query to obtain weighted percentile
Trying generate SQL to compute a weighted continuous value at a given set of percentile values (the 25%,50%,and 75% levels used below, but solution should allow for an arbitrary parameter level). In other words, want to find the interpolated "raw" values, weighted by "cnt", at each of the 25%, 50% and 75% cumulative percentiles for the test data in "source" table below.
Test data: (Table: source)
| site | dateval | raw | cnt |
+--------+------------+-------+---------+
| A | 2019-01-05 | 45 | 14 |
| A | 2019-01-05 | 52 | 178 |
| A | 2019-01-05 | 45 | 9 |
| A | 2019-01-05 | 37 | 75 |
| A | 2019-01-05 | 23 | 98 |
| A | 2019-01-05 | 78 | 102 |
| A | 2019-01-05 | 56 | 9 |
| A | 2019-01-05 | 17 | 54 |
| A | 2019-01-05 | 56 | 8 |
| A | 2019-01-06 | 33 | 35 |
| A | 2019-01-06 | 67 | 45 |
| A | 2019-01-06 | 65 | 93 |
| A | 2019-01-06 | 89 | 113 |
| A | 2019-01-06 | 52 | 64 |
| A | 2019-01-06 | 101 | 12 |
| B | 2019-01-05 | 5 | 25 |
| B | 2019-01-05 | 16 | 48 |
| B | 2019-01-05 | 12 | 107 |
| B | 2019-01-05 | 25 | 78 |
| B | 2019-01-05 | 44 | 53 |
| B | 2019-01-05 | 8 | 12 |
| B | 2019-01-05 | 31 | 32 |
| B | 2019-01-06 | 34 | 87 |
| B | 2019-01-06 | 18 | 35 |
| B | 2019-01-06 | 51 | 17 |
| B | 2019-01-06 | 22 | 23 |
| B | 2019-01-06 | 14 | 52 |
| B | 2019-01-06 | 6 | 34 |
+--------+------------+-------+---------+
Expected output (rounded to nearest 1/100th):
| site | dateval | p00 | p25 | p50 | p75 | p100 |
+--------+------------+---------+---------+---------+---------+---------+
| A | 2019-01-05 | 17.00 | 22.07 | 45.92 | 51.30 | 78.00 |
| A | 2019-01-06 | 33.00 | 49.48 | 63.46 | 73.72 | 101.00 |
| B | 2019-01-05 | 5.00 | 9.93 | 14.79 | 24.57 | 44.00 |
| B | 2019-01-06 | 6.00 | 10.31 | 18.52 | 27.79 | 51.00 |
+--------+------------+---------+---------+---------+---------+---------+
Current SQL
The below computes percentile values at discreet points, based on the "raw" values present in table "source." However, the desired output is the "raw" value that corresponds to a given percentile on a continuous basis (for simplicity, the interpolation between discreet "raw" levels is linear instead of splines/other). Frankly, not sure the following approach is the most appropriate path:
WITH raw_lvl AS (
SELECT "site", "dateval", "raw", sum("cnt") AS "sumcnt"
FROM source
GROUP BY "site", "dateval", "raw"
), cum_raw AS (
SELECT tlr.*, sum(tlr."sumcnt") OVER "win_cr" AS "cumsumcnt"
FROM raw_lvl AS "tlr"
WINDOW "win_cr" AS (PARTITION BY tlr."site", tlr."dateval" ORDER BY tlr."raw" ASC)
)
SELECT cr.*, cr."cumsumcnt"/(sum(cr."sumcnt") OVER "win_pr") AS "percentile"
FROM cum_raw AS cr
WINDOW "win_pr" AS (PARTITION BY cr."site", cr."dateval");
Postgres version 10.3
postgresql postgresql-10
add a comment |
Trying generate SQL to compute a weighted continuous value at a given set of percentile values (the 25%,50%,and 75% levels used below, but solution should allow for an arbitrary parameter level). In other words, want to find the interpolated "raw" values, weighted by "cnt", at each of the 25%, 50% and 75% cumulative percentiles for the test data in "source" table below.
Test data: (Table: source)
| site | dateval | raw | cnt |
+--------+------------+-------+---------+
| A | 2019-01-05 | 45 | 14 |
| A | 2019-01-05 | 52 | 178 |
| A | 2019-01-05 | 45 | 9 |
| A | 2019-01-05 | 37 | 75 |
| A | 2019-01-05 | 23 | 98 |
| A | 2019-01-05 | 78 | 102 |
| A | 2019-01-05 | 56 | 9 |
| A | 2019-01-05 | 17 | 54 |
| A | 2019-01-05 | 56 | 8 |
| A | 2019-01-06 | 33 | 35 |
| A | 2019-01-06 | 67 | 45 |
| A | 2019-01-06 | 65 | 93 |
| A | 2019-01-06 | 89 | 113 |
| A | 2019-01-06 | 52 | 64 |
| A | 2019-01-06 | 101 | 12 |
| B | 2019-01-05 | 5 | 25 |
| B | 2019-01-05 | 16 | 48 |
| B | 2019-01-05 | 12 | 107 |
| B | 2019-01-05 | 25 | 78 |
| B | 2019-01-05 | 44 | 53 |
| B | 2019-01-05 | 8 | 12 |
| B | 2019-01-05 | 31 | 32 |
| B | 2019-01-06 | 34 | 87 |
| B | 2019-01-06 | 18 | 35 |
| B | 2019-01-06 | 51 | 17 |
| B | 2019-01-06 | 22 | 23 |
| B | 2019-01-06 | 14 | 52 |
| B | 2019-01-06 | 6 | 34 |
+--------+------------+-------+---------+
Expected output (rounded to nearest 1/100th):
| site | dateval | p00 | p25 | p50 | p75 | p100 |
+--------+------------+---------+---------+---------+---------+---------+
| A | 2019-01-05 | 17.00 | 22.07 | 45.92 | 51.30 | 78.00 |
| A | 2019-01-06 | 33.00 | 49.48 | 63.46 | 73.72 | 101.00 |
| B | 2019-01-05 | 5.00 | 9.93 | 14.79 | 24.57 | 44.00 |
| B | 2019-01-06 | 6.00 | 10.31 | 18.52 | 27.79 | 51.00 |
+--------+------------+---------+---------+---------+---------+---------+
Current SQL
The below computes percentile values at discreet points, based on the "raw" values present in table "source." However, the desired output is the "raw" value that corresponds to a given percentile on a continuous basis (for simplicity, the interpolation between discreet "raw" levels is linear instead of splines/other). Frankly, not sure the following approach is the most appropriate path:
WITH raw_lvl AS (
SELECT "site", "dateval", "raw", sum("cnt") AS "sumcnt"
FROM source
GROUP BY "site", "dateval", "raw"
), cum_raw AS (
SELECT tlr.*, sum(tlr."sumcnt") OVER "win_cr" AS "cumsumcnt"
FROM raw_lvl AS "tlr"
WINDOW "win_cr" AS (PARTITION BY tlr."site", tlr."dateval" ORDER BY tlr."raw" ASC)
)
SELECT cr.*, cr."cumsumcnt"/(sum(cr."sumcnt") OVER "win_pr") AS "percentile"
FROM cum_raw AS cr
WINDOW "win_pr" AS (PARTITION BY cr."site", cr."dateval");
Postgres version 10.3
postgresql postgresql-10
add a comment |
Trying generate SQL to compute a weighted continuous value at a given set of percentile values (the 25%,50%,and 75% levels used below, but solution should allow for an arbitrary parameter level). In other words, want to find the interpolated "raw" values, weighted by "cnt", at each of the 25%, 50% and 75% cumulative percentiles for the test data in "source" table below.
Test data: (Table: source)
| site | dateval | raw | cnt |
+--------+------------+-------+---------+
| A | 2019-01-05 | 45 | 14 |
| A | 2019-01-05 | 52 | 178 |
| A | 2019-01-05 | 45 | 9 |
| A | 2019-01-05 | 37 | 75 |
| A | 2019-01-05 | 23 | 98 |
| A | 2019-01-05 | 78 | 102 |
| A | 2019-01-05 | 56 | 9 |
| A | 2019-01-05 | 17 | 54 |
| A | 2019-01-05 | 56 | 8 |
| A | 2019-01-06 | 33 | 35 |
| A | 2019-01-06 | 67 | 45 |
| A | 2019-01-06 | 65 | 93 |
| A | 2019-01-06 | 89 | 113 |
| A | 2019-01-06 | 52 | 64 |
| A | 2019-01-06 | 101 | 12 |
| B | 2019-01-05 | 5 | 25 |
| B | 2019-01-05 | 16 | 48 |
| B | 2019-01-05 | 12 | 107 |
| B | 2019-01-05 | 25 | 78 |
| B | 2019-01-05 | 44 | 53 |
| B | 2019-01-05 | 8 | 12 |
| B | 2019-01-05 | 31 | 32 |
| B | 2019-01-06 | 34 | 87 |
| B | 2019-01-06 | 18 | 35 |
| B | 2019-01-06 | 51 | 17 |
| B | 2019-01-06 | 22 | 23 |
| B | 2019-01-06 | 14 | 52 |
| B | 2019-01-06 | 6 | 34 |
+--------+------------+-------+---------+
Expected output (rounded to nearest 1/100th):
| site | dateval | p00 | p25 | p50 | p75 | p100 |
+--------+------------+---------+---------+---------+---------+---------+
| A | 2019-01-05 | 17.00 | 22.07 | 45.92 | 51.30 | 78.00 |
| A | 2019-01-06 | 33.00 | 49.48 | 63.46 | 73.72 | 101.00 |
| B | 2019-01-05 | 5.00 | 9.93 | 14.79 | 24.57 | 44.00 |
| B | 2019-01-06 | 6.00 | 10.31 | 18.52 | 27.79 | 51.00 |
+--------+------------+---------+---------+---------+---------+---------+
Current SQL
The below computes percentile values at discreet points, based on the "raw" values present in table "source." However, the desired output is the "raw" value that corresponds to a given percentile on a continuous basis (for simplicity, the interpolation between discreet "raw" levels is linear instead of splines/other). Frankly, not sure the following approach is the most appropriate path:
WITH raw_lvl AS (
SELECT "site", "dateval", "raw", sum("cnt") AS "sumcnt"
FROM source
GROUP BY "site", "dateval", "raw"
), cum_raw AS (
SELECT tlr.*, sum(tlr."sumcnt") OVER "win_cr" AS "cumsumcnt"
FROM raw_lvl AS "tlr"
WINDOW "win_cr" AS (PARTITION BY tlr."site", tlr."dateval" ORDER BY tlr."raw" ASC)
)
SELECT cr.*, cr."cumsumcnt"/(sum(cr."sumcnt") OVER "win_pr") AS "percentile"
FROM cum_raw AS cr
WINDOW "win_pr" AS (PARTITION BY cr."site", cr."dateval");
Postgres version 10.3
postgresql postgresql-10
Trying generate SQL to compute a weighted continuous value at a given set of percentile values (the 25%,50%,and 75% levels used below, but solution should allow for an arbitrary parameter level). In other words, want to find the interpolated "raw" values, weighted by "cnt", at each of the 25%, 50% and 75% cumulative percentiles for the test data in "source" table below.
Test data: (Table: source)
| site | dateval | raw | cnt |
+--------+------------+-------+---------+
| A | 2019-01-05 | 45 | 14 |
| A | 2019-01-05 | 52 | 178 |
| A | 2019-01-05 | 45 | 9 |
| A | 2019-01-05 | 37 | 75 |
| A | 2019-01-05 | 23 | 98 |
| A | 2019-01-05 | 78 | 102 |
| A | 2019-01-05 | 56 | 9 |
| A | 2019-01-05 | 17 | 54 |
| A | 2019-01-05 | 56 | 8 |
| A | 2019-01-06 | 33 | 35 |
| A | 2019-01-06 | 67 | 45 |
| A | 2019-01-06 | 65 | 93 |
| A | 2019-01-06 | 89 | 113 |
| A | 2019-01-06 | 52 | 64 |
| A | 2019-01-06 | 101 | 12 |
| B | 2019-01-05 | 5 | 25 |
| B | 2019-01-05 | 16 | 48 |
| B | 2019-01-05 | 12 | 107 |
| B | 2019-01-05 | 25 | 78 |
| B | 2019-01-05 | 44 | 53 |
| B | 2019-01-05 | 8 | 12 |
| B | 2019-01-05 | 31 | 32 |
| B | 2019-01-06 | 34 | 87 |
| B | 2019-01-06 | 18 | 35 |
| B | 2019-01-06 | 51 | 17 |
| B | 2019-01-06 | 22 | 23 |
| B | 2019-01-06 | 14 | 52 |
| B | 2019-01-06 | 6 | 34 |
+--------+------------+-------+---------+
Expected output (rounded to nearest 1/100th):
| site | dateval | p00 | p25 | p50 | p75 | p100 |
+--------+------------+---------+---------+---------+---------+---------+
| A | 2019-01-05 | 17.00 | 22.07 | 45.92 | 51.30 | 78.00 |
| A | 2019-01-06 | 33.00 | 49.48 | 63.46 | 73.72 | 101.00 |
| B | 2019-01-05 | 5.00 | 9.93 | 14.79 | 24.57 | 44.00 |
| B | 2019-01-06 | 6.00 | 10.31 | 18.52 | 27.79 | 51.00 |
+--------+------------+---------+---------+---------+---------+---------+
Current SQL
The below computes percentile values at discreet points, based on the "raw" values present in table "source." However, the desired output is the "raw" value that corresponds to a given percentile on a continuous basis (for simplicity, the interpolation between discreet "raw" levels is linear instead of splines/other). Frankly, not sure the following approach is the most appropriate path:
WITH raw_lvl AS (
SELECT "site", "dateval", "raw", sum("cnt") AS "sumcnt"
FROM source
GROUP BY "site", "dateval", "raw"
), cum_raw AS (
SELECT tlr.*, sum(tlr."sumcnt") OVER "win_cr" AS "cumsumcnt"
FROM raw_lvl AS "tlr"
WINDOW "win_cr" AS (PARTITION BY tlr."site", tlr."dateval" ORDER BY tlr."raw" ASC)
)
SELECT cr.*, cr."cumsumcnt"/(sum(cr."sumcnt") OVER "win_pr") AS "percentile"
FROM cum_raw AS cr
WINDOW "win_pr" AS (PARTITION BY cr."site", cr."dateval");
Postgres version 10.3
postgresql postgresql-10
postgresql postgresql-10
asked 3 mins ago
WheeWhee
336
336
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f227543%2fquery-to-obtain-weighted-percentile%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f227543%2fquery-to-obtain-weighted-percentile%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown