Understanding percentile computation
$begingroup$
I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:
a = np.array([0,0.2,0.4,0.7,1])
p = np.percentile(a,99)
print(p)
0.988
I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!
statistics descriptive-statistics python percentile
$endgroup$
add a comment |
$begingroup$
I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:
a = np.array([0,0.2,0.4,0.7,1])
p = np.percentile(a,99)
print(p)
0.988
I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!
statistics descriptive-statistics python percentile
$endgroup$
add a comment |
$begingroup$
I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:
a = np.array([0,0.2,0.4,0.7,1])
p = np.percentile(a,99)
print(p)
0.988
I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!
statistics descriptive-statistics python percentile
$endgroup$
I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:
a = np.array([0,0.2,0.4,0.7,1])
p = np.percentile(a,99)
print(p)
0.988
I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!
statistics descriptive-statistics python percentile
statistics descriptive-statistics python percentile
edited Jan 14 at 18:02
gt6989b
35k22557
35k22557
asked Jan 14 at 17:55
Jane SullyJane Sully
1084
1084
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
The correct result would be the number at position $5$: $a_5 =1$.
A $p$-th percentile $P_p$ is characterized by the following two properties:
At most $p%$ of the data is less than $P_p$
At most $(100-p)%$ of the data is greater than $P_p$
Let $n$ be the number of data items. There are two cases:
- If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
$$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$
- If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.
Summary:
The percentile function in "numpy" (np) is mathematically not correct.
$endgroup$
add a comment |
$begingroup$
HINT
Look at the documentation of your percentile
function, and notice that it is using linear interpolation in places where the data was not available.
Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3073526%2funderstanding-percentile-computation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The correct result would be the number at position $5$: $a_5 =1$.
A $p$-th percentile $P_p$ is characterized by the following two properties:
At most $p%$ of the data is less than $P_p$
At most $(100-p)%$ of the data is greater than $P_p$
Let $n$ be the number of data items. There are two cases:
- If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
$$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$
- If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.
Summary:
The percentile function in "numpy" (np) is mathematically not correct.
$endgroup$
add a comment |
$begingroup$
The correct result would be the number at position $5$: $a_5 =1$.
A $p$-th percentile $P_p$ is characterized by the following two properties:
At most $p%$ of the data is less than $P_p$
At most $(100-p)%$ of the data is greater than $P_p$
Let $n$ be the number of data items. There are two cases:
- If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
$$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$
- If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.
Summary:
The percentile function in "numpy" (np) is mathematically not correct.
$endgroup$
add a comment |
$begingroup$
The correct result would be the number at position $5$: $a_5 =1$.
A $p$-th percentile $P_p$ is characterized by the following two properties:
At most $p%$ of the data is less than $P_p$
At most $(100-p)%$ of the data is greater than $P_p$
Let $n$ be the number of data items. There are two cases:
- If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
$$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$
- If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.
Summary:
The percentile function in "numpy" (np) is mathematically not correct.
$endgroup$
The correct result would be the number at position $5$: $a_5 =1$.
A $p$-th percentile $P_p$ is characterized by the following two properties:
At most $p%$ of the data is less than $P_p$
At most $(100-p)%$ of the data is greater than $P_p$
Let $n$ be the number of data items. There are two cases:
- If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
$$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$
- If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.
Summary:
The percentile function in "numpy" (np) is mathematically not correct.
answered Jan 15 at 12:27
trancelocationtrancelocation
13.1k1827
13.1k1827
add a comment |
add a comment |
$begingroup$
HINT
Look at the documentation of your percentile
function, and notice that it is using linear interpolation in places where the data was not available.
Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?
$endgroup$
add a comment |
$begingroup$
HINT
Look at the documentation of your percentile
function, and notice that it is using linear interpolation in places where the data was not available.
Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?
$endgroup$
add a comment |
$begingroup$
HINT
Look at the documentation of your percentile
function, and notice that it is using linear interpolation in places where the data was not available.
Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?
$endgroup$
HINT
Look at the documentation of your percentile
function, and notice that it is using linear interpolation in places where the data was not available.
Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?
answered Jan 14 at 18:02
gt6989bgt6989b
35k22557
35k22557
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3073526%2funderstanding-percentile-computation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown