Understanding percentile computation












0












$begingroup$


I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:



a = np.array([0,0.2,0.4,0.7,1])
p = np.percentile(a,99)
print(p)

0.988


I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!










share|cite|improve this question











$endgroup$

















    0












    $begingroup$


    I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:



    a = np.array([0,0.2,0.4,0.7,1])
    p = np.percentile(a,99)
    print(p)

    0.988


    I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!










    share|cite|improve this question











    $endgroup$















      0












      0








      0





      $begingroup$


      I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:



      a = np.array([0,0.2,0.4,0.7,1])
      p = np.percentile(a,99)
      print(p)

      0.988


      I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!










      share|cite|improve this question











      $endgroup$




      I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:



      a = np.array([0,0.2,0.4,0.7,1])
      p = np.percentile(a,99)
      print(p)

      0.988


      I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!







      statistics descriptive-statistics python percentile






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Jan 14 at 18:02









      gt6989b

      35k22557




      35k22557










      asked Jan 14 at 17:55









      Jane SullyJane Sully

      1084




      1084






















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          The correct result would be the number at position $5$: $a_5 =1$.



          A $p$-th percentile $P_p$ is characterized by the following two properties:





          • At most $p%$ of the data is less than $P_p$


          • At most $(100-p)%$ of the data is greater than $P_p$


          Let $n$ be the number of data items. There are two cases:




          • If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
            $$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$

          • If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.


          Summary:
          The percentile function in "numpy" (np) is mathematically not correct.






          share|cite|improve this answer









          $endgroup$





















            1












            $begingroup$

            HINT



            Look at the documentation of your percentile function, and notice that it is using linear interpolation in places where the data was not available.



            Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?






            share|cite|improve this answer









            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "69"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              noCode: true, onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3073526%2funderstanding-percentile-computation%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              The correct result would be the number at position $5$: $a_5 =1$.



              A $p$-th percentile $P_p$ is characterized by the following two properties:





              • At most $p%$ of the data is less than $P_p$


              • At most $(100-p)%$ of the data is greater than $P_p$


              Let $n$ be the number of data items. There are two cases:




              • If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
                $$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$

              • If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.


              Summary:
              The percentile function in "numpy" (np) is mathematically not correct.






              share|cite|improve this answer









              $endgroup$


















                1












                $begingroup$

                The correct result would be the number at position $5$: $a_5 =1$.



                A $p$-th percentile $P_p$ is characterized by the following two properties:





                • At most $p%$ of the data is less than $P_p$


                • At most $(100-p)%$ of the data is greater than $P_p$


                Let $n$ be the number of data items. There are two cases:




                • If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
                  $$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$

                • If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.


                Summary:
                The percentile function in "numpy" (np) is mathematically not correct.






                share|cite|improve this answer









                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  The correct result would be the number at position $5$: $a_5 =1$.



                  A $p$-th percentile $P_p$ is characterized by the following two properties:





                  • At most $p%$ of the data is less than $P_p$


                  • At most $(100-p)%$ of the data is greater than $P_p$


                  Let $n$ be the number of data items. There are two cases:




                  • If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
                    $$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$

                  • If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.


                  Summary:
                  The percentile function in "numpy" (np) is mathematically not correct.






                  share|cite|improve this answer









                  $endgroup$



                  The correct result would be the number at position $5$: $a_5 =1$.



                  A $p$-th percentile $P_p$ is characterized by the following two properties:





                  • At most $p%$ of the data is less than $P_p$


                  • At most $(100-p)%$ of the data is greater than $P_p$


                  Let $n$ be the number of data items. There are two cases:




                  • If $ncdotfrac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $leftlceil ncdotfrac{p}{100} rightrceil$ (rounding up) is the $p$-th percentile. In your case
                    $$5cdotfrac{99}{100}=4.95 stackrel{}{longrightarrow}lceil ncdotfrac{p}{100}rceil = 5$$

                  • If $ncdotfrac{p}{100}$ is an integer, then any value starting from the data item at position $ncdotfrac{p}{100}$ till the item at position $ncdotfrac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.


                  Summary:
                  The percentile function in "numpy" (np) is mathematically not correct.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Jan 15 at 12:27









                  trancelocationtrancelocation

                  13.1k1827




                  13.1k1827























                      1












                      $begingroup$

                      HINT



                      Look at the documentation of your percentile function, and notice that it is using linear interpolation in places where the data was not available.



                      Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?






                      share|cite|improve this answer









                      $endgroup$


















                        1












                        $begingroup$

                        HINT



                        Look at the documentation of your percentile function, and notice that it is using linear interpolation in places where the data was not available.



                        Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?






                        share|cite|improve this answer









                        $endgroup$
















                          1












                          1








                          1





                          $begingroup$

                          HINT



                          Look at the documentation of your percentile function, and notice that it is using linear interpolation in places where the data was not available.



                          Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?






                          share|cite|improve this answer









                          $endgroup$



                          HINT



                          Look at the documentation of your percentile function, and notice that it is using linear interpolation in places where the data was not available.



                          Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered Jan 14 at 18:02









                          gt6989bgt6989b

                          35k22557




                          35k22557






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Mathematics Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3073526%2funderstanding-percentile-computation%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Human spaceflight

                              Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

                              張江高科駅