How to convert a pandas MultiIndex DataFrame into a 3D array





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







6















Suppose I have a MultiIndex DataFrame:



                                c       o       l       u
major timestamp
ONE 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

TWO 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008


I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.



should create an array:



array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])


One used to be able to do this with pd.Panel:



panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)
...


How would I be able to most effectively accomplish this with a multi index dataframe?
Thanks










share|improve this question































    6















    Suppose I have a MultiIndex DataFrame:



                                    c       o       l       u
    major timestamp
    ONE 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
    2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
    2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
    2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
    2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

    TWO 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
    2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
    2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
    2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
    2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008


    I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.



    should create an array:



    array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
    [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

    [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
    [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

    [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
    [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

    [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
    [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])


    One used to be able to do this with pd.Panel:



    panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)
    ...


    How would I be able to most effectively accomplish this with a multi index dataframe?
    Thanks










    share|improve this question



























      6












      6








      6


      2






      Suppose I have a MultiIndex DataFrame:



                                      c       o       l       u
      major timestamp
      ONE 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

      TWO 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008


      I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.



      should create an array:



      array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
      [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

      [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
      [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

      [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
      [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

      [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
      [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])


      One used to be able to do this with pd.Panel:



      panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)
      ...


      How would I be able to most effectively accomplish this with a multi index dataframe?
      Thanks










      share|improve this question
















      Suppose I have a MultiIndex DataFrame:



                                      c       o       l       u
      major timestamp
      ONE 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

      TWO 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
      2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008


      I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.



      should create an array:



      array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
      [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

      [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
      [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

      [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
      [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

      [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
      [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])


      One used to be able to do this with pd.Panel:



      panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)
      ...


      How would I be able to most effectively accomplish this with a multi index dataframe?
      Thanks







      python arrays pandas numpy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 10 at 11:34







      Brad

















      asked Feb 10 at 11:25









      BradBrad

      331311




      331311
























          2 Answers
          2






          active

          oldest

          votes


















          3














          How about using xarray?



          res = df.to_xarray().to_array()


          Result is an array of shape (4, 15, 5)



          In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.






          share|improve this answer































            5














            Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:



            arr = df.values.reshape(15, 100, 4)


            Then call transpose to rearrange the order of the axes:



            arr = arr.transpose(2, 0, 1)


            Now arr has shape (4, 15, 100).





            Using reshape/transpose is ~960x faster than to_xarray().to_array():



            In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))

            In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
            3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

            In [24]: %timeit df.to_xarray().to_array()
            3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

            In [25]: 3180/3.31
            Out[25]: 960.7250755287009





            share|improve this answer


























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54615882%2fhow-to-convert-a-pandas-multiindex-dataframe-into-a-3d-array%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              How about using xarray?



              res = df.to_xarray().to_array()


              Result is an array of shape (4, 15, 5)



              In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.






              share|improve this answer




























                3














                How about using xarray?



                res = df.to_xarray().to_array()


                Result is an array of shape (4, 15, 5)



                In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.






                share|improve this answer


























                  3












                  3








                  3







                  How about using xarray?



                  res = df.to_xarray().to_array()


                  Result is an array of shape (4, 15, 5)



                  In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.






                  share|improve this answer













                  How about using xarray?



                  res = df.to_xarray().to_array()


                  Result is an array of shape (4, 15, 5)



                  In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Feb 10 at 11:40









                  Josh FriedlanderJosh Friedlander

                  3,1911933




                  3,1911933

























                      5














                      Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:



                      arr = df.values.reshape(15, 100, 4)


                      Then call transpose to rearrange the order of the axes:



                      arr = arr.transpose(2, 0, 1)


                      Now arr has shape (4, 15, 100).





                      Using reshape/transpose is ~960x faster than to_xarray().to_array():



                      In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))

                      In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
                      3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

                      In [24]: %timeit df.to_xarray().to_array()
                      3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

                      In [25]: 3180/3.31
                      Out[25]: 960.7250755287009





                      share|improve this answer






























                        5














                        Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:



                        arr = df.values.reshape(15, 100, 4)


                        Then call transpose to rearrange the order of the axes:



                        arr = arr.transpose(2, 0, 1)


                        Now arr has shape (4, 15, 100).





                        Using reshape/transpose is ~960x faster than to_xarray().to_array():



                        In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))

                        In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
                        3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

                        In [24]: %timeit df.to_xarray().to_array()
                        3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

                        In [25]: 3180/3.31
                        Out[25]: 960.7250755287009





                        share|improve this answer




























                          5












                          5








                          5







                          Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:



                          arr = df.values.reshape(15, 100, 4)


                          Then call transpose to rearrange the order of the axes:



                          arr = arr.transpose(2, 0, 1)


                          Now arr has shape (4, 15, 100).





                          Using reshape/transpose is ~960x faster than to_xarray().to_array():



                          In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))

                          In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
                          3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

                          In [24]: %timeit df.to_xarray().to_array()
                          3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

                          In [25]: 3180/3.31
                          Out[25]: 960.7250755287009





                          share|improve this answer















                          Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:



                          arr = df.values.reshape(15, 100, 4)


                          Then call transpose to rearrange the order of the axes:



                          arr = arr.transpose(2, 0, 1)


                          Now arr has shape (4, 15, 100).





                          Using reshape/transpose is ~960x faster than to_xarray().to_array():



                          In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))

                          In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
                          3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

                          In [24]: %timeit df.to_xarray().to_array()
                          3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

                          In [25]: 3180/3.31
                          Out[25]: 960.7250755287009






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Feb 10 at 11:52

























                          answered Feb 10 at 11:32









                          unutbuunutbu

                          562k10612141268




                          562k10612141268






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54615882%2fhow-to-convert-a-pandas-multiindex-dataframe-into-a-3d-array%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Human spaceflight

                              Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

                              張江高科駅