How to convert a pandas MultiIndex DataFrame into a 3D array

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

Suppose I have a MultiIndex DataFrame:

                                c       o       l       u

major       timestamp                       

ONE         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008



TWO         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.

should create an array:

array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])

One used to be able to do this with pd.Panel:

panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)

...

How would I be able to most effectively accomplish this with a multi index dataframe?
Thanks

edited Feb 10 at 11:34

asked Feb 10 at 11:25

Brad

331311

add a comment |

Suppose I have a MultiIndex DataFrame:

                                c       o       l       u

major       timestamp                       

ONE         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008



TWO         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

should create an array:

array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])

One used to be able to do this with pd.Panel:

panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)

...

How would I be able to most effectively accomplish this with a multi index dataframe?
Thanks

edited Feb 10 at 11:34

asked Feb 10 at 11:25

Brad

331311

add a comment |

Suppose I have a MultiIndex DataFrame:

                                c       o       l       u

major       timestamp                       

ONE         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008



TWO         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

should create an array:

array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])

One used to be able to do this with pd.Panel:

panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)

...

How would I be able to most effectively accomplish this with a multi index dataframe?
Thanks

edited Feb 10 at 11:34

asked Feb 10 at 11:25

Brad

331311

Suppose I have a MultiIndex DataFrame:

                                c       o       l       u

major       timestamp                       

ONE         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008



TWO         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 

            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

should create an array:

array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],



       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],

        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])

One used to be able to do this with pd.Panel:

panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)

...

How would I be able to most effectively accomplish this with a multi index dataframe?
Thanks

python arrays pandas numpy

edited Feb 10 at 11:34

asked Feb 10 at 11:25

Brad

331311

edited Feb 10 at 11:34

asked Feb 10 at 11:25

Brad

331311

edited Feb 10 at 11:34

asked Feb 10 at 11:25

Brad

331311

asked Feb 10 at 11:25

Brad

331311

asked Feb 10 at 11:25

Brad

331311

add a comment |

2 Answers
2

active

oldest

votes

How about using xarray?

res = df.to_xarray().to_array()

Result is an array of shape (4, 15, 5)

In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.

answered Feb 10 at 11:40

Josh Friedlander

3,1911933

add a comment |

Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:

arr = df.values.reshape(15, 100, 4)

Then call transpose to rearrange the order of the axes:

arr = arr.transpose(2, 0, 1)

Now arr has shape (4, 15, 100).

Using reshape/transpose is ~960x faster than to_xarray().to_array():

In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))



In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)

3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [24]: %timeit df.to_xarray().to_array()

3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



In [25]: 3180/3.31

Out[25]: 960.7250755287009

edited Feb 10 at 11:52

answered Feb 10 at 11:32

unutbu

562k10612141268

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54615882%2fhow-to-convert-a-pandas-multiindex-dataframe-into-a-3d-array%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

How about using xarray?

res = df.to_xarray().to_array()

Result is an array of shape (4, 15, 5)

In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.

answered Feb 10 at 11:40

Josh Friedlander

3,1911933

add a comment |

How about using xarray?

res = df.to_xarray().to_array()

Result is an array of shape (4, 15, 5)

In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.

answered Feb 10 at 11:40

Josh Friedlander

3,1911933

add a comment |

How about using xarray?

res = df.to_xarray().to_array()

Result is an array of shape (4, 15, 5)

In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.

answered Feb 10 at 11:40

Josh Friedlander

3,1911933

How about using xarray?

res = df.to_xarray().to_array()

Result is an array of shape (4, 15, 5)

In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.

answered Feb 10 at 11:40

Josh Friedlander

3,1911933

answered Feb 10 at 11:40

Josh Friedlander

3,1911933

answered Feb 10 at 11:40

Josh Friedlander

3,1911933

answered Feb 10 at 11:40

Josh Friedlander

3,1911933

add a comment |

Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:

arr = df.values.reshape(15, 100, 4)

Then call transpose to rearrange the order of the axes:

arr = arr.transpose(2, 0, 1)

Now arr has shape (4, 15, 100).

Using reshape/transpose is ~960x faster than to_xarray().to_array():

In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))



In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)

3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [24]: %timeit df.to_xarray().to_array()

3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



In [25]: 3180/3.31

Out[25]: 960.7250755287009

edited Feb 10 at 11:52

answered Feb 10 at 11:32

unutbu

562k10612141268

add a comment |

Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:

arr = df.values.reshape(15, 100, 4)

Then call transpose to rearrange the order of the axes:

arr = arr.transpose(2, 0, 1)

Now arr has shape (4, 15, 100).

Using reshape/transpose is ~960x faster than to_xarray().to_array():

In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))



In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)

3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [24]: %timeit df.to_xarray().to_array()

3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



In [25]: 3180/3.31

Out[25]: 960.7250755287009

edited Feb 10 at 11:52

answered Feb 10 at 11:32

unutbu

562k10612141268

add a comment |

Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:

arr = df.values.reshape(15, 100, 4)

Then call transpose to rearrange the order of the axes:

arr = arr.transpose(2, 0, 1)

Now arr has shape (4, 15, 100).

Using reshape/transpose is ~960x faster than to_xarray().to_array():

In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))



In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)

3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [24]: %timeit df.to_xarray().to_array()

3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



In [25]: 3180/3.31

Out[25]: 960.7250755287009

edited Feb 10 at 11:52

answered Feb 10 at 11:32

unutbu

562k10612141268

Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:

arr = df.values.reshape(15, 100, 4)

Then call transpose to rearrange the order of the axes:

arr = arr.transpose(2, 0, 1)

Now arr has shape (4, 15, 100).

Using reshape/transpose is ~960x faster than to_xarray().to_array():

In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))



In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)

3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [24]: %timeit df.to_xarray().to_array()

3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



In [25]: 3180/3.31

Out[25]: 960.7250755287009

edited Feb 10 at 11:52

answered Feb 10 at 11:32

unutbu

562k10612141268

edited Feb 10 at 11:52

answered Feb 10 at 11:32

unutbu

562k10612141268

answered Feb 10 at 11:32

unutbu

562k10612141268

answered Feb 10 at 11:32

unutbu

562k10612141268

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dtyjlui