What is the faster way to count occurrences of equal sublists in a nested list?
I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.
I have done that with some pandas
data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.
So for instance my nested list is here:
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
I need to have:
res = [[1, 3, 2, 2], [1, 3, 5, 1]]
EDIT
Order in res
does not matter at all.
python
add a comment |
I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.
I have done that with some pandas
data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.
So for instance my nested list is here:
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
I need to have:
res = [[1, 3, 2, 2], [1, 3, 5, 1]]
EDIT
Order in res
does not matter at all.
python
add a comment |
I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.
I have done that with some pandas
data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.
So for instance my nested list is here:
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
I need to have:
res = [[1, 3, 2, 2], [1, 3, 5, 1]]
EDIT
Order in res
does not matter at all.
python
I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.
I have done that with some pandas
data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.
So for instance my nested list is here:
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
I need to have:
res = [[1, 3, 2, 2], [1, 3, 5, 1]]
EDIT
Order in res
does not matter at all.
python
python
edited Jan 25 at 10:58
Muhammad Ahmad
2,1321422
2,1321422
asked Jan 25 at 10:45
Léo JoubertLéo Joubert
149210
149210
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:
from collections import Counter
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
print(result)
Output
[[1, 3, 5, 1], [1, 3, 2, 2]]
1
This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course
– Chris_Rands
Jan 25 at 10:57
add a comment |
This is quite an odd output to want but it is of course possible. I suggest using collections.Counter()
, no doubt others will make different suggestions and a timeit
style comparison would reveal the fastest of course for particular data sets:
>>> from collections import Counter
>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
[[1, 3, 2, 2], [1, 3, 5, 1]]
Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter
recipe.
add a comment |
If numpy
is an option, you could use np.unique
setting axis to 0
and return_counts
to True
, and concatenate the unique rows and counts using np.vstack
:
l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
x, c = np.unique(l, axis=0, return_counts=True)
np.vstack([x.T,c]).T
array([[1, 3, 2, 2],
[1, 3, 5, 1]])
add a comment |
Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict()
as following:
In [5]: from collections import defaultdict
In [6]: d = defaultdict(int)
In [7]: for sub in l:
...: d[tuple(sub)] += 1
...:
In [8]: d
Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})
This will give you a dictionary of your sub-lists as the key and their counts as the value.
Another way is to create your own dictionary object:
In [9]: class customdict(dict):
...:
...: def __getitem__(self, key):
...: try:
...: val = super(customdict, self).__getitem__(key)
...: except KeyError:
...: self[key] = [*key, 0]
...: else:
...: val[-1] += 1
...: self[key] = val
...: return val
...:
...:
In [10]: m = customdict()
In [11]: for sub in l:
...: m[tuple(sub)]
...:
In [12]:
In [12]: m
Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}
In [13]: m.values()
Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54363705%2fwhat-is-the-faster-way-to-count-occurrences-of-equal-sublists-in-a-nested-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:
from collections import Counter
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
print(result)
Output
[[1, 3, 5, 1], [1, 3, 2, 2]]
1
This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course
– Chris_Rands
Jan 25 at 10:57
add a comment |
If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:
from collections import Counter
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
print(result)
Output
[[1, 3, 5, 1], [1, 3, 2, 2]]
1
This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course
– Chris_Rands
Jan 25 at 10:57
add a comment |
If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:
from collections import Counter
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
print(result)
Output
[[1, 3, 5, 1], [1, 3, 2, 2]]
If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:
from collections import Counter
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
print(result)
Output
[[1, 3, 5, 1], [1, 3, 2, 2]]
edited Jan 25 at 11:13
answered Jan 25 at 10:51
Daniel MesejoDaniel Mesejo
18.6k21432
18.6k21432
1
This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course
– Chris_Rands
Jan 25 at 10:57
add a comment |
1
This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course
– Chris_Rands
Jan 25 at 10:57
1
1
This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course
– Chris_Rands
Jan 25 at 10:57
This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course
– Chris_Rands
Jan 25 at 10:57
add a comment |
This is quite an odd output to want but it is of course possible. I suggest using collections.Counter()
, no doubt others will make different suggestions and a timeit
style comparison would reveal the fastest of course for particular data sets:
>>> from collections import Counter
>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
[[1, 3, 2, 2], [1, 3, 5, 1]]
Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter
recipe.
add a comment |
This is quite an odd output to want but it is of course possible. I suggest using collections.Counter()
, no doubt others will make different suggestions and a timeit
style comparison would reveal the fastest of course for particular data sets:
>>> from collections import Counter
>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
[[1, 3, 2, 2], [1, 3, 5, 1]]
Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter
recipe.
add a comment |
This is quite an odd output to want but it is of course possible. I suggest using collections.Counter()
, no doubt others will make different suggestions and a timeit
style comparison would reveal the fastest of course for particular data sets:
>>> from collections import Counter
>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
[[1, 3, 2, 2], [1, 3, 5, 1]]
Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter
recipe.
This is quite an odd output to want but it is of course possible. I suggest using collections.Counter()
, no doubt others will make different suggestions and a timeit
style comparison would reveal the fastest of course for particular data sets:
>>> from collections import Counter
>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
[[1, 3, 2, 2], [1, 3, 5, 1]]
Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter
recipe.
answered Jan 25 at 10:50
Chris_RandsChris_Rands
16.7k53975
16.7k53975
add a comment |
add a comment |
If numpy
is an option, you could use np.unique
setting axis to 0
and return_counts
to True
, and concatenate the unique rows and counts using np.vstack
:
l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
x, c = np.unique(l, axis=0, return_counts=True)
np.vstack([x.T,c]).T
array([[1, 3, 2, 2],
[1, 3, 5, 1]])
add a comment |
If numpy
is an option, you could use np.unique
setting axis to 0
and return_counts
to True
, and concatenate the unique rows and counts using np.vstack
:
l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
x, c = np.unique(l, axis=0, return_counts=True)
np.vstack([x.T,c]).T
array([[1, 3, 2, 2],
[1, 3, 5, 1]])
add a comment |
If numpy
is an option, you could use np.unique
setting axis to 0
and return_counts
to True
, and concatenate the unique rows and counts using np.vstack
:
l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
x, c = np.unique(l, axis=0, return_counts=True)
np.vstack([x.T,c]).T
array([[1, 3, 2, 2],
[1, 3, 5, 1]])
If numpy
is an option, you could use np.unique
setting axis to 0
and return_counts
to True
, and concatenate the unique rows and counts using np.vstack
:
l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
x, c = np.unique(l, axis=0, return_counts=True)
np.vstack([x.T,c]).T
array([[1, 3, 2, 2],
[1, 3, 5, 1]])
answered Jan 25 at 10:53
yatuyatu
11.5k31137
11.5k31137
add a comment |
add a comment |
Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict()
as following:
In [5]: from collections import defaultdict
In [6]: d = defaultdict(int)
In [7]: for sub in l:
...: d[tuple(sub)] += 1
...:
In [8]: d
Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})
This will give you a dictionary of your sub-lists as the key and their counts as the value.
Another way is to create your own dictionary object:
In [9]: class customdict(dict):
...:
...: def __getitem__(self, key):
...: try:
...: val = super(customdict, self).__getitem__(key)
...: except KeyError:
...: self[key] = [*key, 0]
...: else:
...: val[-1] += 1
...: self[key] = val
...: return val
...:
...:
In [10]: m = customdict()
In [11]: for sub in l:
...: m[tuple(sub)]
...:
In [12]:
In [12]: m
Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}
In [13]: m.values()
Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])
add a comment |
Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict()
as following:
In [5]: from collections import defaultdict
In [6]: d = defaultdict(int)
In [7]: for sub in l:
...: d[tuple(sub)] += 1
...:
In [8]: d
Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})
This will give you a dictionary of your sub-lists as the key and their counts as the value.
Another way is to create your own dictionary object:
In [9]: class customdict(dict):
...:
...: def __getitem__(self, key):
...: try:
...: val = super(customdict, self).__getitem__(key)
...: except KeyError:
...: self[key] = [*key, 0]
...: else:
...: val[-1] += 1
...: self[key] = val
...: return val
...:
...:
In [10]: m = customdict()
In [11]: for sub in l:
...: m[tuple(sub)]
...:
In [12]:
In [12]: m
Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}
In [13]: m.values()
Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])
add a comment |
Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict()
as following:
In [5]: from collections import defaultdict
In [6]: d = defaultdict(int)
In [7]: for sub in l:
...: d[tuple(sub)] += 1
...:
In [8]: d
Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})
This will give you a dictionary of your sub-lists as the key and their counts as the value.
Another way is to create your own dictionary object:
In [9]: class customdict(dict):
...:
...: def __getitem__(self, key):
...: try:
...: val = super(customdict, self).__getitem__(key)
...: except KeyError:
...: self[key] = [*key, 0]
...: else:
...: val[-1] += 1
...: self[key] = val
...: return val
...:
...:
In [10]: m = customdict()
In [11]: for sub in l:
...: m[tuple(sub)]
...:
In [12]:
In [12]: m
Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}
In [13]: m.values()
Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])
Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict()
as following:
In [5]: from collections import defaultdict
In [6]: d = defaultdict(int)
In [7]: for sub in l:
...: d[tuple(sub)] += 1
...:
In [8]: d
Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})
This will give you a dictionary of your sub-lists as the key and their counts as the value.
Another way is to create your own dictionary object:
In [9]: class customdict(dict):
...:
...: def __getitem__(self, key):
...: try:
...: val = super(customdict, self).__getitem__(key)
...: except KeyError:
...: self[key] = [*key, 0]
...: else:
...: val[-1] += 1
...: self[key] = val
...: return val
...:
...:
In [10]: m = customdict()
In [11]: for sub in l:
...: m[tuple(sub)]
...:
In [12]:
In [12]: m
Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}
In [13]: m.values()
Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])
edited Jan 25 at 14:08
answered Jan 25 at 13:24
KasrâmvdKasrâmvd
79k1091128
79k1091128
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54363705%2fwhat-is-the-faster-way-to-count-occurrences-of-equal-sublists-in-a-nested-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown