What is the faster way to count occurrences of equal sublists in a nested list?

I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.

I have done that with some pandas data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.

So for instance my nested list is here:

l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

I need to have:

res = [[1, 3, 2, 2], [1, 3, 5, 1]]

EDIT

Order in res does not matter at all.

edited Jan 25 at 10:58

Muhammad Ahmad

2,1321422

asked Jan 25 at 10:45

Léo Joubert

149210

add a comment |

I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.

So for instance my nested list is here:

l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

I need to have:

res = [[1, 3, 2, 2], [1, 3, 5, 1]]

EDIT

Order in res does not matter at all.

edited Jan 25 at 10:58

Muhammad Ahmad

2,1321422

asked Jan 25 at 10:45

Léo Joubert

149210

add a comment |

I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.

So for instance my nested list is here:

l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

I need to have:

res = [[1, 3, 2, 2], [1, 3, 5, 1]]

EDIT

Order in res does not matter at all.

edited Jan 25 at 10:58

Muhammad Ahmad

2,1321422

asked Jan 25 at 10:45

Léo Joubert

149210

I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.

So for instance my nested list is here:

l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

I need to have:

res = [[1, 3, 2, 2], [1, 3, 5, 1]]

EDIT

Order in res does not matter at all.

python

edited Jan 25 at 10:58

Muhammad Ahmad

2,1321422

asked Jan 25 at 10:45

Léo Joubert

149210

edited Jan 25 at 10:58

Muhammad Ahmad

2,1321422

asked Jan 25 at 10:45

Léo Joubert

149210

edited Jan 25 at 10:58

Muhammad Ahmad

2,1321422

edited Jan 25 at 10:58

Muhammad Ahmad

2,1321422

edited Jan 25 at 10:58

Muhammad Ahmad

2,1321422

asked Jan 25 at 10:45

Léo Joubert

149210

asked Jan 25 at 10:45

Léo Joubert

149210

asked Jan 25 at 10:45

Léo Joubert

149210

add a comment |

4 Answers
4

active

oldest

votes

If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:

from collections import Counter



l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]



result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]

print(result)

Output

[[1, 3, 5, 1], [1, 3, 2, 2]]

edited Jan 25 at 11:13

answered Jan 25 at 10:51

Daniel Mesejo

18.6k21432

1

This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

– Chris_Rands
Jan 25 at 10:57

add a comment |

This is quite an odd output to want but it is of course possible. I suggest using collections.Counter(), no doubt others will make different suggestions and a timeit style comparison would reveal the fastest of course for particular data sets:

>>> from collections import Counter

>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]

[[1, 3, 2, 2], [1, 3, 5, 1]]

Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.

answered Jan 25 at 10:50

Chris_Rands

16.7k53975

add a comment |

If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:

l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])

x, c = np.unique(l, axis=0, return_counts=True)

np.vstack([x.T,c]).T



array([[1, 3, 2, 2],

       [1, 3, 5, 1]])

answered Jan 25 at 10:53

yatu

11.5k31137

add a comment |

Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:

In [5]: from collections import defaultdict



In [6]: d = defaultdict(int)



In [7]: for sub in l:

   ...:     d[tuple(sub)] += 1

   ...:     



In [8]: d

Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})

This will give you a dictionary of your sub-lists as the key and their counts as the value.

Another way is to create your own dictionary object:

 In [9]: class customdict(dict):

    ...:        

    ...:     def __getitem__(self, key):

    ...:         try:

    ...:             val = super(customdict, self).__getitem__(key)

    ...:         except KeyError:

    ...:             self[key] = [*key, 0]

    ...:         else:

    ...:             val[-1] += 1

    ...:             self[key] = val

    ...:             return val

    ...:         

    ...:    



 In [10]: m = customdict()



 In [11]: for sub in l:

     ...:     m[tuple(sub)]

     ...:     



 In [12]: 



 In [12]: m

 Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}



 In [13]: m.values()

 Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])

edited Jan 25 at 14:08

answered Jan 25 at 13:24

Kasrâmvd

79k1091128

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54363705%2fwhat-is-the-faster-way-to-count-occurrences-of-equal-sublists-in-a-nested-list%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:

from collections import Counter



l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]



result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]

print(result)

Output

[[1, 3, 5, 1], [1, 3, 2, 2]]

edited Jan 25 at 11:13

answered Jan 25 at 10:51

Daniel Mesejo

18.6k21432

1

This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

– Chris_Rands
Jan 25 at 10:57

add a comment |

If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:

from collections import Counter



l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]



result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]

print(result)

Output

[[1, 3, 5, 1], [1, 3, 2, 2]]

edited Jan 25 at 11:13

answered Jan 25 at 10:51

Daniel Mesejo

18.6k21432

1

This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

– Chris_Rands
Jan 25 at 10:57

add a comment |

If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:

from collections import Counter



l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]



result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]

print(result)

Output

[[1, 3, 5, 1], [1, 3, 2, 2]]

edited Jan 25 at 11:13

answered Jan 25 at 10:51

Daniel Mesejo

18.6k21432

If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:

from collections import Counter



l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]



result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]

print(result)

Output

[[1, 3, 5, 1], [1, 3, 2, 2]]

edited Jan 25 at 11:13

answered Jan 25 at 10:51

Daniel Mesejo

18.6k21432

edited Jan 25 at 11:13

answered Jan 25 at 10:51

Daniel Mesejo

18.6k21432

answered Jan 25 at 10:51

Daniel Mesejo

18.6k21432

answered Jan 25 at 10:51

Daniel Mesejo

18.6k21432

1

This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

– Chris_Rands
Jan 25 at 10:57

add a comment |

1

This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

– Chris_Rands
Jan 25 at 10:57

This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

– Chris_Rands
Jan 25 at 10:57

add a comment |

>>> from collections import Counter

>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]

[[1, 3, 2, 2], [1, 3, 5, 1]]

Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.

answered Jan 25 at 10:50

Chris_Rands

16.7k53975

add a comment |

>>> from collections import Counter

>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]

[[1, 3, 2, 2], [1, 3, 5, 1]]

Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.

answered Jan 25 at 10:50

Chris_Rands

16.7k53975

add a comment |

>>> from collections import Counter

>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]

[[1, 3, 2, 2], [1, 3, 5, 1]]

Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.

answered Jan 25 at 10:50

Chris_Rands

16.7k53975

>>> from collections import Counter

>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]

[[1, 3, 2, 2], [1, 3, 5, 1]]

Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.

answered Jan 25 at 10:50

Chris_Rands

16.7k53975

answered Jan 25 at 10:50

Chris_Rands

16.7k53975

answered Jan 25 at 10:50

Chris_Rands

16.7k53975

answered Jan 25 at 10:50

Chris_Rands

16.7k53975

add a comment |

If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:

l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])

x, c = np.unique(l, axis=0, return_counts=True)

np.vstack([x.T,c]).T



array([[1, 3, 2, 2],

       [1, 3, 5, 1]])

answered Jan 25 at 10:53

yatu

11.5k31137

add a comment |

If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:

l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])

x, c = np.unique(l, axis=0, return_counts=True)

np.vstack([x.T,c]).T



array([[1, 3, 2, 2],

       [1, 3, 5, 1]])

answered Jan 25 at 10:53

yatu

11.5k31137

add a comment |

If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:

l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])

x, c = np.unique(l, axis=0, return_counts=True)

np.vstack([x.T,c]).T



array([[1, 3, 2, 2],

       [1, 3, 5, 1]])

answered Jan 25 at 10:53

yatu

11.5k31137

If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:

l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])

x, c = np.unique(l, axis=0, return_counts=True)

np.vstack([x.T,c]).T



array([[1, 3, 2, 2],

       [1, 3, 5, 1]])

answered Jan 25 at 10:53

yatu

11.5k31137

answered Jan 25 at 10:53

yatu

11.5k31137

answered Jan 25 at 10:53

yatu

11.5k31137

answered Jan 25 at 10:53

yatu

11.5k31137

add a comment |

Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:

In [5]: from collections import defaultdict



In [6]: d = defaultdict(int)



In [7]: for sub in l:

   ...:     d[tuple(sub)] += 1

   ...:     



In [8]: d

Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})

This will give you a dictionary of your sub-lists as the key and their counts as the value.

Another way is to create your own dictionary object:

 In [9]: class customdict(dict):

    ...:        

    ...:     def __getitem__(self, key):

    ...:         try:

    ...:             val = super(customdict, self).__getitem__(key)

    ...:         except KeyError:

    ...:             self[key] = [*key, 0]

    ...:         else:

    ...:             val[-1] += 1

    ...:             self[key] = val

    ...:             return val

    ...:         

    ...:    



 In [10]: m = customdict()



 In [11]: for sub in l:

     ...:     m[tuple(sub)]

     ...:     



 In [12]: 



 In [12]: m

 Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}



 In [13]: m.values()

 Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])

edited Jan 25 at 14:08

answered Jan 25 at 13:24

Kasrâmvd

79k1091128

add a comment |

Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:

In [5]: from collections import defaultdict



In [6]: d = defaultdict(int)



In [7]: for sub in l:

   ...:     d[tuple(sub)] += 1

   ...:     



In [8]: d

Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})

This will give you a dictionary of your sub-lists as the key and their counts as the value.

Another way is to create your own dictionary object:

 In [9]: class customdict(dict):

    ...:        

    ...:     def __getitem__(self, key):

    ...:         try:

    ...:             val = super(customdict, self).__getitem__(key)

    ...:         except KeyError:

    ...:             self[key] = [*key, 0]

    ...:         else:

    ...:             val[-1] += 1

    ...:             self[key] = val

    ...:             return val

    ...:         

    ...:    



 In [10]: m = customdict()



 In [11]: for sub in l:

     ...:     m[tuple(sub)]

     ...:     



 In [12]: 



 In [12]: m

 Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}



 In [13]: m.values()

 Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])

edited Jan 25 at 14:08

answered Jan 25 at 13:24

Kasrâmvd

79k1091128

add a comment |

Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:

In [5]: from collections import defaultdict



In [6]: d = defaultdict(int)



In [7]: for sub in l:

   ...:     d[tuple(sub)] += 1

   ...:     



In [8]: d

Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})

This will give you a dictionary of your sub-lists as the key and their counts as the value.

Another way is to create your own dictionary object:

 In [9]: class customdict(dict):

    ...:        

    ...:     def __getitem__(self, key):

    ...:         try:

    ...:             val = super(customdict, self).__getitem__(key)

    ...:         except KeyError:

    ...:             self[key] = [*key, 0]

    ...:         else:

    ...:             val[-1] += 1

    ...:             self[key] = val

    ...:             return val

    ...:         

    ...:    



 In [10]: m = customdict()



 In [11]: for sub in l:

     ...:     m[tuple(sub)]

     ...:     



 In [12]: 



 In [12]: m

 Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}



 In [13]: m.values()

 Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])

edited Jan 25 at 14:08

answered Jan 25 at 13:24

Kasrâmvd

79k1091128

Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:

In [5]: from collections import defaultdict



In [6]: d = defaultdict(int)



In [7]: for sub in l:

   ...:     d[tuple(sub)] += 1

   ...:     



In [8]: d

Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})

This will give you a dictionary of your sub-lists as the key and their counts as the value.

Another way is to create your own dictionary object:

 In [9]: class customdict(dict):

    ...:        

    ...:     def __getitem__(self, key):

    ...:         try:

    ...:             val = super(customdict, self).__getitem__(key)

    ...:         except KeyError:

    ...:             self[key] = [*key, 0]

    ...:         else:

    ...:             val[-1] += 1

    ...:             self[key] = val

    ...:             return val

    ...:         

    ...:    



 In [10]: m = customdict()



 In [11]: for sub in l:

     ...:     m[tuple(sub)]

     ...:     



 In [12]: 



 In [12]: m

 Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}



 In [13]: m.values()

 Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])

edited Jan 25 at 14:08

answered Jan 25 at 13:24

Kasrâmvd

79k1091128

edited Jan 25 at 14:08

answered Jan 25 at 13:24

Kasrâmvd

79k1091128

answered Jan 25 at 13:24

Kasrâmvd

79k1091128

answered Jan 25 at 13:24

Kasrâmvd

79k1091128

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dtyjlui