What is the faster way to count occurrences of equal sublists in a nested list?












9















I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.



I have done that with some pandas data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.



So for instance my nested list is here:



l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]


I need to have:



res = [[1, 3, 2, 2], [1, 3, 5, 1]]


EDIT



Order in res does not matter at all.










share|improve this question





























    9















    I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.



    I have done that with some pandas data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.



    So for instance my nested list is here:



    l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]


    I need to have:



    res = [[1, 3, 2, 2], [1, 3, 5, 1]]


    EDIT



    Order in res does not matter at all.










    share|improve this question



























      9












      9








      9


      1






      I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.



      I have done that with some pandas data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.



      So for instance my nested list is here:



      l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]


      I need to have:



      res = [[1, 3, 2, 2], [1, 3, 5, 1]]


      EDIT



      Order in res does not matter at all.










      share|improve this question
















      I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.



      I have done that with some pandas data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.



      So for instance my nested list is here:



      l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]


      I need to have:



      res = [[1, 3, 2, 2], [1, 3, 5, 1]]


      EDIT



      Order in res does not matter at all.







      python






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 25 at 10:58









      Muhammad Ahmad

      2,1321422




      2,1321422










      asked Jan 25 at 10:45









      Léo JoubertLéo Joubert

      149210




      149210
























          4 Answers
          4






          active

          oldest

          votes


















          10














          If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:



          from collections import Counter

          l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

          result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
          print(result)


          Output



          [[1, 3, 5, 1], [1, 3, 2, 2]]





          share|improve this answer





















          • 1





            This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

            – Chris_Rands
            Jan 25 at 10:57



















          8














          This is quite an odd output to want but it is of course possible. I suggest using collections.Counter(), no doubt others will make different suggestions and a timeit style comparison would reveal the fastest of course for particular data sets:



          >>> from collections import Counter
          >>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
          >>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
          [[1, 3, 2, 2], [1, 3, 5, 1]]


          Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.






          share|improve this answer































            1














            If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:



            l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
            x, c = np.unique(l, axis=0, return_counts=True)
            np.vstack([x.T,c]).T

            array([[1, 3, 2, 2],
            [1, 3, 5, 1]])





            share|improve this answer































              0














              Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:



              In [5]: from collections import defaultdict

              In [6]: d = defaultdict(int)

              In [7]: for sub in l:
              ...: d[tuple(sub)] += 1
              ...:

              In [8]: d
              Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})


              This will give you a dictionary of your sub-lists as the key and their counts as the value.



              Another way is to create your own dictionary object:



               In [9]: class customdict(dict):
              ...:
              ...: def __getitem__(self, key):
              ...: try:
              ...: val = super(customdict, self).__getitem__(key)
              ...: except KeyError:
              ...: self[key] = [*key, 0]
              ...: else:
              ...: val[-1] += 1
              ...: self[key] = val
              ...: return val
              ...:
              ...:

              In [10]: m = customdict()

              In [11]: for sub in l:
              ...: m[tuple(sub)]
              ...:

              In [12]:

              In [12]: m
              Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}

              In [13]: m.values()
              Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])





              share|improve this answer

























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54363705%2fwhat-is-the-faster-way-to-count-occurrences-of-equal-sublists-in-a-nested-list%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                10














                If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:



                from collections import Counter

                l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

                result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
                print(result)


                Output



                [[1, 3, 5, 1], [1, 3, 2, 2]]





                share|improve this answer





















                • 1





                  This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

                  – Chris_Rands
                  Jan 25 at 10:57
















                10














                If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:



                from collections import Counter

                l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

                result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
                print(result)


                Output



                [[1, 3, 5, 1], [1, 3, 2, 2]]





                share|improve this answer





















                • 1





                  This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

                  – Chris_Rands
                  Jan 25 at 10:57














                10












                10








                10







                If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:



                from collections import Counter

                l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

                result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
                print(result)


                Output



                [[1, 3, 5, 1], [1, 3, 2, 2]]





                share|improve this answer















                If order does not matter you could use collections.Counter with extended iterable unpacking, as a variant of @Chris_Rands solution:



                from collections import Counter

                l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

                result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
                print(result)


                Output



                [[1, 3, 5, 1], [1, 3, 2, 2]]






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jan 25 at 11:13

























                answered Jan 25 at 10:51









                Daniel MesejoDaniel Mesejo

                18.6k21432




                18.6k21432








                • 1





                  This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

                  – Chris_Rands
                  Jan 25 at 10:57














                • 1





                  This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

                  – Chris_Rands
                  Jan 25 at 10:57








                1




                1





                This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

                – Chris_Rands
                Jan 25 at 10:57





                This is a reasonable (although mostly cosmetic) variant of my solution, assuming Python 3 of course

                – Chris_Rands
                Jan 25 at 10:57













                8














                This is quite an odd output to want but it is of course possible. I suggest using collections.Counter(), no doubt others will make different suggestions and a timeit style comparison would reveal the fastest of course for particular data sets:



                >>> from collections import Counter
                >>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
                >>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
                [[1, 3, 2, 2], [1, 3, 5, 1]]


                Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.






                share|improve this answer




























                  8














                  This is quite an odd output to want but it is of course possible. I suggest using collections.Counter(), no doubt others will make different suggestions and a timeit style comparison would reveal the fastest of course for particular data sets:



                  >>> from collections import Counter
                  >>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
                  >>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
                  [[1, 3, 2, 2], [1, 3, 5, 1]]


                  Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.






                  share|improve this answer


























                    8












                    8








                    8







                    This is quite an odd output to want but it is of course possible. I suggest using collections.Counter(), no doubt others will make different suggestions and a timeit style comparison would reveal the fastest of course for particular data sets:



                    >>> from collections import Counter
                    >>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
                    >>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
                    [[1, 3, 2, 2], [1, 3, 5, 1]]


                    Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.






                    share|improve this answer













                    This is quite an odd output to want but it is of course possible. I suggest using collections.Counter(), no doubt others will make different suggestions and a timeit style comparison would reveal the fastest of course for particular data sets:



                    >>> from collections import Counter
                    >>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
                    >>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
                    [[1, 3, 2, 2], [1, 3, 5, 1]]


                    Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jan 25 at 10:50









                    Chris_RandsChris_Rands

                    16.7k53975




                    16.7k53975























                        1














                        If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:



                        l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
                        x, c = np.unique(l, axis=0, return_counts=True)
                        np.vstack([x.T,c]).T

                        array([[1, 3, 2, 2],
                        [1, 3, 5, 1]])





                        share|improve this answer




























                          1














                          If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:



                          l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
                          x, c = np.unique(l, axis=0, return_counts=True)
                          np.vstack([x.T,c]).T

                          array([[1, 3, 2, 2],
                          [1, 3, 5, 1]])





                          share|improve this answer


























                            1












                            1








                            1







                            If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:



                            l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
                            x, c = np.unique(l, axis=0, return_counts=True)
                            np.vstack([x.T,c]).T

                            array([[1, 3, 2, 2],
                            [1, 3, 5, 1]])





                            share|improve this answer













                            If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True, and concatenate the unique rows and counts using np.vstack:



                            l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
                            x, c = np.unique(l, axis=0, return_counts=True)
                            np.vstack([x.T,c]).T

                            array([[1, 3, 2, 2],
                            [1, 3, 5, 1]])






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Jan 25 at 10:53









                            yatuyatu

                            11.5k31137




                            11.5k31137























                                0














                                Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:



                                In [5]: from collections import defaultdict

                                In [6]: d = defaultdict(int)

                                In [7]: for sub in l:
                                ...: d[tuple(sub)] += 1
                                ...:

                                In [8]: d
                                Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})


                                This will give you a dictionary of your sub-lists as the key and their counts as the value.



                                Another way is to create your own dictionary object:



                                 In [9]: class customdict(dict):
                                ...:
                                ...: def __getitem__(self, key):
                                ...: try:
                                ...: val = super(customdict, self).__getitem__(key)
                                ...: except KeyError:
                                ...: self[key] = [*key, 0]
                                ...: else:
                                ...: val[-1] += 1
                                ...: self[key] = val
                                ...: return val
                                ...:
                                ...:

                                In [10]: m = customdict()

                                In [11]: for sub in l:
                                ...: m[tuple(sub)]
                                ...:

                                In [12]:

                                In [12]: m
                                Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}

                                In [13]: m.values()
                                Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])





                                share|improve this answer






























                                  0














                                  Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:



                                  In [5]: from collections import defaultdict

                                  In [6]: d = defaultdict(int)

                                  In [7]: for sub in l:
                                  ...: d[tuple(sub)] += 1
                                  ...:

                                  In [8]: d
                                  Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})


                                  This will give you a dictionary of your sub-lists as the key and their counts as the value.



                                  Another way is to create your own dictionary object:



                                   In [9]: class customdict(dict):
                                  ...:
                                  ...: def __getitem__(self, key):
                                  ...: try:
                                  ...: val = super(customdict, self).__getitem__(key)
                                  ...: except KeyError:
                                  ...: self[key] = [*key, 0]
                                  ...: else:
                                  ...: val[-1] += 1
                                  ...: self[key] = val
                                  ...: return val
                                  ...:
                                  ...:

                                  In [10]: m = customdict()

                                  In [11]: for sub in l:
                                  ...: m[tuple(sub)]
                                  ...:

                                  In [12]:

                                  In [12]: m
                                  Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}

                                  In [13]: m.values()
                                  Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])





                                  share|improve this answer




























                                    0












                                    0








                                    0







                                    Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:



                                    In [5]: from collections import defaultdict

                                    In [6]: d = defaultdict(int)

                                    In [7]: for sub in l:
                                    ...: d[tuple(sub)] += 1
                                    ...:

                                    In [8]: d
                                    Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})


                                    This will give you a dictionary of your sub-lists as the key and their counts as the value.



                                    Another way is to create your own dictionary object:



                                     In [9]: class customdict(dict):
                                    ...:
                                    ...: def __getitem__(self, key):
                                    ...: try:
                                    ...: val = super(customdict, self).__getitem__(key)
                                    ...: except KeyError:
                                    ...: self[key] = [*key, 0]
                                    ...: else:
                                    ...: val[-1] += 1
                                    ...: self[key] = val
                                    ...: return val
                                    ...:
                                    ...:

                                    In [10]: m = customdict()

                                    In [11]: for sub in l:
                                    ...: m[tuple(sub)]
                                    ...:

                                    In [12]:

                                    In [12]: m
                                    Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}

                                    In [13]: m.values()
                                    Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])





                                    share|improve this answer















                                    Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:



                                    In [5]: from collections import defaultdict

                                    In [6]: d = defaultdict(int)

                                    In [7]: for sub in l:
                                    ...: d[tuple(sub)] += 1
                                    ...:

                                    In [8]: d
                                    Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})


                                    This will give you a dictionary of your sub-lists as the key and their counts as the value.



                                    Another way is to create your own dictionary object:



                                     In [9]: class customdict(dict):
                                    ...:
                                    ...: def __getitem__(self, key):
                                    ...: try:
                                    ...: val = super(customdict, self).__getitem__(key)
                                    ...: except KeyError:
                                    ...: self[key] = [*key, 0]
                                    ...: else:
                                    ...: val[-1] += 1
                                    ...: self[key] = val
                                    ...: return val
                                    ...:
                                    ...:

                                    In [10]: m = customdict()

                                    In [11]: for sub in l:
                                    ...: m[tuple(sub)]
                                    ...:

                                    In [12]:

                                    In [12]: m
                                    Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}

                                    In [13]: m.values()
                                    Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])






                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Jan 25 at 14:08

























                                    answered Jan 25 at 13:24









                                    KasrâmvdKasrâmvd

                                    79k1091128




                                    79k1091128






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54363705%2fwhat-is-the-faster-way-to-count-occurrences-of-equal-sublists-in-a-nested-list%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Human spaceflight

                                        Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

                                        張江高科駅