Python 3 pandas.groupby.filter





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







11















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


But this doesn't work, and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help you can provide. Thank you, in advance, for your help.










share|improve this question




















  • 3





    The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    Feb 15 at 22:33











  • @ALollz: please file a docbug to improve the docstring

    – smci
    Feb 16 at 2:41


















11















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


But this doesn't work, and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help you can provide. Thank you, in advance, for your help.










share|improve this question




















  • 3





    The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    Feb 15 at 22:33











  • @ALollz: please file a docbug to improve the docstring

    – smci
    Feb 16 at 2:41














11












11








11


1






I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


But this doesn't work, and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help you can provide. Thank you, in advance, for your help.










share|improve this question
















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


But this doesn't work, and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help you can provide. Thank you, in advance, for your help.







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 16 at 2:28









weliketocode

690513




690513










asked Feb 15 at 21:45









FinProgFinProg

605




605








  • 3





    The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    Feb 15 at 22:33











  • @ALollz: please file a docbug to improve the docstring

    – smci
    Feb 16 at 2:41














  • 3





    The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    Feb 15 at 22:33











  • @ALollz: please file a docbug to improve the docstring

    – smci
    Feb 16 at 2:41








3




3





The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

– ALollz
Feb 15 at 22:33





The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

– ALollz
Feb 15 at 22:33













@ALollz: please file a docbug to improve the docstring

– smci
Feb 16 at 2:41





@ALollz: please file a docbug to improve the docstring

– smci
Feb 16 at 2:41












5 Answers
5






active

oldest

votes


















3














>>> # sort=False to return the rows in the order they originally occurred
>>> df.loc[df.groupby("A", sort=False)["B"].idxmin()]

A B C
0 foo 1 2.0
1 bar 2 5.0





share|improve this answer

































    5














    No need groupby :-)



    df.sort_values('B').drop_duplicates('A')
    Out[288]:
    A B C
    0 foo 1 2.0
    1 bar 2 5.0





    share|improve this answer































      4














      There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



      For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



      df.sort_values('B').groupby('A').head(1)

      # A B C
      #0 foo 1 2.0
      #1 bar 2 5.0


      For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



      df[df.groupby('A').B.transform(lambda x: x == x.min())]

      # A B C
      #0 foo 1 2.0
      #1 bar 2 5.0





      share|improve this answer

































        3














        df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





        share|improve this answer































          3














          The short answer:



          grouped.apply(lambda x: x[x['B'] == x['B']].min())




          ... and the longer one:



          Your grouped object has 2 groups:



          In[25]: for df in grouped:
          ...: print(df)
          ...:
          ('bar',
          A B C
          1 bar 2 5.0
          3 bar 4 1.0
          5 bar 6 9.0)

          ('foo',
          A B C
          0 foo 1 2.0
          2 foo 3 8.0
          4 foo 5 2.0)


          filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




          • an empty DataFrame (0 rows),

          • rows of the group 'bar' (3 rows),

          • rows of the group 'foo' (3 rows),

          • rows of both groups (6 rows)


          Nothing else, regardless of the used parameter (boolean function) in the filter() method.





          So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




          • takes a DataFrame (a group of GroupBy object) as its only parameter,

          • returns either a Pandas object or a scalar.


          In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



          group['B'] == group['B'].min()


          for selecting such a row (or - maybe - more rows):



          In[26]: def select_min_b(group):
          ...: return group[group['B'] == group['B'].min()]


          Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



          In[27]: grouped.apply(select_min_b)
          Out[27]:
          A B C
          A
          bar 1 bar 2 5.0
          foo 0 foo 1 2.0




          Note:



          The same, but as only one command (using the lambda function):



          grouped.apply(lambda group: group[group['B'] == group['B']].min())





          share|improve this answer


























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54717473%2fpython-3-pandas-groupby-filter%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            5 Answers
            5






            active

            oldest

            votes








            5 Answers
            5






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3














            >>> # sort=False to return the rows in the order they originally occurred
            >>> df.loc[df.groupby("A", sort=False)["B"].idxmin()]

            A B C
            0 foo 1 2.0
            1 bar 2 5.0





            share|improve this answer






























              3














              >>> # sort=False to return the rows in the order they originally occurred
              >>> df.loc[df.groupby("A", sort=False)["B"].idxmin()]

              A B C
              0 foo 1 2.0
              1 bar 2 5.0





              share|improve this answer




























                3












                3








                3







                >>> # sort=False to return the rows in the order they originally occurred
                >>> df.loc[df.groupby("A", sort=False)["B"].idxmin()]

                A B C
                0 foo 1 2.0
                1 bar 2 5.0





                share|improve this answer















                >>> # sort=False to return the rows in the order they originally occurred
                >>> df.loc[df.groupby("A", sort=False)["B"].idxmin()]

                A B C
                0 foo 1 2.0
                1 bar 2 5.0






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Feb 18 at 14:59

























                answered Feb 16 at 0:20









                BallpointBenBallpointBen

                3,7681639




                3,7681639

























                    5














                    No need groupby :-)



                    df.sort_values('B').drop_duplicates('A')
                    Out[288]:
                    A B C
                    0 foo 1 2.0
                    1 bar 2 5.0





                    share|improve this answer




























                      5














                      No need groupby :-)



                      df.sort_values('B').drop_duplicates('A')
                      Out[288]:
                      A B C
                      0 foo 1 2.0
                      1 bar 2 5.0





                      share|improve this answer


























                        5












                        5








                        5







                        No need groupby :-)



                        df.sort_values('B').drop_duplicates('A')
                        Out[288]:
                        A B C
                        0 foo 1 2.0
                        1 bar 2 5.0





                        share|improve this answer













                        No need groupby :-)



                        df.sort_values('B').drop_duplicates('A')
                        Out[288]:
                        A B C
                        0 foo 1 2.0
                        1 bar 2 5.0






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Feb 15 at 22:39









                        Wen-BenWen-Ben

                        128k83872




                        128k83872























                            4














                            There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



                            For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



                            df.sort_values('B').groupby('A').head(1)

                            # A B C
                            #0 foo 1 2.0
                            #1 bar 2 5.0


                            For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



                            df[df.groupby('A').B.transform(lambda x: x == x.min())]

                            # A B C
                            #0 foo 1 2.0
                            #1 bar 2 5.0





                            share|improve this answer






























                              4














                              There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



                              For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



                              df.sort_values('B').groupby('A').head(1)

                              # A B C
                              #0 foo 1 2.0
                              #1 bar 2 5.0


                              For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



                              df[df.groupby('A').B.transform(lambda x: x == x.min())]

                              # A B C
                              #0 foo 1 2.0
                              #1 bar 2 5.0





                              share|improve this answer




























                                4












                                4








                                4







                                There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



                                For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



                                df.sort_values('B').groupby('A').head(1)

                                # A B C
                                #0 foo 1 2.0
                                #1 bar 2 5.0


                                For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



                                df[df.groupby('A').B.transform(lambda x: x == x.min())]

                                # A B C
                                #0 foo 1 2.0
                                #1 bar 2 5.0





                                share|improve this answer















                                There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



                                For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



                                df.sort_values('B').groupby('A').head(1)

                                # A B C
                                #0 foo 1 2.0
                                #1 bar 2 5.0


                                For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



                                df[df.groupby('A').B.transform(lambda x: x == x.min())]

                                # A B C
                                #0 foo 1 2.0
                                #1 bar 2 5.0






                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Feb 15 at 22:44

























                                answered Feb 15 at 22:19









                                ALollzALollz

                                16.9k41838




                                16.9k41838























                                    3














                                    df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





                                    share|improve this answer




























                                      3














                                      df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





                                      share|improve this answer


























                                        3












                                        3








                                        3







                                        df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





                                        share|improve this answer













                                        df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()






                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Feb 15 at 21:54









                                        kudehkudeh

                                        490210




                                        490210























                                            3














                                            The short answer:



                                            grouped.apply(lambda x: x[x['B'] == x['B']].min())




                                            ... and the longer one:



                                            Your grouped object has 2 groups:



                                            In[25]: for df in grouped:
                                            ...: print(df)
                                            ...:
                                            ('bar',
                                            A B C
                                            1 bar 2 5.0
                                            3 bar 4 1.0
                                            5 bar 6 9.0)

                                            ('foo',
                                            A B C
                                            0 foo 1 2.0
                                            2 foo 3 8.0
                                            4 foo 5 2.0)


                                            filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




                                            • an empty DataFrame (0 rows),

                                            • rows of the group 'bar' (3 rows),

                                            • rows of the group 'foo' (3 rows),

                                            • rows of both groups (6 rows)


                                            Nothing else, regardless of the used parameter (boolean function) in the filter() method.





                                            So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




                                            • takes a DataFrame (a group of GroupBy object) as its only parameter,

                                            • returns either a Pandas object or a scalar.


                                            In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



                                            group['B'] == group['B'].min()


                                            for selecting such a row (or - maybe - more rows):



                                            In[26]: def select_min_b(group):
                                            ...: return group[group['B'] == group['B'].min()]


                                            Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



                                            In[27]: grouped.apply(select_min_b)
                                            Out[27]:
                                            A B C
                                            A
                                            bar 1 bar 2 5.0
                                            foo 0 foo 1 2.0




                                            Note:



                                            The same, but as only one command (using the lambda function):



                                            grouped.apply(lambda group: group[group['B'] == group['B']].min())





                                            share|improve this answer






























                                              3














                                              The short answer:



                                              grouped.apply(lambda x: x[x['B'] == x['B']].min())




                                              ... and the longer one:



                                              Your grouped object has 2 groups:



                                              In[25]: for df in grouped:
                                              ...: print(df)
                                              ...:
                                              ('bar',
                                              A B C
                                              1 bar 2 5.0
                                              3 bar 4 1.0
                                              5 bar 6 9.0)

                                              ('foo',
                                              A B C
                                              0 foo 1 2.0
                                              2 foo 3 8.0
                                              4 foo 5 2.0)


                                              filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




                                              • an empty DataFrame (0 rows),

                                              • rows of the group 'bar' (3 rows),

                                              • rows of the group 'foo' (3 rows),

                                              • rows of both groups (6 rows)


                                              Nothing else, regardless of the used parameter (boolean function) in the filter() method.





                                              So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




                                              • takes a DataFrame (a group of GroupBy object) as its only parameter,

                                              • returns either a Pandas object or a scalar.


                                              In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



                                              group['B'] == group['B'].min()


                                              for selecting such a row (or - maybe - more rows):



                                              In[26]: def select_min_b(group):
                                              ...: return group[group['B'] == group['B'].min()]


                                              Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



                                              In[27]: grouped.apply(select_min_b)
                                              Out[27]:
                                              A B C
                                              A
                                              bar 1 bar 2 5.0
                                              foo 0 foo 1 2.0




                                              Note:



                                              The same, but as only one command (using the lambda function):



                                              grouped.apply(lambda group: group[group['B'] == group['B']].min())





                                              share|improve this answer




























                                                3












                                                3








                                                3







                                                The short answer:



                                                grouped.apply(lambda x: x[x['B'] == x['B']].min())




                                                ... and the longer one:



                                                Your grouped object has 2 groups:



                                                In[25]: for df in grouped:
                                                ...: print(df)
                                                ...:
                                                ('bar',
                                                A B C
                                                1 bar 2 5.0
                                                3 bar 4 1.0
                                                5 bar 6 9.0)

                                                ('foo',
                                                A B C
                                                0 foo 1 2.0
                                                2 foo 3 8.0
                                                4 foo 5 2.0)


                                                filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




                                                • an empty DataFrame (0 rows),

                                                • rows of the group 'bar' (3 rows),

                                                • rows of the group 'foo' (3 rows),

                                                • rows of both groups (6 rows)


                                                Nothing else, regardless of the used parameter (boolean function) in the filter() method.





                                                So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




                                                • takes a DataFrame (a group of GroupBy object) as its only parameter,

                                                • returns either a Pandas object or a scalar.


                                                In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



                                                group['B'] == group['B'].min()


                                                for selecting such a row (or - maybe - more rows):



                                                In[26]: def select_min_b(group):
                                                ...: return group[group['B'] == group['B'].min()]


                                                Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



                                                In[27]: grouped.apply(select_min_b)
                                                Out[27]:
                                                A B C
                                                A
                                                bar 1 bar 2 5.0
                                                foo 0 foo 1 2.0




                                                Note:



                                                The same, but as only one command (using the lambda function):



                                                grouped.apply(lambda group: group[group['B'] == group['B']].min())





                                                share|improve this answer















                                                The short answer:



                                                grouped.apply(lambda x: x[x['B'] == x['B']].min())




                                                ... and the longer one:



                                                Your grouped object has 2 groups:



                                                In[25]: for df in grouped:
                                                ...: print(df)
                                                ...:
                                                ('bar',
                                                A B C
                                                1 bar 2 5.0
                                                3 bar 4 1.0
                                                5 bar 6 9.0)

                                                ('foo',
                                                A B C
                                                0 foo 1 2.0
                                                2 foo 3 8.0
                                                4 foo 5 2.0)


                                                filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




                                                • an empty DataFrame (0 rows),

                                                • rows of the group 'bar' (3 rows),

                                                • rows of the group 'foo' (3 rows),

                                                • rows of both groups (6 rows)


                                                Nothing else, regardless of the used parameter (boolean function) in the filter() method.





                                                So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




                                                • takes a DataFrame (a group of GroupBy object) as its only parameter,

                                                • returns either a Pandas object or a scalar.


                                                In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



                                                group['B'] == group['B'].min()


                                                for selecting such a row (or - maybe - more rows):



                                                In[26]: def select_min_b(group):
                                                ...: return group[group['B'] == group['B'].min()]


                                                Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



                                                In[27]: grouped.apply(select_min_b)
                                                Out[27]:
                                                A B C
                                                A
                                                bar 1 bar 2 5.0
                                                foo 0 foo 1 2.0




                                                Note:



                                                The same, but as only one command (using the lambda function):



                                                grouped.apply(lambda group: group[group['B'] == group['B']].min())






                                                share|improve this answer














                                                share|improve this answer



                                                share|improve this answer








                                                edited Feb 15 at 23:55

























                                                answered Feb 15 at 22:50









                                                MarianDMarianD

                                                4,47761433




                                                4,47761433






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Stack Overflow!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54717473%2fpython-3-pandas-groupby-filter%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Human spaceflight

                                                    Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

                                                    張江高科駅