Trying to find files that contain only NULs, but getting some others












7















The files I am trying to find/list are:




  • Any size (0 bytes accepted)

  • Consist only of ASCII NUL characters (0x00)

  • If there are any characters other than 0x00, the file shouldn't be listed.


The command I have now is:



grep -RLP '[^x00]' .


Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.



Is there any better command to find such files?










share|improve this question

























  • Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

    – wjandrea
    Aug 17 '18 at 0:12
















7















The files I am trying to find/list are:




  • Any size (0 bytes accepted)

  • Consist only of ASCII NUL characters (0x00)

  • If there are any characters other than 0x00, the file shouldn't be listed.


The command I have now is:



grep -RLP '[^x00]' .


Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.



Is there any better command to find such files?










share|improve this question

























  • Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

    – wjandrea
    Aug 17 '18 at 0:12














7












7








7


2






The files I am trying to find/list are:




  • Any size (0 bytes accepted)

  • Consist only of ASCII NUL characters (0x00)

  • If there are any characters other than 0x00, the file shouldn't be listed.


The command I have now is:



grep -RLP '[^x00]' .


Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.



Is there any better command to find such files?










share|improve this question
















The files I am trying to find/list are:




  • Any size (0 bytes accepted)

  • Consist only of ASCII NUL characters (0x00)

  • If there are any characters other than 0x00, the file shouldn't be listed.


The command I have now is:



grep -RLP '[^x00]' .


Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.



Is there any better command to find such files?







command-line text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 17 '18 at 1:32









muru

1




1










asked Aug 16 '18 at 22:27









pbiespbies

1406




1406













  • Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

    – wjandrea
    Aug 17 '18 at 0:12



















  • Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

    – wjandrea
    Aug 17 '18 at 0:12

















Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

– wjandrea
Aug 17 '18 at 0:12





Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

– wjandrea
Aug 17 '18 at 0:12










3 Answers
3






active

oldest

votes


















9














In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



LC_CTYPE=C grep -RLP '[^x00]' .




UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



@DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
for block in iter(lambda: f.read(4096), b''):
if any(block):
sys.exit(1)


Which you can use in a find to locate all matches recursively:



$ find . -type f -exec allzeroes.py {} ; -print


I hope that helps.






share|improve this answer





















  • 3





    +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

    – steeldriver
    Aug 17 '18 at 1:23



















2














You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



grep -L -z -e . ...


Replace ... with the file set that you want to scan (here: -R .).



Explanation





  • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


  • -e . – Use . as the search pattern, i. e. match any character.


  • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1


Test case



Set-up:



: > empty
truncate -s 100 zero
printf '%s' foo bar > foobar


Run test:



$ grep -L -z -e . empty zero foobar
empty
zero




1 From the grep(1) manual page.






share|improve this answer































    0














    I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:



    shopt -s globstar
    for file in ./**
    do
    [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
    done





    share|improve this answer























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "89"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1066057%2ftrying-to-find-files-that-contain-only-nuls-but-getting-some-others%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      9














      In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



      (In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



      Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



      LC_CTYPE=C grep -RLP '[^x00]' .




      UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



      @DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



      Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



      #!/usr/bin/python3
      import sys
      assert len(sys.argv) == 2
      with open(sys.argv[1], 'rb') as f:
      for block in iter(lambda: f.read(4096), b''):
      if any(block):
      sys.exit(1)


      Which you can use in a find to locate all matches recursively:



      $ find . -type f -exec allzeroes.py {} ; -print


      I hope that helps.






      share|improve this answer





















      • 3





        +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

        – steeldriver
        Aug 17 '18 at 1:23
















      9














      In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



      (In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



      Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



      LC_CTYPE=C grep -RLP '[^x00]' .




      UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



      @DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



      Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



      #!/usr/bin/python3
      import sys
      assert len(sys.argv) == 2
      with open(sys.argv[1], 'rb') as f:
      for block in iter(lambda: f.read(4096), b''):
      if any(block):
      sys.exit(1)


      Which you can use in a find to locate all matches recursively:



      $ find . -type f -exec allzeroes.py {} ; -print


      I hope that helps.






      share|improve this answer





















      • 3





        +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

        – steeldriver
        Aug 17 '18 at 1:23














      9












      9








      9







      In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



      (In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



      Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



      LC_CTYPE=C grep -RLP '[^x00]' .




      UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



      @DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



      Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



      #!/usr/bin/python3
      import sys
      assert len(sys.argv) == 2
      with open(sys.argv[1], 'rb') as f:
      for block in iter(lambda: f.read(4096), b''):
      if any(block):
      sys.exit(1)


      Which you can use in a find to locate all matches recursively:



      $ find . -type f -exec allzeroes.py {} ; -print


      I hope that helps.






      share|improve this answer















      In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



      (In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



      Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



      LC_CTYPE=C grep -RLP '[^x00]' .




      UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



      @DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



      Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



      #!/usr/bin/python3
      import sys
      assert len(sys.argv) == 2
      with open(sys.argv[1], 'rb') as f:
      for block in iter(lambda: f.read(4096), b''):
      if any(block):
      sys.exit(1)


      Which you can use in a find to locate all matches recursively:



      $ find . -type f -exec allzeroes.py {} ; -print


      I hope that helps.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Aug 17 '18 at 16:16

























      answered Aug 16 '18 at 23:23









      filbrandenfilbranden

      7378




      7378








      • 3





        +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

        – steeldriver
        Aug 17 '18 at 1:23














      • 3





        +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

        – steeldriver
        Aug 17 '18 at 1:23








      3




      3





      +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

      – steeldriver
      Aug 17 '18 at 1:23





      +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

      – steeldriver
      Aug 17 '18 at 1:23













      2














      You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



      grep -L -z -e . ...


      Replace ... with the file set that you want to scan (here: -R .).



      Explanation





      • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


      • -e . – Use . as the search pattern, i. e. match any character.


      • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1


      Test case



      Set-up:



      : > empty
      truncate -s 100 zero
      printf '%s' foo bar > foobar


      Run test:



      $ grep -L -z -e . empty zero foobar
      empty
      zero




      1 From the grep(1) manual page.






      share|improve this answer




























        2














        You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



        grep -L -z -e . ...


        Replace ... with the file set that you want to scan (here: -R .).



        Explanation





        • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


        • -e . – Use . as the search pattern, i. e. match any character.


        • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1


        Test case



        Set-up:



        : > empty
        truncate -s 100 zero
        printf '%s' foo bar > foobar


        Run test:



        $ grep -L -z -e . empty zero foobar
        empty
        zero




        1 From the grep(1) manual page.






        share|improve this answer


























          2












          2








          2







          You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



          grep -L -z -e . ...


          Replace ... with the file set that you want to scan (here: -R .).



          Explanation





          • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


          • -e . – Use . as the search pattern, i. e. match any character.


          • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1


          Test case



          Set-up:



          : > empty
          truncate -s 100 zero
          printf '%s' foo bar > foobar


          Run test:



          $ grep -L -z -e . empty zero foobar
          empty
          zero




          1 From the grep(1) manual page.






          share|improve this answer













          You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



          grep -L -z -e . ...


          Replace ... with the file set that you want to scan (here: -R .).



          Explanation





          • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


          • -e . – Use . as the search pattern, i. e. match any character.


          • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1


          Test case



          Set-up:



          : > empty
          truncate -s 100 zero
          printf '%s' foo bar > foobar


          Run test:



          $ grep -L -z -e . empty zero foobar
          empty
          zero




          1 From the grep(1) manual page.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Aug 17 '18 at 9:18









          David FoersterDavid Foerster

          28.2k1365111




          28.2k1365111























              0














              I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:



              shopt -s globstar
              for file in ./**
              do
              [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
              done





              share|improve this answer




























                0














                I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:



                shopt -s globstar
                for file in ./**
                do
                [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
                done





                share|improve this answer


























                  0












                  0








                  0







                  I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:



                  shopt -s globstar
                  for file in ./**
                  do
                  [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
                  done





                  share|improve this answer













                  I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:



                  shopt -s globstar
                  for file in ./**
                  do
                  [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
                  done






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 17 at 16:23









                  pbiespbies

                  1406




                  1406






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Ask Ubuntu!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1066057%2ftrying-to-find-files-that-contain-only-nuls-but-getting-some-others%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Human spaceflight

                      Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

                      張江高科駅