Trying to find files that contain only NULs, but getting some others

The files I am trying to find/list are:

Any size (0 bytes accepted)

Consist only of ASCII NUL characters (0x00)

If there are any characters other than 0x00, the file shouldn't be listed.

The command I have now is:

grep -RLP '[^x00]' .

Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.

Is there any better command to find such files?

edited Aug 17 '18 at 1:32

muru

asked Aug 16 '18 at 22:27

pbies

1406

Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

– wjandrea
Aug 17 '18 at 0:12

add a comment |

The files I am trying to find/list are:

Any size (0 bytes accepted)

Consist only of ASCII NUL characters (0x00)

If there are any characters other than 0x00, the file shouldn't be listed.

The command I have now is:

grep -RLP '[^x00]' .

Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.

Is there any better command to find such files?

edited Aug 17 '18 at 1:32

muru

asked Aug 16 '18 at 22:27

pbies

1406

Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

– wjandrea
Aug 17 '18 at 0:12

add a comment |

The files I am trying to find/list are:

Any size (0 bytes accepted)

Consist only of ASCII NUL characters (0x00)

If there are any characters other than 0x00, the file shouldn't be listed.

The command I have now is:

grep -RLP '[^x00]' .

Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.

Is there any better command to find such files?

edited Aug 17 '18 at 1:32

muru

asked Aug 16 '18 at 22:27

pbies

1406

The files I am trying to find/list are:

Any size (0 bytes accepted)

Consist only of ASCII NUL characters (0x00)

If there are any characters other than 0x00, the file shouldn't be listed.

The command I have now is:

grep -RLP '[^x00]' .

Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.

Is there any better command to find such files?

command-line text-processing

edited Aug 17 '18 at 1:32

muru

asked Aug 16 '18 at 22:27

pbies

1406

edited Aug 17 '18 at 1:32

muru

asked Aug 16 '18 at 22:27

pbies

1406

edited Aug 17 '18 at 1:32

muru

edited Aug 17 '18 at 1:32

muru

edited Aug 17 '18 at 1:32

muru

asked Aug 16 '18 at 22:27

pbies

1406

asked Aug 16 '18 at 22:27

pbies

1406

asked Aug 16 '18 at 22:27

pbies

1406

Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

– wjandrea
Aug 17 '18 at 0:12

add a comment |

Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

– wjandrea
Aug 17 '18 at 0:12

Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.

– wjandrea
Aug 17 '18 at 0:12

add a comment |

3 Answers
3

active

oldest

votes

In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.

(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)

Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):

LC_CTYPE=C grep -RLP '[^x00]' .

UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.

@DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.

Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:

#!/usr/bin/python3

import sys

assert len(sys.argv) == 2

with open(sys.argv[1], 'rb') as f:

    for block in iter(lambda: f.read(4096), b''):

        if any(block):

            sys.exit(1)

Which you can use in a find to locate all matches recursively:

$ find . -type f -exec allzeroes.py {} ; -print

I hope that helps.

edited Aug 17 '18 at 16:16

answered Aug 16 '18 at 23:23

filbranden

7378

3

+1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

– steeldriver
Aug 17 '18 at 1:23

add a comment |

You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:

grep -L -z -e . ...

Replace ... with the file set that you want to scan (here: -R .).

Explanation

-z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.¹

-e . – Use . as the search pattern, i. e. match any character.

-L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.¹

Test case

Set-up:

: > empty

truncate -s 100 zero

printf '%s' foo bar > foobar

Run test:

$ grep -L -z -e . empty zero foobar

empty

zero

¹ From the grep(1) manual page.

answered Aug 17 '18 at 9:18

David Foerster

28.2k1365111

add a comment |

I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:

shopt -s globstar

for file in ./**

do

    [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"

done

answered Jan 17 at 16:23

pbies

1406

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1066057%2ftrying-to-find-files-that-contain-only-nuls-but-getting-some-others%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.

(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)

LC_CTYPE=C grep -RLP '[^x00]' .

UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.

@DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.

Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:

#!/usr/bin/python3

import sys

assert len(sys.argv) == 2

with open(sys.argv[1], 'rb') as f:

    for block in iter(lambda: f.read(4096), b''):

        if any(block):

            sys.exit(1)

Which you can use in a find to locate all matches recursively:

$ find . -type f -exec allzeroes.py {} ; -print

I hope that helps.

edited Aug 17 '18 at 16:16

answered Aug 16 '18 at 23:23

filbranden

7378

3

+1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

– steeldriver
Aug 17 '18 at 1:23

add a comment |

In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.

(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)

LC_CTYPE=C grep -RLP '[^x00]' .

UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.

@DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.

Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:

#!/usr/bin/python3

import sys

assert len(sys.argv) == 2

with open(sys.argv[1], 'rb') as f:

    for block in iter(lambda: f.read(4096), b''):

        if any(block):

            sys.exit(1)

Which you can use in a find to locate all matches recursively:

$ find . -type f -exec allzeroes.py {} ; -print

I hope that helps.

edited Aug 17 '18 at 16:16

answered Aug 16 '18 at 23:23

filbranden

7378

3

+1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

– steeldriver
Aug 17 '18 at 1:23

add a comment |

In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.

(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)

LC_CTYPE=C grep -RLP '[^x00]' .

UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.

@DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.

Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:

#!/usr/bin/python3

import sys

assert len(sys.argv) == 2

with open(sys.argv[1], 'rb') as f:

    for block in iter(lambda: f.read(4096), b''):

        if any(block):

            sys.exit(1)

Which you can use in a find to locate all matches recursively:

$ find . -type f -exec allzeroes.py {} ; -print

I hope that helps.

edited Aug 17 '18 at 16:16

answered Aug 16 '18 at 23:23

filbranden

7378

In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.

(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)

LC_CTYPE=C grep -RLP '[^x00]' .

UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.

@DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.

Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:

#!/usr/bin/python3

import sys

assert len(sys.argv) == 2

with open(sys.argv[1], 'rb') as f:

    for block in iter(lambda: f.read(4096), b''):

        if any(block):

            sys.exit(1)

Which you can use in a find to locate all matches recursively:

$ find . -type f -exec allzeroes.py {} ; -print

I hope that helps.

edited Aug 17 '18 at 16:16

answered Aug 16 '18 at 23:23

filbranden

7378

edited Aug 17 '18 at 16:16

answered Aug 16 '18 at 23:23

filbranden

7378

answered Aug 16 '18 at 23:23

filbranden

7378

answered Aug 16 '18 at 23:23

filbranden

7378

3

+1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

– steeldriver
Aug 17 '18 at 1:23

add a comment |

3

+1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

– steeldriver
Aug 17 '18 at 1:23

+1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?

– steeldriver
Aug 17 '18 at 1:23

add a comment |

You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:

grep -L -z -e . ...

Replace ... with the file set that you want to scan (here: -R .).

Explanation

-z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.¹

-e . – Use . as the search pattern, i. e. match any character.

-L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.¹

Test case

Set-up:

: > empty

truncate -s 100 zero

printf '%s' foo bar > foobar

Run test:

$ grep -L -z -e . empty zero foobar

empty

zero

¹ From the grep(1) manual page.

answered Aug 17 '18 at 9:18

David Foerster

28.2k1365111

add a comment |

You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:

grep -L -z -e . ...

Replace ... with the file set that you want to scan (here: -R .).

Explanation

-z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.¹

-e . – Use . as the search pattern, i. e. match any character.

-L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.¹

Test case

Set-up:

: > empty

truncate -s 100 zero

printf '%s' foo bar > foobar

Run test:

$ grep -L -z -e . empty zero foobar

empty

zero

¹ From the grep(1) manual page.

answered Aug 17 '18 at 9:18

David Foerster

28.2k1365111

add a comment |

You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:

grep -L -z -e . ...

Replace ... with the file set that you want to scan (here: -R .).

Explanation

-z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.¹

-e . – Use . as the search pattern, i. e. match any character.

-L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.¹

Test case

Set-up:

: > empty

truncate -s 100 zero

printf '%s' foo bar > foobar

Run test:

$ grep -L -z -e . empty zero foobar

empty

zero

¹ From the grep(1) manual page.

answered Aug 17 '18 at 9:18

David Foerster

28.2k1365111

You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:

grep -L -z -e . ...

Replace ... with the file set that you want to scan (here: -R .).

Explanation

-z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.¹

-e . – Use . as the search pattern, i. e. match any character.

-L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.¹

Test case

Set-up:

: > empty

truncate -s 100 zero

printf '%s' foo bar > foobar

Run test:

$ grep -L -z -e . empty zero foobar

empty

zero

¹ From the grep(1) manual page.

answered Aug 17 '18 at 9:18

David Foerster

28.2k1365111

answered Aug 17 '18 at 9:18

David Foerster

28.2k1365111

answered Aug 17 '18 at 9:18

David Foerster

28.2k1365111

answered Aug 17 '18 at 9:18

David Foerster

28.2k1365111

add a comment |

I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:

shopt -s globstar

for file in ./**

do

    [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"

done

answered Jan 17 at 16:23

pbies

1406

add a comment |

I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:

shopt -s globstar

for file in ./**

do

    [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"

done

answered Jan 17 at 16:23

pbies

1406

add a comment |

I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:

shopt -s globstar

for file in ./**

do

    [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"

done

answered Jan 17 at 16:23

pbies

1406

I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:

shopt -s globstar

for file in ./**

do

    [ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"

done

answered Jan 17 at 16:23

pbies

1406

answered Jan 17 at 16:23

pbies

1406

answered Jan 17 at 16:23

pbies

1406

answered Jan 17 at 16:23

pbies

1406

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Ask Ubuntu!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dtyjlui