Trying to find files that contain only NULs, but getting some others
The files I am trying to find/list are:
- Any size (0 bytes accepted)
- Consist only of ASCII NUL characters (0x00)
- If there are any characters other than 0x00, the file shouldn't be listed.
The command I have now is:
grep -RLP '[^x00]' .
Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.
Is there any better command to find such files?
command-line text-processing
add a comment |
The files I am trying to find/list are:
- Any size (0 bytes accepted)
- Consist only of ASCII NUL characters (0x00)
- If there are any characters other than 0x00, the file shouldn't be listed.
The command I have now is:
grep -RLP '[^x00]' .
Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.
Is there any better command to find such files?
command-line text-processing
Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
– wjandrea
Aug 17 '18 at 0:12
add a comment |
The files I am trying to find/list are:
- Any size (0 bytes accepted)
- Consist only of ASCII NUL characters (0x00)
- If there are any characters other than 0x00, the file shouldn't be listed.
The command I have now is:
grep -RLP '[^x00]' .
Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.
Is there any better command to find such files?
command-line text-processing
The files I am trying to find/list are:
- Any size (0 bytes accepted)
- Consist only of ASCII NUL characters (0x00)
- If there are any characters other than 0x00, the file shouldn't be listed.
The command I have now is:
grep -RLP '[^x00]' .
Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.
Is there any better command to find such files?
command-line text-processing
command-line text-processing
edited Aug 17 '18 at 1:32
muru
1
1
asked Aug 16 '18 at 22:27
pbiespbies
1406
1406
Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
– wjandrea
Aug 17 '18 at 0:12
add a comment |
Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
– wjandrea
Aug 17 '18 at 0:12
Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
– wjandrea
Aug 17 '18 at 0:12
Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
– wjandrea
Aug 17 '18 at 0:12
add a comment |
3 Answers
3
active
oldest
votes
In short, what is happening here is that grep
is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.
(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]'
regex, since even when trying to do UTF-8 these would be considered non-characters.)
Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C
locale to force ASCII encoding (so no Unicode enabled):
LC_CTYPE=C grep -RLP '[^x00]' .
UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.
@DavidFoerster's solution using grep's -z
does a good job of solving this problem, using the NUL bytes as separators does the trick.
Alternatively, I came up with a short Python 3 script (allzeroes.py
) to check whether the file's contents are all zeroes:
#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
for block in iter(lambda: f.read(4096), b''):
if any(block):
sys.exit(1)
Which you can use in a find
to locate all matches recursively:
$ find . -type f -exec allzeroes.py {} ; -print
I hope that helps.
3
+1 although sincegrep
is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using-z
(although that will slurp any regular text files wholly into memory). Also I don't think-P
is required here?
– steeldriver
Aug 17 '18 at 1:23
add a comment |
You can abuse grep
’s alternative null-terminated line mode and thus search for files that contain only empty lines:
grep -L -z -e . ...
Replace ...
with the file set that you want to scan (here: -R .
).
Explanation
-z
,--null-data
– Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1
-e .
– Use.
as the search pattern, i. e. match any character.
-L
,--files-without-match
– Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1
Test case
Set-up:
: > empty
truncate -s 100 zero
printf '%s' foo bar > foobar
Run test:
$ grep -L -z -e . empty zero foobar
empty
zero
1 From the grep(1)
manual page.
add a comment |
I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:
shopt -s globstar
for file in ./**
do
[ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
done
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1066057%2ftrying-to-find-files-that-contain-only-nuls-but-getting-some-others%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
In short, what is happening here is that grep
is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.
(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]'
regex, since even when trying to do UTF-8 these would be considered non-characters.)
Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C
locale to force ASCII encoding (so no Unicode enabled):
LC_CTYPE=C grep -RLP '[^x00]' .
UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.
@DavidFoerster's solution using grep's -z
does a good job of solving this problem, using the NUL bytes as separators does the trick.
Alternatively, I came up with a short Python 3 script (allzeroes.py
) to check whether the file's contents are all zeroes:
#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
for block in iter(lambda: f.read(4096), b''):
if any(block):
sys.exit(1)
Which you can use in a find
to locate all matches recursively:
$ find . -type f -exec allzeroes.py {} ; -print
I hope that helps.
3
+1 although sincegrep
is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using-z
(although that will slurp any regular text files wholly into memory). Also I don't think-P
is required here?
– steeldriver
Aug 17 '18 at 1:23
add a comment |
In short, what is happening here is that grep
is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.
(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]'
regex, since even when trying to do UTF-8 these would be considered non-characters.)
Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C
locale to force ASCII encoding (so no Unicode enabled):
LC_CTYPE=C grep -RLP '[^x00]' .
UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.
@DavidFoerster's solution using grep's -z
does a good job of solving this problem, using the NUL bytes as separators does the trick.
Alternatively, I came up with a short Python 3 script (allzeroes.py
) to check whether the file's contents are all zeroes:
#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
for block in iter(lambda: f.read(4096), b''):
if any(block):
sys.exit(1)
Which you can use in a find
to locate all matches recursively:
$ find . -type f -exec allzeroes.py {} ; -print
I hope that helps.
3
+1 although sincegrep
is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using-z
(although that will slurp any regular text files wholly into memory). Also I don't think-P
is required here?
– steeldriver
Aug 17 '18 at 1:23
add a comment |
In short, what is happening here is that grep
is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.
(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]'
regex, since even when trying to do UTF-8 these would be considered non-characters.)
Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C
locale to force ASCII encoding (so no Unicode enabled):
LC_CTYPE=C grep -RLP '[^x00]' .
UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.
@DavidFoerster's solution using grep's -z
does a good job of solving this problem, using the NUL bytes as separators does the trick.
Alternatively, I came up with a short Python 3 script (allzeroes.py
) to check whether the file's contents are all zeroes:
#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
for block in iter(lambda: f.read(4096), b''):
if any(block):
sys.exit(1)
Which you can use in a find
to locate all matches recursively:
$ find . -type f -exec allzeroes.py {} ; -print
I hope that helps.
In short, what is happening here is that grep
is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.
(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]'
regex, since even when trying to do UTF-8 these would be considered non-characters.)
Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C
locale to force ASCII encoding (so no Unicode enabled):
LC_CTYPE=C grep -RLP '[^x00]' .
UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.
@DavidFoerster's solution using grep's -z
does a good job of solving this problem, using the NUL bytes as separators does the trick.
Alternatively, I came up with a short Python 3 script (allzeroes.py
) to check whether the file's contents are all zeroes:
#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
for block in iter(lambda: f.read(4096), b''):
if any(block):
sys.exit(1)
Which you can use in a find
to locate all matches recursively:
$ find . -type f -exec allzeroes.py {} ; -print
I hope that helps.
edited Aug 17 '18 at 16:16
answered Aug 16 '18 at 23:23
filbrandenfilbranden
7378
7378
3
+1 although sincegrep
is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using-z
(although that will slurp any regular text files wholly into memory). Also I don't think-P
is required here?
– steeldriver
Aug 17 '18 at 1:23
add a comment |
3
+1 although sincegrep
is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using-z
(although that will slurp any regular text files wholly into memory). Also I don't think-P
is required here?
– steeldriver
Aug 17 '18 at 1:23
3
3
+1 although since
grep
is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z
(although that will slurp any regular text files wholly into memory). Also I don't think -P
is required here?– steeldriver
Aug 17 '18 at 1:23
+1 although since
grep
is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z
(although that will slurp any regular text files wholly into memory). Also I don't think -P
is required here?– steeldriver
Aug 17 '18 at 1:23
add a comment |
You can abuse grep
’s alternative null-terminated line mode and thus search for files that contain only empty lines:
grep -L -z -e . ...
Replace ...
with the file set that you want to scan (here: -R .
).
Explanation
-z
,--null-data
– Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1
-e .
– Use.
as the search pattern, i. e. match any character.
-L
,--files-without-match
– Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1
Test case
Set-up:
: > empty
truncate -s 100 zero
printf '%s' foo bar > foobar
Run test:
$ grep -L -z -e . empty zero foobar
empty
zero
1 From the grep(1)
manual page.
add a comment |
You can abuse grep
’s alternative null-terminated line mode and thus search for files that contain only empty lines:
grep -L -z -e . ...
Replace ...
with the file set that you want to scan (here: -R .
).
Explanation
-z
,--null-data
– Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1
-e .
– Use.
as the search pattern, i. e. match any character.
-L
,--files-without-match
– Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1
Test case
Set-up:
: > empty
truncate -s 100 zero
printf '%s' foo bar > foobar
Run test:
$ grep -L -z -e . empty zero foobar
empty
zero
1 From the grep(1)
manual page.
add a comment |
You can abuse grep
’s alternative null-terminated line mode and thus search for files that contain only empty lines:
grep -L -z -e . ...
Replace ...
with the file set that you want to scan (here: -R .
).
Explanation
-z
,--null-data
– Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1
-e .
– Use.
as the search pattern, i. e. match any character.
-L
,--files-without-match
– Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1
Test case
Set-up:
: > empty
truncate -s 100 zero
printf '%s' foo bar > foobar
Run test:
$ grep -L -z -e . empty zero foobar
empty
zero
1 From the grep(1)
manual page.
You can abuse grep
’s alternative null-terminated line mode and thus search for files that contain only empty lines:
grep -L -z -e . ...
Replace ...
with the file set that you want to scan (here: -R .
).
Explanation
-z
,--null-data
– Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1
-e .
– Use.
as the search pattern, i. e. match any character.
-L
,--files-without-match
– Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1
Test case
Set-up:
: > empty
truncate -s 100 zero
printf '%s' foo bar > foobar
Run test:
$ grep -L -z -e . empty zero foobar
empty
zero
1 From the grep(1)
manual page.
answered Aug 17 '18 at 9:18
David FoersterDavid Foerster
28.2k1365111
28.2k1365111
add a comment |
add a comment |
I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:
shopt -s globstar
for file in ./**
do
[ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
done
add a comment |
I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:
shopt -s globstar
for file in ./**
do
[ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
done
add a comment |
I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:
shopt -s globstar
for file in ./**
do
[ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
done
I'll provide another answer, which is script I am using. Runned from specific folder will recurse and list all the NUL files:
shopt -s globstar
for file in ./**
do
[ -d "$file" ] || LC_CTYPE=C grep -qP '[^x00]' "$file" || echo "$file"
done
answered Jan 17 at 16:23
pbiespbies
1406
1406
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1066057%2ftrying-to-find-files-that-contain-only-nuls-but-getting-some-others%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
– wjandrea
Aug 17 '18 at 0:12