Difference between revisions of "Find notes"

From Noah.org
Jump to: navigation, search
Line 75: Line 75:
 
Backups that create a rotating backup often hard-link unchanged files between each rotation set. This saves disk space. Files that have changes are copied normally, so they don't have additional hard-links. Sometimes it is useful to compare two backup sets and generate a list of the files that changed between each set. This could be done with file hashes, but that would be slow. We can use the fact that files that are identical in two separate backup sets will have the same inode number. Files that are changed will have different inode numbers.
 
Backups that create a rotating backup often hard-link unchanged files between each rotation set. This saves disk space. Files that have changes are copied normally, so they don't have additional hard-links. Sometimes it is useful to compare two backup sets and generate a list of the files that changed between each set. This could be done with file hashes, but that would be slow. We can use the fact that files that are identical in two separate backup sets will have the same inode number. Files that are changed will have different inode numbers.
  
This is a work in progress...
+
<em>This is a work in progress...</em>
 +
 
 
<pre>
 
<pre>
 
cat <(find BACKUP_SET_1 ! -type d -exec ls -1i "{}" \;) <(find BACKUP_SET_2 ! -type d -exec ls -1i "{}" \;) | sort | cut -d ' ' -f 2-,1 | uniq -u -f 1
 
cat <(find BACKUP_SET_1 ! -type d -exec ls -1i "{}" \;) <(find BACKUP_SET_2 ! -type d -exec ls -1i "{}" \;) | sort | cut -d ' ' -f 2-,1 | uniq -u -f 1
 
</pre>
 
</pre>

Revision as of 14:32, 13 August 2008


exec versus xargs

You may notice that some people will pipe `find` output into `xargs`, but other people tell `find` to start a command using -exec. What is the difference? The difference is that xargs is faster. It will intelligently group arguments and feed batches to the subcommand, so it doesn't have to start a new instance of the subcommand for every argument.

Generally I find -exec easier to use because you can easily repeat the found filename in the exec argument. It's easier for me to express exactly what I want to be executed. Of course, some people think the `find` syntax is wacky. The `xargs` command comes in handy in other places such as a stream not generated by `find`, but when using `find` I stick with -exec unless I have a good reason not to.

You can always do it in a shell loop too:

  for filename in *.png ; do convert $filename `basename $filename .png`.jpg; done

Delete old files with find and cron

Put in /etc/cron.daily. This automatically deletes Spam older than 30 days from my Spam folder.

#!/bin/sh
find /home/vpopmail/domains/noah.org/noah/Maildir/.Spam/cur/ -type f -mtime +30 -exec rm -f {} \;

More CPU efficient:

#!/bin/sh
find /home/vpopmail/domains/noah.org/noah/Maildir/.Spam/cur/ -mtime +30 | xargs rm

copy user permissions to group permissions

Often you want group permissions to be identical as user permissions for an entire directory structure. This often happens with htdoc directories on web sites. The typical newbie mistake is to execute a massive `chmod -R a+rwx .` in an attempt to "get rid of permission problems". The following is a slightly more surgical:

find . -exec /bin/sh -c 'chmod g=`ls -ld "{}" | cut -c2-4 | tr -d "-"` "{}"' \;

This is also really slow. It forks a shell for every single file and directory in the current path directory structure. This was tested on a directory tree with 111645 files on a slow drive (disk read: 8.43 MB/sec), but the performance still is not impressive -- real timing: 95m 36.895s

List all extensions in the current directory

This came in handy when I was trying to find out exactly what mime-types I need to care about.

find . -print0 | xargs -L 1 -0 basename | sed -e "s/.*\(\\.\\s*\)/\\1/" | sort | uniq > /tmp/types

The -print0 option tells find to null-terminate filenames. The -0 option for xargs tells it to read null-terminated strings. These two options are used to handle filenames that have special characters such as quotes or line-feeds. If you don't do this then you may get the following error:

xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option

massive recursive grep

Grep has a recursive option, but you can fine tune a recursive grep wtih `find` -- you can make much more complicated expressions for the types of files you want to grep through. The main to remember when using `grep` with `find` is that you probably want the -H option on grep. This prints the filename along with the match.

find . -exec grep -H PatternToFind {} \;

find duplicates of files

This will list files with duplicates. It compares all files under the given directory. This ignores .svn directories and files of size 0.

This needs a little more work... It would be more efficient if it ignored all files that have a unique size, but then it's a slippery slope into writing a full-blown script. I would also like to get rid of the tmp file.

find . . -name .svn -prune -o -size 1 \! -type d -exec cksum {} \; | sort | tee /tmp/f.tmp | cut -f 1,2 -d ' ' | uniq -d | grep -hif - /tmp/f.tmp

find unique files in between two directories of hard-linked copies

Backups that create a rotating backup often hard-link unchanged files between each rotation set. This saves disk space. Files that have changes are copied normally, so they don't have additional hard-links. Sometimes it is useful to compare two backup sets and generate a list of the files that changed between each set. This could be done with file hashes, but that would be slow. We can use the fact that files that are identical in two separate backup sets will have the same inode number. Files that are changed will have different inode numbers.

This is a work in progress...

cat <(find BACKUP_SET_1 ! -type d -exec ls -1i "{}" \;) <(find BACKUP_SET_2 ! -type d -exec ls -1i "{}" \;) | sort | cut -d ' ' -f 2-,1 | uniq -u -f 1