Saturday, March 28, 2009

How to find files easy way in linux

Searching on steroids: find

At the other end of the scale is the top-of-the-range search tool, find. In addition to filename-based searching, find is able to locate files based on ownership, access permissions, time of last access, size, and much else besides. Of course, the price you pay for all this flexibility is a rather perplexing command syntax. We'll dive into the details later, but here's an example to give you the idea:

$ find /etc -name '*.conf' -user cupsys -print
find: /etc/ssl/private: Permission denied
find: /etc/cups/ssl: Permission denied
/etc/cups/cupsd.conf
/etc/cups/printers.conf

In this example, find is searching in (and below) the directory /etc for files whose name ends in .conf and that are owned by the cupsys account.

Generally, the syntax of the find command is of the form:

$ find   

The "where to look" part is simply a space-separated list of the directories we want find to search. For each one, find will recursively descend into every directory beneath those specified. Our table below, titled Search Criteria For Find, lists the most useful search criteria (the "what to look for" part of the command).

Search criteria for find

-name string File name matches string (wildcards are allowed) -name '*.jpg'
-iname string Same as -name but not case sensitive -iname '*tax*'
-user username File is owned by username -user chris
-group groupname File has group groupname -group admin
-type x File is of type 'x', one of:

f - regular file

d - directory

l - symbolic link

c - character device

b - block device

p - named pipe (FIFO)

-type d
-size +N File is bigger than N 512-byte blocks (use suffix c for bytes, k for kilobytes, M for megabytes) -size +100M
-size -N File is smaller than N blocks (use suffix c for bytes, k for kilobytes, M for megabytes) -size -50c
-mtime -N File was last modified less than N days ago -mtime -1
-mtime +N File was last modified more than N days ago -mtime +14
-mmin -N File was last modified less than N minutes ago -mmin -10
-perm mode The files permissions exactly match mode. The mode can be specified in octal, or using the same symbolic notation that chmod supports -perm 644
-perm -mode All of the permission bits specified by mode are set. -perm -ugo=x
-perm /mode Any of the permission bits specified by mode is set -perm /011

And the smaller table Actions For Find, below, lists the most useful actions (the "what to do with it" part of the command). Neither of these is a complete list, so check the manual page for the full story.

Actions for find

-print Print the full pathname of the file to standard output
-ls Give a full listing of the file, equivalent to running ls -dils
-delete Delete the file
-exec command Execute the specified command. All following arguments to find are taken to be arguments to the command until a ';' is encountered. The string {} is replaced by the current file name.

If no other action is specified, the -print action is assumed, with the result that the pathname of the selected file is printed (or to be more exact, written to standard output). This is a very common use of find. I should perhaps point out that many of the search criteria supported by find are really intended to help in rounding up files to perform some administrative operation on them (make a backup of them, perhaps) rather than helping you find odd files you happen to have mislaid.

Why is this not a command?

The which command can - occasionally - give a misleading answer, if the command in question also happens to be a built-in command of the bash shell. For example:

$ which kill
/bin/kill

tells us that the kill command lives in /bin. However, kill is also a built-in bash command, so if I enter a command like

$ kill -HUP 1246

it will actually run the shell's built-in kill and not the external command.

To find out whether a command is recognised as a shell built-in, an alias, or an external command, you can use the type command, like this:

$ type kill
kill is a shell builtin

Learning by Example

It takes a while to get your head around all this syntax, so maybe a few examples would help ...

Example 1 This is a simple name-based search, starting in my home directory and looking for all PowerPoint (.ppt) files. Notice we've put the filename wildcard expression in quotes to stop the shell trying to expand it. We want to pass the argument '*.ppt' directly and let find worry about the wildcard matching.

$ find ~ -name '*.ppt'

Example 2 You can supply multiple "what to look for" tests to find and by default they will be logically AND-ed, that is, they must all be true in order for the file to match. Here, we look for directories under /var that are owned by daemon:

$ find /var -type d -user daemon

Example 3 This shows how you can OR tests together rather than AND-ing them. Here, we're looking in /etc for files that are either owned by the account cupsys or are completely empty:

$ find /etc -user cupsys -or -size 0

Example 4 This uses the '!' operator to reverse the sense of a test. Here, we're searching /bin for files that aren't owned by root:

$ find /usr/bin ! -user root

Example 5 The tests that make numeric comparisons are especially confusing. Just remember that '+' in front of a number means 'more than', '-' means 'less than', and if there is no '+' or '-', find looks for an exact match. These three example search for files that have been modified less than 10 minutes ago, more than 1 year ago, and exactly 4 days ago. (This third example is probably not very useful.)

$ find ~ -mmin -10
$ find ~ -mtime +365
$ find ~ -mtime 4

Example 6 Perhaps the most confusing tests of all are those made on a file's access permissions. This example isn't too bad, it looks for an exact match on the permissions 644 (which would be represented symbolically by ls -l as rw-r--r--:

$ find ~ -perm 644

Example 7 Here we look for files that are writeable by anybody (that is, either the owner, the group, or rest-of-world). The two examples are equivalent; the first uses the traditional octal notation, the second uses the same symbolic notation for representing permissions that chmod uses:

$ find ~ -perm -222
$ find ~ -perm -ugo=w

Example 8 Here we look for files that are writeable by everybody (that is, by the owner and the group and the rest-of-world):

$ find ~ -perm /222
$ find ~ -perm /ugo=w

Example 9 So far we've just used the default -print action of find to display the names of the matching files. Here's an example that uses the -exec option to move all matching files into a backup directory. There are a couple of points to note here. First, the notation {} gets replaced by the full pathname of the matching file, and the ';' is used to mark the end of the command that follows -exec. Remember: ';' is also a shell metacharacter, so we need to put the backslash in front to prevent the shell interpreting it.

$ find ~ -mtime +365 -exec mv {} /tmp/mybackup \;

Never mind the file name, what's in the file?

As we've seen, tools such as find can track down files based on file name, size, ownership, timestamps, and much else, but find cannot select files based on their content. It turns out that we can do some quite nifty content-based searching using grep in conjunction with the shell's wildcards. This example is taken from my personal file system:

$ grep -l Hudson */*
Desktop/suse_book_press_release.txt
google-earth/README.linux
Mail/inbox.ev-summary
Mail/sent-mail.ev-summary
snmp_training/enterprise_mib_list

Here, we're asking grep to report the names of the files containing a match for the string Hudson. The wildcard notation */* is expanded by the shell to a list of all files that are one level below the current directory. If we wanted to be a bit more selective on the file name, we could do something like:

$ grep -l Hudson */*.txt
Desktop/search_tools.txt
Desktop/suse_book_press_release.txt

which would only search in files with names ending in .txt. In principal you could extend the search to more directory levels, but in practice you may find that the number of file names matched by the shell exceeds the number of arguments that can appear in the argument list, as happened when I tried it on my system:

$ grep -l Hudson  */*  */*/*
bash: /bin/grep: Argument list too long

A more powerful approach to content-based searching is to use grep in conjunction with find. This example shows a search for files under my home directory ('~') whose names end in .txt, that contain the string Hudson.

$ find ~ -name '*.txt' -exec grep -q Hudson {} \; -print
/home/chris/Desktop/search_tools.txt
/home/chris/Desktop/suse_book_press_release.txt

This approach does not suffer from the argument list overflow problem that our previous example suffered from. Remember, too, that find is capable of searching on many more criteria that just file name, and grep is capable of searching for regular expressions not just fixed text, so there is a lot more power here than this simple example suggests.

If you're unclear about the syntax of this example, read The truth about find, below left. In this example, the predicate -exec grep -q Hudson {} \; returns true if grep finds a match for the string Hudson in the specified file, and false if not. If the predicate is false, find does not continue to evaluate any following expressions, that is, it does not execute the -print action.

The truth about find

The individual components of a find command are known as expressions, (or more technically, as predicates). For example, -uname cupsys is a predicate. The find command operates by examining each and every file under the directory you ask it to search and evaluating each of the predicates in turn against that file.

Each predicate returns either true or false, and the results of the predicates are logically AND-ed together. If one of the predicates returns a false result, find does not evaluate the remaining predicates. So for example in a command such as:

$ find . -user chris -name '*.txt' -print
if the predicate -user chris is false (that is, if the file is not owned by chris) find will not evaluate the remaining predicates. Only if -user chris and -name '*.txt' both return true will find evaluate the -print predicate (which writes the file name to standard output and also returns the result 'true')

No comments: