Wednesday, August 12, 2009

Firefox-3.0.10 Installation on Puppy Linux OS (en)

Firefox

1. Download Firefox∞.
2. Install using the following code:

mv firefox* /usr/local/
cd /usr/local/
tar -zxvf firefox*
rm /usr/local/*.tar.gz

3. You will want to place a symbolic link to the Firefox executable in /usr/local/bin/ with a command similar to this:

ln -s /usr/local/firefox/firefox /usr/local/bin/firefox
Flash

1. Download Flash Player∞.
2. Decompress it, then copy libflashplayer.so to your Firefox plugins directory and flashplayer.xpt to your Firefox components directory.

Firefox | Opera | Seamonkey | Flock

CategoryApplicationVarious


=====================================================

PuppyLinuxOS-4.20 distro came with Mozilla SeaMonkey-1.1.15 as the default browser. Indeed, sea monkey is very nicely integrated with the email client, and html composer (the package that I used rarely). However, I’m not familiar with this SeaMonkey. So, I need to install it on my new Puppy. Below is my experience installing Firefox on Puppy Linux OS.

You can find firefox installation for PuppyLinux on this wiki. It follows default firefox installation. But on Puppy Linux, Firefox applications ant libraries should be placed on /opt/mozilla.org/lib/firefox-xxx folder. And then making a firefox softlink in directory /root/my-applications/bin/. In this article, I’ll also show you how to create firefox-3.0.10-i686.pet file.

Installation:

  1. Download firefox-3.0.10
  2. Extract Firefox-3.0.10.tar.gz to /opt/mozilla.org/lib directory . Rename it with firefox-3.0.10 (optional)
  3. Make softlink /opt/mozilla.org/lib/firefox-3.0.10/firefox file to /root/my-applications/bin/:
    ln-sf /opt/mozilla.org/lib/firefox-3.0.10/firefox /root/my-applications/bin/
  4. Edit moz_libdir in /opt/mozilla.org/lib/firefox-3.0.10/firefox file. It shoud be:
    moz_libdir=/opt/mozilla.org/lib/firefox-3.0.10
  5. type ‘firefox’ to run the program
===================================================

Want to make a pet file for this firefox-3.0.10? Follow these:

  1. Create a directory (for example, firefox-3.0.10-i686)
  2. Copy all firefox files and directories :cp --parent -av /opt/mozilla.org/lib/firefox-3.0.10/ ./firefox-3.0.10-i686
  3. Copy the /root/my-applications/firefox softlink (if /root/my-applications/firefox doesn’t exist, create it using step 3 on install instruction above):
    cp --parent -av /root/my-applications/bin/firefox ./firefox-3.0.10-i686
  4. Run the command:
    dir2pet firefox-3.0.10-i686
    Step1: press enter
    Step1b: (category) WebBrowser

    Step1c: (path, executable) firefox (because the /root/my-applications/bin directory is already in the path env)

    Step1d: (path, icon) /opt/mozilla.org/lib/firefox-3.0.10/icons/mozicon16.xpm

    Step1e: (name of application) Firefox

    Step2: (description) web browser

    Press enter 2 times

    Step3: (dependency) leave empty

    Step4: press enter

  5. Firefox-3.0.10-i686.pet file has been ready to be installed

Download Firefox pet files:

Click this thread post for detail.

Monday, August 10, 2009

How to update/reinstall grub boot loader

Way-01:-

(This how to is written for Ubuntu but should work on other systems. The only thing to take note of, when you see "sudo" that will mean to you that the following command should be entered at a root terminal.)

Boot into the live Ubuntu cd. This can be the live installer cd or the older live session Ubuntu cds.

When you get to the desktop open a terminal and enter. (I am going to give you the commands and then I will explain them later)

Code:
sudo grub
This will get you a "grub>" prompt (i.e. the grub shell). At grub>. enter these commands

Code:
find /boot/grub/stage1
This will return a location. If you have more than one, select the installation that you want to provide the grub files.
Next, THIS IS IMPORTANT, whatever was returned for the find command use it in the next line (you are still at grub>. when you enter the next 3 commands)

Code:
root (hd?,?)
Again use the value from the find command i.e. if find returned (hd0,1) then you would enter root (hd0,1)

Next enter the command to install grub to the mbr

Code:
setup (hd0)
Finally exit the grub shell
Code:
quit
That is it. Grub will be installed to the mbr.
When you reboot, you will have the grub menu at startup.

Now the explanation.
Sudo grub gets you the grub shell.
Find /boot/grub/stage1 has grub locate the file stage1. What this does is tell us where grub's files are. Only a small part of grub is located on the mbr, the rest of grub is in your boot folder. Grub needs those files to run the setup. So you find the files and then you tell grub where to locate the files it will need for setup.
So root (hd?,?) tells grub it's files are on that partition.
Finally setup (hd0) tells grub to setup on hd0. When you give grub the parameter hd0 with no following value for a partition, grub will use the mbr. hd0 is the grub label for the first drive's mbr.
Quit will exit you from the grub shell.
==============================

Way 2:

When the busted system is in a single ext3 root partition (say, /dev/sda5)...
mkdir /mnt/temp
mount /dev/sda5 /mnt/temp
grub-install --root-directory=/mnt/temp /dev/sda
When the busted system has separate ext3 boot and root partitions (say, /dev/sda5 and /dev/sda6)...
mkdir -p /mnt/temp/boot
mount /dev/sda6 /mnt/temp
mount /dev/sda5 /mnt/temp/boot
grub-install --root-directory=/mnt/temp /dev/sda
When the busted system has a separate ext3 boot partition (say, /dev/sda5), and the root partition is a logical volume (say, /dev/VolGroup00/LogVol00 according to lvdisplay)...
mkdir -p /mnt/temp/boot
vgchange -a y
lvdisplay
mount /dev/VolGroup00/LogVol00 /mnt/temp
mount /dev/sda5 /mnt/temp/boot
grub-install --root-directory=/mnt/temp /dev/sda
If grub-install fails with a read error or the boot loader still doesn't work, always try the GRUB shell (and vice versa). They both accomplish the same thing, but different things occur in the background. It's not necessary to mount partitions when using the GRUB shell commands in the LiveCD. Change x & y in the example to the busted system's boot partition...
/sbin/grub
grub> root (hdx,y)
grub> setup (hd0)
grub>quit


Restart & all are done.

Thursday, June 18, 2009

webserver -Apache2 With PHP, Ruby, Python in fedora 10

Apache2 With PHP, Ruby, Python
Now we install Apache with PHP5 (this is PHP 5.2.6):
yum install php php-devel php-gd php-imap php-ldap php-mysql php-odbc php-pear php-xml
php-xmlrpc php-eaccelerator php-magickwand php-magpierss php-mapserver php-mbstring
php-mcrypt php-mhash php-mssql php-shout php-snmp php-soap php-tidy curl curl-devel
perl-libwww-perl ImageMagick libxml2 libxml2-devel

Then edit /etc/httpd/conf/httpd.conf:
vi /etc/httpd/conf/httpd.conf

and change DirectoryIndex to
DirectoryIndex index.html index.htm index.shtml index.cgi index.php index.php3 index.pl


Now configure your system to start Apache at boot time:
chkconfig --levels 235 httpd on

Start Apache:
/etc/init.d/httpd start

12.1 Disable PHP Globally
(If you do not plan to install ISP Config on this server, please skip this section!)
In ISPConfig you will configure PHP on a per-website basis, i.e. you can specify which website can run
PHP scripts and which one cannot. This can only work if PHP is disabled globally because otherwise all
websites would be able to run PHP scripts, no matter what you specify in ISPConfig.
To disable PHP globally, we edit /etc/httpd/conf.d/php.conf and comment out the AddHandler and
AddType lines:
vi /etc/httpd/conf.d/php.conf

Afterwards we restart Apache:
/etc/init.d/httpd restart

12.2 Ruby
Starting with version 2.2.20 , ISPConfig has built-in support for Ruby. Instead of using CGI/FastCGI,
ISPConfig depends on mod_ruby being available in the server's Apache.
For Fedora 10 , there's no mod_ruby pack age available, so we must compile it ourselves. First we
install some prerequisites:
yum install httpd-devel ruby ruby-devel

Next we download and install mod_ruby as follows:
cd /tmp

wget http://www.modruby.net/archive/mod_ruby-1.3.0.tar.gz
tar zxvf mod_ruby-1.3.0.tar.gz

cd mod_ruby-1.3.0/
./configure.rb --with-apr-includes=/usr/include/apr-1

make
make install

Finally we must add the mod_ruby module to the Apache configuration, so we create the file
/etc/httpd/conf.d/ruby.conf


vi /etc/httpd/conf.d/ruby.conf

... and restart Apache:
/etc/init.d/httpd restart

You can find more details about mod_ruby in this article.
12.3 Installing mod_python
To install mod_python, we simply run...
yum install mod_python

... and restart Apache afterwards:
/etc/init.d/httpd restart

13 ProFTPd
ISPConfig has better support for proftpd than vsftpd, so let's remove vsftpd and install proftpd:
yum remove vsftpd
yum install proftpd

Now we can create the system startup links for Proftpd and start it:

chkconfig --levels 235 proftpd on

/etc/init.d/proftpd start

14 Webalizer
To install webaliz er, just run
yum install webalizer

15 Synchronize The System Clock
If you want to have the system clock synchroniz ed with an NTP server do the following:
yum install ntp
chkconfig --levels 235 ntpd on

ntpdate 0.pool.ntp.org
/etc/init.d/ntpd start

16 Install Som e Perl Modules
ISPConfig comes with SpamAssassin which needs a few Perl modules to work. We install the required
Perl modules with a single command:
yum install perl-HTML-Parser perl-DBI perl-Net-DNS perl-Digest-SHA1 perl-ExtUtils-
AutoInstall

17 ISPConfig
The configuration of the server is now finished. You can now install ISPConfig on it, following these
instructions: http://www.ispconfig.org/manual_installation.htm
17.1 A Note On SuExec
If you want to run CGI scripts under suExec, you should specify /var/www as the web root for websites
created by ISPConfig as Fedora's suExec is compiled with /var/www as Doc_Root. Run
/usr/sbin/suexec -V

and the output should look like this:

[root@server1 ~]# /usr/sbin/suexec -V
-D AP_DOC_ROOT="/var/www"
-D AP_GID_MIN=100
-D AP_HTTPD_USER="apache"
-D AP_LOG_EXEC="/var/log/httpd/suexec.log"
-D AP_SAFE_PATH="/usr/local/bin:/usr/bin:/bin"
-D AP_UID_MIN=500
-D AP_USERDIR_SUFFIX="public_html"
[root@server1 ~]#

So if you want to use suExec with ISPconfig, don't change the default web root (which is /var/www) if
you use expert mode during the ISPConfig installation (in standard mode you can't change the web
root anyway so you'll be able to use suExec in any case).

Saturday, June 6, 2009

Squid in 5 minutes

Squid in 5 minutes
by Noah Gift

Why Squid? Why only five minutes?

There are many great tools that Squid has to offer, but when I need to redirect http traffic to a caching server for performance increases or security, squid’s my pick. Squid has built in proxy and caching tools that are simple, yet effective.

I recently used Squid for a secure subnet that did not allow outgoing port 80 http access to external IP addresses. Many organizations will block external port 80 access at the router level. This is a great way to eliminate a huge security hole, but a headache when a systems administrator needs to reach the outside world temporarily to download a file. Another scenario: redirect all computers in a home network to a local caching server to increase website query performance and save on bandwidth.

The situations described above are when the five minute Squid configuration comes in very handy. All requests for external http access can be handled by squid through a simple proxy configuration on each client machine. Sounds complicated? It isn’t. Let’s get into the details next.
Install

On a Red Hat® Enterprise Linux® or Fedora™ Core operating system, it is easy to check if Squid is installed using the rpm system. Type the command:

rpm -q squid


If Squid is already installed, you will get a response similar to:

squid-2.5.STABLE6-3.4E.12


If Squid isn’t installed, then you can use Yum to install it. Thanks to Yum the installation is quite easy.

Just type at a command line:

yum install squid


If you happen to have downloaded the rpm you can also type something like:

rpm -ivh squid-2.5.STABLE6-3.4E.12.i386.rpm


Configure

Squid’s main configuration file lives in /etc/squid/squid.conf. The 3,339 line configuration file is intimidating, but the good news is that it is very simple to setup a proxy server that forward http, https, and ftp requests to Squid on the default port of 3128 and caches the data.
Back up the configuration file

It is always good policy to backup a configuration file before you edit it. If you haven’t been burned yet, you haven’t edited enough configuration files. Make a backup from the command line or the gui and rename the original file something meaningful. I personally like to append a bck.datestamp. For example:

cp /etc/squid/squid.conf /etc/squid/squid.conf.bck.02052007

If it is the original configuration file you might choose to do:

cp /etc/squid/squid.conf /etc/squid/squid.conf.org.02052007


Edit the file

Open /etc/squid/squid.conf with your favorite text editor. I use vim, but nano is a good beginner’s command line text editor. If you do use nano, make sure you use the nano –nowrap option to turn off line wrapping when editing things like configuration files. A gui editor like Gedit will also work.
Five minute configuration

There are many fancy options for squid that we will not enable, specifically acls (access control lists) or authentication. We are going to set up a caching proxy server with no access control. This server would be suitable for a home network behind a firewall.

The default squid configuration is almost complete, but a few small changes should be made. You will need to either find and uncomment entries, or modify existing uncommented lines in the squid configuration file. Use your favorite text editor or a text find to quickly locate these lines:

visible_hostname machine-name
http_port 3128
cache_dir ufs /var/spool/squid 1000 16 256
cache_access_log /var/log/squid/access.log

In the acl section near the bottom add:

acl intranet 192.168.0.0/24
http_access allow intranet


Let me explain what each of these six lines means:

visible_hostname – Create this entry and set this to the hostname of the machine. To find the hostname, use the command hostname. Not entering a value may cause squid to fail as it may not be able to automatically determine the fully qualified hostname of your machine.

http_port 3128 – Uncomment this line but there is no need to edit it unless you want to change the default port for http connections.

cache_dir ufs /var/spool/squid 1000 15 256 – Uncomment this line. You may want to append a zero to the value 100 which will make the cache size 1000MB instead of 100MB. The last two values stand for the default folder depth the cache will create on the top and subdirectories respectively. They do not need modification.

cache_access_log – Uncomment this line. This is where all requests to the proxy server will get logged.

acl intranet 192.168.0.0/24 – This entry needs to be added. It should correspond to whatever your local network range is. For example, if your Fedora server is 192.168.2.5 then the entry should be acl intranet 192.168.2.0/24

http_access allow intranet – This allows the acl named intranet to use the proxy server. Make sure to put allow directives above the last ‘http_access deny all’ entry, as it will overide any allow directives below it.
Turning on squid

Enable the proper run levels:

chkconfig squid on

Start the service:

service squid start

Verify that squid isrunning:

service squid status


Note, if you have problems starting squid, open a separate shell and run:

tail -f /var/log/messages


Then start the squid service in your original window:

service squid start


The tail command should show an error for squid that can help you solve the problem. One common error is that the swap (cache) directory doesn’t exist. To solve this problem, run squid with the -z option to automatically create the directories:

/usr/sbin/squid -z


Make sure that squid has write permission to the swap directory or this command won’t work.
Configuring the clients

If you are using Firefox or Mozilla you will need to add the proxy server as follows:

Go to Preferences>Network>Settings

Add the name of your new proxy server and port 3128 to the http proxy field (under manual configuration).

Open a shell to your proxy server so you can observe the log file being written to. Use tail, as before:

tail -f /var/log/squid/access.log


Now surf the web through your proxy server. You should see entries flying by in real time as you surf different http addresses. Congratulations, you now have a caching proxy server setup!
Summary

Quick recap:You installed squid with a simple yum command. You backed up the default configuration file, then edited just 6 lines. You started the proper run level. You started the squid service. You configured a client to use the proxy server, and then you verified it was working properly by tailing the log. To top it all off, you did it in 5 minutes. Now who says Linux isn’t fun?

This entry was posted by Noah Gift on Wednesday, April 11th, 2007 at 9:01 am and is filed under technical. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Wednesday, May 27, 2009

How to install microsoft core fonts

Microsoft Core Fonts

These fonts need to be built from the web so you will build an rpm package based on a SPEC file.

su -c 'yum install wget rpmdevtools rpmbuild cabextract ttmkfdir'
  • Then create the rpmdev directory
rpmdev-setuptree
  • Switch to your SPECS directory that you created in the previous step.
cd ~/rpmbuild/SPECS/
  • Download the spec file.
wget http://dl.atrpms.net/all/chkfontpath-1.10.1-2.fc9.x86_64.rpm
wget http://corefonts.sourceforge.net/msttcorefonts-2.0-1.spec
  • run the following command to build the rpm
rpmbuild -bb msttcorefonts-2.0-1.spec
  • Move to where the msttcorefonts rpm was created
cd ~/rpmbuild/RPMS/noarch/
  • Download and install the msttcorefonts and dependencies
su -c 'rpm -ivh chkfontpath-1.10.1-2.fc9.x86_64.rpm'
su -c 'rpm -ivh msttcorefonts-2.0-1.noarch.rpm'
  • Then run the following code to restart the font server.
su -c '/sbin/service xfs reload'

Or

su -c '/etc/init.d/xfs reload'

Tuesday, May 19, 2009

How to Copy Files Across a Network/Internet Over ssh with rsync/scp/tar

How to Copy Files Across a Network/Internet in UNIX/LINUX (Redhat, Debian, FreeBSD, etc) - scp tar rsync

One of the many advantages of Linux/UNIX is how many ways you can do one thing. This tuturial is going to show you some of the many ways you can ttransfer files over a network connection.

In this article/tutorial we will cover rsync, scp, and tar. Please note that there are many other ways these are just some of the more common ones. The methods covered assume that SSH is used in all sessions. These methods are all much more secure and reliable than using rcp or ftp. This tutorial is a great alternative for those looking for an FTP alterative to transfering files over a network.

scp

scp or secure copy is probably the easiest of all the methods, its is designed as a replacement for rcp, which was a quick copy of cp with network funcationability.

scp syntax

scp [-Cr] /some/file [ more ... ] host.name:/destination/file

-or-

scp [-Cr] [[user@]host1:]file1 [ more ... ] [[user@]host2:]file2

Before scp does any copying it first connects via ssh. Unless proper keys are in place, then you will be asked for usernames. You can test if this is working by using ssh -v hostname

The -r switch is used when you want to recursively go through directories. Please note you must specify the source file as a directory for this to work.

scp encrypts data over your network connection, but by using the -C switch you can compress the data before it goes over the network. This can significantly decrease the time it takes to copy large files.

Tip: By default scp uses 3DES encryption algorithm, all encryption algorithms are slow, but some are faster than others. Using -c blowfish can speed things up.

What scp shouldn't be used for:
1. When you are copying more than a few files, as scp spawns a new process for each file and can be quite slow and resource intensive when copying a large number of files.
2. When using the -r switch, scp does not know about symbolic links and will blindly follow them, even if it has already made a copy of the file. The can lead to scp copying an infinite amount of data and can easily fill up your hard disk, so be careful.

rsync

rsync has very similar syntax to scp:

rsync -e ssh [-avz] /some/file [ more ... ] host.name:/destination/file

-or-

rsync -ave ssh source.server:/path/to/source /destination/dir

rsync's speciality lies in its ability to analyse files and only copy the changes made to files rather than all files. This can lead to enormous improvements when copying a directory tree a second time.

Switches:

-a Archive mode, most likely you should always keep this on. Preserves file permissions and does not follow symlinks.

-v Verbose, lists files being copied

-z Enable compression, this will compress each file as it gets sent over the pipe. This can greatly decrease time depending on what sort files you are copying.

-e ssh Uses ssh as the transport, this should always be specified.

Disadvantages of using rsync:
1. Picky syntax, use of trailing slashes can be confusing.
2. Have to remember that you are using ssh.
3. rsync is not installed on all computers.

tar

tar is usually used for achiving applications, but what we are going to do in this case is tar it then pipe it over an ssh connection. tar handles large file trees quite well and preserves all file permissions, etc, including those UNIX systems which use ACLs, and works quite well with symlinks.

the syntax is slightly different as we are piping it to ssh:

tar -cf - /some/file | ssh host.name tar -xf - -C /destination

-or with compression-

tar -czf - /some/file | ssh host.name tar -xzf - -C /destination

Switch -c for tar creates an archive and -f which tells tar to send the new archive to stdout.

The second tar command uses the -C switch which changes directory on the target host. It takes the input from stdin. The -x switch extracts the archive.

The second way of doing the transfer over a network is with the -z option, which compresses the stream, decreasing time it will take to transfer over the network.

Some people may ask why tar is used, this is great for large file trees, as it is just streaming the data from one host to another and not having to do intense operations with file trees.

If using the -v (verbose) switch, be sure only to include it on the second tar command, otherwise you will see double output.

Using tar and piping can also be a great way to transfer files locally to be sure that file permissions are kept correctly:

tar cf - /some/file | (cd /some/file; tar xf -)

This may seem like a long command, but it is great for making sure all file permissions are kept in tact. What it is doing is streaming the files in a sub-shell and then untarring them in the target directory. Please note that the -z command should not be used for local files and no perfomance increase will be visible as overhead processing (CPU) will be evident, and will slow down the copy.

Why tar shouldn't be used:
1. The syntax can be hard to remember
2. It's not as quick as to type scp for a small number of files
3. rsync will beat it hands down for a tree of files that already exist in the destination.

There are several other ways of copying over a network, such as FTP, NAS, and NFS but these all requre specialised software installed on either the receiving or sending end, and hence are not as useful as the above commands.

Thursday, April 23, 2009

Record streaming radio with Streamripper and streamtuner

Record streaming radio with Streamripper and streamtuner

streamripper-streamtuner.png

Streamripper and streamtuner are Linux applications that work in tandem, allowing you to listen and record streaming radio.

Streamtuner is a basic application used primarily for playing internet radio from any number of sources including Xiph, basic.ch, punkcast.com, Google Stations, Live365, and SHOUTcast. Streamripper, on the other hand, sits stealthily in the terminal recording and parsing all of the music piped through your system into MP3s. Leaving Streamripper running all day will rack you up a healthy dose of your favorite tunes. What's particularly nice about the combination of Streamripper and streamtuner is that once you choose a station to record, you can exit streamtuner leaving only a tiny, non-obtrusive terminal window for Streamripper.

Note that Streamripper doesn't do a perfect job parsing your songs into individual MP3s. Unfortunately, there tends to be a few seconds of overlap at the beginning and/or end of the songs. If this bothers you, you can always edit your newly recorded songs with Audacity. Ubuntu users can grab Streamripper and streamtuner out of the repositories with this command:

sudo apt-get install streamripper streamtuner

Other Linux users should grab the appropriate package for your distro at the streamtuner and Streamripper homepages.

Don't worry Windows users, you're not left out in the cold. Stationripper lets you do the same thing.

Monday, April 20, 2009

Show all running processes in Linux and kill any process u want to ...

Show all running processes in Linux

by Vivek Gite

Q. How do I see all running process in Linux?

A. Both Linux and UNIX support same command to display information about all running process.

ps command gives a snapshot of the current processes. If you want a repetitive update of this status, use top command.

Task: Use ps command

Type the following ps command to display all running process
# ps aux | less

Where,

  • -A: select all processes
  • a: select all processes on a terminal, including those of other users
  • x: select processes without controlling ttys

Task: see every process on the system

# ps -A
# ps -e

Task: See every process except those running as root

# ps -U root -u root -N

Task: See process run by user vivek

# ps -u vivek

Task: Use top command

The top program provides a dynamic real-time view of a running system. Type the top at command prompt:
# top
Output:

To quite press q for help press h.

Task: display a tree of processes

pstree shows running processes as a tree. The tree is rooted at either pid or init if pid is omitted. If a user name is specified, all process trees rooted at processes owned by that user are shown.
$ pstreeOutput:

pstree output

Task: Lookup process

Use pgrep command. pgrep looks through the currently running processes and lists the process IDs which matches the selection criteria to screen. For example display firefox process id:
$ pgrep firefox

Following command will list the process called sshd which is owned by root user.
$ pgrep -u root sshd

Say hello to htop

htop is interactive process viewer just like top, but allows to scroll the list vertically and horizontally to see all processes and their full command lines. Tasks related to processes (killing, renicing) can be done without entering their PIDs. To install htop type command:
# apt-get install htop
or
# yum install htop
Now type htop to run the command:
# htop

(click to enlarge)

See also:

Tuesday, April 7, 2009

Exploring filters and pipes


LXF

When many newbies first encounter Linux, the 'cool stuff' that often gets their attention is the incredible array of command line tools, and something called a pipe that allowed you to connect them together. Together, these provide an incredibly powerful component-based architecture designed to process streams of text-based data.

If you've never dabbled with filters and pipes before, or perhaps you've just been too scared, we want to help you out, so read on to learn how you can make powerful Linux commands just by stringing smaller bits together...

A filter is a program that reads a single input stream, transforms it in some way, and writes the result to a single output stream, as shown in Figure 1, below. By default, the output stream (called standard output or just stdout) is connected to the terminal window that the program is running in, and the input stream (standard input, or just stdin) is connected to the keyboard, though in practice filters are rarely used to process data typed in manually at the keyboard.

Figure 1: A filter has a single input stream and a single output stream.

Figure 1: A filter has a single input stream and a single output stream.

If a filter is given a file name as a command line argument, it will open that file and read from it instead of reading from stdin, as shown in Figure 2 below. This is a much more common arrangement. There are a few sample commands in the box opposite. By themselves, many individual filters don't do anything exciting. It's what you can do with them in combination that gets interesting.

Figure 2: Given a filename, a filter reads it, ignoring its standard input.

Figure 2: Given a filename, a filter reads it, ignoring its standard input.

Combining filters: pipes

A pipe allows data to flow from an output stream in one process to an input stream in another process. Usually, it's used to connect a standard output stream to a standard input stream. Pipes are very easy to create at the shell command line using the | (vertical bar) symbol. For example, when the shell sees this command line:

$ sort foo.txt | tr '[A-Z]' '[a-z]'

it runs the programs sort and tr in two separate, concurrent processes, and creates a pipe that connects the standard output of sort (the 'upstream' process) to the standard input of tr (the 'downstream' process). The plumbing for this is shown in Figure 3, below.

Figure 3: A graphical breakdown a complex command using filters and pipes.

Figure 3: Plumbing for the command "sort foo.txt | tr '[A-Z]' '[a-z]'".

This particular tr command, if you're interested, replaces all characters in the set [A-Z] by the corresponding character from the set [a-z]; that is, it translates upper-case to lower-case.

Capture standard output

Most of our examples assume that standard output is connected to the terminal window, which is the default behaviour. However, it's easy to send stdout to a file instead using the shell's > operator. For example:

$ sort foo.txt | tr '[A-Z]' '[a-z]' > sorted.txt

...runs the same pipeline as before but redirects the stdout from tr into the file sorted.txt. Note: it's the shell that performs this re-arrangement of the plumbing; the tr command simply writes to stdout and neither knows nor cares where this is directed.

Semi-filters

Linux has many command-line tools that aren't really filters, but which nonetheless write their results to stdout and so can be used at the head of a pipeline. Examples include ls, ps, df, du, but there are lots of others. So for example,

$ ps aex | wc

will count the number of processes running on the machine, and

$ ls -l | grep '^l'

will show just the symbolic links in the current directory. (Regular expression ^l matches those lines that begin with the letter 'l'.)

The other case of a semi-filter (that is, programs that read stdin but don't write to stdout) is less common. The only ones that come to mind are the browsing program less, the printing utility lpr, and the command-line mailer mail. Such programs can be used at the downstream end of a pipeline, for example:

grep syslog /var/log/messages | less

...searches for entries in /var/log/messages relating to syslog, and browses the output using less. Using less on the downstream end of a pipeline in this way is very common.

You can put perform complex and powerful actions by trying out some of the filter commands in the table below.

Commonly used filters

Filter What it does
cat Copies input direct to output.
head Shows beginning of a file (default 10 lines).
tail Shows end of a file (default 10 lines).
wc Counts characters, words and lines.
sort Sorts input lines.
grep Shows lines that match a regular expression.
tr Translates or deletes specified character sets.
sed Stream editor.
uniq Discards all but one of successive identical lines.
awk Highly-programmable field-processing.

sed - the stream editor

The stream editor sed supports automated editing of text and is a more flexible filter than most. It streams its input (either stdin, or a specified input file) line by line, applies one or more editing operations, and writes the resulting lines to stdout.

Actually, sed has quite a repertoire of commands, but the most useful is the substitute command. As a simple example to get us started, let's use sed to perform a simple substitution on text entered directly from the keyboard (stdin):

$ sed 's/rich/poor/g'
He wished all men as rich as he
He wished all men as poor as he
And he was as rich as rich could be
And he was as poor as poor could be
^D
$

A little explanation may be in order. The sed command used here replaces all occurrences of the string 'rich' by the string 'poor'. The /g suffix on the command causes the replacement to be "global" - that is, if there is more than one occurrence of the original string within the line, they will all be replaced. (Without the /g suffix, only the first one would be replaced.) Following the sed command we see two pairs of lines. The first of each pair is the text we typed in on the keyboard, and the second of each pair is the edited line that sed has written to its standard output.

Here's a more useful example, in which we use a regular expression to strip out all fields, except the first, from the file /etc/passwd:

$ sed 's/:.*//' /etc/passwd
root
daemon
bin
sys
... more lines deleted ...

Here, the text we're replacing is the regular expression :.*, which matches from the first colon to the end of the line. The replacement string (between the second and third slashes) is empty, so that whatever the regular expression matches is deleted from the line. In this case, we end up with a list of the account names in the passwd file.

Let's be clear about one thing, though - sed is not actually making any changes to /etc/passwd. It is simply reading the file as input and writing the modified lines to stdout.

awk

Named after its inventors (Aho, Weinberger and Kernighan) awk is really in a league of its own, being a fully-fledged programming language with variables, loops, branches, functions, and more. An awk program consists of one or more pattern-action pairs like this:

pattern { action }
pattern { action }

The pattern is some sort of test that is applied to each line; if the pattern matches, the corresponding action is taken. If the pattern is omitted, the action is taken for every line. If the action is omitted, the default action is to print the entire line. Awk excels at processing textual data that's organised into fields (ie columns); it reads its input line by line and automatically splits it into fields, making them available in the special variables $1, $2, $3 etc.

To demonstrate awk we'll use a small sample of geographic data for the accuracy of which I put my trust in Collins Complete World Atlas. The data shows countries, areas, populations (in thousands), languages spoken, and currency. For brevity we're restricting ourselves to four lines of data that look like this:

Country      Area    Population  Languages              Currency
Albania 28748 3130 Albanian,Greek Lek
Greece 131957 11120 Greek Euro
Luxembourg 2586 465 German,French Euro
Switzerland 41293 7252 German,French,Italian Franc

The data is in the file geodata.

Many awk programs are simple enough to enter as a command line argument to awk. Here's an example:

$ awk '{ print $1, $5 }' geodata
Country Currency
Albania Lek
Greece Euro
Luxembourg Euro
Switzerland Franc

This awk program has just a single pattern-action. The pattern is missing, so the action is taken on every line. The action is to print the first and fifth fields; ie, the country name and the currency.

Next, let's find those countries that use Euros. We can use a pattern to make a test on the value in the fifth field, like this:

$ awk ' $5=="Euro" { print $1 }' geodata
Greece
Luxembourg

What about calculating the total population? This needs two pattern-actions, one that fires on every line to accumulate the running sum of the population (the values in the third field), and one that fires right at the end to print out the result. Although it would be possible to enter this program directly on the command line, as we have in the examples so far, it's getting a little longer so it's more convenient to put it into a separate file, which I called totalpop.awk. It looks like this:

{ sum += $3 }
END { print sum }

The first action has no pattern, so it takes place on every line. The variable sum is just a name I made up. Variables do not need to be pre-declared in awk; they spring into existence (and are initialised to zero) at the mere mention of their name. The second action uses the special pattern 'END' that fires just once, after all the input has been processed, to print out the result.

Now, I can run awk and have it take its program from the file totalpop.awk like this:

$ awk -f totalpop.awk geodata
21967

Notice the use of the -f flag in the first line above to specify the file containing the awk program. How would we set about finding those countries in which a specified language is spoken? This is a little harder because each country has a comma-separated list of one or more languages and we will have to split this list apart. Fortunately, awk provides a built-in function for doing just that. Here's the complete program, which I called language.awk:

{ NL = split($4, langs, ",");
for (i=1; i<=NL; i++)
if ( langs[i] == "Greek")
print $1
}

There's only one action, and it has no associated pattern (so it fires on every line); however the action is starting to look a bit more like a conventional program, with a function call, a couple of variables, a loop, and an if test. Here's a sample run:

$ awk -f language.awk geodata
Albania
Greece

Back to a one-liner to show that awk can do arithmetic. This example shows all entries that have a population density greater than 150 per square kilometre:

$ awk '$3*1000/$2 > 150' geodata
Luxembourg 2586 465 German,French Euro
Switzerland 41293 7252 German,French,Italian Franc

Notice that this awk program has a pattern but no action. As you can see, the default action is to print the entire line. Awk has lots more features than shown here, and many people write awk programs much longer than four lines! My mission, though, it not to show how long an awk program can get, but rather to show how short it can get, and still do something useful.

A worked example

Let's bring several of the tools we've discussed together in one final example. Our mission is to count the word frequencies in a sample of text, and our text this morning is taken from Mark, Chapter 6 (The King James version, downloadable from the Electronic Text Center of the University of Virginia Library - see http://etext.virginia.edu/kjv.browse.html).

Now, It's possible, in theory, to type the entire solution interactively at the command prompt, but we'll build it up as a shell script called wordfreq.sh, adding processing steps one by one and showing what the output looks like at each stage.

The file, as I retrieved it from the University of Virginia website, consists of lines of text, one for each bible verse, with the verse number and a colon at the start. So for example, the line for verse 42 looks like:

42: And they did all eat, and were filled.

I placed this text into a file called mark.txt.

We're going to use awk's associative arrays to perform the word count, but the input could do with a little tidying first. For starters, we need to get rid of those verse numbers. We can easily do this with sed's substitute command, which is the first line to go into our wordfreq.sh script:

#!/bin/bash
sed 's/^[0-9]*:\ //' $1

The first line is part of the shell scripting machinery - it tells Linux to use the bash shell as the interpreter for the script. The second line is a classic use of sed. This use of the substitute command should be clear from our earlier examples; the 'old pattern' uses regular expressions to say "beginning of line, zero or more digits, colon, literal space" and the 'new pattern' (between the second and third forward slashes) is empty.

Thus, those leading verse numbers are stripped away. The $1 at the end of the line is more shell scripting machinery - it will be substituted by the command-line argument we supply to the script. This is the file we want sed to process. With our two line script created, we can turn on execute permission:

chmod u+x wordfreq.sh

Now we can run the script, specifying the input file name as argument:

./wordfreq.sh mark.txt

Our sample verse 42 now looks like this:

And they did all eat, and were filled.

Next, I decided to make the word counting case-insensitive. The easiest way to do this is to convert all upper-case letters in the input to lower-case. This is easily done with tr. So now our script is three lines long:

#!/bin/bash
sed 's/^[0-9]*:\ //' $1 | \
tr '[A-Z]' '[a-z]'

We have appended a pipe symbol and a backslash on line 2, allowing us to continue the command on line 3. (There is no need to split the pipeline across multiple lines, I just figured it would be easier to read that way.) The tr command on the third line is another classic, it says "replace each character in the set A-Z by the corresponding character in the set a-z". If we run our new, three-line script, our sample verse 42 looks like this:

and they did all eat, and were filled.

The next step is to get rid of those pesky punctuation characters. Again, this is easily done with tr, using the -d option. So now our script looks like this:

#!/bin/bash
sed 's/^[0-9]*:\ //' $1 | \
tr '[A-Z]' '[a-z]' | \
tr -d '[.,;:]'

The last line simply removes all characters in the set [.,;:]. After applying this version of the script, our sample verse looks like this:

and they did all eat and were filled

Now we have something fit for awk to process. This is where we do the actual word frequency counting. The basic idea is to loop over each individual word in the document, using the word itself as an index into an associative array, and simply incrementing the corresponding element of the array for each word.

Once the entire document has been processed, we can print out the index and value of each element of the array - that is, the word and the number of times that word occurs. I decided to put the awk program into a separate file called wordfreq.awk, so now we have two scripts to deal with, the shell script wordfreq.sh and the awk program wordfreq.awk

The shell script now looks like this:

sed 's/^[0-9]*:\ //' $1 | \
tr '[A-Z]' '[a-z]' | \
tr -d '[.,;:]' | \
awk -f wordfreq.awk

...and the awk program looks like this:

{ for (i=1; i<=NF; i++)
w[$i]++
}
END { for (word in w)
print word, w[word]
}

This awk program has two actions. Easier to spot is the first, which is taken on all input lines (there is no selection pattern), loops over all fields in the input line (ie all words in the input) and increments the corresponding element of the associative array w.

The variable names i and w were my own choice, the variable NF is a built-in awk variable and contains the number of fields in the current line. The statement w[$i]++, which increments the appropriate element of the associative array w, is really the core of this solution, Everything else that is there just exists to allow this statement to work.

The second action in this awk program only occurs once, at the end, after all input has been processed. It simply loops over the elements of the w array, printing out the index into the array (that is, the word) and the content of the array (that is, a count of how many times that word occurred).

The output from our script is now fundamentally transformed and looks like this:

themselves 4
would 3
looked 1
taken 1
of 27
sit 1
privately 1
abroad) 1
name 1
and 134

...with a further 400-odd lines deleted.

Finally, we can make the output more useful by doing a reverse numeric sort on the second field, (so that the most common words appear at the top of the list) and using head to pick off the first ten lines. Our final script looks like this:

#!/bin/bash
sed 's/^[0-9]*:\ //' $1 | \
tr '[A-Z]' '[a-z]' | \
tr -d '[.,;:]' | \
awk -f wordfreq.awk | \
sort -nr -k2 | \
head

Notice the flags on the sort command. -n forces a numeric sort , -r reverses the results of the sort, and -k2 sorts on the second field. Now we have the word frequency report we were looking for:

and 134         of 27
the 64 him 26
he 38 unto 23
they 31 to 22
them 31 his 21

There's more than one way to do it

As with most things in life, there are other ways to solve this particular problem. Rather than using awk's associative arrays, we can split the file up as one word per line (using tr) then sort it alphabetically using sort (so that repeated instances of the same word will appear on successive lines) then use uniq to report how many times each line appears, then do a reverse numeric sort, then pick off the first ten lines. The script looks like this:

#!/bin/bash
sed 's/^[0-9]*:\ //' $1 | \
tr '[A-Z]' '[a-z]' | \
tr -d '[.,;:]' | \
tr ' ' '\n' | \
sort | \
uniq -c | \
sort -nr | \
head
and the output looks like this:
134 and 27 of
64 the 26 him
38 he 23 unto
31 they 22 to
31 them 21 his

...which is the same as we had before, except the fields are in the opposite order. To understand how this solution works, try building it up step by step using a sample of input of your choice, and observing the output at each stage.

Simple regular expressions

Command Line What it does
head /etc/passwd Show the first ten lines of /etc/passwd.
grep '/bin/bash$' \ /etc/passwd Show the lines in /etc/passwd relating to accounts that use bash as their login shell.
sort /etc/services Sort the services from /etc/services in alphabetical order.
wc /etc/* 2> /dev/null Count the lines, words and characters of all the files in /etc; throw away any error messages.
First published in Linux Format

Tuesday, March 31, 2009

21 of the Best Free Linux DVD Tools

21 of the Best Free Linux DVD Tools


The venerable DVD is a removable media format that was conceived over 15 years ago. However, it still remains today the ideal format for many uses, although blu-ray is starting to make significant inroads in the field of multimedia playback.

DVDs remain the most popular removable format for Linux distributions, primarily because CDs do not have the capacity to carry a large number of applications, and blu-ray discs and players remain expensive.

This article focuses on selecting the best free software which relies on the DVD format. We select the best of breed Linux software to play DVD multimedia, to create DVD videos, software to write to them, as well as applications that copy video content from DVD media to a hard disk. Furthermore, we also examine software which helps users to catalogue their DVD collection, as well as software which offer a textual version of dialog.

To provide an insight into the quality of software that is available, we have compiled a list of 21 high quality Linux DVD tools. Hopefully, there will be something of interest for anyone who wants to revel in multimedia.

Now, let's explore the 21 DVD tools at hand. For each title we have compiled its own portal page, providing a screenshot of the software in action, a full description with an in-depth analysis of its features, together with links to relevant resources and reviews.

Players
xine Powerful multimedia program, specializing in video playback
MPlayer Very popular movie player
Totem Official movie player of the GNOME desktop environment
VLC Multimedia player and streamer


Burning
Brasero Easy to use CD/DVD burning application
GnomeBaker CD/DVD creation in the GNOME desktop
K3b Sophisticated KDE CD/DVD burning application


Ripping / Transcoding
AcidRip Ripping and encoding DVD tool using MPlayer and Mencoder
dvd::rip Perl front end for transcode
HandBrake Multithreaded DVD to MPEG-4 converter
Thoggen DVD backup utility ('DVD ripper'), based on GStreamer and Gtk+


Authoring
DeVeDe Create video DVDs
K9Copy DVD backup and authoring
KMediaFactory Template based DVD authoring tool for KDE
Q DVD-Author GUI frontend for dvdauthor and related tools


Cataloging
GCstar Manages a movie collection
Griffith Film collection manager
Tellico
Collection manager for KDE


Subtitles
Gaupol Subtitle editor for text-based subtitle files
GNOME Subtitles Subtitles editor for the GNOME Desktop environment
Subtitle Editor
Graphical subtitle editor with sound waves representation

Sunday, March 29, 2009

Configuring and running Mono ASP.NET 3.5 (AJAX.NET) on Linux computers


Before we will start, we should install Linux. To do this, you can download any of LiveCDs with live installation. Officially, Mono supported only on one free Linux - openSuse. However, you can make it work on any RedHat (and its alternatives), OpenSolaris. It works, but unsupported on Debians, Ubuntu and Maemo. We’ll stick to openSuse by now.

image

Let’s install it - OS

I’m assuming, that you’re installing Linux as the only OS on the machine, so insert liveCD (it can be either Gnome or KDE) and wait for Linux to run. It will run live image directly on your machine without installation.

When it’ll up, you’ll see liveInstall icon in your desktop. Click it and skip first two screens (it is only language and local settings). Next screen (disk partitions) is necessary for us.

On this screen, first delete all automatic partitions. The only one main partition will remain (/DEV/SDA or /DEV/HDA). Next you should choose non-LVM option and then start creating partitions.

Create first partition with mount point /boot and size of 100Mb. File system for this partition should be ext3.

Create second partition with file system SWAP (you will not have mount point) and set the size twice bigger, then RAM amount.

Create last partition with mount point / and all remaining size on disk.

All other steps are optional, you can just click Next button.

After about 10 minutes you’ll have up and running openSuse system. (If you forgot to remove CD, choose HardDisk as boot option)

Web Server installation

Now we have to install web server. You can choose either Apache, FastCGI or use build-in server within Mono – XSP. We’ll choose Apache

Goto “Computer” it’s in the same place as Start button :) and choose YaST. You’ll be asked for admin password, you entered while installing the system.

Now in the Filter field, type “Install”. Choose “Software Management” from the available programs at right. Now, when Package Selection dialog opens, type “apache”, you’ll find apache2. Select it and click Install. Apache will move to the right column. Optionally, you can install also prefork and utils packages.

Now hit “Apply” to install it. Within two minutes, you’ll be asked to log off and log on. Do it.

By now apache is not running, you should run it and set it starts automatically. To do this, enter terminal window (you can either do it from “Computer” menu or right clicking desktop).

You need elevation to administrate startup programs. So type: “su –“ and enter your password. Terminal color turns red. Type “chkconfig apache2 on”. Now you should check whether it done, so type: “chkconfig apache2 –list”. You should see “On” near number 3 and 5.

To run apache manually, just type “/etc/init.d/apache2 start” to stop “/etc/init.d/apache2 stop”, to restart “/etc/init.d/apache2 restart” and to check the status “/etc/init.d/apache2 status

We done, apache is up and running. Now we should install mono

Mono installation

Start with the same YaST but this time, type “mono” – you’'ll get a lot of programs. To simplified installation, choose (or type) mono-complete. This will all available Mono modules.

After Mono will be installed, you should install also apache2-mod_mono to make possible running ASP.NET mono pages in Apache. do this.

Log off – log on and move to configuration

Mono configuration

Now it’s time to configure what ASP.NET pages you want to run. We want ASP.NET 2.0, so we should run mono apache mode for this version. To do this, go to the terminal, elevate yourself (su –) and type following: “vi /etc/apache2/httpd.conf” This will open VI editor with apache configuration file in it.

Now it’s time to learn VI a little. To start editing, you should type “A” – it will write “INSERT” in the lower left corner. To return to the command mode, hit escape key. To save (from command mode) “:w” to exit and save:wq” to exit without save:q!”. To find/” and string the pattern you are looking for.

Now go the the very end of the file and write under Include “/etc/apache2/vhosts.d/*.conf” following:
(to short string “[D]” is your virtual directory (slash blank is root), “[P]” is physical path to your site without trailing slash)

MonoServerPath default /usr/bin/mod-mono_server2
Alias [D] “[P]”
AddMonoApplications default “[D]:[P]”

SetHandler mono

So, if your site is MySite and it is in /srv/www/htdocs/MySite, this section will looks as following:

MonoServerPath default /usr/bin/mod-mono_server2
Alias /MySite “/srv/www/htdocs/MySite”
AddMonoApplications default “/MySite:/srv/www/htdocs/MySite”

SetHandler mono

If you want to turn it to the root site, this will looks following:

MonoServerPath default /usr/bin/mod-mono_server2
AddMonoApplications default “/:/srv/www/htdocs/MySite”

SetHandler mono

Now, we’ll add mono administrative site to be able to restart mono only without touching apache itself. To do this, after last you should add following:


SetHandler mono-ctrl
Order deny,allow
Deny from all
Allow from 127.0.0.1

I think it’s very clear what it did :)

If you have more, then one site and want to configure mono differently for each one of those, you should add VirtualHost section. To do this, include your configuration in to


ServerName [Name you want]

We done. Restart apache and enter the url you set (for example http://localhost/MySite/)

Working? Good. You finished.

Not working (familiar yellow error 500 screen)? Keep reading…

Debugging Mono website

Do you remember, that you have no development environment in this machine? You can install it, or download Mono liveCD with openSuse. But before doing it, please note, that GTK# (it’s devenv) is not very user friendly. It even worse, then Eclipse. So let’s try to understand first whether we can fix small compatibility problems without entering code.

The most convenient method to debug web site on Mono is by using XSP and XSP2 mini web servers. Just enter the directory of the site and run it. By default you’ll be able to access the site by using “http://localhost:8080” (it also be written for you). Enter and notice whether you have any errors in console. No? Keep doing

The most common problem is “error 500” with nonsense stack. If it contains ScriptManager error Type not found, the problem is in Web.config file. Try to regenerate it to be compatible to Mono (for example, Mono has different version of System.Web.Extensions assembly. In ASP.NET 3.5 it has version 3.5, Mono has only 1.0.61025.0 (the old AJAX.NET). To recreate your web.config all you have to do is to execute “mconfig af AJAX Web.config” It will create default web.config file, supports System.Web.Extensions (AJAX features).

Not helped? Keep doing. Let’s look another time into the stack – if it contains errors in “EnablePageMethods” or “ShouldGenerateScript” or “EncryptString” – the problem is serialization. Mono has very limited support for JSON, XML and SOAP serialization. Try to look into your code and notice if you have classes, marked with [Serializable] or you are transferring your own classes by using PageMethods. If so, replace it with regular strings (my grandma serialization).

Person p = new Person();
string sstr = string.Format(“{0}|{1}|{2}|{3}”, p.FirstName, p.LastName, p.Age, p.Wage);
return sstr;

var sstr = persons[i].split("|");
var p.FirstName = sstr[0];
var p.LastName = sstr[1];
var p.Age = sstr[2];
var p.Wage = sstr[3];

Not helped? Try to rename “Bin” directory into “bin” “mv Bin bin –r”. Actually this was fixed in latest versions of Mono, but who knows?…

No? Check whether you have partial classes, which is not supported by Mono. If so, recompile it like this

mcs /t:library /out:bin/test.dll –r:System.Web –r:System.Data –r:System.Web.Services –r:System.Web.UI.Controls test.aspx.cs

If you have Generics in your code, you should use gmcs, rather then mcs.

Not helped? It looks, that you have to either install Mono on your Windows machine and debug your code with it. Or, alternatively install GTK# and do in on Linux.

But wait, before doing such big step, install and check the binary compatibility of your code. To do this, you need “Moma” – a simple tool, that tell you if everything is ok for Mono in your assemblies.

Good luck and see you in my forthcoming TechEd session, where I’m presenting openSuse, running UDP multicast server with ASP.NET 3.5 extended methods (It uses recompiled ISAPI filters for apache, rather then regular limited AJAX support in Mono)

Have a nice day and be good people.

Filed under: , , , , , , , ,

Comments

Saturday, March 28, 2009

How to find files easy way in linux

Searching on steroids: find

At the other end of the scale is the top-of-the-range search tool, find. In addition to filename-based searching, find is able to locate files based on ownership, access permissions, time of last access, size, and much else besides. Of course, the price you pay for all this flexibility is a rather perplexing command syntax. We'll dive into the details later, but here's an example to give you the idea:

$ find /etc -name '*.conf' -user cupsys -print
find: /etc/ssl/private: Permission denied
find: /etc/cups/ssl: Permission denied
/etc/cups/cupsd.conf
/etc/cups/printers.conf

In this example, find is searching in (and below) the directory /etc for files whose name ends in .conf and that are owned by the cupsys account.

Generally, the syntax of the find command is of the form:

$ find   

The "where to look" part is simply a space-separated list of the directories we want find to search. For each one, find will recursively descend into every directory beneath those specified. Our table below, titled Search Criteria For Find, lists the most useful search criteria (the "what to look for" part of the command).

Search criteria for find

-name string File name matches string (wildcards are allowed) -name '*.jpg'
-iname string Same as -name but not case sensitive -iname '*tax*'
-user username File is owned by username -user chris
-group groupname File has group groupname -group admin
-type x File is of type 'x', one of:

f - regular file

d - directory

l - symbolic link

c - character device

b - block device

p - named pipe (FIFO)

-type d
-size +N File is bigger than N 512-byte blocks (use suffix c for bytes, k for kilobytes, M for megabytes) -size +100M
-size -N File is smaller than N blocks (use suffix c for bytes, k for kilobytes, M for megabytes) -size -50c
-mtime -N File was last modified less than N days ago -mtime -1
-mtime +N File was last modified more than N days ago -mtime +14
-mmin -N File was last modified less than N minutes ago -mmin -10
-perm mode The files permissions exactly match mode. The mode can be specified in octal, or using the same symbolic notation that chmod supports -perm 644
-perm -mode All of the permission bits specified by mode are set. -perm -ugo=x
-perm /mode Any of the permission bits specified by mode is set -perm /011

And the smaller table Actions For Find, below, lists the most useful actions (the "what to do with it" part of the command). Neither of these is a complete list, so check the manual page for the full story.

Actions for find

-print Print the full pathname of the file to standard output
-ls Give a full listing of the file, equivalent to running ls -dils
-delete Delete the file
-exec command Execute the specified command. All following arguments to find are taken to be arguments to the command until a ';' is encountered. The string {} is replaced by the current file name.

If no other action is specified, the -print action is assumed, with the result that the pathname of the selected file is printed (or to be more exact, written to standard output). This is a very common use of find. I should perhaps point out that many of the search criteria supported by find are really intended to help in rounding up files to perform some administrative operation on them (make a backup of them, perhaps) rather than helping you find odd files you happen to have mislaid.

Why is this not a command?

The which command can - occasionally - give a misleading answer, if the command in question also happens to be a built-in command of the bash shell. For example:

$ which kill
/bin/kill

tells us that the kill command lives in /bin. However, kill is also a built-in bash command, so if I enter a command like

$ kill -HUP 1246

it will actually run the shell's built-in kill and not the external command.

To find out whether a command is recognised as a shell built-in, an alias, or an external command, you can use the type command, like this:

$ type kill
kill is a shell builtin

Learning by Example

It takes a while to get your head around all this syntax, so maybe a few examples would help ...

Example 1 This is a simple name-based search, starting in my home directory and looking for all PowerPoint (.ppt) files. Notice we've put the filename wildcard expression in quotes to stop the shell trying to expand it. We want to pass the argument '*.ppt' directly and let find worry about the wildcard matching.

$ find ~ -name '*.ppt'

Example 2 You can supply multiple "what to look for" tests to find and by default they will be logically AND-ed, that is, they must all be true in order for the file to match. Here, we look for directories under /var that are owned by daemon:

$ find /var -type d -user daemon

Example 3 This shows how you can OR tests together rather than AND-ing them. Here, we're looking in /etc for files that are either owned by the account cupsys or are completely empty:

$ find /etc -user cupsys -or -size 0

Example 4 This uses the '!' operator to reverse the sense of a test. Here, we're searching /bin for files that aren't owned by root:

$ find /usr/bin ! -user root

Example 5 The tests that make numeric comparisons are especially confusing. Just remember that '+' in front of a number means 'more than', '-' means 'less than', and if there is no '+' or '-', find looks for an exact match. These three example search for files that have been modified less than 10 minutes ago, more than 1 year ago, and exactly 4 days ago. (This third example is probably not very useful.)

$ find ~ -mmin -10
$ find ~ -mtime +365
$ find ~ -mtime 4

Example 6 Perhaps the most confusing tests of all are those made on a file's access permissions. This example isn't too bad, it looks for an exact match on the permissions 644 (which would be represented symbolically by ls -l as rw-r--r--:

$ find ~ -perm 644

Example 7 Here we look for files that are writeable by anybody (that is, either the owner, the group, or rest-of-world). The two examples are equivalent; the first uses the traditional octal notation, the second uses the same symbolic notation for representing permissions that chmod uses:

$ find ~ -perm -222
$ find ~ -perm -ugo=w

Example 8 Here we look for files that are writeable by everybody (that is, by the owner and the group and the rest-of-world):

$ find ~ -perm /222
$ find ~ -perm /ugo=w

Example 9 So far we've just used the default -print action of find to display the names of the matching files. Here's an example that uses the -exec option to move all matching files into a backup directory. There are a couple of points to note here. First, the notation {} gets replaced by the full pathname of the matching file, and the ';' is used to mark the end of the command that follows -exec. Remember: ';' is also a shell metacharacter, so we need to put the backslash in front to prevent the shell interpreting it.

$ find ~ -mtime +365 -exec mv {} /tmp/mybackup \;

Never mind the file name, what's in the file?

As we've seen, tools such as find can track down files based on file name, size, ownership, timestamps, and much else, but find cannot select files based on their content. It turns out that we can do some quite nifty content-based searching using grep in conjunction with the shell's wildcards. This example is taken from my personal file system:

$ grep -l Hudson */*
Desktop/suse_book_press_release.txt
google-earth/README.linux
Mail/inbox.ev-summary
Mail/sent-mail.ev-summary
snmp_training/enterprise_mib_list

Here, we're asking grep to report the names of the files containing a match for the string Hudson. The wildcard notation */* is expanded by the shell to a list of all files that are one level below the current directory. If we wanted to be a bit more selective on the file name, we could do something like:

$ grep -l Hudson */*.txt
Desktop/search_tools.txt
Desktop/suse_book_press_release.txt

which would only search in files with names ending in .txt. In principal you could extend the search to more directory levels, but in practice you may find that the number of file names matched by the shell exceeds the number of arguments that can appear in the argument list, as happened when I tried it on my system:

$ grep -l Hudson  */*  */*/*
bash: /bin/grep: Argument list too long

A more powerful approach to content-based searching is to use grep in conjunction with find. This example shows a search for files under my home directory ('~') whose names end in .txt, that contain the string Hudson.

$ find ~ -name '*.txt' -exec grep -q Hudson {} \; -print
/home/chris/Desktop/search_tools.txt
/home/chris/Desktop/suse_book_press_release.txt

This approach does not suffer from the argument list overflow problem that our previous example suffered from. Remember, too, that find is capable of searching on many more criteria that just file name, and grep is capable of searching for regular expressions not just fixed text, so there is a lot more power here than this simple example suggests.

If you're unclear about the syntax of this example, read The truth about find, below left. In this example, the predicate -exec grep -q Hudson {} \; returns true if grep finds a match for the string Hudson in the specified file, and false if not. If the predicate is false, find does not continue to evaluate any following expressions, that is, it does not execute the -print action.

The truth about find

The individual components of a find command are known as expressions, (or more technically, as predicates). For example, -uname cupsys is a predicate. The find command operates by examining each and every file under the directory you ask it to search and evaluating each of the predicates in turn against that file.

Each predicate returns either true or false, and the results of the predicates are logically AND-ed together. If one of the predicates returns a false result, find does not evaluate the remaining predicates. So for example in a command such as:

$ find . -user chris -name '*.txt' -print
if the predicate -user chris is false (that is, if the file is not owned by chris) find will not evaluate the remaining predicates. Only if -user chris and -name '*.txt' both return true will find evaluate the -print predicate (which writes the file name to standard output and also returns the result 'true')