Introduction
The command find
finds files depending on search criteria. In this paper, some useful examples, especially
with regular expressions, underused but so powerful.
The grep
command, on the other hand, searches in one or a series of files in a same directory
for text that matches a given regular expression. grep
is the acronym for "Globally search for a Regular Expression and Print matching lines".
The find command and its options
Command find syntax
find directory criteria
Some common search criteria :
-name |
search by file name |
-perm |
search by file permissions |
-user |
search by file owner |
-group |
search by group |
-type |
search by type (d=directory, l=link, f=regular file) |
-size |
search by file size |
-mtime |
search by last modified time |
-ctime |
search by creation time |
Search criteria can be combined using logical operators :
\( criteria1 criteria2 \)
or\( criteria1 -a criteria2 \)
: logicalAND
\( ! critere \)
: logicalNOT
\( critere1 -o critere2 \)
: logicalOR
The find
command is used at least with the option -print
. Without at least this option,
even if the search is successful, find
displays nothing to the standard output.
The find
command is recursive, directories and subdirectories are scanned by this command.
Searching by file name
Option -name
To find all files with the .c
extension in the directory /usr
:
find /usr -name *.c -print
/usr/share/bison/yacc.c /usr/share/bison/glr.c …
To find in the current directory all files with the .jpg
extension or the .gif
extension
but not containing the keyword gimp
in the name :
find . \( ! -name "*gimp*" -a \( -name "*.jpg" -o -name "*.gif" \) \) -print | sort
… ./images/conception-html-dynamique-suppression-document.write-01.jpg ./images/conception-html-equations-math-mathjax-asciimath-01.jpg ./images/google-analytics-optimisation-mesure-audience-01.jpg ./images/google-analytics-optimisation-mesure-audience-02.jpg …
By default, results are not sorted, that’s why in the above example sort
is applied on the output of the find
command.
To specify the current directory : find . < critères >
Searching by dates
Option -mtime +/-
To find *.js
or *.css
files modified less than 2 days ago :
find . -mtime -2 -a \( -name "*.js" -o -name "*.css" \) -print
./css/style-df.css ./css/style.css ./js/resources/nohttp.js
Option -ctime +/-
To find *.json
files created more than 30 days ago in the directory $LOG
:
find $LOG -ctime +30 -name "*.json" | sort
./postgresql-9.6-setup-installation-rapide_20200929030800.json ./postgresql-9.6-setup-installation-rapide_20200929023720.json …
The option -mtime -2
means -48h
, the option -ctime +30
means +30 × 24h
: by default find
uses the current date and the current time.
Specify daystart
to use the real number of days without considering the current time.
find . -daystart -mtime -2 -name "*.html"
Searching by size
Option -size +/-
To identify *.html
files whose size is greater than 50K (i.e. 100 blocks of 512o) :
find . -size +100 -name "*.html" -print
./influxdb-v1.7-architecture-installation-configuration-utilisation.html ./sybase-ase-iq-comparaison.html …
In practice, the unit (k | M | G
) is specified to avoid computing the multiples of 512 bytes.
find . -size +50k -name "*.html" -print
find . -size +100M -print
find . -size +2G -print
Redirecting error messages
Due to access rights in some directories, the find
command may produce a large number of error
messages (permission denied, etc.). To avoid this, redirect error messages to /dev/null
. However,
the errors can be saved in a regular file if needed.
Exemple :
find . \( -name a.out -o -name "*.c" \) -print > /dev/null
find and the option exec
The option -print
is used to display the results to the standard output.
The option -exec
is available in the find
command and this option is exclusive of the option -print
.
When the option -exec
is used, it is then possible to execute a command on the files found by the
find
command.
find directory criteria -exec command {} \;
The output of the find
command with the -print
option is very basic :
find . -type f -size +100k -print
./sybase-replication-server-guide-pratique.pdf ./images/gimp-supprimer-couleur-arriere-plan-fond-09.jpg …
With the -exec
option, the command find
is usually combined with the command ls
to display more details :
find . -type f -size +100k -exec ls -lh {} \; 2> /dev/null
-rw-r--r-- 1 sqlpac wapp 118K Jun 15 11:34 ./sybase-replication-server-guide-pratique.pdf -rw-r--r-- 1 sqlpac wapp 104K Jun 15 11:33 ./images/gimp-supprimer-couleur-arriere-plan-fond-09.jpg …
Other common examples :
To remove all files named core
with the command rm
:
find . -name core -exec rm {} \;
To remove all *.json
files created more than 10 days ago in the directory $LOG
with the command rm
:
find $LOG -name "*.json" -ctime +10 -exec rm {} \;
More concrete examples :
The encoding of a file is displayed by the command file
and the option -i
,
so to list all *.htm, *.html, *.inc, *.php, *.css, *.json, *.xml
files for which the encoding is iso-8859-1 :
find . -type f \( -name "*.html" -o -name "*.htm" -o -name "*.json" -o -name "*.php" -o -name "*.inc" -o -name "*.x ml" -o -name "*.css" -o -name "*.xml" \) -exec file -i {} \; | grep -i 'iso-8859-1'
./admpmgportal/config.inc: text/x-php; charset=iso-8859-1 ./admpmgportal/include/rules.inc: text/x-php; charset=iso-8859-1 ./admpmgportal/include/treeview.inc: text/html; charset=iso-8859-1...
To find the string '79.13'
in non binary files :
find . -type f -exec grep -Il '79\.13' {} \;
./redis/dba/srvrdisqlpac/cfg/srvrdisqlpac.conf ...
Useful to find in a tree structure values (address IP, functions…), whatever the file is but not a binary file.
The option -I
in the grep
command discards binary files.
The find command and regular expressions (-regex et -regextype)
Option -regex
The previous example is not very elegant ( -o -name "*.css" -o -name "*.php
… ).
Regular expressions are implemented in the find
command using the option -regex
.
The code becomes much more readable with this feature.
find . -regex '.*\.\(css\|htm\|html\|inc\|js\|json\|php\|xml\)' -exec file -i {} \;
Several libraries exist for regular expressions (posix, GNU awk…), libraries for which regular expressions syntaxes may differ.
The option -regextype
specifies the library to use for regular expressions : example, finding *.txt
and *.inc
files using the library posix-basic
.
find -regextype posix-basic -regex ".*\(txt\|inc\)" -print
Libraries and regular expressions syntaxes are numerous and are not detailed here, it is not the subject of this paper.
A trick to find the libraries available on the platform used : call the find
command with an invalid -regextype
option.
find .-regextype dummy
find: Unknown regular expression type `dummy'; valid types are `findutils-default', `awk', `egrep', `ed', `emacs', `gnu-awk', `grep', `posix-awk', `posix-basic', `posix-egrep', `posix-extended', `posix-minimal-basic', `sed'.
To handle case insensitivity in regular expressions, use the -iregex
option.
find -iregex ".*\.\(txt\)' -print
./README.TXT
Combining find and xargs commands
To find the PHP function ereg_replace
in the source code of *.php
and *.inc
files :
find . -type f \( -name "*.php" -o -name "*.inc" \) -print | xargs grep -ni "ereg_replace"
... ./sqlpacv2/prp_article.php5:216: $caption = ereg_replace("\.","",_USRDIR_DOC)."/".$datadoc[1]["fichier"]; ./sqlpacv2/prp_glossaire.php5:95: $caption = ereg_replace("\.","",__USRDIR_DOC)."/".$article["fichier"];
The xargs
command runs an echo
/cat
command against the file returned by the find
command.
So the command following xargs
(grep
in the example above) is executed on the file.
Find and links
Option -xtype
The find
command is very useful to find broken links :
find . -xtype l -exec ls -ll {} \;
lrwxrwxrwx 1 sqlpac wapp 26 Sep 3 14:33 ./postmenu.php -> ../../engines/postmenu.php
If the option -xtype
is not available on the platform, use the command test
:
find . -type l ! -exec test -e {} \; -exec ls -ll {} \;
lrwxrwxrwx 1 sqlpac wapp 26 Sep 3 14:33 ./postmenu.php -> ../../engines/postmenu.php
A useful example : computing the size of images in a directory
In a single command line combining find
and awk
, to compute the size of images in a directory :
find . -regex '.*\.\(png\|gif\|jpg\|jpeg\)' -exec ls -l {} \; | \ awk 'BEGIN {sum=0} {sum+=$5} END { printf("%.2f %s\n",sum/1024000,"Mb") }'
18.16 Mb
The grep command and its options
grep command syntax
grep -option(s) expression file(s)
Some common options :
-v |
display lines that do not match the expression |
-c |
counts the number of lines matching the expression without displaying the rows |
-n |
display the row matching the expression and its row number |
-i |
ignore case sensitivity |
Some examples :
To search for tables (HTML tags <table>
) in *.html
files :
grep "<table" *.html
… sybase-iq-12.7-migration-ase-vers-iq.html: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html: <table> sybase-iq-12.7-migration-ase-vers-iq.html: <table class="alt- r-brdr"> …
With the row numbers and ignoring case sensitivity :
grep -ni "<table" *.html
… sybase-iq-12.7-migration-ase-vers-iq.html:232: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:512: <table> sybase-iq-12.7-migration-ase-vers-iq.html:661: <table class="alt- r-brdr"> …
Just the number of occurences and ignoring case sensitivity :
grep -ci "<table" *.html
… sybase-iq-12.7-migration-ase-vers-iq.html:6 … sybase-iq-index-cardinalite-sp_dba_helpcolumn.html:0 …
The output results of the grep
command are separated by :
,
very useful to quickly process the results with the awk
utility.
grep -ci "<table" *.html | \ awk -F":" 'BEGIN { hf=0; tb=0; } { if ($2 != 0) { hf++; tb +=$2 } } END { print tb" tables in "hf" files"}'
549 tables in 187 files
Regular expressions with grep
The option -E
gives a regular expression to the grep
command.
To find the strings mysql_connect
, mysql_query
and mysql_close
in *.php
files with row numbers :
grep -ni -E "mysql_connect|mysql_query|mysql_close" *.php
cls_database_myisam.php5:59: $objRessource = mysql_connect(_SGBD_SERVER,_SGBD_USER,$pwdUnCrypted); cls_database_myisam.php:81: $ret=mysql_close($objRessource); cls_database_myisam.php5:108: $return = mysql_query($queryString,$objRessource); cls_database_myisam.php5:114: $reserrors = mysql_query('select @errno as errno , @errmsg as errmsg' cls_database_myisam.php5:341: $resultPlan=mysql_query("EXPLAIN ".$query,$objRessource);
The egrep
command is none other than the grep
command with the option -E
:
egrep -ni "mysql_connect|mysql_query|mysql_close" *.php
Another possible syntax, more elegant :
egrep -ni "mysql_(connect|query|close)" *.php
The list of regular expressions terms are sometimes long in one single command line.
So the terms can be defined in a text file, file submitted to the grep
command using the option -f
.
grep -ni -f regex.txt *.php
regex.txt
mysql_connect
mysql_query
As with the find
command, the grep
command allows the use of different regular expressions syntaxes.
- Option
-E
: ERE, Extended Regular Expressions - Option
-G
: BRE, Basic Regular Expressions - Option
-P
: PRE, Perl Regular Expressions
Some other examples :
The character ^
in a regular expression matches the starting position of any line, to find lines starting with a star :
grep -ni -E "^\*" *.php
The character $
in a regular expression matches an end of line, to find lines ending by a semicolon :
grep -ni -E "$;" *.php
Using the first examples, to find the tables in *.html
files having CSS class rco-
and/or r-brdr
:
egrep -ni "<table.*class.*(rco-|r-brdr).*>" *.html
… sybase-iq-12.7-migration-ase-vers-iq.html:53: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:121: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:152: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:232: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:661: <table class="alt- r-brdr"> …
To list *.html
files not containing tables (option -L
) :
grep -L "<table" *.html