Test of agrep

Search for word:
with agrep options:
in Books: I II III IV

NAME

agrep - search a file for a string or regular expression, with approximate matching capabilities

SYNOPSIS

agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [ filename ".\|.\|. ]"

DESCRIPTION

agrep searches the input filenames (standard input is the default, but see a warning under LIMITATIONS) for records containing strings which either exactly or approximately match a pattern. A record is by default a line, but it can be defined differently using the -d option (see below). Normally, each record found is copied to the standard output. Approximate matching allows finding records that contain the pattern with several errors including substitutions, insertions, and deletions. For example, Massechusets matches Massachusetts with two errors (one substitution and one insertion). Running
agrep -2 Massechusets foo
outputs all lines in foo containing any string with at most 2 errors from Massechusets.
agrep supports many kinds of queries including arbitrary wild cards, sets of patterns, and in general, regular expressions. See PATTERNS below. It supports most of the options supported by the grep family plus several more (but it is not 100% compatible with grep).
For more information on the algorithms used by agrep see Wu and Manber, "Fast Text Searching With Errors," Technical report #91-11, Department of Computer Science, University of Arizona, June 1991 (available by anonymous
from cs.arizona.edu in agrep/agrep.ps.1), and Wu and Manber, "Agrep -- A Fast Approximate Pattern Searching Tool", To appear in USENIX Conference 1992 January (available by anonymous
from cs.arizona.edu in agrep/agrep.ps.2).

As with the rest of the grep family, the characters

can cause unexpected results when included in the pattern as these characters are also meaningful to the

ell. To avoid these problems, one should always enclose the entire pattern argument in single quotes, i.e., 'pattern'. Do not use double quotes (").
When agrep is applied to more than one input file, the name of the file is displayed preceding each line which matches the pattern. The filename is not displayed when processing a single file, so if you actually want the filename to appear, use /dev/null as a second file in the list.

OPTIONS

)

ould be preceded by `\\' if they are to be matched as regular characters. For example, \\^abc\\\\ corresponds to the string ^abc\\, whereas ^abc corresponds to the string abc at the beginning of a line.

Classes of characters

a list of characters inside [] (in order) corresponds to any character from the list. For example, [a-ho-z] is any character between a and h or between o and z. The symbol `^' inside [] complements the list. For example, [^i-n] denote any character in the character set except character 'i' to 'n'. The symbol `^' thus has two meanings, but this is consistent with egrep. The symbol `.' (don't care) stands for any symbol (except for the newline symbol).

Boolean operations

agrep supports an `and' operation `;' and an `or' operation `,', but not a combination of both. For example, 'fast;network' searches for all records containing both words.

Wild cards

The symbol '#' is used to denote a wild card. # matches zero or any number of
  • itrary characters. For example, ex#e matches example. The symbol # is equivalent to .* in egrep. In fact, .* will work too, because it is a valid regular expression (see below), but unless this is part of an actual regular expression, # will work faster.
    Combination of exact and approximate matching: any pattern inside angle brackets <> must match the text exactly even if the match is with errors. For example, ics matches mathematical with one error (replacing the last s with an a), but mathe does not match mathematical no matter how many errors we allow.

    Regular expressions

    The syntax of regular expressions in agrep is in general the same as that for egrep. The union operation `|', Kleene closure `*', and parentheses () are all supported. Currently '+' is not supported. Regular expressions are currently limited to approximately 30 characters (generally excluding meta characters). Some options ( -d, -w, -f, -t, -x, -D, -I, -S) do not currently work with regular expressions. The maximal number of errors for regular expressions that use '*' or '|' is 4.

    EXAMPLES


    agrep -2 -c ABCDEFG foo
    gives the number of lines in file foo that contain ABCDEFG within two errors.

    agrep -1 -D2 -S2 'ABCD#YZ' foo
    outputs the lines containing ABCD followed, within

  • itrary distance, by YZ, with up to one additional insertion (-D2 and -S2 make deletions and substitutions too "expensive").

    agrep -5 -p abcdefghij /usr/dict/words
    outputs the list of all words containing at least 5 of the first 10 letters of the
    habet in order\fR. (Try it: any list starting with academia and ending with sacrilegious must mean something!)

    agrep -1 'abc[0-9](de|fg)*[x-z]' foo
    outputs the lines containing, within up to one error, the string that starts with abc followed by one digit, followed by zero or more repetitions of either de or fg, followed by either x, y, or z.

    agrep -d '^From\ ' 'breakdown;internet' mbox
    outputs all mail messages (the pattern '^From\ ' separates mail messages in a mail file) that contain keywords 'breakdown' and 'internet'.

    agrep -d '$$' -1 ' ' foo
    finds all paragraphs that contain word1 followed by word2 with one error in place of the blank. In particular, if word1 is the last word in a line and word2 is the first word in the next line, then the space will be substituted by a newline symbol and it will match. Thus, this is a way to overcome separation by a newline. Note that -d '$$' (or another delim which spans more than one line) is necessary, because otherwise agrep searches only one line at a time.

    agrep '^agrep'
    outputs all the examples of the use of agrep in this man pages.

    BUGS/LIMITATIONS

    Any bug reports or comments will be appreciated! Please mail them to sw@cs.arizona.edu or udi@cs.arizona.edu
    Regular expressions do not support the '+' operator (match 1 or more instances of the preceding token). These can be searched for by using this syntax in the pattern: .sp .in 1.0i \&'pattern(pattern)*\fR' .in .sp (search for strings containing one instance of the pattern, followed by 0 or more instances of the pattern).

    The following can cause an infinite loop:
    agrep pattern * > out_file.
    If the number of matches is high, they may be deposited in
    out_file before it is completely read leading to more matches of the pattern within output_file (the matches are against the whole directory). It's not clear whether this is a "bug" (grep will do the same), but be warned.
    The maximum size of the patternfile is limited to be 250Kb, and the maximum number of patterns is limited to be 30,000.
    Standard input is the default if no input file is given. However, if standard input is keyed in directly (as opposed to through a pipe, for example) agrep may not work for some non-simple patterns.
    There is no size limit for simple patterns. More complicated patterns are currently limited to approximately 30 characters. Lines are limited to 1024 characters. Records are limited to 48K, and may be truncated if they are larger than that. The limit of record length can be changed by modifying the parameter Max_record in agrep.h.

    > DIAGNOSTICS

    Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible files.

    AUTHORS Sun Wu and Udi Manber, Department of Computer Science, University of Arizona, Tucson, AZ 85721. {sw|udi}@cs.arizona.edu.