Data input types

From GWAVA4

Jump to: navigation, search

Information entered into the GWAVA configuration pages are generally simple, straight forward and will typically accept what you expect to enter.

However, when entering data into the GWAVA pattern matching screens, the format of the information entered can take several formats which will be automatically converted within the GWAVA scanner engine.

This search style input applies to areas of GWAVA where information is being compared for acting upon, such as text filtering, IP address filtering and address matching. Any configuration screen that takes a value that will be compared with some input can be entered in this fashion.

For example:

There are three types of valid data for these input boxes:

  1. Basic text
  2. Wildcard text
  3. Regular expressions


Contents

Basic Text

The simplest form of input is straight forward text. Information entered as basic text is handled as complete words without case sensitivity.

'Complete word' means that the word must start and end without being part of another word. Case insensitivity means the text will match something if it is capital letters, small letters, or any combination in between.

Examples of basic text searches:

cialis will find cialis and CIALIS, but not specialist

sex will find sex, Sex and SEX but not Essex, sexual or desexed

Note: Basic text patterns cannot take the * and ? characters literally, as these are reserved for wildcard text searches. Read the footnote below for a brief guide on working around this.

Wildcard Text

Wildcards are easy to use additions to basic text searches that provide flexibility in the search criteria.

There are two Wildcard characters that change the search criteria, these are ? and * and will be mostly familiar to anyone with experience using the DOS operating system.

When used within basic text, these characters take on special meanings:

? = any single character can go here

  • = any number of characters can go here

The * wildcard is particularly useful at either end of a search word as it allows the preceding or following text to be taken as part of the search pattern.

Examples of wildcard searches:

sex* will find sex, sexual and sexy, but not Essex c?al?s will find cialis, c1al1s and chalks house*loans will find the word house followed by loans with any number of letters or characters in between *.pif will catch all pif files that might be viruses 192.168.* could be used to trap an IP network segment

Be careful when using wildcards, as they can cause you to catch words that you don't want to:

*cialis* will find cialis and CIALIS and specialist *semen* will find semen and advertisement


Regular Expressions

Regular expressions are advanced pattern matching descriptions that are used to capture complex patterns of characters in the search data. Regular Expressions are very powerful, but can be confusing to a novice.

Does an Administrator need to know Regular Expressions? The answer is usually not. If an entry is not typed in Regular Expression format (e.g. doesn't begin and end with a '/' character), GWAVA automatically internally converts it to a Regular Expression following these rules:

  • all matches are case insensitive - 'dog' and 'DOG' match.
  • all matches apply to a whole word only - 'dog' doesn't match 'doggy'.
  • You can use the '*' character as a wildcard - dog* matches 'doggy', 'dog and cow', etc.

For most administrators, this provides sufficient power and flexibility without needing to know more about Regular Expressions.

However, there are cases where case sensitivity is desired, or efficient ways to specify ranges of characters are desired. For these purposes, among many others, a regular expression is useful. For example, if the filter is /dog/, it will match only 'dog', not 'DOG'.

To enter into regular expression mode, the text entered must be encapsulated in / characters. When any search pattern is encapsulated this way, it is treated as a regular expression and parsed accordingly. Modifiers are added outside the closing slash:

/\bc[i1][a@]l[i1][s$]\b/i is used to catch many variations of the word cialis

/(.*\.pif|.*\.com|.*\.scr|.*\.vbs)/i could be used to trap many common virus types in a single rule

Footnote

If you do not want to deal with regular expressions (and we don't blame you), here's a couple of tips to provide a little extra flexibility to your searches.

Any basic text pattern can be represented as a regular expression as follows:

Basic text cialis is equivalent to /\bcialis\b/i

Wildcard text *mortgage* is equivalent to /mortgage/i

The \b part of the expression means 'word boundary' which ensures that the word is not part of another word, and the i at the end tells the system to ignore upper/lower casing.

If you want to search for a pattern and match the case (be case sensitive), it is necessary to create a simple regular expression as follows.

To find the word 'MORTGAGE' in capitals only, the following search pattern needs to be applied: /\bMORTGAGE\b/ To extend this to catch the same word with anything following, such as 'MORTGAGES', take off the trailing \b: /\bMORTGAGE/

If you want to use the characters * or ? you must also use a regular expression. To add special characters, they must be preceded by a backslash:

To find 'AllSt*r', the expression would be: /\bAllSt\*r\b/

Be forewarned however that many symbols are reserved in regular expressions, so if you want to find anything more than just basic text with a regular expression, you should read up the regular expression guide.

Personal tools