Regular Expressions - Tutorial Part 7: Modifiers

 

0 Comments | Write a Comment | Rate this Article | Report Article

 

Type

Tutorial for Beginners

Category

Tools & Utilities

Language

English

Author

Stefan Trost Media

Date

13.07.2011

Ratings

31

Views

2085
 
 


About the author

Stefan Trost is a developer of software and web solutions and gladly also cares about your needs and desires. Contact

Profile of Stefan Trost Media
Articles by Stefan Trost Media

In some, but not all porgrams, you can use so called modifiers which are changing the behaviour of regular expressions in the way you want. If your porgram supports modifiers, you have to try. The used program in this tutorial, the Text Converter, supports them. Up to now, the following parts are published:

Part 1: Basics | Part 2: Normal strings, grouping and repetitions | Part 3: Meta characters and combinations | Part 4: Selections of characters and alternatives | Part 5: Character groups and classes | Part 6: Reusing and backward references | Part 7: Modifiers | Part 8: Usage and Examples

Important Note

To try out the example and to test your own regular expression, you can use the software Text Converter in its free Basic version. In the first part of this tutorial, there is an explanation how to use the application.

Usage of modifiers

Before we come to the different modifiers, we will have a look at their usage and application. Unfortunately, the modfiers are not used uniformly. Some programs support them, others not. Because we have used the Text Converter up to now, we will also explain how to use the modifiers in this software:

Globally, you can set modifiers with this: Click on the button "RegEx.." under the search and replace box. A new window opens, in which you can activate or deactivate the global modifiers like you want. In the same window, you can also define custom character classes.

Another possibility is the usage of modifiers directly in the regular expression. This variant, we want to use later in this tutorial. Let us take the regular expession [a-z] and the modifier "i", for example. To apply it, you can write "(?i)[a-z]. If you want to use more than one modifier, you can write (?ims)[a-z]. If you want to activate the modifiers i and m and you want to deactivated s, you can use (?im-s)[a-z]. If you want to apply modifiers only to parts of regular expressions, you can write round brackets like this: ((?i)[a-z])[a-z].

In the text, you can see other examples for activating or deactivating modifiers.

Modifier i: Case Insensitive

Example 1

Search for:         Replacement:   Original:        Replaced:

[A-Z]               X              abcd ABCD        abcd XXXX

(?i)[A-Z]           X              abcd ABCD        XXXX XXXX

(?-i)[A-Z]          X              abcd ABCD        abcd XXXX

(?i)[A-Z](?-i)[A-Z] X              aB AB Ab         X X Ab

The modifier i determines, whether there should be a distinction between lowercase and uppercase writings. By default, this modifier is deactivated, so that, there is a difference beteween "A" and "a". In the first line, we are searching for [A-Z] with the default setting without a modifier. As expected, only the uppercase letters will be replaced. In the secound line, we activate the modifier i by writing (?i) in front of the regular expression (you can also select the option in the settings). With that, there is no differenz between lowercase and uppercase letters and both, lowercase and uppercase letters will be replaced. If you write (?-i) in front of the expression, the modifier i will be deactivated. The result is the same as in the first line, because the modifier i is deactivated by default.

In the last line, we activate the modifier in the first part and we search for [A-Z] and after that we deactivate the modifier to search for [A-Z] again. The result is clear. We are searching for a combination with two letters. The first letter of this combination can be lowercase or uppercas, the second letter must be uppercase. This applies to "aB" and "AB" but not to "Ab".

Modifiers m and s: Multi Line and Single Line

Example 2

Search for:    Replacement:  Original:           Replaced:

(?s-m)^.*$     X             This is a           X

                             multiline text.

(?m-s)^.*$     X             This is a           X

                             multiline text.     X

The modifiers m and s define how we are operating multiple lines in a text document. The regular expression  ^.*$ matches all characters in a string from the beginning to the end. ^ stands for the begin of a string, $ stands for the end of a string. .* means that an arbitrary character can be repeated any number of times.

Now, we are using this regular expression in the first example with the modifier s (and m is switched off) and in the secound example with m (and s is switched off). In the mode s, the whole text will be seen as a single line. The whole multiline text will be found completely and replaced with a single "X". In the mode m, that is different. Here, we are operating line after line. The meta characters ^ and $ stand for the begin and the end of a line. Accordingly, in the first line "This is a" will be found and replaced with an "X" and in the second line "multiline text." will be found and replaced with another "X". So, in each line there is an "X" after the replacement, while in the first example the whole text "This is a multiline text." will be replaced with "X".

Modifier g: Greedy Mode

Example 3

Search for:    Replacement:    Original:            Replaced:

(?-g)a.+x      X               abcxfghixj           Xfghixj

(?g)a.+x       X               abcxfghixj           Xj

 

a.+x           X               abcxfghixj           Xfghixj

a.+?x          X               abcxfghixj           Xj

The regular expression "a.+x" is searching for a string beginning with "a" and ending with "x". Between these characters, any other character can appear. But what's about an "x"? Does an "x" belong to "any other character" or is the expression ending as soon as the first "x" appears?

In our example 3, we have a string beginning with "a" and there is an "x" in the middle and another "x" at the end. In the first line, the Greedy Mode is deactivated. With this, the regular expression finds all characters up to the first "x". In the Greedy Mode, that behaves differentely. In the Greedy Mode as much characters as possible will be found an always the last "x" will be taken. Depending on the aim you are following, the active or the deactive Greedy Mode is the right choice for you.

By default, the Greedy Mode is deactivated. So, if you use the expression "a.+x" without changing the mode, you can see the well known pattern like in the first example. But also with a deactivated Greedy Mode, you can imitate the behavior of the Greedy Mode with meta characters. Simply add to each + or * a ? and the expression works greedy with deactivated Greedy Mode.

Modifier x: Extended Syntax

The modus x stands for extended syntax and is supported only by a few programs. This modifikator only contribute to the readability and the ease of use of regular expressions, but plays no role for the interpretation of regular expressions. If x is active, you can use whitespace like spaces or carriage returns in your regular expressions or you can insert comments with # at the begin of a line. Because regular expressions (like in the next part) can become very long and complex, the modifier x is a good possibility to understand the expressions better. When using x, you have to notice, that real whitespace that you want to use in your expression have to be escaped with \ to be interpreted.

Combination of different modifiers

Example 4

Search for:        Replacement:   Original:        Replaced:

(?is-g)a[A-Z]*x    X              abcxfghixj       Xfghixj

Up to now, in the other examples, we have also activated or deactivated one modifier at the same time. But you can also activate or deactivate several modes at the same time. The box shows an example how to do that. Here, we activate the modifiers i and s but we deactivated the greedy mode. So, the expression behaves non greedy, it differs not between upper and lowercase writings and all is one line for this expression. In the replacement, you see the result. The replacement goes to the first "x" (non greedy) and although there are only lowercase letters in the original string, the expression [A-Z] works.

Summary

  • Modifiers make it possible to interpret regular expressions in different ways
  • The modifier i switches the difference between uppercase and lowercase writings on and off
  • The modifiers m and s define whether a string consisting of multiple lines is operated as a whole string or as group of several strings (with each line as a new string)
  • The modifier g defines whether the first or the last occurence of a character will be taken
  • If you activate modifier x, you can add spaces or comments to your regular expressions, that will not be interpreted. So, regular expressions become easier to read and to understand with x.
  • Modifiers can be combined and switched on and off within a regular expression

Read more

 

© Stefan Trost - The usage of this tutorial, even in parts, is prohibited without prior written consent of Stefan Trost. But of course, you are welcome to link to this tutorial.

 
  
 

Comments

Have you got the same opinion like the author or do you want to add something? Here you can leave a comment.

Write a comment

You can leave an anonymous comment. If you want to write something under your name, please log in or register.



Past Comments

Nobody has written a comment on this article. You can be the first one.