Regular Expressions - Tutorial Part 3: Meta characters and combinations

 

0 Comments | Write a Comment | Rate this Article | Report Article

 

Type

Tutorial for Beginners

Category

Tools & Utilities

Language

English

Author

Stefan Trost Media

Date

13.07.2011

Ratings

40

Views

1496
 
 


About the author

Stefan Trost is a developer of software and web solutions and gladly also cares about your needs and desires. Contact

Profile of Stefan Trost Media
Articles by Stefan Trost Media

In the first two parts of this regular expression tutorial, we have learned about several variants of regular expressions. We want to deepen this knowledge and find a few examples. In addition, we have a look at other meta characters. Up to now, the following parts are published:

Part 1: Basics | Part 2: Normal strings, grouping and repetitions | Part 3: Meta characters and combinations | Part 4: Selections of characters and alternatives | Part 5: Character groups and classes | Part 6: Reusing and backward references | Part 7: Modifiers | Part 8: Usage and Examples

Important Note

To try out the example and to test your own regular expression, you can use the software Text Converter in its free Basic version. In the first part of this tutorial, there is an explanation how to use the application.

Metazeichen

Up to now, we have only seen the meta characters point, asterisk and the round and curly brackets. The point means, that the character before has to appear at least for one time, but it can also appear more often. The asterisk means, that the character before has not to appear, but it can also appear multiple times. With the round brackets, we can group different characters and the curly brackets tell us how often the character before should appear exactly.

But all in all, there are 11 meta characters possible in regular expressions. These are  [ ] ( ) { } | ? + - * ^ $ . and \. All other characters like numbers, letters but also / or ' can be used in regular expressions with their own meaning. For example, if you want to replace a letter, a number or /, you only have to use this character in the regular expression.

Search for meta characters with regular expressions

Example 1

Search for:    Replace with:    Original:        After replacement:

.              X                abc. ab.cd       XXXXXXXXXX

\.             X                abc. ab.cd       abcX abXcd

1\+1           X                1+1 1\+1         X 1\+1

\\             X                abc\def          abcXdef

If we want to search with regular expressions for characters, that are meta characters within regular expressions, thus we are searching for characters with a special meaning, we have to escape them. That can be carried out with the meta character \. As soon as you write a \ in front of a meta character, this character will no longer be interpreted as meta character but as character itself. The first example shows how to apply this. We will replace the same orginal text for two times. On the one hand, we are searching for "." and on the other hand, we are searching for "\.". In the first line, the point will be interpreted as meta character and stands for an arbitrary character. In the second line, the point is escaped and now only the real points will be replaced.

Because also the plus + belongs to the meta characters, also this character must be escaped whenever we want to replace a plus with something else. The third line shows an example. Only 1+1 will be replaced, but not "1\+1". If we want to replace \ with another character, we also have to escape the \ with a \. The last line shows an example.

Optional characters

Example 2

Search for:    Replace with:    Original:        After replacement:

ab?            X                abcdacd af ab    XcdXcd Xf X

The meta character question mark ? stands for an optional character. That means, if there is a question mark behind a character, this character can appear but it must not appear. Thus, the regular expression in example 2 means: First of all, we want to have an "a". After that, there can be a "b" but the "b" is not required. So, in the example all "ab" and all single "a" will be found and replaced with an "X".

Start and end of a string

Example 3

      Search for:    Replace with:    Original:        Replaced:

3.1   ^ab$           X                abc def          abc def

3.2   ^ab$           X                ab               X

3.3   ^abc def$      X                abc def          X

3.4   ^ab            X                abc def          Xcd ef

3.5   ^ab            X                ab               X

3.6   ab$            X                ab               X

3.7   ab$            X                abc def          abc def

3.8   ef$            X                abc def          abc dX

The meta characters ^ and $ stand for the start and the end of a string. The replacements in example 3 are demonstrating this and the behaviour of the meta characters. In 3.1 and 3.2, we are searching for ^ab$, hence for a string beginning with "ab" and ending with "ab". In 3.1 the begin would match, but the end only matches in 3.2. So, a replacement will only take place in 3.2 and not in 3.1.

In example 3.3, we are searching for the complete string from 3.1. So, it works and all characters will be replaced. In 3.4 and 3.5 we are searching for a string beginning with "ab". Now, the end does not matter. Thus, in the first example only the begin is replaced and in the secound example the complete string will be replaced (because there is nothing behind "ab". We care about the end of the string in the examples 3.6, 3.7 and 3.8. In 3.6, the complete string matches the search pattern, so all will be replaced. In 3.7, the original does not end with "ab" - nothing happens. In 3.8 we are searching for "ef" at the end, so a replacement will be carried out.

If you want to test these examples in the Text Converter, you have to care about the begin and the end of a string, because the begin and the end of the string can be the same as the begin and the end of a complete file. Later in this tutorial, we will learn about a possibility to interpret each line of a text file as a single string.

An example

Up to now, we have seen a lot of things regarding regular expression. So, it's time for a short break and a look at an example of what we have heard. In example 4, you can see some combinations of previous rules.

Example 4

      Search for:    Replace with:   Original:        Replaced:

3.1   .ab            X               ab caab .ab      ab cX X

3.2   \.ab           X               ab caab .ab      ab caab X

The example 3.1 searches for a combination of 3 characters. The first character is arbitrary, while the last two characters have to be "a" and "b". The first "ab" will not be replaced, because there is no arbitrary character before it. The point in "Search for" is a meta character, that stands for all characters. So, it also stands for another point in ".ab" in example 3.2. In example 3.2 the point was escpaed. With this, the point is no longer a meta character and only ".ab" will be replaced with this.

Summary

  • All in all, regular expressions know the 11 meta characters [ ] ( ) { } | ? + - * ^ $ . and \. These characters have a special meaning within regular expressions.
  • If you do not want to use a meta character as a meta character, you have to write a \ in front of it. The point alone stands for all characters, but \. only stands for a point.
  • Optional characters, you can define with a trailing question mark ?.
  • If characters should appear at the begin or the end of a string, you can use the characters  ^ and $.

Read more

 

© Stefan Trost - The usage of this tutorial, even in parts, is prohibited without prior written consent of Stefan Trost. But of course, you are welcome to link to this tutorial.

 
  
 

Comments

Have you got the same opinion like the author or do you want to add something? Here you can leave a comment.

Write a comment

You can leave an anonymous comment. If you want to write something under your name, please log in or register.



Past Comments

Nobody has written a comment on this article. You can be the first one.