Regular Expressions - Tutorial Part 6: Reusing and backward references

 

0 Comments | Write a Comment | Rate this Article | Report Article

 

Type

Tutorial for Beginners

Category

Tools & Utilities

Language

English

Author

Stefan Trost Media

Date

13.07.2011

Ratings

80

Views

1476
 
 


About the author

Stefan Trost is a developer of software and web solutions and gladly also cares about your needs and desires. Contact

Profile of Stefan Trost Media
Articles by Stefan Trost Media

The possibilities you get with regular expressions are much more complex then described in the parts before. In this part we have a look at the re-using of found parts of strings and backward references. Up to now, the following parts are published:0Part 1: Basics | Part 2: Normal strings, grouping and repetitions | Part 3: Meta characters and combinations | Part 4: Selections of characters and alternatives | Part 5: Character groups and classes | Part 6: Reusing and backward references | Part 7: Modifiers | Part 8: Usage and Examples

Important Note

To try out the example and to test your own regular expression, you can use the software Text Converter in its free Basic version. In the first part of this tutorial, there is an explanation how to use the application.

Reusing found characters

In our last examples, we habe only used regular expressions in the search box of the Text Converter. That means, we have searched for a string with the help of a regular expression and we have replaced the string we have found with a fixed string. But it is also possible to re-use parts of the found string in the replace box. What is possible with this, we can see in the following examples.

Example 10Search for:    Replacement:   Original:         After replacement:0(.*)           $1             abc               abc0(.*)c          $1             abc               ab0(.*c)          $1             abc               abc

First of all, we have to create a reference in our search term. This can be done with a round bracket around the part of the string that should be reused. The easiest example, you can see in line 1 of example 1. We are searching for all characters with .* and with the brackets around .* we create a reference to the foun string. As a replacement, we are using $1. $1 stands for the content that we have found within the first bracket. Thus, the text "abc" will be found completely and after that, "abc" will be replaced with "abc". So, after the replacement, we see no difference to the orginal text.
0In the secound line, we are searching for an arbitrary character in front of a "c". The arbitrary character is surrounded with brackets, because we want to insert its content again. The "c" will be lost in this procedure, because it do not stand in the brackes. We find "abc" with this regular expression, but we only insert the part from the brackets. Something different can be seen in the next line. Here also "c" is written into the brackets. Again, "abc" will be found and replaced with the content of the bracket. So, again, there is no difference between the original and the modified text in this example.

Example 20Search for:     Replacement:   Original:       After replacement:0(.*)x(.*)       $2x$1          axc             cxa

In the first box, we have fetched with $1 the reference to the first bracket in our regular expression. If we are using more than one pair of brackets, we can fetch the content from this brackets with $1, $2, $3 and so on. In example 2, we are searching for arbitrary characters in front of an "x" and for arbitrary characters after an "x". In the original text, "axc" matches this condition, so this string will be found. We will replace this string with $2x$1. That means: First, we take the content from the secound bracket, then we write an "x" and after that, we take the content from the first bracket. The result is "cxa", we have changed "a" and "c" around the "x".

Example: Date rewriting

Example 30Search for:          (\d{4})/(\d{2})/(\d{2})0Replace with:        $3.$2.$10Original:            2011/07/130After replacement:   13.07.2011

We would like to look at an example in which  we can apply this knowledge in practice. We would like to rewrite a date in the form 2011/07/13 into a date with the form 13.07.2011. For this, we are using the regular expression, you can see in example 3. \d stands for a number (see last part). We have three groups of numbers, the length of the first group is 4, then there are two groups with 2 digits. This number groups are separated with the character /. This character is no meta character, so that we can use it in our regular expression directly. So, we have a description of our date. We have surrounded the number blocks with round brackets, so that we can reuse them with $1, $2 and $3. In the replace box, we make this. We order the number blocks and write dots between them. So, we have formatted our date correctly.

Backward references

We remember the expression [ab]+ from one of the other parts of this tutorial. This expression matches strings like "aaaa", "bb" but also "abbaa" or "ababaa". [ab]+ only means that an element from the group [ab] has to appear for one or for several times. Which element from the group that should be, is not specified. It can be always the same element or the elements can be different.

Example 40Search for:   Replacement:  Original:           After replacement:0[ab]+         X             abab aaaa bb aabb   X X X X0([ab])\1+     X             abab aaaa bb aabb   XXXX X X XX

If you want to be sure that only the same element from the group is repeating, you can use backward references. Backward references are similar to the re-using of characters we have seen in the last section. The difference is, that now we are reusing chracters within a regular expression and not from the search box to the replace box.
0Exampe 4 shows, how we can use such a reference. First, we have the character group [ab]. Again, we surround this group with round brackets to create a reference. Because we want to have the first bracket, we want to use reference 1. Above, we have used the expression $1 to fetch the reference, within regular expressions we are using \1 to fetch the reference. Behind the reference \1, we have written a +. That means, that the reference should repeat for one or several times.
0The expression has the following meaning: The reference \1 should be repeated for one or more than one times. The reference \1 refers to the first bracket. The first bracket contains the character selection a or b. So only a or only b should be repeated for one ore for multiple times. In the example you can see, that it is working. The expression finds "aaaa", "aa", "bbb", "b" but not "aab" or "baba". "abab" is seen as four single occurences, so that it is replaced with "XXXX" while "aaaa" is replaced with only one "X", because here the same letter repeats

Multiple use of a backward reference

Example 50Search for:     Replacement:  Original:          After replacement:0([ab])x\1x\1    X             axaxa abab bxbxb   X abab X

It is possible to use a backward reference for multiple times. You can see an example in the box. Our string should begin with a character from the group [ab]. We have summarized this group with the round brackets to the reference \1. After a character from the group, an "x" has to occur, after that a reference to the group (thus an "a" or a "b" depending on what was found in the first group) and another "x") and then another "x" and another time the character from the first group. So, this expression only matches "axaxa" and "bxbxb" but no other strings or combinations like "abab" or something like that.

Several backreferences

Example 60Search for:        Replacement: Original:           Replaced:0([ab])x([cd])x\1\2 X            axdxad ab bxdxbd    X ab X0([ab])x([cd])x\2   X            axcxc ab bxcxc      X ab X

In the first two examples, we have only used one backward reference with \1. If you are using more than one brackets, multiple backreferences result. An example, you can see in the box. Here, we would like to have a character from the group [ab], then we want to have an "x", then a character from the group [cd] and another "x". In the last part, the character from the first group and the character from the last group should appear another time. With this, we have defined two back references. This regular expression matches  "axdxad", "bxdxbd" from the example but also "axcxcd" or "bxcxbc".0We are not forced to use each back reference, we have created. In the secound line, we define a reference to the group [ab], but we do not use \1. Instead, we only use the reference to the secound group with \2. Sometimes, you need some round brackets to combine some parts of the expression, but you do not want to use this parts in a reference.

Summary

  • Characters that should be re-used respectively characters that should be recalled with a reference have to be enclosed with brackets
  • The references are numbered and start with the first bracket with 1
  • If you want to create references from one regular expression to another one (for example when searching and replacing), you can call your references with $1, $2, $2 and so on
  • Also references within regular expressions are possible. Here you can use \1, \2, \3 and so on

Read more

 0© Stefan Trost - The usage of this tutorial, even in parts, is prohibited without prior written consent of Stefan Trost. But of course, you are welcome to link to this tutorial.

 
  
 

Comments

Have you got the same opinion like the author or do you want to add something? Here you can leave a comment.

Write a comment

You can leave an anonymous comment. If you want to write something under your name, please log in or register.



Past Comments

Nobody has written a comment on this article. You can be the first one.