C# Regular Expression (Regex) Examples in .NET
More Advanced Regular Expression Syntax
This article continues from Learn Regular Expression (Regex) syntax with C# and .NET and covers
character escapes, match grouping, some C# code examples, matching boundaries and RegexOptions.
Matching special characters with character escapes
Special characters such as Tab and carriage return are matched using character escapes. The syntax is similar to C and C#. The common
character escapes are listed below.
| Special Character |
Description |
| \t
| Matches a tab
|
| \r
| Matches a carriage return
|
| \n
| Matches a new line
|
| \u0020 | Matches a Unicode character
using hexadecimal representation.
Exactly four digits must be specified. |
In this example, the Regular Expression pattern matches one or more word characters followed by a carriage return then a new line.
Text: an anaconda ate
Anna Jones
Regex: \w+\r\n
Match:
ate
Depending on your operating system you might have to combine the \r and \n character escapes to create the
correct new line sequence for your platform. For Microsoft Windows systems you should generally use \r\n which is a carriage
return then line feed (CRLF). To simply match the end of a line or string use the dollar sign ($).
Match Grouping
Groups perform a few different functions. They allow the quantifiers (such as plus and star) to be applied to sections of the match
instead of just individual characters.
A group is specified by the round brackets ( and ). If you want to match the round bracket characters you
must use the escape character before the bracket e.g. \( or \).
This regex matches 'http://' optionally followed by 'www.' then starts a group and matches one or more of
any character that is not a full stop/period (.) closes the group then matches '.com'.
Text: http://www.yahoo.com/index.html and http://yahoo.com
Regex: http://(www\.)?([^\.]+)\.com
Matches:
http://www.yahoo.com
http://yahoo.com
The question mark after the group (www\.) applies to the whole group making it optional.
An example in C#
The regular expression classes are in the System.Text.RegularExpressions namespace.
using System.Text.RegularExpressions;
The Regex class represents a regular expression. A regular expression pattern must be specified when creating a Regex
object. The pattern cannot be changed.
Regex exp = new Regex(
@"http://(www\.)?([^\.]+)\.com",
RegexOptions.IgnoreCase);
string InputText = "http://www.yahoo.com/";
The MatchCollection class stores a list of successful matches found by applying the regular expression pattern to an input
string.
MatchCollection MatchList = exp.Matches(InputText);
Match FirstMatch = MatchList[0];
Console.WriteLine(FirstMatch.Value);
The Group class represents a group within the regex pattern. Each Match object has a Groups collection.
Group GroupCurrent;
for (int i = 1; i < FirstMatch.Groups.Count; i++)
{
GroupCurrent = FirstMatch.Groups[i];
The Success property on the group can be used to check if the Group matched or not.
if (GroupCurrent.Success)
{
Console.WriteLine("\tMatched:" + GroupCurrent.Value);
}
else
{
Console.WriteLine("\tGroup didn't match");
}
}
Groups within a Match can be referenced by number or by name (see below).
if (MatchList.Count > 0)
{
if (MatchList[1].Success)
{
Console.WriteLine("Group 1 matched");
}
}
Matches also allow sections of the match to be used in replacement expressions when using Regex.Replace().
Named Groups
Groups can be named to allow easier identification with the following syntax.
(?<NameOfGroup>expression)
Matching boundaries between words
To match a boundary between a word character (\w) and a non-word character (\W) use \b. The match
will occur at the first or last character in words separated by any nonalphanumeric characters. For example, the following Regular Expression
matches one or more word characters followed by a word boundary followed by a hyphen (-) followed by another word boundary followed
by one or more word characters.
Text: Anna Jones and John William-Scott went to lunch- with an anaconda
Regex: \w+\b-\b\w+
Options: IgnoreCase
Matches: Anna Jones and John William-Scott went to lunch- with an anaconda
William-Scott
Use \B to specify that a match must not occur on a \b boundary.
Regular Expression Options
Regular Expression Options can be used in the constructor for the Regex class.
RegexOptions.None - Specifies that no options are set.
RegexOptions.IgnoreCase - Specifies case-insensitive matching.
RegexOptions.Multiline - Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively,
of any line, and not just the beginning and end of the entire string.
RegexOptions.Singleline - Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character
(instead of every character except \n).
RegexOptions.ExplicitCapture - Specifies that the only valid captures are groups that are explicitly named or in the form
(?<name>...).
RegexOptions.IgnorePatternWhitespace - Eliminates unescaped white space from the pattern and enables comments marked with the
hash sign (#).
RegexOptions.Compiled - Specifies that the regular expression is compiled to an assembly. The regular expression will be faster
to match but it takes more time to compile initially. This option (although tempting) should only be used when the expression will be
used many times. e.g. in a foreach loop
RegexOptions.ECMAScript - Enables ECMAScript-compliant behavior for the expression. This flag can be used only in conjunction
with the IgnoreCase, Multiline, and Compiled flags. The use of this flag with any other flags results in an exception.
RegexOptions.RightToLeft - Specifies that the search will be from right to left instead of from left to right.