Learning never exhausts the mind
Home >  Coding > C# ASP.Net MVC > Regular Expressions in C#

Published 19th January 2009 by

Regular expressions are special strings that are used to describe a search pattern. They can be used for data validation, data processing and pattern matching.
Introduction to Programming with C# Series
  1. Introduction to Programming with C# 7
  2. C# Programming Fundamentals
  3. Introduction to Object-Oriented Programming
  4. C# Object-Oriented Programming Part 2
  5. Flow Control and Control Structures in C#
  6. C# Data Types, Variables and Casting
  7. C# Collection Types (Array, List, Dictionary, Hash Table)
  8. C# Operators: Arithmetic, Comparison, Logical and more
  9. Using Data in C# 7 with ADO.Net & Entity Framework
  10. LINQ: .NET Language Integrated Query
  11. Error and Exception Handling in C#
  12. Advanced C# Programming Topics
  13. Reflection in C#
  14. What Are ASP.Net Webforms
  15. Introduction to ASP.Net MVC
  16. Windows Application Development
  17. Assemblies and the Global Assembly Cache in C#
  18. Working with Resources Files, Culture & Regions
  19. Regular Expressions in C#
  20. Introduction to XML with C#
  21. Complete Guide to File Handling in C#

I love it when people say "a simple way to do XYZ is to use regular expressions" and then offer what amounts to a string of indecipherable hieroglyphics to answer the question. However, once you know how to leverage the power of regular expressions, they can be very useful tools.

Firstly, let's talk about what exactly a regular expression is. A regular expression is a string of characters that form what is known as a pattern. This pattern can then be used to match a part, or parts, of another string. There are usually start and end characters to indicate where the pattern starts and stops, and a seemly random bunch of characters in between. These random characters are in fact representations of different smaller patterns to match, for example, letters, numbers, punctuation or whitespace.

Regular expressions are a very fast and efficient method for string manipulation and can save tens of lines of code for complex operations, for example, an email address can be validated in just 4 lines of code, and that's splitting the lines up. You could do it in one line!

One thing that is very frustrating is that while regular expressions are fairly generic, each application "engine" has its own implementation so it is rarely a simple case of copy and paste and it'll work. Examples of different applications engines using regular expressions are Perl, PHP, .NET, Java, JavaScript, Python, Ruby and POSIX.

The term regular expression is often shortened to just regex, it's easier to say and type so I'm going to use that from now on.

Regular expressions in C# are defined within the System.Text.RegularExpressions namespace which provides a Regex class. When instantiating the class you need to pass the expression string to the constructor. We have used a verbatim string for the regex as it makes the regex easier if you don't have to escape forward slashes.

Finding Values with RegEx

One of the basic methods of the RegEx class called IsMatch. It simply returns true or false, depending on whether there is one or several matches found in the test string.

For our first RegEx example, we can use IsMatch to see if a string contains a number.

string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex("[0-9]+");
if (regex.IsMatch(testString))
    Console.WriteLine("String contains numbers!");
    Console.WriteLine("String does NOT contain numbers!");

Capture Values with RegEx

In this example, we'll capture the number found in the test string and present it to the user, instead of just verifying that it's there.

string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex("[0-9]+");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Number found: " + match.Value);

The Index and Length properties of match can be used to find out the location of the match in the string and length of the match.

Group Matching with RegEx

In the previous two examples we saw how to find and extract a number from a string, so now let's look at groups and see how to extract both the age and the name.

This new pattern first looks for the separating comma and after that, a number, which is placed in the second capture group.

string testString = string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex(@"^([^,]+),s([0-9]+)");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Name: " + match.Groups[1].Value + ". Age: " + match.Groups[2].Value);

The groups property is used to access the matched groups. Index 0 contains the entire match, while Index 1 is for the name and 2 for the age.

Validation with RegEx

Regular Expressions can also be used for input validation. In this example, we test two strings to see if they contain a valid email address. The emailRegex will match any valid email address.

string ValidEmailAddress = "somebody@somedomain.com";
string InvalidEmailAddress = "invalid.email-address.com&somedomann..3";
string emailRegex = @"^[w-]+(.[w-]+)*@([a-z0-9-]+(.[a-z0-9-]+)*?.[a-z]{2,6}|(d{1,3}.){3}d{1,3})(:d{4})?$";

Regex RegularExpression = new Regex(emailRegex);

if (RegularExpression.IsMatch(ValidEmailAddress))
  Console.WriteLine("{0}: is Valid Email Address", ValidEmailAddress);
  Console.WriteLine("{0}: is NOT a Valid Email Address", ValidEmailAddress);

if (RegularExpression.IsMatch(InvalidEmailAddress))
  Console.WriteLine("{0}: is Valid Email Address", InvalidEmailAddress);
  Console.WriteLine("{0}: is NOT a Valid Email Address", InvalidEmailAddress);

In ASP.Net you can use a RegularExpressionValidator to validate user input in forms on the client and server side.

<asp:TextBox ID="txtEmail" runat="server" ></asp:TextBox>  
    <asp:RegularExpressionValidator ID="RegularExpressionValidator1" runat="server"   
        ErrorMessage="Please enter a valid email address"   
        ToolTip="Please enter a valid email address"   
        ControlToValidate="txtEmail" ForeColor="Red">Please enter a valid email address</asp:RegularExpressionValidator>

Search/Replace with the Regex

Another powerful feature of regular expressions is to perform complex search and replace functions. We'll use the Replace() to remove whitespace (spaces, tabs) from a string.

string stringValue = "Hello World 12345, Testing! - We are good!";
Regex regex = new Regex("[s+]");
stringValue = regex.Replace(stringValue, string.Empty);

How about removing anything that is not alpha-numeric (useful for input sanitisation)

string stringValue = "Hello World 12345, Testing! - We are good!";
Regex regex = new Regex("[^a-zA-Z0-9]");
stringValue = regex.Replace(stringValue, string.Empty);

Next, we'll see how to strip HTML tags from a string using RegEx.

string stringValue = "<b>Hello, <i>world</i></b>";
Regex regex = new Regex("<[^>]+>");
string cleanString = regex.Replace(stringValue, string.Empty);

You can even use DataAnnotations on your models to enforce user input in both the client and server sides.

[RegularExpression(@"^([w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$", ErrorMessage = "Please enter a valid email address")]  
public string EmailAddress { get; set; }

Formatting Data with RegEx

You can also use RegEx.Replace to format strings, in this example taking a series of numbers and formatting them into a telephone number.

string number = "0123456789";
string formatted = Regex.Replace(number, "(d{3})(d{3})(d{4})", "($1) $2-$3");
// formatted = "(012) 345 6789"

Common RegEx Patterns

You can use any of these patterns to match common input formats for validation and sanitisation.

dAny Digit
DAny Non-digit character
.Any Character
[abc]Only a, b, or c
[^abc]Not a, b, nor c
[a-z]Characters a to z
[0-9]Numbers 0 to 9
wAny Alphanumeric character
WAny Non-alphanumeric character
{m}m Repetitions
{m,n}m to n Repetitions
*Zero or more repetitions
+One or more repetitions
?Optional character
sAny Whitespace
SAny Non-whitespace character
^...$Starts and ends
(...)Capture Group
(a(bc))Capture Sub-group
(.*)Capture all
(abc|def)Matches abc or def

Common Validation Patterns

^[w-]+(.[w-]+)*@([a-z0-9-]+(.[a-z0-9-]+)*?.[a-z]{2,6}|(d{1,3}.){3}d{1,3})(:d{4})?$Email address
/((www.|(http|https|ftp|news|file)+://)[_.a-z0-9-]+.[a-z0-9/_:@=.+?,##%&~-]*[^.|'|# |!|(|?|,| |>|<|;|)])/Website
Tutorial Series

This post is part of the series Introduction to Programming with C#. Use the links below to advance to the next tutorial in the couse, or go back and see the previous in the tutorial series.

Leave a Reply

Fields marked with * are mandatory.

We respect your privacy, and will not make your email public. Hashed email address may be checked against Gravatar service to retrieve avatars. This site uses Akismet to reduce spam. Learn how your comment data is processed.