Email regex usage
It is important to learn the context before jumping into C# regular expression. It can help you to broaden your scope of using email regex. Let's see when we would want to use the email regex.
Email format validation
C# powers countless web applications around the world. When you first visit a website and want to use the service, it is common that it requires you to join the website. One of the details you have to provide is your email address. To businesses, your email address is an important detail for communication and marketing purposes. If a company fails to collect valid email addresses, it loses a way to reach out to that customer. Furthermore, there is a chance that hackers and scammers use an invalid email address to join and explore a website to find a way to infiltrate the system.
Valid email patterns
The regular expression in C# can be used to validate email addresses. When it comes to a valid email format, there are international standards. The rules can be explained below.
- Alphabet letters, numbers, and specific special characters including underscores, periods, and dashes
- Underscores, periods, and dashes must be followed by one or more letters or numbers.
Let's have a look at some examples of valid emails.
- example@email.com
- example.abc@email.com
- example_abc@email.com
Some of the invalid email addresses look like the following.
- example-@email.com
- example..abc@email.com
- .example@email.com
- example#example@email.com
All these rules can be interpreted in a regular expression. Since the rules are universal and can be defined in a regex which is a general language, once you define it, you can use it anywhere you need to check the validity of email addresses.
Email extraction out of unstructured text string
Regex is not only used in web applications to validate email but can also be used in the data domain. Data teams often have to handle unstructured data such as plain text. In large free text, it can contain important information such as email addresses. To extract email addresses, you can use a regular expression to detect and extract them. After extraction, you can then store email addresses in structured storage such as a relational database or semi-structured format in CSV or JSON.
Personal information protection
Identity theft is an important topic in data security. As enterprises deal with ever-growing data, it becomes a good practice to de-personalize customer details to prevent personal data from being used illegally. Among many types of personal details, email addresses can be extremely useful to hackers since they can be used as usernames for other online services.
If your customers' email addresses are mixed in a long string or plain text, you can use a regular expression to find and replace them so that you can either remove all email addresses or switch them to de-personalized email addresses. This makes sure that even if there is a data breach, the infiltrator cannot get real personal information.
Prefix split for username creation or duplication check
A valid email address has a prefix and a domain part and the two sections are split by the symbol, "@". When you sign up for a service using your email, some companies use the prefix as a username. When you log into the service, the company uses that username and displays it under your profile.
Another use case is duplication check. To attract more customers, companies often provide a free trial for a certain period. To take advantage of the free trial, some people could create more email addresses using the same prefix but different domains. By splitting an email address by the symbol and comparing the prefix with historical records, you can identify the same person.
Domain validation to block suspicious activity
Similarly, you can use the same split strategy to get the domain name and validate it if it is a trustworthy domain. If you do not check the domain that a user enters, you can potentially let users put in a random email address before carrying out suspicious activities. To avoid any risk, it is essential to stop at an early stage, and, to do so, using regex to split and grab the domain address is an important step for validation.
Email regex in C#
We learned when we want to use email regex and how it can help in the five scenarios. Let's now learn the actual implementation part of regex in C#.
Email format validation
In C#, you can access the Regex functions under the System.Text.RegularExpressions namespace. The sample code below shows you how you can validate an email address.
If an email input from a user is valid, it will print out "This is a valid email" since the Regex.Match() function returns success. If not, it will go to the else condition. If you use the match function, it returns a matched string. For example, if you print out the following code snippet, it will return "example@email.com".
If you want to try an invalid email, switch the email variable to the following and execute the main class.
When it is not matched, the Regex.Match() function does not return anything.
Email extraction out of unstructured text string
Data engineers and software developers sometimes have to handle unstructured data. There can be a situation where you have to extract email addresses from plain text and store extracted emails in a nice clean way.
The sample code above prints out three hidden email addresses in the text. Note that in the Regex.Matches() function used the case-insensitive option and the function returns all matched strings in a collection.
Personal information protection
There can be a case where we want to de-identify or remove personal email addresses in plain text. This can be done by the Regex.Replace() function.
The plainText and emailRegex variables remain the same as the sample code above. This function first detects strings that are matched with the regex and replace them with an empty string. Using this pattern, you can replace emails with any new string you want.
Prefix split for username creation or duplication check
We can use a C# function to split an email address by the symbol, "@", and then get the string from the first index which will be the prefix.
You can use the string.Split() function to split an email. The function returns an array with each value divided by the splitting symbol.
Domain validation to block suspicious activity
Using the same function, you can extract the domain. This time, we will want to read data from index 1. You can try to change the index number from the code above as below.
Handy C# libraries for email validation
We learned the email regex for the various operations. Alternatively, if you are interested in using C# libraries for email validation, you can explore several options here.
1. System.Net.Mail
This is C# native library that allows users to validate email addresses using the class initialization. As you can see below, when you initialize the MailAddress class with an email string, it throws an exception if a format is invalid. Using the try-catch clause, we can check the validity of the input email.
2. EmailValidation
EmailValidation is a third-party library that you can install using the following PackageReference:
Or, by the Package Manager:
You can test it out using the following sample code.
The sample code asks for your email and returns if it is valid or not.
3. System.ComponentModel.DataAnnotations
This is a class that is supported by Microsoft. The data annotations let you apply attributes or rules to your variables to check, for example, required fields or the validity of string input.
When you test the invalidEmail value, it will return false and the second valid email address will return true.
4. AbstractAPI Email Validation
Using C# libraries for email validation might look simple. However, the regular expression may not be able to capture all possible email addresses, and using the same validation rule across different classes and languages can be difficult. As an alternative, you can use an email validation API like AbstractAPI.
How to use AbstractAPI
- Go to the sign-up page and join the website.
- Once you finish joining, log into the website.
- On the main dashboard, find and click the Email validation menu.
- On the Email validation page, you can see Try it out section. As you can see below, it will give you the API key and sample URL address with the API key and your email address for testing.
- You can also check the Documentation menu to learn more details.
A request URL structure follows the patterns below.
https://emailvalidation.abstractapi.com/v1/ ? api_key = YOUR_API_KEY & email = emailprefix@sample.com
What's interesting is that, in the response body, you can find a comprehensive analysis of the requested email. The AbstractAPI returns fields such as:
- autocorrect: auto-corrected email address if your requested email has an incorrect value. For example, if you send sample@gmali.com, it will auto-correct it as sample@gmail.com.
- deliverability: if a requested email is not valid, it will say "UNDELIVERABLE".
- quality_score: it gives you an Abstract's confidence on the email address from 0.01 to 0.99.
- is_valid_format: it tells you if it is a valid email address.
- is_free_email: it tells you if the requested email domain is free. (e.g. Gmail, Yahoo, etc)
- is_disposable_email: it tells you if it is a disposable email. (e.g. Mailinator, Yopmail, etc)
- is_role_email: it tells you if it is a group email.
- is_catchall_email: if a requested email is configured to catch all emails, it will be true.
- is_mx_found: it returns true if MX Records for the domain can be detected.
- is_smtp_valid: it is true if the SMTP check of the email goes through successfully.
These various types of validation checks are hard to be found in C# classes. If you try to implement them on your own, it can also be demanding. Using email validation APIs such as AbstractAPI can provide benefits and help developers to save time.
Wrapping up
We learned various ways of validating email addresses using regular expressions in C# and C# libraries. Using regex, you can perform not only email validation but also manipulate email addresses such as removing or replacing. Regardless of the email regex operations, if you get yourself familiar with regular expression patterns, it will become extremely handy for your future work. Regex is a general syntax that can be used across different programming languages and in different fields such as software development or data mining. Try the sample codes above and also try other patterns to hone your skills.
Frequently asked questions
We prepared the FAQ section to expand your regex and relevant knowledge beyond this article.
What factors do we need to consider when using regex in C#?
When you use regex in C# not just for email address validation but for other cases, one thing you need to consider is the input source. When you build a regex, knowing about the input source can help you to build an optimal and accurate regex. Is your input source constrained or not constrained? A constrained input source means that the text where you want to use a regex comes from a reliable source that follows certain rules. For example, the email input from a user registration form online is mostly constrained form. Unconstrained input means it comes from an unreliable source such as a web user who does not have to follow any rules.
What are other helpful string methods in C#?
In C#, there are multiple string methods that can help developers to handle strings.
Where are good C# regex references?
To deepen your regex knowledge in C#, refer to the following sources.
- Microsoft Regex Reference: the page explains available regex syntax in C# including character escapes, character classes, anchors, grouping constructs, lookarounds, quantifiers, and many more. Although you don't need to remember all of them, it would be good to know what you can do with regex in C#.
- Regex best practices in .NET: this is another Microsoft document that explains the best practices to use regex in C#. The document covers comprehensive aspects of using regular expression in the language.
- Regex101: this website is not a reference website, but a great online tool where you can practice your regex. You can copy and paste your text and insert your regex to test it. The website gives you a quick reference and the interpretations of your regex.