Complete code for regex email validation
TL;DR time! Let us present you the comprehensive Java email validation regex that covers a series of validations:
Above regex pattern embodies:
- Domain name syntax restrictions.
- Ensure no consecutive, trailing, or leading periods (.) appear in the email address.
- Enforce local part and domain part restrictions.
- Validations to also include Non-Latin and Unicode characters in addition to Latin characters.
We’re about to take a deep dive into this regular expression and build it step-by-step from the ground up in the coming sections; but, in case you’re short of time or you want something a bit simpler, feel free to jump straight to a simple email validation regex in Java, an OWASP-provided email validation regex, or start using an email validation API.
Simple email validation regex in Java
Firstly, let us start building our regular expression enforcing the most basic structure a valid email address should hold (note that at least one dot is not required, and more than one dot is also possible):
Valid email address syntax according to RFC 5322
As per section 3.4.1. Addr-Spec Specification on RFC 5322 specifying the syntax of an email, an email address should carry the following basic pattern:
addr-spec = local-part "@" domain
Note: ‘local-part’ denotes the section occurring before the @ sign in an email address.
Our simple regex checks that this structure is preserved in addition to validating no white space character is present in the email address string.
Java email validation using the Pattern class
Let’s write a simple Java method with the help of the Pattern class readily provided in the java.util.regex package:
- Compile the emailValidationRegex String into a Pattern object.
- Call the matcher method of the Pattern object returned on step 1 passing in the emailAddrToValidate String as an argument.
- Execute the Matcher.matches method of the Matcher object returned on the previous step; this step returns a boolean indicating if emailAddrToValidate String matched the regex in the emailValidationRegex String.
Pattern class also supplies a Pattern.matches method letting us simplify the previous lines to a single line:
We can test out this method using a simple JUnit assertion:
Test results show our regex pattern succeeded in identifying an email ID:
But, note that at the moment our regular expression is far from being “production-ready” because it would still pass an invalid email address such as:
- Email addresses with local-parts past their maximum character limit.
- Leading, trailing, and consecutive dots in the email address.
So, let’s find out how we can make our Java email validation regex more Internet Message Format laws-abiding shall we?
Adding restrictions for the domain
As our next step, let us tackle these domain restrictions in our regular expression:
- Allow Latin characters (“a” to “z” both lowercase and uppercase) in any location of the email address.
- Let digits from 0 to 9 reside anywhere within the email address.
- Allow hyphen (“-”) characters within the email address but not as a leading or trailing character.
Why not try passing in an email ID with a special character to see if our email regex doesn’t allow it anymore?
As we can see, the assertion fails as expected:
Checks for consecutive, trailing, or leading dots
RFC 5322 passes addresses with periods in them as valid email addresses as long as the periods aren’t the first or last letter of the address and not repeated one after the other (“..”).
Let’s modify our regular expression to cater to these requirements:
Since we had already handled the restrictions on domain names in the previous section, we only had to make a slight change to the section of our regex validating the local part of the email address.
In detail, we replaced the \S+ allowing one or more non-whitespace characters before the @ sign (which permitted a period at the start) with a (?:[^.\s])\S* regex allowing a single character that isn’t a period or whitespace followed by zero or more non-whitespace characters.
Now, our regular expression shouldn’t allow trailing, leading, or consecutive dots in both the local part of the domain of an email address.
Shall we test our new regular expression to see if it works as expected?
Test results show that the valid email ID on testEmailRegex1 asserts as true while the other tests with consecutive, trailing, or leading periods assert as false:
Has restrictions for the local part and domain part
Until this moment our email regex let any number of letters to be present in the email address and didn’t allow any special characters in them either. But, RFC 5321 specifically limits the length of a local-part to 64 octets and RFC 5322 allows a set of special characters ("!", "#", "$", "%", "&", "'", "*", "+", "-", "/", "=", "?", "^", "_", "`", "{", "|", "}", "~”) in the local part of an email address.
Let us go ahead and add support for these on our regular expression:
As you can see, we’ve added the aforementioned restrictions to our email regex along with an additional restriction allowing a top-level domain to only contain 2 to 7 letters.
Testing out our regular expression shows it follows all the new local part and domain part restrictions we introduced:
Validates non-Latin or Unicode characters email
Up until this point, our regular expression only passed all the characters in the English language. But, with the expansion of internationalization in the world, IETF has adapted and already produced multiple RFCs passing addresses with Unicode characters as valid email addresses.
So, why don’t we see how we can adapt as well? Simply exchanging all occurrences of a-zA-Z in our email regex with \p{L} that implies a character in the category of Unicode letters, we should be able to pass all email addresses holding Unicode characters from our regular expression:
Time to test if the above regex accepts unicode characters as well!
As we can see in the results, our email regex matched all email IDs containing or not containing Unicode characters as valid email addresses:
Regular Expression RFC 5322 for Email Validation
RFC 5322 provides a rather large regular expression we could use for validating email addresses:
This regular expression allows:
- Two options for the local part:
- One or more sequences of characters comprising of letters “a” to “z”, digits 0 to 9, or special characters !, #, $, %, &, ', *, +, /, =, ?, ^, _, `, {, |, }, ~, or -. A period character may exist in the local part as long as it doesn’t lead, trail, or appear consecutively.
- One or more sequences of any ASCII characters that are placed within double quotes.
- Two options for the domain:
- A domain name following the domain name syntax.
- Specific internet address provided within double-quotes.
Make note that the above regex doesn’t enforce any length limits; this is simply since the RFC 5322 email message format doesn’t designate any length limitations on any part of the email address.
Let’s test out this regular expression in our Java application:
Our test results show the regex matched with the valid email IDs holding RFC 5322-allowed syntaxes:
Ergo, we explored a collection of Java email validation regular expressions that would surely prove helpful in your email address validating endeavors. Feel free to use these regular expressions straight away instead of wasting time writing your own regular expression! Plus, if you’d like to explore how regular expressions are used in other languages, check out our ultimate guide to validating emails with Regex.
Using OWASP email validation regex
The Open Web Application Security Project® — or in short, OWASP — leads various initiatives to improve web application security. Courteously, they maintain a validation regex repository holding just what you might need for your email address validations:
Although the OWASP email address verification regex isn’t as verbose as the RFC 5322 regex we discussed previously, it holds all the basic validations for a standard email address.
Shall we test out how the OWASP regex performs on an email ID String in a Java application?
Affirming the points we discussed earlier, the OWASP email validation regex matches with
- basic email addresses
- Email addresses with a set of special characters (“_”, “+”, “&”, “*”, “-”) in the local part
- email addresses with 2 to 7 letters in its top-level domain
But essentially fails to match email addresses with Unicode characters, double-quoted local parts, or ones that have internet addresses for the domain:
Apache Commons Email Validation
Apache provides a Commons Validator package to help in various validation scenarios in Java applications. Notably, inside its org.apache.commons.validator.routines package is an EmailValidator class letting us validate if an email address conforms to the RFC 822 format.
Setting up the Apache Commons Validator package inside our Java app is as simple as adding its Maven dependency to the project’s pom.xml file:
Then, all we have to do is import the EmailValidator class inside the Commons Validator package on any class we’d like to make email address verifications:
Let us test this EmailValidator using a few JUnit tests as usual:
Test results show that the Apache Commons library’s EmailValidator successfully matches an email ID holding:
- the basic structure
- special characters (RFC 822 allows any character except specials (" / ", "<", ">", "@", ",", ";", ":", "\", “"”, ".", "[", "]"), space, delete, or ASCII control characters)
- Unicode characters
- a double-quoted local part
But, since the EmailValidator follows RFC 822, it doesn’t match with email IDs holding internet addresses for their domains:
Using an API to validate emails
If you want to validate email addresses with your hands off of lengthy regexes and your Java project relieved from the weight of yet another library, chances are you’d be better off with a proper Java email validation API.
An easier way to find valid email addresses
Say hello to Abstract Email Validation and Verification API. Using this email verification API you’d not only get a simple email address syntax verification but along with it obtain:
- MX record and SMTP server validation
- Automatic email address misspelling detection and smart suggestions to correct them
- Analyze if the provided email ID uses free and /or disposable email provider domains
- All of the above verifications and features in a privacy-friendly (GDPR, CCPA) manner
Why not give Abstract Email Validation API a try? Because without even any confirmation mails it literally takes only the time of typing in your email address and a password!
Additionally, don’t hesitate to scope out the Best Email Validation and Verification APIs list for a detailed look into the API options available along with their pros and cons.
Which email validation method should you use?
As we saw when exploring multiple methods for Java email validation, each approach holds varied complexities and different support of additional features like special characters, Unicode characters. Hence, the ultimate choice of which email regex or technique to use for your email address validation boils down to which specific characteristics and features you’d need your valid email addresses to adhere to.
Fundamental email validation
As the simplest email validation, you can check for an “@” sign between two words using the simple regex validation.
Safest method to validate your email address
But, in case you’re looking for exhaustive validation for your emails, the RFC 5322 validation we discussed would be your safest bet. Although, keep in mind that it doesn’t allow for non-Latin or Unicode characters while those letters in email addresses could become quite the norm in the internationalizing days we live in.
If you’re interested, feel free to check out a similar guide on how to do email validation in jQuery.