Guides
Last updated
August 4, 2023

The Complete Guide to Email Validation in Java [Regex + Apache Commons]

Emma Jagger

Table of Contents:

Get your free
Email Validation
API key now
4.8 from 1,863 votes
See why the best developers build on Abstract
START FOR FREE
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
No credit card required

Even if you’re the most careful person on Earth, when you sign up on a site, platform, or service on the web, chances are you’ve mistaken an email address input or received invalid email addresses and come upon an email validation pop-up asking you to “enter a valid email address and try again”. This is because although the syntax of an email address looks seemingly simple enough to consider you don't need to validate email address, in reality, an email address should follow a collection of rules enforced by the IETF – specifically RFC 5322 holding the currently used Internet Message Format.

Hence, arrives a need for email address validation. And, as Java developers, we’re provided with a repertoire of options ranging from regular expression (regex) based evaluations as the primary technique to using Apache Commons EmailValidator library and email validation APIs.

So, without further ado, how about we figure out in detail how each of the aforementioned approaches helps us Java coders perform email address validation and avoid invalid emails?

Let’s send your first free
Email Validation
call
See why the best developers build on Abstract
Get your free api

Complete code for regex email validation

TL;DR time! Let us present you the comprehensive Java email validation regex that covers a series of validations:



^[\p{L}0-9!#$%&'*+\/=?^_`{|}~-][\p{L}0-9.!#$%&'*+\/=?^_`{|}~-]{0,63}@[\p{L}0-9-]+(?:\.[\p{L}0-9-]{2,7})*$

Above regex pattern embodies:

  • Domain name syntax restrictions.
  • Ensure no consecutive, trailing, or leading periods (.) appear in the email address.
  • Enforce local part and domain part restrictions.
  • Validations to also include Non-Latin and Unicode characters in addition to Latin characters.

We’re about to take a deep dive into this regular expression and build it step-by-step from the ground up in the coming sections; but, in case you’re short of time or you want something a bit simpler, feel free to jump straight to a simple email validation regex in Java, an OWASP-provided email validation regex, or start using an email validation API.

Simple email validation regex in Java

Firstly, let us start building our regular expression enforcing the most basic structure a valid email address should hold (note that at least one dot is not required, and more than one dot is also possible):



^\S+@\S+$

Valid email address syntax according to RFC 5322

As per section 3.4.1. Addr-Spec Specification on RFC 5322 specifying the syntax of an email, an email address should carry the following basic pattern:

addr-spec = local-part "@" domain

Note:local-part’ denotes the section occurring before the @ sign in an email address.

Our simple regex checks that this structure is preserved in addition to validating no white space character is present in the email address string.

Java email validation using the Pattern class

Let’s write a simple Java method with the help of the Pattern class readily provided in the java.util.regex package:



public static boolean isValidEmailAddrRegex(String emailValidationRegex, String emailAddrToValidate) {
return Pattern.compile(emailValidationRegex) // 1
.matcher(emailAddrToValidate) // 2
.matches(); // 3
}

  1. Compile the emailValidationRegex String into a Pattern object.
  2. Call the matcher method of the Pattern object returned on step 1 passing in the emailAddrToValidate String as an argument.
  3. Execute the Matcher.matches method of the Matcher object returned on the previous step; this step returns a boolean indicating if emailAddrToValidate String matched the regex in the emailValidationRegex String.

Pattern class also supplies a Pattern.matches method letting us simplify the previous lines to a single line:



public static boolean isValidEmailAddrRegex(String emailValidationRegex, String emailAddrToValidate) {
return Pattern.matches(emailValidationRegex, emailAddrToValidate);
}

We can test out this method using a simple JUnit assertion:



private static final String REGEX = "^\\S+@\\S+$";

@Test
public void testEmailRegex() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.com"));
}

Test results show our regex pattern succeeded in identifying an email ID:

But, note that at the moment our regular expression is far from being “production-ready” because it would still pass an invalid email address such as:

  • Email addresses with local-parts past their maximum character limit.
  • Leading, trailing, and consecutive dots in the email address.

So, let’s find out how we can make our Java email validation regex more Internet Message Format laws-abiding shall we?

Adding restrictions for the domain

As our next step, let us tackle these domain restrictions in our regular expression:

  • Allow Latin characters (“a” to “z” both lowercase and uppercase) in any location of the email address.
  • Let digits from 0 to 9 reside anywhere within the email address.
  • Allow hyphen (“-”) characters within the email address but not as a leading or trailing character.


^\S+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$

Why not try passing in an email ID with a special character to see if our email regex doesn’t allow it anymore?



private static final String REGEX = "^\\S+@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$";

@Test
public void testEmailRegex() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@g$mail.com"));
}

As we can see, the assertion fails as expected:

Checks for consecutive, trailing, or leading dots

RFC 5322 passes addresses with periods in them as valid email addresses as long as the periods aren’t the first or last letter of the address and not repeated one after the other (“..”).

Let’s modify our regular expression to cater to these requirements:



^(?:[^.\s])\S*@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$

Since we had already handled the restrictions on domain names in the previous section, we only had to make a slight change to the section of our regex validating the local part of the email address.

In detail, we replaced the \S+ allowing one or more non-whitespace characters before the @ sign (which permitted a period at the start) with a (?:[^.\s])\S* regex allowing a single character that isn’t a period or whitespace followed by zero or more non-whitespace characters.

Now, our regular expression shouldn’t allow trailing, leading, or consecutive dots in both the local part of the domain of an email address.

Shall we test our new regular expression to see if it works as expected?



private static final String REGEX = "^(?:[^.\\s])\\S*@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$";

@Test
public void testEmailRegex1() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.com"));
}
@Test
public void testEmailRegex2() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, ".test123@gmail.com"));
}
@Test
public void testEmailRegex3() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail..com"));
}
@Test
public void testEmailRegex4() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.com."));
}

Test results show that the valid email ID on testEmailRegex1 asserts as true while the other tests with consecutive, trailing, or leading periods assert as false:

Has restrictions for the local part and domain part

Until this moment our email regex let any number of letters to be present in the email address and didn’t allow any special characters in them either. But, RFC 5321 specifically limits the length of a local-part to 64 octets and RFC 5322 allows a set of special characters ("!", "#", "$", "%", "&", "'", "*", "+", "-", "/", "=", "?", "^", "_", "`", "{", "|", "}", "~”) in the local part of an email address.

Let us go ahead and add support for these on our regular expression:



^[a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-][a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]{0,63}@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]{2,7})*$

As you can see, we’ve added the aforementioned restrictions to our email regex along with an additional restriction allowing a top-level domain to only contain 2 to 7 letters.

Testing out our regular expression shows it follows all the new local part and domain part restrictions we introduced:

Validates non-Latin or Unicode characters email

Up until this point, our regular expression only passed all the characters in the English language. But, with the expansion of internationalization in the world, IETF has adapted and already produced multiple RFCs passing addresses with Unicode characters as valid email addresses.

So, why don’t we see how we can adapt as well? Simply exchanging all occurrences of a-zA-Z in our email regex with \p{L} that implies a character in the category of Unicode letters, we should be able to pass all email addresses holding Unicode characters from our regular expression:



^[\p{L}0-9!#$%&'*+\/=?^_`{|}~-][\p{L}0-9.!#$%&'*+\/=?^_`{|}~-]{0,63}@[\p{L}0-9-]+(?:\.[\p{L}0-9-]{2,7})*$

Time to test if the above regex accepts unicode characters as well!



private static final String REGEX = "^[\\p{L}0-9!#$%&'*+\\/=?^_`{|}~-][\\p{L}0-9.!#$%&'*+\\/=?^_`{|}~-]{0,63}@[\\p{L}0-9-]+(?:\\.[\\p{L}0-9-]{2,7})*$";

@Test
public void testEmailWithoutUnicodeCharacter() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.com"));
}
@Test
public void testEmailWithUnicodeCharacter() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test1Ὠ23@gmail.com"));
}

As we can see in the results, our email regex matched all email IDs containing or not containing Unicode characters as valid email addresses:

Regular Expression RFC 5322 for Email Validation

RFC 5322 provides a rather large regular expression we could use for validating email addresses:



(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

This regular expression allows:

  • Two options for the local part:
  • One or more sequences of characters comprising of letters “a” to “z”, digits 0 to 9, or special characters !, #, $, %, &, ', *, +, /, =, ?, ^, _, `, {, |, }, ~, or -. A period character may exist in the local part as long as it doesn’t lead, trail, or appear consecutively.
  • One or more sequences of any ASCII characters that are placed within double quotes.
  • Two options for the domain:
  • A domain name following the domain name syntax.
  • Specific internet address provided within double-quotes.

Make note that the above regex doesn’t enforce any length limits; this is simply since the RFC 5322 email message format doesn’t designate any length limitations on any part of the email address.

Let’s test out this regular expression in our Java application:



private static final String REGEX = "(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

@Test
public void testValidEmail() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.com"));
}
@Test
public void testValidEmailDoubleQuotedLocalPart() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "\"test123\"@gmail.com"));
}
@Test
public void testValidInternetAddressForDomain() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@127.0.0.1"));
}

Our test results show the regex matched with the valid email IDs holding RFC 5322-allowed syntaxes:

Ergo, we explored a collection of Java email validation regular expressions that would surely prove helpful in your email address validating endeavors. Feel free to use these regular expressions straight away instead of wasting time writing your own regular expression! Plus, if you’d like to explore how regular expressions are used in other languages, check out our ultimate guide to validating emails with Regex.

Using OWASP email validation regex

The Open Web Application Security Project® — or in short, OWASP — leads various initiatives to improve web application security. Courteously, they maintain a validation regex repository holding just what you might need for your email address validations:



^[a-zA-Z0-9_+&*-]+(?:\.[a-zA-Z0-9_+&*-]+)*@(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,7}$

Although the OWASP email address verification regex isn’t as verbose as the RFC 5322 regex we discussed previously, it holds all the basic validations for a standard email address.

Shall we test out how the OWASP regex performs on an email ID String in a Java application?



private static final String REGEX = "^[a-zA-Z0-9_+&*-]+(?:\\.[a-zA-Z0-9_+&*-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,7}$";

@Test
public void testValidEmail() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.com"));
}
@Test
public void testEmailWithSpecialCharacter() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test1&23@gmail.com"));
}
@Test
public void testEmailWithin2To7LettersOnTopLevelDomain() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.com"));
}
@Test
public void testEmailLessThan2LettersOnTopLevelDomain() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.c"));
}
@Test
public void testEmailMoreThan7LettersOnTopLevelDomain() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@gmail.comcomco"));
}
@Test
public void testEmailWithUnicodeCharacter() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test1Ὠ23@gmail.com"));
}
@Test
public void testValidEmailDoubleQuotedLocalPart() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "\"test123\"@gmail.com"));
}
@Test
public void testValidInternetAddressForDomain() {
assertTrue(Main.isValidEmailAddrRegex(REGEX, "test123@127.0.0.1"));
}

Affirming the points we discussed earlier, the OWASP email validation regex matches with

  • basic email addresses
  • Email addresses with a set of special characters (“_”, “+”, “&”, “*”, “-”) in the local part
  • email addresses with 2 to 7 letters in its top-level domain

But essentially fails to match email addresses with Unicode characters, double-quoted local parts, or ones that have internet addresses for the domain:

Apache Commons Email Validation

Apache provides a Commons Validator package to help in various validation scenarios in Java applications. Notably, inside its org.apache.commons.validator.routines package is an EmailValidator class letting us validate if an email address conforms to the RFC 822 format.

Setting up the Apache Commons Validator package inside our Java app is as simple as adding its Maven dependency to the project’s pom.xml file:



dependencies
.
.
dependency
groupId commons-validator /groupId
artifactId commons-validator /artifactId
version 1.7 /version
/dependency
/dependencies

Then, all we have to do is import the EmailValidator class inside the Commons Validator package on any class we’d like to make email address verifications:



import org.apache.commons.validator.routines.EmailValidator;

Let us test this EmailValidator using a few JUnit tests as usual:



@Test
public void testEmailValidatorValidEmail() {
assertTrue(EmailValidator.getInstance().isValid("test123@gmail.com"));
}
@Test
public void testEmailValidatorInvalidEmail() {
assertTrue(EmailValidator.getInstance().isValid("test1..23@gmail.com"));
}
@Test
public void testEmailWithSpecialCharacter() {
assertTrue(EmailValidator.getInstance().isValid("test1&23@gmail.com"));
}
@Test
public void testEmailWithUnicodeCharacter() {
assertTrue(EmailValidator.getInstance().isValid("test1Ὠ23@gmail.com"));
}@Test
public void testValidEmailDoubleQuotedLocalPart() {
assertTrue(EmailValidator.getInstance().isValid("\"test123\"@gmail.com"));
}
@Test
public void testValidInternetAddressForDomain() {
assertTrue(EmailValidator.getInstance().isValid("test123@127.0.0.1"));
}

Test results show that the Apache Commons library’s EmailValidator successfully matches an email ID holding:

  • the basic structure
  • special characters (RFC 822 allows any character except specials (" / ", "<",  ">", "@", ",", ";", ":", "\", “"”, ".", "[", "]"), space, delete, or ASCII control characters)
  • Unicode characters
  • a double-quoted local part

But, since the EmailValidator follows RFC 822, it doesn’t match with email IDs holding internet addresses for their domains:

Using an API to validate emails

If you want to validate email addresses with your hands off of lengthy regexes and your Java project relieved from the weight of yet another library, chances are you’d be better off with a proper Java email validation API.

An easier way to find valid email addresses

Say hello to Abstract Email Validation and Verification API. Using this email verification API you’d not only get a simple email address syntax verification but along with it obtain:

  • MX record and SMTP server validation
  • Automatic email address misspelling detection and smart suggestions to correct them
  • Analyze if the provided email ID uses free and /or disposable email provider domains
  • All of the above verifications and features in a privacy-friendly (GDPR, CCPA) manner

Why not give Abstract Email Validation API a try? Because without even any confirmation mails it literally takes only the time of typing in your email address and a password!

Additionally, don’t hesitate to scope out the Best Email Validation and Verification APIs list for a detailed look into the API options available along with their pros and cons.

Which email validation method should you use?

As we saw when exploring multiple methods for Java email validation, each approach holds varied complexities and different support of additional features like special characters, Unicode characters. Hence, the ultimate choice of which email regex or technique to use for your email address validation boils down to which specific characteristics and features you’d need your valid email addresses to adhere to.

Fundamental email validation

As the simplest email validation, you can check for an “@” sign between two words using the simple regex validation.

Safest method to validate your email address

But, in case you’re looking for exhaustive validation for your emails, the RFC 5322 validation we discussed would be your safest bet. Although, keep in mind that it doesn’t allow for non-Latin or Unicode characters while those letters in email addresses could become quite the norm in the internationalizing days we live in.

If you’re interested, feel free to check out a similar guide on how to do email validation in jQuery.

Emma Jagger

Emma Jagger is an experienced engineer and Google alumna with a degree from Carnegie Mellon University. She specializes in email validation, IP geolocation, and API integration, focusing on creating practical and scalable solutions through her technical writing.

Get your free
Email Validation
key now
See why the best developers build on Abstract
get started for free

Related Articles

Get your free
Email Validation
key now
4.8 from 1,863 votes
See why the best developers build on Abstract
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
No credit card required