The best all-around regex to find valid email addresses
Before we get ourselves to the bottom of the said rabbit hole, let’s jump straight to the best regex for email validation just to save your time:
You may simply copy-paste the above regular expression but please note that it’s more than recommended to keep on reading. This is since as you’ll find in the upcoming sections, there’s more to validating user input email addresses through regular expressions than simply copying a code.
This somewhat complex regex validating email addresses
- allows Latin characters ("a" - "z" or "A" - "Z") within the email address.
- permits digits (0 - 9) in the email address.
- enforces domain part restrictions.
- allows hyphens (-) inside the domain as long as they don't lead or trail the domain.
- allows IP address literals surrounded with square brackets ([]) for the domain names.
- restricts sub-domains to a maximum length of 63 characters.
- applies local part restrictions.
- permits the set of special characters allowed by RFC 5322 ("!", "#", "$", "%", "&", "'", "*", "+", "-", "/", "=", "?", "^", "_", "`", "{", "|", "}", "~”) to reside in the local part.
- lets local part comprise as double quotes housing one or more sequences of ASCII characters.
- doesn't allow trailing, leading, or consecutive periods anywhere within the email address.
While the above regular expression covers most of the email address-related rules and regulations, it does have some shortcomings:
- doesn't allow Unicode characters.
- doesn't check for the entire length of the email address to be less than or equal to 253 characters.
- ignores obsolete syntax-related rules.
Want to get a bit more practical? Refer to our email regex guide for a full list of code examples by language.
General user input email patterns and regular expressions
Before your company’s corporate mail gets swarmed with SQL injection attacks or your personal emails get sent to the wrong recipients, let’s get a brief idea of email addresses and regular expressions just so you’ll know exactly what to look out for.
A general email address looks like this
According to the currently used Internet Message Format (IMF) named RFC 5322, a general email pattern takes this form: local-part@domain
The elements that make up this email pattern are:
- Local-part – a locally interpreted string constrained by a collection of rules the currently active IMF enforces. RFC 5322 lets the local-part conform to a dot-atom or a quoted string.
- “@” sign – symbol separating the local-part from the domain. An ASCII character of value 64 is used to represent this element.
- Domain – a string holding the name of the web service to which the email should be delivered. RFC 5322 asks the domain of valid email addresses to consist either of a dot-atom or a domain-literal within square brackets ([]).
What about regular expressions?
A regular expression — or its more commonly used term, a regex — is simply a search pattern defining what a particular string that wishes to match with it should and/or shouldn’t have.
Use cases of regular expressions
Regular expressions are widely used for string searching and string replacing tasks such as
- Validate email addresses.
- Web scraping.
- Credit card number format validation.
- Validating password input string against complexity requirements.
- Removing unwanted characters from strings, e.g., punctuation, extra space.
For a detailed look into how different programming languages handle regular expressions, have a look at our email validation regex guide.
The basic format of a regular expression
Regular expressions comprise textual patterns holding
Metacharacters
A collection of ‘characters and sequences of characters’ reserved by regular expressions to represent specific patterns.
For instance, a caret symbol (^) and a dollar symbol ($) would mean the start and end of a string consecutively. Similarly, a period (.) inside a regular expression would mean “any character”.
Hence, a regular expression such as ^.$ would act as a case insensitive matching option matching with any single character like “D”, “g”, “5”.
A simple googling would bring you cheatsheets holding these regular expression metacharacters you could easily refer to.
Regular characters
Usual characters that’d be matched for their literal value.
Adding to the above regex, regular expressions like ^.ed$ will match with strings such as “bed”, “fed”, or “Ted”.
RFC 5322 official standard regular expression to validate email addresses
The above regular expression conforms to the RFC 5322 standard and matches with basic email addresses.
Let’s walk through each section of this regular expression, shall we?
- local-part matches with one or the other of two subsections:
- [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)* – match with a dot-atom-text.
- "([]!#-[^-~ \t]|(\\[\t -~]))+" – match with a quoted-string within double-quotes. This regex subsection excludes whitespace-related rules RFC enforces for a quoted-string since they’re irrelevant when validating emails.
Similarly, the domain matches with either one of two subsections:
- [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)* – match with a dot-atom-text.
- \[[\t -Z^-~]*] – match a domain-literal; note that this regex subsection ignores whitespace-related rules RFC defines for domain-literals which are irrelevant when you need to match email addresses.
As for further limitations, notice that
- its source informs the regular expression isn’t “optimized for performance”.
- this regular expression overlooks rules related to RFC’s obsolete syntax.
Supplemental additions
A few more changes to the previous regular expression could improve its accuracy:
How about we explore the new additions and modifications added to the above regular expression?
- (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} – match with IPv4-address-literals.
- IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::) – match with IPv6-address-literals.
- [0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ – match with general-address-literals.
- [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])? – enforce subdomains to have a maximum length of 63 characters.
Why regex might not be your best friend for validating email addresses
Time for the plot twist! Up until this point, we lay the groundwork of validating email addresses through regular expressions and explored the best regular expressions to do so. But, what if I told you that using regular expressions to validate email addresses is actually more hazardous than you might’ve imagined?
A hit on performance
With the RFC 5322 standard’s complexities, a regular expression honoring all its rules & regulations may end up as one large expression requiring high CPU loads to process. Hence, these complex regex instances could end up lagging your company’s servers, and what’s worse? A hacker could exploit this and launch a ReDoS attack completely halting your web service.
An inconvenience to maintain
Assume you’re currently using the most optimal and updated regular expression to validate email addresses. But, since a regular expression isn’t some program you can add/remove some modules to/from, the moment IMF changes to a new standard you’d be back at square one, and you’d have to search for the new “best regex” once again.
What’s better?
Due to the aforementioned drawbacks among others, it could be more appropriate to resort to an API to find valid email addresses. Just to help you out, here are the best email validation and verification APIs that currently exist on the market.
Conclusion
Because of its precise search pattern matching and compact nature, a regular expression can be your best bet to validate user input email addresses in most everyday scenarios. Having said that, in more cases than not, using an API to validate email addresses can be called a good alternative for the same purpose.
If you want a deeper dive into how specific programming languages and frameworks find valid email addresses, check out how the Python, Ruby, PHP programming languages and the jQuery framework face these e mail address verifications.
Indeed, we discussed all the essentials about regular expressions and how they may help your email address verification endeavors. But, let us end this article with a quite important guideline:
Always test your chosen regular expression on the website, app, server, etc., the location you’d be using it in instead of simply copy-pasting it so as to save yourself from matching invalid addresses and the heaps of terrible dilemma that’d follow.