Guides
Last updated
November 21, 2023

How to Validate Emails with SQL

Brian Wu

Table of Contents:

Get your free
Email Validation
API key now
4.8 from 1,863 votes
See why the best developers build on Abstract
START FOR FREE
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
No credit card required

SQL, or structured query language, is as popular as it is flexible. No matter how much technology continues to evolve, we find ourselves coming back to this foundational and simple database manipulation language. It’s utilized in almost every instance that involves querying, manipulating, or validating databases

One of the most value-adding capabilities of SQL lies in its ability to validate data.

That’s why it's perfect for a use case like email validation.

In this blog, we’re going to dive into all the ways you can use SQL to validate your email list. So, whether you’re a database administrator, SQL developer, or even a marketer with some technical chops - you’ll want to check this out.

Let’s send your first free
Email Validation
call
See why the best developers build on Abstract
Get your free api

EmailWhat is SQL: A Refresher

SQL is more than just a way to retrieve information from a database. It allows you to perform complex calculations, analyze large amounts of data, and build complex integrations. Nearly every backend leverages a language like SQL to retrieve and manage data.

  • Database Management: SQL is commonly used to manage and manipulate relational databases. It allows users to create, modify, and query databases, making it an essential tool for data storage and retrieval.
  • Data Analysis: SQL is used for data analysis tasks, especially in business intelligence and data analytics. Analysts use SQL to extract valuable insights from large datasets by writing complex queries to filter, aggregate, and join data.
  • Web Development: SQL is a fundamental component of web applications. It's used to interact with databases and retrieve dynamic content for websites. Popular web frameworks often incorporate SQL to handle database interactions.
  • Reporting and Dashboards: SQL is essential for generating reports and building interactive dashboards. Tools like Tableau, Power BI, and others rely on SQL queries to fetch and present data in a meaningful way.
  • Data Integration: SQL plays a key role in data integration processes. It's used to extract data from various sources, transform it, and load it into data warehouses or other storage systems. This is crucial for businesses aiming to consolidate and analyze data from multiple platforms.

These are just a few examples, and SQL's versatility extends to many other domains, including data science, data engineering, and more. Its ability to manage and query structured data makes it a fundamental tool in the world of data and information management.

The Syntax

SQL uses a readable syntax that specifies the columns, tables, and conditions to query or manipulate data.

  1. SELECT: This command is used to retrieve data from a database. A Select statement is the most common you’ll see because it’s only meant to retrieve data, and not manipulate it. It can be replaced with other commands like Delete, Insert, and Update
  2. Example: SELECT first_name, last_name FROM users;
  3. FROM: Specifies which table to retrieve the data from.
  4. Used with SELECT: SELECT * FROM users;
  5. WHERE: Filters the results based on a condition.
  6. Example: SELECT * FROM users WHERE age > 30;

You’ll notice that directly after each select statement, the columns that are retrieved are specified. In the case where we want to retrieve all columns, a * is provided.

Cleaning and Validating Emails

Now, let's look at different ways to validate emails using SQL. I’ll start with simple validations, then move on to more complex ways to verify syntax using regex.

Deduping Emails

"Deduping" data involves removing duplicate records from a dataset. When working with SQL and emails, you might want to dedupe entries to make sure an email address appears only once in your result.

Example Table

Imagine we have a table named users with the following structure:


CREATE TABLE users (

id INT PRIMARY KEY,

email VARCHAR(255)

);

Here are some methods you can use to dedupe your email list.

Method 1: Using DISTINCT

If you simply want to retrieve unique email addresses without concern for other columns, you can use DISTINCT:


SELECT DISTINCT email

FROM users;

This will return the unique email values.

The notable drawback of this approach is that you’ll only be able to retrieve the email and not the entire row. To retrieve the entire row, which may contain more pertinent data, you can use the following method.

Method 2: Using ROW_NUMBER() with a CTE

If you want to retrieve whole records while removing duplicates based on email addresses, you might use ROW_NUMBER() and a Common Table Expression (CTE) to identify and remove duplicates:


WITH CTE AS (

SELECT *,

ROW_NUMBER() OVER(PARTITION BY email ORDER BY id) as rn

FROM users

)

SELECT *

FROM CTE

WHERE rn = 1;

In the PARTITION BY clause, rows are divided into partitions to which the ROW_NUMBER() is applied, and here it's applied to email, meaning it will generate a new row number for each distinct email. ORDER BY id determines which of the duplicate rows to keep (the one with the lowest id in this case). Then, the WHERE rn = 1 clause filters out the duplicates, keeping only the first occurrence.

Method 3: Delete Duplicate Rows

If you want to permanently remove duplicate emails from the table, you can utilize a DELETE query with a CTE and ROW_NUMBER():


WITH CTE AS (

SELECT *,

ROW_NUMBER() OVER(PARTITION BY email ORDER BY id) as rn

FROM users

)

SELECT *

FROM CTE

WHERE rn = 1;

Here, rn > 1 ensures that all duplicates except the first occurrence (per the ORDER BY id ordering) are deleted.

Keep in mind that you’ll want to check with your data team before truncating or removing data without properly merging records.

Checking for the Right Syntax (Without Regex)

If you haven’t used regex yet, you might be running queries that have conditions that check for a whole bunch of things.

Validating email addresses without using regular expressions in SQL involves using string functions to check for certain conditions that are generally true for valid email addresses. Here’s a simplistic approach for SQL Server:

Basic Email Validation in SQL Server

Consider that a basic email address is in the format [email protected], where:

  • There are characters before and after "@"
  • There is at least one period after "@"

With this in mind, the following SQL query checks for these basic characteristics:


SELECT *

FROM Users

WHERE

CHARINDEX('@', Email) > 1 AND -- "@" is not at the start

CHARINDEX('.', Email) > CHARINDEX('@', Email) + 1 AND -- There's a "." after "@"

LEN(Email) - CHARINDEX('.', REVERSE(Email)) >= 2 AND -- At least two characters after the last "."

LEN(Email) - CHARINDEX('@', REVERSE(Email)) > 2; -- At least one character after "@" and before "."

Explanation

  • CHARINDEX('@', Email) > 1: "@" should not be at the start.
  • CHARINDEX('.', Email) > CHARINDEX('@', Email) + 1: There should be a period after "@".
  • LEN(Email) - CHARINDEX('.', REVERSE(Email)) >= 2: There should be at least two characters after the last period, assuming the TLD is at least two characters.
  • LEN(Email) - CHARINDEX('@', REVERSE(Email)) > 2: There should be at least one character between "@" and the last period.

Caveats

This approach is quite basic and might allow some invalid emails while disallowing some valid ones.

Without using regular expressions, it is complex to validate email addresses.

Regular Expression and SQL

Regular expressions (regex) are patterns used to match character combinations in strings. They offer a powerful and flexible way to identify strings of text, such as particular characters, words, or patterns of characters. Regex is widely used across software applications and programming languages to perform searches, replace text, and validate data formats.

Regex in SQL

Regular Expressions (regex) are patterns used for matching and manipulating text in programming, utilities, and database queries. They’re versatile for tasks like validation, search, and replacement.

Key Syntax of Regex

  • Literals: Regular characters that match themselves (e.g., a, 1).
  • Metacharacters: Special symbols with distinct meanings (e.g., . matches any character).
  • Character Classes: [abc] matches a, b, or c. [^abc] excludes these characters.
  • Quantifiers: Define how often a character or group appears (e.g., + means one or more times).
  • Positional: ^ and $ signify the start and end of a line, respectively.
  • Grouping: (abc) groups characters; a|b matches either a or b.

At a high level, SQL systems that support regex provide a way to perform advanced string matching and manipulation directly within database queries. This advanced matching can be especially useful when dealing with textual data, such as logs, free-form text entries, or any dataset where patterns in the data might exist.

You can use regex for data validation so that data conforms to validated patterns.

  • Validate Emails: Check if email addresses in a database are in the proper format.
  • Phone Numbers:  Ensure phone numbers conform to a particular pattern, whether it's (xxx) xxx-xxxx, xxx-xxx-xxxx, etc.
  • Dates: Confirm date strings match expected formats like YYYY-MM-DD or MM/DD/YYYY.
  • Password Policies:  For applications storing password requirements (though direct password storage is discouraged), regex can be used to ensure passwords have the necessary combination of characters, digits, and special symbols.
  • Custom Patterns: Maybe you have product codes, serial numbers, or other custom strings that follow specific formats. Regex can validate these as well.

When using SQL and regular expressions to validate email addresses, you should strive to check for common patterns that legitimate emails follow. Note that strictly validating an email address according to the official standard (RFC 5322) is quite complex and often unnecessary.

A practical regular expression might look for the following patterns:

  1. One or more alphanumeric characters, possibly including ".", "_" or "-" before the "@" symbol.
  2. The "@" symbol.
  3. One or more alphanumeric characters after the "@" symbol and before a "." symbol (domain name).
  4. The "." symbol (dot).
  5. Two or more alphanumeric characters after the "." symbol (top-level domain).

Examples

MySQL

In MySQL, the REGEXP operator allows you to filter based on a regular expression:


SELECT *

FROM users

WHERE email REGEXP '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}$';

PostgreSQL

PostgreSQL uses the ~ operator for case-sensitive regular expression matching:


SELECT *

FROM users

WHERE email ~ '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}$';

And ~* for case-insensitive matching:


SELECT *

FROM users

WHERE email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}$';

SQL Server

SQL Server does not natively support regular expressions, but you can use LIKE and PATINDEX for basic pattern matching. For complex regex, you'd typically need to use a CLR (Common Language Runtime) integration solution or manage validation at the application layer.

Here's a basic example using LIKE:


SELECT *

FROM users

WHERE email LIKE '%@%.%'

AND email NOT LIKE '%[% %]';

Abstract API and SQL together

When you use an email verification API, you can make your SQL email verification efforts even more robust.

A good email verification API can do things SQL can’t like:

  • Syntax Check: Check that the email address follows the correct format and doesn't contain any obvious errors.
  • Domain Verification: Confirm that the domain in the email address exists and has valid DNS records.
  • Disposable Email Detection: Identify if the email address is from a disposable or temporary email service.
  • Role Account Detection: Check if the email address is associated with a generic role (e.g., support@company.com).
  • MX Record Validation: Verify the presence of Mail Exchange (MX) records for the email domain.
  • SMTP Verification: Attempt to connect to the mail server to check if the email address is deliverable.

The good news is you can leverage a tool like Abstract API’s email verification API to validate your email list.

To verify and clean emails using the Abstract API along with SQL commands and regular expressions, you can follow these steps:

1. Set Up Abstract API Account

Sign up for an account on the Abstract API website and obtain your API key.

2. Create a SQL Stored Procedure

Create a stored procedure in SQL Server that will interact with the Abstract API and perform email verification and cleaning. This example uses T-SQL.


CREATE PROCEDURE VerifyAndCleanEmails

@ApiKey NVARCHAR(50)

AS

BEGIN

-- Create a temporary table to store results

CREATE TABLE #Emails (

Email NVARCHAR(255)

);

   -- Insert email addresses to be verified and cleaned


INSERT INTO #Emails (Email)

VALUES

('user1@example.com'),

('user2@invalid'),

('user3@example.org'),

('user4@invalidemail.org');

   -- Loop through the emails and validate using Abstract API


DECLARE @Email NVARCHAR(255);

DECLARE @IsValid BIT;

DECLARE email_cursor CURSOR FOR

SELECT Email

FROM #Emails;

OPEN email_cursor;

FETCH NEXT FROM email_cursor INTO @Email;

WHILE @@FETCH_STATUS = 0

BEGIN

-- Check if the email format is valid using regex

SET @IsValid = IIF(@Email LIKE '%@%.%' AND CHARINDEX(' ', @Email) = 0, 1, 0);

       -- If email format is valid, call Abstract API for validation


IF @IsValid = 1

BEGIN

-- Construct the API endpoint URL

DECLARE @Url NVARCHAR(1000) = 'https://api.abstractapi.com/v1/email/verify?api_key=' + @ApiKey + '&email=' + @Email;

           -- Execute the HTTP request and capture the response


DECLARE @ApiResponse NVARCHAR(MAX);

EXEC sp_execute_external_script

@language = N'R',

@script = N'

library(httr)

response <- GET(InputDataSet$url)

responseContent <- content(response, "text")

OutputDataSet <- as.data.frame(responseContent)

',

@input_data_1 = N'SELECT @Url AS url',

@output_data_1_name = N'OutputDataSet',

@params = N'@Url NVARCHAR(1000)',

@Url = @Url

WITH RESULT SETS (([response] NVARCHAR(MAX)));

           -- Parse the API response and process the verification data


DECLARE @VerificationResult NVARCHAR(1000);

SELECT @VerificationResult = response

FROM OutputDataSet;


Use your preferred method to parse and handle the verification result. For example, you can use CASE statements to update the table with validation results


UPDATE #Emails

SET Email = @VerificationResult

WHERE Email = @Email;

END;

FETCH NEXT FROM email_cursor INTO @Email;

END;

CLOSE email_cursor;

DEALLOCATE email_cursor;

   -- Clean up the temporary table and return results


SELECT * FROM #Emails;

DROP TABLE #Emails;

END;


You’ll notice in the above code we created a table, inserted test data, and called the email verification API. Then we parsed the response and recorded the results.

Execute the Stored Procedure

To execute the stored procedure, provide your Abstract API key:


DECLARE @ApiKey NVARCHAR(50) = 'YOUR_API_KEY';

EXEC VerifyAndCleanEmails @ApiKey;

In this stored procedure, we are using a temporary table to store email addresses to be verified and cleaned.

  • It uses a cursor to loop through each email, checking the format's validity using regex.
  • If the format is valid, it calls the Abstract API to verify the email.
  • The result from the API is processed and updated in the temporary table.
  • Finally, the cleaned and verified emails are returned as a result.

SQL Best Practices

When implementing email validation in SQL, here are some best practices to keep performance optimal, and your users happy.

Use String Functions for Basic Validation

For basic email format validation, use SQL string functions like LIKE, CHARINDEX, and LEN to check the presence and position of "@" and "." symbols.

Keep It Simple

Avoid overly complex regular expressions for email validation in SQL. Complex regex patterns can negatively impact performance and increase the likelihood of false positives or negatives.

Validate at the Application Layer

For comprehensive and accurate email validation, consider performing it in your application layer using a language with robust regex support (e.g., Python, JavaScript). This allows you to leverage libraries and tools specifically designed for email validation.

Use External Libraries

If your database system allows external libraries or functions, consider using well-established libraries for email validation, like Python’s email validator. These libraries are often more accurate and optimized.

Test Thoroughly

Always test your email validation logic with a wide range of test cases. Include valid and invalid email addresses. Your testing will help identify false positives and negatives and ensures the accuracy of your validation method.

Optimize Queries

Optimize your SQL queries to reduce unnecessary database calls or processing. Consider using appropriate indexes on columns involved in email validation.

Monitor and Analyze

Monitor the performance of your email validation processes in a production environment. Use database monitoring tools with full stack observability to identify any bottlenecks or inefficiencies in your validation logic.

Document Validation Rules

Maintain clear documentation for your email validation rules. This documentation can help your team understand and maintain the validation process effectively.

Conclusion

There are more than enough ways to clean and validate emails using SQL. The approaches range in complexity. Without regex, you can dedupe, perform rudimentary syntax validation, and check for invalid characters.

Regex opens a range of possibilities, allowing you to greatly simplify validation to a few methods.

Finally, if you’re looking for complete email verification, you can use a combination of SQL commands with the power of an email verification API like Abstract.

FAQ: Email Validation Using SQL and Regex

How can I validate an email address using SQL?

You can use SQL string functions (`CHARINDEX`, LIKE, etc.) or, when supported, regular expressions to validate email formats. Using string functions, you might check for the existence and position of "@" and ".", ensuring they're in the correct order and position.

What is the best regex for email validation in SQL?

A common regex pattern for basic email validation is: ^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}$. Remember that email validation regex can vary in complexity based on your specific needs.

How do SQL servers validate email addresses?

SQL Server uses string functions to validate emails since it doesn’t support full regular expressions natively. You might check if an email contains "@" and ".", and ensure they’re in appropriate positions with functions like CHARINDEX and LIKE.

Why is it important to validate email address format in SQL?

Validating emails ensures data integrity and quality. It helps avoid sending communications to non-existent addresses, safeguarding against bounces and protecting sender reputation.

What are the common challenges in SQL email validation and how to overcome them?

Challenges include SQL’s limited regex support, the complexity of accurate email validation, and the handling of varied valid formats. Overcome these by utilizing a combination of string functions for basic validation or handling robust validation at the application layer.

Can I use SQL queries to validate email addresses?

Yes, you can use SQL queries with string functions to perform basic email validations. For more intricate validation, use regular expressions in programming languages that interact with your database.

What are the best practices for email validation in SQL?

Stick to basic validation with SQL string functions, and for complex validations, perform these in your application logic or use a verified library. Always consider performance impact and test your validation logic to prevent false positives/negatives.

How does email validation impact the efficiency of SQL databases?

Validating emails can consume resources, especially with large datasets. Optimal validation methods are crucial to mitigate performance impact. Inefficient validations can slow down data retrieval and insertion operations.

Are there different methods to validate email addresses in SQL?

Yes. Methods include using string functions for basic validations, and where possible, regex for more comprehensive checks. Some database systems allow integration with programming languages for more sophisticated validation techniques.

Can regular expressions be used for email format validation in SQL?

It depends on the SQL variant. Some, like PostgreSQL, support regex natively. SQL Server does not support full regex, so you'd typically use string functions or handle detailed regex validation within your application or via SQL CLR procedures.


Brian Wu

Brian Wu is the SEO lead at OpenPhone and has some experience with JavaScript, PHP, Python, and more. Brian has previously worked as Head of SEO at RapidAPI, where he integrated SEO with cutting-edge API technologies. Renowned for advancing digital marketing through innovative API use, his strategies significantly enhance online visibility and user engagement.

Get your free
Email Validation
key now
See why the best developers build on Abstract
get started for free

Related Articles

Get your free
Email Validation
key now
4.8 from 1,863 votes
See why the best developers build on Abstract
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
No credit card required