Tag Archives: Regular Expressions

Email Validation Using Regular Expressions (the right way… really)

OK, I know this is the millionth blog post claiming to have the right way to validate email addresses, but here me out:

Regular expressions are awesome–and I mean Wrath of Kaaaaahn! awesome. They yield the unholy power to make or break your system, to secure or rip apart your entry points. Woven wisely, they can be magical. Woven foolishly, they can destroy you.

After spending way too long sifting through broken regular expressions, expecting that someone, somewhere has solved the obvious need for the ultimate email validation RegEx, I gave up on searching and created my own. Yes, you may have solved it too, but searching google for ’email validation regular expression’ gives some very poor answers. Look at all the variations on RegexLib.

I do recommend reading Regular-Expressions.info’s take on email regex. They make great points about trade-offs when using the RFC spec for emails. But I’m still not happy with their ‘practical’ email regex:

[a-z0-9!#$%&’*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&’*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

— from regular-expressions.info

This version may work for you, but I have two issues with it:
1. It supports characters that most other websites will reject as invalid–even though they are RFC compliant (e.g. `@asdf.com)
2. It’s overly complex for what it’s checking.

I have a simpler version, which doesn’t support everything that the above regex supports (though, you could add the extra characters if you like), but it does support everything I consider to be reasonable for an email address while being light, fast, and easy to understand. It also simplifies much of the regex by shortening collections like [a-zA-Z0-9_] by using the RegEx shortcut for word characters ([\w]) instead.

My Regular Expression Breakdown

Here it is:

/^[\w.%&=+$#!-‘]+@[\w.-]+\.[a-zA-Z]{2,4}$/

  1. /^ – the beginning of the string (must not have a newline character or some other invalid character before the email address starts)
  2. [\w.%&=+$#!-‘]+ – the username portion, which allows any word character a-z, 0-9, underscore (_) but also a few other characters. You can pick and choose what you want to remove from this list but keep in mind that some people really do create crazy emails–and remember that spam bots tend to use emails like skiwytyru32hh@mail.ru, which will pass any validator (unless you are checking emails against a spam database on the back-end).
  3. @ – gotta have this (and only 1)
  4. [\w.-]+ – the domain or subdomain + domain (asdf.com, or sub.asdf.com)
  5. \.[a-zA-Z]{2,4} – the top level domain (.com, .co.uk, .info)
  6. $/ – the end of the string (must end once it’s considered valid (we don’t have trailing spaces, line breaks or other data)

The below functions are documented using PHPDocs with valid test data and known unsupported emails. They also contain codesnippit GUID identifiers, which I recommend for all code snippits (read Jeff Atwood’s “A Modest Proposal for the Copy and Paste School of Code Reuse” for more info)

JavaScript implementation (not for security!):

/**
 * validEmail checks for a single valid email
 *
 * supported RFC valid email addresses (test data):
 * a@a.com
 * A_B@A.co.uk
 * a@subdomain.adomain.com
 * abc.123@a.net
 * O'Connor@a.net
 * 12+34-5+1=42@a.org
 * me&mywife@a.co.uk
 * root!@a_b.com
 * _______@a-b.la
 * %&=+.$#!-@a.com
 *
 * Current known unsupported (but are RFC valid):
 * abc+mailbox/department=shipping@example.com
 *  !#$%&'*+-/=?^_`.{|}~@example.com (all of these characters are RFC compliant)
 * "abc@def"@example.com (anything goes inside quotation marks)
 * "Fred \"quota\" Bloggs"@example.com (however, quotes need escaping)
 *
 * @param string email The supposed email address to validate
 * @return bool valid
 * @author Adam Eivy
 * @version 2.1
 * @codesnippit bcd71ab9-dc05-45af-9855-abb57c0cf0ab
 */
function validEmail(email){
   var re = /^[\w.%&=+$#!-']+@[\w.-]+\.[a-zA-Z]{2,4}$/;
   return re.test(email);
}

Click to copy:

PHP implementation:

I use the same regular expression in PHP, but PHP also allows us to check the domain for a mailserver, which has a two part advantage:
1. we can be sure that the domain exists
2. we can be sure that the domain has a mail server setup on it (using PHP checkdnsrr)

This is where the real validation happens. Remember: You CANNOT trust client side code. This means, you can never assume that your JavaScript code validated your email address. It’s a nicety for the user but not a security checkpoint. You need to check this stuff on the back end, always.

/**
 * validEmail checks for a single valid email
 *
 * supported RFC valid email addresses (test data):
 * a@a.com
 * A_B@A.co.uk
 * a@subdomain.adomain.com
 * abc.123@a.net
 * O'Connor@a.net
 * 12+34-5+1=42@a.org
 * me&mywife@a.co.uk
 * root!@a_b.com
 * _______@a-b.la
 * %&=+.$#!-@a.com
 *
 * Current known unsupported (but are RFC valid):
 * abc+mailbox/department=shipping@example.com
 *  !#$%&'*+-/=?^_`.{|}~@example.com (all of these characters are RFC compliant)
 * "abc@def"@example.com (anything goes inside quotation marks)
 * "Fred \"quota\" Bloggs"@example.com (however, quotes need escaping)
 *
 * @param string $email The supposed email address to validate
 * @param bool $validateDomain (default true): ping the domain for a valid mailserver
 * @return bool valid
 * @author Adam Eivy
 * @version 2.1
 * @codesnippit fa5a06bf-2bce-41a8-a2e0-2f6db7dd22f9
 */
function validEmail($email,$validateDomain=true){
   if(preg_match('/^[\w.%&=+$#!-\']+@[\w.-]+\.[a-zA-Z]{2,4}$/' , $email)) {
      if(!$validateDomain)   return true; // not testing mailserver but regex passed
      // now test mail server on supplied domain
      list($username,$domain)=split('@',$email);
      if(checkdnsrr($domain,'MX')) return true; // domain has mail record
   }
   return false; // either failed to match regex or mailserver check failed
}

Click to copy:

follow on Twitter