Is Valid Internet Email Address
Validating internet email addresses is a very difficult thing to do. For example, name@domain.com is valid, but there are many final qualifiers (.com, .net, .tv, all the country qualifiers like .uk, .ca, etc., and many others). And the part after the @ sign could be an IP address instead of the host name. And many characters are allowed before the @ sign. About the only thing that you know has to be there is the @ sign itself. Almost all of the regular expressions that we've found on the internet to validate internet addresses have some sort of fault. Either they are not comprehensive enough, or they go way over the top and are difficult to understand. This one may not be 100% comprehensive, but it's pretty close, and it's fairly easy to understand (especially after we explain it).
Here's the whole function to validate an internet email address:
function isValidEmail(emailAddress) {
var re = /^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[(2([0-4]\d|5[0-5])|1?\d{1,2})(\.(2([0-4]\d|5[0-5])|1?\d{1,2})){3} \])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/
return re.test(emailAddress);
}
OK, let's look at this in pieces. The ^ character indicates that the expression needs to start out with the group in parentheses. But that, in itself, is made of up two groups with a vertical bar ("or") between. So either grouping can be at the start of the email address. Let's take a look at the two groups independently:
([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)
The brackets indicate a group of characters. The + at the end says that the group of characters has to appear at least once, but it can appear as many times as you want. The group starts out with a carat (^) character, which indicates negation. So we are listing characters that CANNOT appear: greater than or less than signs, open or close parentheses, open or close brackets (the slash before the close bracket indicates that the character is literal instead of closing the group of characters), slash (needs to have a slash in front to indicate a literal character), period, comma, semicolon, colon, a whitespace character (space, tab, form feed, line feed), at sign, or quote. If any one of those characters appear one or more times, the address will be not valid. But all other characters not listed are valid and something outside of that group must appear at least once (because of the + sign following the group).
After one or more valid characters, there is parentheses around another group, followed by a star (*) character. The star idicates that the preceeding group can appear zero or more times. In other words, if it's omitted it's ok, but it can appear. That applies to the whole grouping of characters. Within this grouping, is a period followed by one or more valid characters (same listing as before).
All this means that there has to be one or more valid characters before the @ sign, and if there are any periods before the @ sign they must be followed by one or more valid characters. Since periods themselves are listed in the not valid characters, a period cannot be the first character and there cannot be consecutive periods before the @ sign.
(\".+\")
This grouping is a bit easier to understand. A literal quote character can appear, followed by one or more characters (here, the period indicates any single character) followed by another qoute. So, "johndoe"@ is a valid starting part of the email address. And if there truly was a case where consecutive periods were in the part before the @ sign, the whole thing could be enclosed in quotes to make it valid ("john..doe"@ would be a valid first part, but john..doe@ would not be valid).
After one of the first two groups of characters appears at the start of the email address, the @ character must appear. Then the next grouping must appear at the end of the email address (the $ after the big grouping indicates that). There are other "sub" groups that are "or"ed together to make up the big grouping. Let's go over each of the "sub" groups.
(\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])
This group looks for the IP address format of the domain. So this group says that a left bracket (the slash again indicates the literal character) must be followed by anywhere from 1 to 3 digits ([] indicates a group of characters, 0-9 indicate the allowed characters, {1,3} indicate a range of times the previous group of characters must appear - between 1 and 3). After the bracket and 1 to 3 digits must come a period, then another 1 to 3 digits, then another period, then another 1 to 3 digits, and then another period, then another 1 to 3 digits, and then the right bracket.
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,})
The + at the end of the first group indicates that the preceeding grouping must appear one or more times. This grouping is one or more valid characters followed by a period. Valid characters are letters, numbers, or hyphens. Domain names can only contain letters, numbers, or hyphens (go to www.godaddy.com and attempt to register a domain name with any other character somewhere in it). So every period must be preceeded by one or more characters, but there can be as many groupings of character(s) followed by a period. At the very end must be a group of 2 or more letters. This handles the ending qualifiers of a domain name. It used to be a requirement that the ending qualifiers were anywhere from 2 to 4 characters (.tv has 2 characters, .name has 4) but recently state names have been introduced as qualifiers, so we just decided to make sure it was at least 2 characters and didn't put an upper limit on the length.
So those two groups are "or"ed together. So after the @ sign must appear one of those groups - either the IP address format, or the domain name format.
To test out this regular expression, put in an email address and click the button: