Fastie Systems
- Essays

Protecting Email Addresses on Your Web Site

You probably know by now that the bad email guys out there get many of the email addresses they use by harvesting them from sites on the Web.  "Bots" constantly roam the Web, reading Web sites and looking for anything that resembles an email address. The simplest attack looks for MailTo links in the HTML code of a page but, importantly, text email addresses like nomail@ptcto.com will be found.

To avoid this problem, many Web site authors have eliminated MailTo links and any text using the email address format. A common technique is to create a small image containing an email link, like this one (it looks like text, but try to put your cursor in it):

A simple text technique is to eliminate the commercial at sign (@). You'll see addresses like "nomail at ptcto.com" or "nomail [at] ptcto.com." The first example illustrates the fatal flaw of this approach, which is that if everyone started using the technique the bots would quickly be adapted to assume that "nomail at ptcto.com" was simply another valid form of email address. Thus the second example tries to be different than the basic approach but again, if it was widely adopted it would simply become another valid form.

"Live" or "hot" email links on a Web page are a great convenience to site visitors. The techniques above eliminate the convenience. However, there are ways to provide hot MailTo links while befuddling the bots.

Dan Benjamin, formerly of Automatic Labs and HiveWare, has created an online application called Enkoder. Enter an email address and Enkoder will create a chunk of Javascript that contains an encrypted form of that email address. Plop this down in your HTML at the right spot and you're in business. If you look at the code, you'll see that no bot is likely to "dekode" this slug.

I learned about Enkoder from my old friend Brian Livingston, who among other things is the editor of the superb Web site and newsletter WindowsSecrets.com. He wrote the eBook "Spam-Proof Your E-Mail Address", which I own and can recommend.

I like the Benjamin's approach but the code is big and ugly, making a mess of the HTML in your pages. Here's a simple JavaScript technique I developed to accomplish more or less the same thing while leaving the MailTo link readable and reasonably easy to change.

<script type="text/javascript"><!--
var ats = "\x40";
var dot = "\x2E";
var nam = "widgets";
var dom = "fastie";
var tld = "net";
document.write("<a href=\"mailto:" + nam + ats + dom + dot + 
  tld + "\">" + "Click Here to E-mail Me" + "</a>"); //-->
</script>

Once again, you can simply copy the code above into your Web page at the correct spot and edit it to change the components of the email address. Note that the dot and at sign characters are represented by their hexadecimal equivalents in an attempt to thwart detection and so that no clear text is present in the MailTo link itself.

Like the simple text technique discussed above, this simple JavaScript technique is vulnerable. A bot could be written that detected pieces of this code, such as the "var ats" declaration. Therefore, if you use this technique you should change the name of all the variables to your own random choices. If millions of sites used this technique but all the names were different, it would be much more difficult to write bot code to recognize the link.

This wasn't good enough for me. I wanted to institutionalize the technique in such a way that I would have the minimal amount of code on my pages with nothing that could be used as a clue for bots. The result is that an ordinary email link is much smaller:

<script type="text/javascript"> <!--
  nbmf("widgets");
//--> </script>

Here's the way the full version of the function is called. This looks more like an email address but it is still hard for a bot to parse. Note that this call omits the last two optional arguments.

<script type="text/javascript"> <!--
  nbm("widgets", "fastie", "net");
//--> </script>

Unless a bot is programmed to know my custom JavaScript function names "nbm" and "nbmf", this can not be parsed as an email address.

My full function is called "nbm" for No Bot Mail. nbmf is a shorthand version that always assumes the domain is ptcto.com and that I want to display the email address in italics. Both functions are shown below. I do not believe that a bot will find anything it can parse into a valid email address in this code. If you think I'm wrong, . On the other hand, if you think this code is the greatest thing since sliced bread, .

I hope these functions prove helpful to you. Feel free to use them; I'd appreciate but do not require credit. If you do use them, change the function names. And please, consider using the revised, production version shown in the addendum below. I do.

function nbm(emlname, emldomain, emltld, emldisplay, emlsubj) {
// Written 20 Feb 2004 by Will Fastie 
// Edited  11 Jul 2004
// May be used freely with attribution.
//
// This function creates a MailTo link from the passed arguments.
// Using this function instead of embedding MailTo links in HTML prevents email
// bots from seeing any email addresses on a site.
// The function may be called with 3, 4, or 5 arguments.
// The emlname, emldomain, and emltld arguments are combined to create the address.
// If present, the emldisplay argument becomes the visible text of the link; if
// it is not present or is blank, the email address is used instead.
// When the email address is used for the display text, it is italicized.
var txtdisplay;
var txtsubj = "?subject=";
var ats = "\u0040";
var dot = "\u002E";
var itopen = "<i>";
var itclose = "</i>";
// Construct the email address
var txtaddr = emlname + ats + emldomain + dot + emltld;
// Examine other arguments and act accordingly
if (arguments.length == 5) { // there is a subject argument
  if (emldisplay == "") {
    // display text is blank, use email address and italicize
    txtdisplay = itopen + txtaddr + itclose;
  } else {
    txtdisplay = emldisplay;
  }
  // append the subject to the email address
  txtaddr = txtaddr + txtsubj + emlsubj;
} else if (arguments.length == 4) { // there is a display argument
  if (emldisplay == "") {
    // display text is blank, use email address and italicize 
    txtdisplay = itopen + txtaddr + itclose;
  } else {
    txtdisplay = emldisplay;
  }
} else { // three arguments only
  // display text is not provided, use email address and italicize 
  txtdisplay = itopen + txtaddr + itclose;
}
document.write("<a href=\"mailto:" + txtaddr + "\">" + txtdisplay + "</a>");
}

function nbmf(emlname) {
// This version of nbm always assumes the fastie domain.
var f = "fastie";
var n = "net";
  nbm(emlname, f, n);
}

Keep in mind that not every browser will have JavaScript enabled. As long as the function is called as shown above, this just means that the email address will not appear. If your Web skills are up to it, you can write the HTML so that it uses another technique, perhaps the graphical version, if JavaScript is not available.

Addendum - A Cleaner Version

I wrote the code above some time ago. Since then, I've come to better understand the advantages of CSS-based, XHTML-compatible page design and my JavaScript skills have improved just a tiny bit. My original code uses a very poor technique - it incorporates HTML formatting tags for italics. My new version is format neutral. The generated email address will take on the formatting provided by the surrounding XHTML content or the CSS styles for an <a> tag.

In addition, the call to the nbm function contains the email address in pieces: "widgets," "fastie," and "net." While I doubt most bots will actually put this together, I liked the nbmf function that separated the domain from the email name and wanted to make it standard.

Here is what a simple call to the new version looks like on the page:

<script type="text/javascript">NOBOTS.nb("widgets", NOBOTS.myd("fn"));</script>

The nb function requires a minimum of two arguments, the email name and the domain. The function myd ("my domain") is called with an ID string and returns the domain. Unlike the nbm function, only one component of a full email address would be visible to a bot, making it very difficult to construct an email address from this code.

An important difference between this code and the nbm and nbmf examples above is that the nb call does not use the long-standing HTML comment trick to hide the script code from browsers that do not support the script tag. Why? Because the trick is not XHTML-compliant.

Here is the new code. It is my production version and is thus more compact without comments.

// Written by Will Fastie, 19 Aug 2005
// Revised for generality and Global Abatement 10 Jul 2008

/*global NOBOTS */
NOBOTS = {};

NOBOTS.constructaddr = function (ename, edomain) {
    var atsign = "@";
    var addr = ename + atsign + edomain;
    return addr;
};

NOBOTS.nb = function (ename, edomain, edisplay, esubj) {
    var subj = "?subject=";
    var addr = NOBOTS.constructaddr(ename, edomain);
    var display = addr;
    var atag;
    if (((arguments.length === 4) || (arguments.length === 3)) && (edisplay !== "")) {
        display = edisplay;
    }
    if ((arguments.length === 4) && (esubj !== "")) {
        addr = addr + subj + esubj;
    }
    atag = "<a " +  " href=\"mailto:" + addr + "\">" + display + "</a>";
    document.write(atag);
    return null;
};

NOBOTS.myd = function (id) {
    var d;
    switch (id) {
    case "fn":
        d = "fastie.net";
        break;
    case "fc":
        d = "fastie.com";
        break;
    default:
        d = "fastie.net";
        break;
    }
    return d;
};

Like its predecessors, nb has optional arguments for the displayed portion of the <a> tag, which defaults to the email address if not supplied, and for the subject line of the email.

The myd function is very simple. It uses the id to select one of several domains. I wrote it this way because I have multiple domain names and wanted to use the same code everywhere. If you use this code and have only one domain name, you can simply change myd to return your domain name and delete the switch statement.

This new version of my bot-thwarting function is now in use throughout my sites and the sites I build for clients. I am no longer using nbm and nbmf.

The JavaScript function definitions used in my newest version do not resemble the definitions taught in most programming texts or shown in most code samples found on the Web. This summer I studied the advice of Douglas Crockford and commend it to you. His lectures are excellent training. My meager JavaScript efforts are the better for it.


Article Copyright ©2004-2008 by Will Fastie. All Rights Reserved.
Use of the code in this essay is at your own risk. No warranty, implied or otherwise, is given.