One of the main reasons for getting spam is the extraction and collection of email addresses from public web pages by spambots. There are many different approaches to hindering these spambots from acquiring the correct email addresses. In this little project we try to gather as many different approaches as possible and test them with randomly generated email addresses.
The presentation of an email address on a web page can be either
(1) plain text, so that the visitor can read and/or copy/paste the address,
(2) a mailto-link to the email address, so the visitor can click it, or
(3) a combination of both.
Examples (unprotected) for illustration purposes (and later for reference):
For this experiment we roughly divide the protecion techniques in two groups: Those which hide the links from the spambots and those which obfuscate the text.
Link Obfuscation (5):The first of the two entity encoding methods. A web browser can interpret numerical character codes and replace them with the corresponding characters. @ for example is replaced by @. This method can be used anywhere in an HTML document.
blah blahThe other entity encoding method. This is also referred to as URI escaped encoding, as described in RFC 2396 section 2.4.1. For instance, the sequence %40 is equivalent to a @ in a URL. This method can only be used in links (like the mailto tags).
blah blahA random mix of the two methods mentioned above.
blah blahThis method uses a javascript function to construct the mailto-link while loading the web page. Note that javascript encryption makes addresses unreadable to people with javascript disabled.
This method also uses a function. But this time there is a link pointing to a web page, which is transformed into a mailto-link "on the fly" (while the visitor moves his/her mouse over the link and/or clicks it).
blah blahOne of the first common techniques used to fool the spambots. While some visitors might be annoyed to find the "@" replaced by "AT" and the "." replaced by "DOT", it remains to be seen how many spambots still get fooled this way.
pxp07m1g AT maps DOT iguw DOT tuwien DOT ac DOT atThis method inserts a noise pattern into the text and uses a style tag to make the noise invisible to the visitors who use browsers that support display. However the noise will also get copy/pasted.
gxtr xlml@maps.ig uw.tuwien.ac.atThis technique reverses the text, so that it displays right-to-left. In the HTML code the email address is typed backwards and a style tag is used to make the browser reverse this procedure. Again, copy/pasting shows the true nature of this text.
ta.ca.neiwut.wugi.spam@k4jxlj8hThe same numerical character encoding technique that is used above. This time the text gets obfuscated this way.
i6k8gu7j@maps.iguw.tuwien.ac.atIn this method there are comments inserted into the HTML text. These comments are visible to the visitor neither in the web browser nor while copy/pasting.
7jfo779j@maps.iguw.tuwien.ac.atThis method uses javascript to create the text with the email address while loading the web page.
This method uses pictures that look like text instead of plain text and should be quite save from spambot attacks.