DIYbanter - View Single Post

John Rumm

Mike Clarke wrote:
John Rumm wrote:

Interestingly, research suggests[1] that harvesting of email addresses
from usenet is *far* more likely to happen with addresses that appear in
the headers that those that just appear in the body text.

Not surprising really. It's much quicker to download thousands of headers
only instead of complete messages. Having downloaded them it's a simple
matter to filter out just the "From:" lines and extract the address with a
simple regex. Scanning the entire message body would be much more time
consuming, and not even yield any results for the majority of messages.

Same logic seems to apply to web harvesting as well. Much less hassle to
get the HTML and read the source code for "mailto:" links, rather than
have to do any processing/unescaping or execution of Jscript to render
obfuscated email addresses machine readable.

--
Cheers,

John.

/================================================== ===============\
| Internode Ltd - http://www.internode.co.uk |
|-----------------------------------------------------------------|
| John Rumm - john(at)internode(dot)co(dot)uk |
\================================================= ================/