Thread: Copy a website
View Single Post
  #9   Report Post  
Posted to rec.crafts.metalworking
Ignoramus3863 Ignoramus3863 is offline
external usenet poster
 
Posts: 5
Default Copy a website

On 2008-08-31, Lloyd E. Sponenburgh lloydspinsidemindspring.com wrote:
fired this volley in news:b6e6ea8f-a3fe-4f46-
:


If you try downloading my website algebra.com, you will get into an
infinite recursion through millions of pages. That's why I prevent
most such bots from accessing my site. This would work only on very
simple sites.


How does your web server differentiate between a bot and a human

user
making http requests?


Duh! It doesn't. The site has links back to the place where the link
began. It wouldn't appear recursive to a human user, because that
person would choose where he/she viewed. The spider can't tell, and
ends up in recursions it can only abort by "counting out" repeats.


I actually have some smarts in the server that can tell a bot from a
human. But httrack is blocked on the spot in any case. I am not
against it, as such, but it will not work on my site.

--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/