Thread: Copy a website
View Single Post
  #15   Report Post  
Posted to rec.crafts.metalworking
Gunner Asch[_4_] Gunner Asch[_4_] is offline
external usenet poster
 
Posts: 2,502
Default Copy a website

On Sun, 31 Aug 2008 17:51:00 -0500, Ignoramus3863
wrote:

On 2008-08-31, Lloyd E. Sponenburgh lloydspinsidemindspring.com wrote:
fired this volley in news:b6e6ea8f-a3fe-4f46-
:


If you try downloading my website algebra.com, you will get into an
infinite recursion through millions of pages. That's why I prevent
most such bots from accessing my site. This would work only on very
simple sites.

How does your web server differentiate between a bot and a human

user
making http requests?


Duh! It doesn't. The site has links back to the place where the link
began. It wouldn't appear recursive to a human user, because that
person would choose where he/she viewed. The spider can't tell, and
ends up in recursions it can only abort by "counting out" repeats.


I actually have some smarts in the server that can tell a bot from a
human. But httrack is blocked on the spot in any case. I am not
against it, as such, but it will not work on my site.



Why?

Gunner

"Confiscating wealth from those who have earned it, inherited it,
or got lucky is never going to help 'the poor.' Poverty isn't
caused by some people having more money than others, just as obesity
isn't caused by McDonald's serving super-sized orders of French fries
Poverty, like obesity, is caused by the life choices that dictate
results." - John Tucci,