Metalworking (rec.crafts.metalworking) Discuss various aspects of working with metal, such as machining, welding, metal joining, screwing, casting, hardening/tempering, blacksmithing/forging, spinning and hammer work, sheet metal work.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 2,502
Default Copy a website

www.httrack.com/

Been using it for years. Works on MOST websites, though not all.

Gunner

"Confiscating wealth from those who have earned it, inherited it,
or got lucky is never going to help 'the poor.' Poverty isn't
caused by some people having more money than others, just as obesity
isn't caused by McDonald's serving super-sized orders of French fries
Poverty, like obesity, is caused by the life choices that dictate
results." - John Tucci,
  #2   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 1
Default Copy a website

On 2008-08-31, Gunner Asch wrote:
www.httrack.com/

Been using it for years. Works on MOST websites, though not all.


If you try downloading my website algebra.com, you will get into an
infinite recursion through millions of pages. That's why I prevent
most such bots from accessing my site. This would work only on very
simple sites.

--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
  #3   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 1,852
Default Copy a website

Some bots as you say - limit to 1 or 2 or 3 levels deep only.

Martin
Martin H. Eastburn
@ home at Lions' Lair with our computer lionslair at consolidated dot net
TSRA, Endowed; NRA LOH & Patron Member, Golden Eagle, Patriot's Medal.
NRA Second Amendment Task Force Charter Founder
IHMSA and NRA Metallic Silhouette maker & member.
http://lufkinced.com/


Ignoramus4791 wrote:
On 2008-08-31, Gunner Asch wrote:
www.httrack.com/

Been using it for years. Works on MOST websites, though not all.


If you try downloading my website algebra.com, you will get into an
infinite recursion through millions of pages. That's why I prevent
most such bots from accessing my site. This would work only on very
simple sites.



----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.pronews.com The #1 Newsgroup Service in the World! 100,000 Newsgroups
---= - Total Privacy via Encryption =---
  #4   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 1,910
Default Copy a website

Ignoramus4791 wrote:
On 2008-08-31, Gunner Asch wrote:
www.httrack.com/

Been using it for years. Works on MOST websites, though not all.


If you try downloading my website algebra.com, you will get into an
infinite recursion through millions of pages. That's why I prevent
most such bots from accessing my site. This would work only on very
simple sites.


you must be very smart to have such a complex and sophisticated website.


  #5   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 2,502
Default Copy a website

On Sat, 30 Aug 2008 21:10:32 -0500, Ignoramus4791
wrote:

On 2008-08-31, Gunner Asch wrote:
www.httrack.com/

Been using it for years. Works on MOST websites, though not all.


If you try downloading my website algebra.com, you will get into an
infinite recursion through millions of pages. That's why I prevent
most such bots from accessing my site. This would work only on very
simple sites.



Yours was one of the sites it doesnt work well on...chuckle

Been there, tried that..

But it works quite well on most others.

Gunner

"Confiscating wealth from those who have earned it, inherited it,
or got lucky is never going to help 'the poor.' Poverty isn't
caused by some people having more money than others, just as obesity
isn't caused by McDonald's serving super-sized orders of French fries
Poverty, like obesity, is caused by the life choices that dictate
results." - John Tucci,


  #6   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 21
Default Copy a website


If you try downloading my website algebra.com, you will get into an
infinite recursion through millions of pages. That's why I prevent
most such bots from accessing my site. This would work only on very
simple sites.


How does your web server differentiate between a bot and a human user
making http requests?

Regards,

Robin
  #8   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 852
Default Copy a website

On Sat, 30 Aug 2008 18:11:21 -0700, Gunner Asch
wrote:

www.httrack.com/

Been using it for years. Works on MOST websites, though not all.

Gunner



I tend to just use wget. Helps if you've got *nix for an OS or the Cygwin
utilities for windoze though.


Mark Rand
RTFM
  #10   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 1,392
Default Copy a website

Lloyd E. Sponenburgh writes:

The spider can't tell,


For one, "wget" can certainly detect and ignore recursive loops.


  #11   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 1,392
Default Copy a website

Ignoramus3863 writes:

I actually have some smarts in the server that can tell a bot from a
human.


Not a bot attempting to look human. Just bots that advertise their
botness, by honest design or flawed hacking.
  #12   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 5
Default Copy a website

On 2008-09-01, Richard J Kinch wrote:
Ignoramus3863 writes:

I actually have some smarts in the server that can tell a bot from a
human.


Not a bot attempting to look human. Just bots that advertise their
botness, by honest design or flawed hacking.


Yes, a bot trying to look like a human (ie supplying Referer and
browser-like User-Agent, I can still detec that it is a bot).

The way I detect is is that there is a hidden link that humans cannot
see, and cannot click, but bots would follow it. The hidden link is
not permitted by robots.txt, so it catches all non-compliant bots.
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
  #13   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 1,392
Default Copy a website

Ignoramus3863 writes:

The way I detect is is that there is a hidden link that humans cannot
see, and cannot click, but bots would follow it.


Yes, that would be difficult to defeat.
  #14   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 5
Default Copy a website

On 2008-09-01, Richard J Kinch wrote:
Ignoramus3863 writes:

The way I detect is is that there is a hidden link that humans cannot
see, and cannot click, but bots would follow it.


Yes, that would be difficult to defeat.


I had a lot of troubles with httrack and other bots like this. The
people who run them usually are not meaning anything bad, they just do
not realize that they should not run it against dynamic sites like
mine. They may not even realize that my site is dynamic because it
tried to look not to be (search engine friendly and all).

I spent a very long time trying to 1) make a website which hopefully
does not lead into too many infinite crawlings, and 2) detect and stop
bad bots early enough. But I still get problems from time to time.

--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
  #16   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 2,502
Default Copy a website

On Sun, 31 Aug 2008 23:40:35 -0500, Ignoramus3863
wrote:

On 2008-09-01, Richard J Kinch wrote:
Ignoramus3863 writes:

The way I detect is is that there is a hidden link that humans cannot
see, and cannot click, but bots would follow it.


Yes, that would be difficult to defeat.


I had a lot of troubles with httrack and other bots like this. The
people who run them usually are not meaning anything bad, they just do
not realize that they should not run it against dynamic sites like
mine. They may not even realize that my site is dynamic because it
tried to look not to be (search engine friendly and all).

I spent a very long time trying to 1) make a website which hopefully
does not lead into too many infinite crawlings, and 2) detect and stop
bad bots early enough. But I still get problems from time to time.



Whats wrong with bots harvesting your manuals?

Frankly..on dialup..I dont have the time to hit each and every manual
and wait for a download to start and finish.

On sites such as yours, I run the program and go to bed.


Gunner

"Confiscating wealth from those who have earned it, inherited it,
or got lucky is never going to help 'the poor.' Poverty isn't
caused by some people having more money than others, just as obesity
isn't caused by McDonald's serving super-sized orders of French fries
Poverty, like obesity, is caused by the life choices that dictate
results." - John Tucci,
  #17   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 12
Default Copy a website

On 2008-09-01, Gunner Asch wrote:
On Sun, 31 Aug 2008 23:40:35 -0500, Ignoramus3863
wrote:

On 2008-09-01, Richard J Kinch wrote:
Ignoramus3863 writes:

The way I detect is is that there is a hidden link that humans cannot
see, and cannot click, but bots would follow it.

Yes, that would be difficult to defeat.


I had a lot of troubles with httrack and other bots like this. The
people who run them usually are not meaning anything bad, they just do
not realize that they should not run it against dynamic sites like
mine. They may not even realize that my site is dynamic because it
tried to look not to be (search engine friendly and all).

I spent a very long time trying to 1) make a website which hopefully
does not lead into too many infinite crawlings, and 2) detect and stop
bad bots early enough. But I still get problems from time to time.



Whats wrong with bots harvesting your manuals?


NOthing.

But if you go to algebra.com, you can accidentally go into an infinite
loop with various scripts.

Frankly..on dialup..I dont have the time to hit each and every manual
and wait for a download to start and finish.

On sites such as yours, I run the program and go to bed.


You may need to sleep a lot longer than anticipated.

i


Gunner

"Confiscating wealth from those who have earned it, inherited it,
or got lucky is never going to help 'the poor.' Poverty isn't
caused by some people having more money than others, just as obesity
isn't caused by McDonald's serving super-sized orders of French fries
Poverty, like obesity, is caused by the life choices that dictate
results." - John Tucci,


--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
  #18   Report Post  
Posted to rec.crafts.metalworking
external usenet poster
 
Posts: 1,417
Default Copy a website

On Sun, 31 Aug 2008 23:40:23 +0100, Mark Rand
wrote:

On Sat, 30 Aug 2008 18:11:21 -0700, Gunner Asch
wrote:

www.httrack.com/

Been using it for years. Works on MOST websites, though not all.

Gunner



I tend to just use wget. Helps if you've got *nix for an OS or the Cygwin
utilities for windoze though.


Mark Rand
RTFM


I use wget for this too, provided saving a few pages doesn't
work out so well. There are Windows binaries available, no
need for Cygwin. For example:

http://xoomer.alice.it/hherold/

Wget won't hold your hand though, command line and a little
bit of reading/homework suggested for it to be really
useful...

--
Leon Fisk
Grand Rapids MI/Zone 5b
Remove no.spam for email
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
My Copy of Mortgage has gone missing: How/where do I get a new copy of the original? Ken Home Ownership 9 April 25th 07 09:34 PM
website development, web designing, search engine optimization, and website promotion at affordable prices. chenelin Woodturning 0 July 1st 06 10:58 AM


All times are GMT +1. The time now is 08:00 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 DIYbanter.
The comments are property of their posters.
 

About Us

"It's about DIY & home improvement"