Home |
Search |
Today's Posts |
![]() |
|
Metalworking (rec.crafts.metalworking) Discuss various aspects of working with metal, such as machining, welding, metal joining, screwing, casting, hardening/tempering, blacksmithing/forging, spinning and hammer work, sheet metal work. |
Reply |
|
LinkBack | Thread Tools | Display Modes |
|
#1
![]()
Posted to rec.crafts.metalworking
|
|||
|
|||
![]()
"DoN. Nichols" wrote:
According to David Merrill : Unless I'm missing something, doing an advanced Google Groups search on author, "Robert Bastow" in "rec.crafts.metalworking" returns 2860 'threads' containing one or more messages from Teenut buried among numerous other messages. From these entire threads one would have to copy Teenut's individual messages and paste them into a text file; certainly possible, but a laborious process. Can anyone identify a more efficient way; possibly some of you in the Linux world, or are your Web/Usenet readers as insulated from scripting tools as seems to be the current case in the Windows world? For some reason, I get 3010 or 3020 rather than 2860 threads, depending on quotes and spaces in search terms. Eg, there are 3010 shown at (1 line url) http://groups.google.com/groups/sear...obert%20bastow [...] "lynx" is a text-only browser, which can work well for the task, and can be coupled to shell scripts to do quite complex things. True enough, although I prefer wget for automated webpage downloading, in general. I presume lynx will save pages with html stripped out? I'd see that as an advantage in an application like this. "wget" can download entire trees of web pages, or individual files, so a combination of lynx to find thinks, a shell script to run it, and wget to download to files could do it nicely. DoN, perhaps you could try wget on the following url and the one below. http://groups.google.com/group/rec.c...ab0c3e00380a5f From here, I get ERROR 403: Forbidden, although in a browser they bring up R.B. pages ok. Maybe a cookie problem? David, if you save the google search page in file t, the following all-on-1-line command will generate a list of individual-message urls in u from the thread urls in t: grep "/browse_thread/" t |sed -e "s|^.*/thread/[0-9a-f]*/|http://groups.google.com/group/rec.crafts.metalworking/msg/|" -e "s/?lnk.*$//" u (Install cygwin package to get grep and sed and bash if using Windows.) For example, the first grepped line is font size="+0"a href="/group/rec.crafts.metalworking/browse_thread/thread/5e5973b836951947/bc0bf49e00214956?lnk=st&q=group%3Arec.crafts.metal working+author%3Arobert+author%3Abastow&rnum=1#bc0 bf49e00214956"Bolting down milling machine??/a/font and sed converts it to http://groups.google.com/group/rec.c...0bf49e00214956 However, for Teenut's postings, the way that *I* would go for it is to download the relevant years from the archives at the site which holds the official (and long un-updated) FAQs for the newsgroup. At the [snip details] -jiw |
#2
![]()
Posted to rec.crafts.metalworking
|
|||
|
|||
![]()
On Tue, 16 May 2006 11:21:33 -0600, James Waldby wrote:
True enough, although I prefer wget for automated webpage downloading, in general. I presume lynx will save pages with html stripped out? I'd see that as an advantage in an application like this. lynx -dump I think it is. Gives you a plain text version of it without the HTML. I use wget for some things, lynx for others, depends on how I want to parse it once I have it. DoN, perhaps you could try wget on the following url and the one below. http://groups.google.com/group/rec.c...ab0c3e00380a5f From here, I get ERROR 403: Forbidden, although in a browser they bring up R.B. pages ok. Maybe a cookie problem? I know you can do cookies with wget but I've never done it. It's in TFM though. |
#3
![]()
Posted to rec.crafts.metalworking
|
|||
|
|||
![]()
According to James Waldby :
"DoN. Nichols" wrote: [ ... ] [...] "lynx" is a text-only browser, which can work well for the task, and can be coupled to shell scripts to do quite complex things. True enough, although I prefer wget for automated webpage downloading, in general. I presume lynx will save pages with html stripped out? I'd see that as an advantage in an application like this. Yes -- it can. See this option: ================================================== ==================== -dump dumps the formatted output of the default document or one specified on the command line to standard out. This can be used in the following way: lynx -dump http://www.crl.com/~subir/lynx.html ================================================== ==================== "wget" can download entire trees of web pages, or individual files, so a combination of lynx to find thinks, a shell script to run it, and wget to download to files could do it nicely. DoN, perhaps you could try wget on the following url and the one below. http://groups.google.com/group/rec.c...ab0c3e00380a5f From here, I get ERROR 403: Forbidden, although in a browser they bring up R.B. pages ok. Maybe a cookie problem? I get the same response -- until I add the following option: --user-agent="Mozilla/4.0" which causes it to identify itself to Google's server as a version of Mozilla. Google does not like wget for whatever reasons. :-) However, what that brought up was some 30K of javascript, which wget can't process, so it won't get the rest of what you were after. Processing that through the -dump option of lynx shows that it has some 30+ links in it, which might have been what you were after. [ ... ] However, for Teenut's postings, the way that *I* would go for it is to download the relevant years from the archives at the site which holds the official (and long un-updated) FAQs for the newsgroup. At the [snip details] It looks as though Scott Logan has posted the pointer to the collection again -- elsewhere in this thread. Enjoy, DoN. -- Email: | Voice (all times): (703) 938-4564 (too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html --- Black Holes are where God is dividing by zero --- |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Childish answers to postings. | UK diy | |||
OT (Somewhat): Why do some of you insist on NOT archiving posts? | Woodworking | |||
What is with the postings for help lately from complete idiots? | Electronics Repair | |||
Off-topic postings and OT replies | Woodworking | |||
Abusive postings | Woodworking |