"DoN. Nichols" wrote:
According to David Merrill :
Unless I'm missing something, doing an advanced Google Groups search on
author, "Robert Bastow" in "rec.crafts.metalworking" returns 2860 'threads'
containing one or more messages from Teenut buried among numerous other
messages. From these entire threads one would have to copy Teenut's
individual messages and paste them into a text file; certainly possible, but
a laborious process.
Can anyone identify a more efficient way; possibly some of you in the Linux
world, or are your Web/Usenet readers as insulated from scripting tools as
seems to be the current case in the Windows world?
For some reason, I get 3010 or 3020 rather than 2860 threads, depending on
quotes and spaces in search terms. Eg, there are 3010 shown at (1 line url)
http://groups.google.com/groups/sear...obert%20bastow
[...] "lynx" is a text-only browser, which can work well for the
task, and can be coupled to shell scripts to do quite complex things.
True enough, although I prefer wget for automated webpage downloading,
in general. I presume lynx will save pages with html stripped out?
I'd see that as an advantage in an application like this.
"wget" can download entire trees of web pages, or individual
files, so a combination of lynx to find thinks, a shell script to run
it, and wget to download to files could do it nicely.
DoN, perhaps you could try wget on the following url and the one below.
http://groups.google.com/group/rec.c...ab0c3e00380a5f
From here, I get ERROR 403: Forbidden, although in a browser they bring
up R.B. pages ok. Maybe a cookie problem?
David, if you save the google search page in file t, the following
all-on-1-line command will generate a list of individual-message urls
in u from the thread urls in t: grep "/browse_thread/" t |sed -e "s|^.*/thread/[0-9a-f]*/|http://groups.google.com/group/rec.crafts.metalworking/msg/|" -e "s/?lnk.*$//" u
(Install cygwin package to get grep and sed and bash if using Windows.)
For example, the first grepped line is
font size="+0"a href="/group/rec.crafts.metalworking/browse_thread/thread/5e5973b836951947/bc0bf49e00214956?lnk=st&q=group%3Arec.crafts.metal working+author%3Arobert+author%3Abastow&rnum=1#bc0 bf49e00214956"Bolting down milling machine??/a/font
and sed converts it to
http://groups.google.com/group/rec.c...0bf49e00214956
However, for Teenut's postings, the way that *I* would go for it
is to download the relevant years from the archives at the site which
holds the official (and long un-updated) FAQs for the newsgroup. At the
[snip details]
-jiw