DIYbanter

DIYbanter (https://www.diybanter.com/)
-   UK diy (https://www.diybanter.com/uk-diy/)
-   -   Creating a pdf file from web pages (https://www.diybanter.com/uk-diy/648675-creating-pdf-file-web-pages.html)

Graeme[_7_] April 30th 20 11:27 AM

Creating a pdf file from web pages
 
I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.
Hope that makes sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files?
The forum I want to save consists of 30 threads, each with 3 or 4 pages.
Saving as above means saving one page at a time which is OK, but would
leave me with 100+ pdf files.

Any recommended prog or add on I could use? Ideally open source i.e.
cheap or free?

Thanks!
--
Graeme

[email protected] April 30th 20 11:39 AM

Creating a pdf file from web pages
 
On Thursday, 30 April 2020 11:27:56 UTC+1, Graeme wrote:
I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.


Does the forum have a separate 'print view' mode? Or could you use the Stylus browser extension to rewrite the page style sheet to change the quoted text to 0pt white?

PDFSAM and others will concatenate or merge PDFs into one file

https://pdfsam.org/

Owain


AnthonyL April 30th 20 12:22 PM

Creating a pdf file from web pages
 
On Thu, 30 Apr 2020 11:27:47 +0100, Graeme
wrote:

I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.
Hope that makes sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files?
The forum I want to save consists of 30 threads, each with 3 or 4 pages.
Saving as above means saving one page at a time which is OK, but would
leave me with 100+ pdf files.

Any recommended prog or add on I could use? Ideally open source i.e.
cheap or free?


I've given up with "printing" from web pages and copy/paste into a
Word Processor then print (to pdf/file/printer).

html is no longer html.


--
AnthonyL

Why do scientists need to BELIEVE in anything?

newshound April 30th 20 01:41 PM

Creating a pdf file from web pages
 
On 30/04/2020 12:22, AnthonyL wrote:
On Thu, 30 Apr 2020 11:27:47 +0100, Graeme
wrote:

I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.
Hope that makes sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files?
The forum I want to save consists of 30 threads, each with 3 or 4 pages.
Saving as above means saving one page at a time which is OK, but would
leave me with 100+ pdf files.

Any recommended prog or add on I could use? Ideally open source i.e.
cheap or free?


I've given up with "printing" from web pages and copy/paste into a
Word Processor then print (to pdf/file/printer).

html is no longer html.


+1. Is this a forum with binaries? Collecting the text is relatively
straightforward. I sometimes use Notepad as a first stage.

Graeme[_7_] April 30th 20 02:05 PM

Creating a pdf file from web pages
 
In message ,
newshound writes
On 30/04/2020 12:22, AnthonyL wrote:
On Thu, 30 Apr 2020 11:27:47 +0100, Graeme

Any recommended prog or add on I could use? Ideally open source i.e.
cheap or free?

I've given up with "printing" from web pages and copy/paste into a
Word Processor then print (to pdf/file/printer).
html is no longer html.

+1. Is this a forum with binaries? Collecting the text is relatively
straightforward. I sometimes use Notepad as a first stage.


Thanks for the suggestions. yes, lots of large, full colour images.
The forum software is phpBB which seems to be popular, and works well.

I think straight copy/paste of the text is the way forward, with images
added where appropriate.

HTML is indeed no longer html, at least at my basic level. I note
Owain's suggestion to rewrite the style sheet, but that is way beyond my
experience. I do, though, like the idea of producing a pdf file for
each thread then concatenate or merge PDFs into one file.

--
Graeme

[email protected] April 30th 20 04:25 PM

Creating a pdf file from web pages
 
On Thursday, 30 April 2020 14:05:24 UTC+1, Graeme wrote:
HTML is indeed no longer html, at least at my basic level. I note
Owain's suggestion to rewrite the style sheet, but that is way beyond my
experience.


You can have a look at userstyles as it may have been done for you already, eg

https://userstyles.org/styles/57/sub...eme-beautifier

Owain


Fredxx[_3_] April 30th 20 04:52 PM

Creating a pdf file from web pages
 
On 30/04/2020 12:22:47, AnthonyL wrote:
On Thu, 30 Apr 2020 11:27:47 +0100, Graeme
wrote:

I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.
Hope that makes sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files?
The forum I want to save consists of 30 threads, each with 3 or 4 pages.
Saving as above means saving one page at a time which is OK, but would
leave me with 100+ pdf files.

Any recommended prog or add on I could use? Ideally open source i.e.
cheap or free?


I've given up with "printing" from web pages and copy/paste into a
Word Processor then print (to pdf/file/printer).

html is no longer html.


I used to copy and past into Word, but now print direct to pdf, or save
the whole page if appropriate as HTML.

For printing and using the web style sheet I use the addon "Print Edit
WE". I have not looked back since. You can also de-select areas you
don't want to output.

I suggest you take a look before giving up.

polygonum_on_google[_2_] April 30th 20 05:08 PM

Creating a pdf file from web pages
 
On Thursday, 30 April 2020 11:27:56 UTC+1, Graeme wrote:
I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.
Hope that makes sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files?
The forum I want to save consists of 30 threads, each with 3 or 4 pages.
Saving as above means saving one page at a time which is OK, but would
leave me with 100+ pdf files.

Any recommended prog or add on I could use? Ideally open source i.e.
cheap or free?

Thanks!


Lyx. Does an amazing job of merging PDFs. But it does take quite some getting your head around it.

Free - tick.


Richard[_10_] April 30th 20 05:41 PM

Creating a pdf file from web pages
 
On 30/04/2020 11:27, Graeme wrote:
I'm trying to create a pdf file of online forum posts.Â* Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.
Hope that makes sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files?
The forum I want to save consists of 30 threads, each with 3 or 4 pages.
Saving as above means saving one page at a time which is OK, but would
leave me with 100+ pdf files.

Any recommended prog or add on I could use?Â* Ideally open source i.e.
cheap or free?

Thanks!


Opera browser does a good job of saving web pages to pdf. It may not do
the number of pages you want, but might be worth a try.

Brian Gaff \(Sofa\) April 30th 20 09:05 PM

Creating a pdf file from web pages
 
If its got to be accessible ie, not just a graphic of a page, I don't
actually think there is any obvious way to do what you want since Adobe say
they want dosh when you try to do a conversion grin.
Brian

--
----- --
This newsgroup posting comes to you directly from...
The Sofa of Brian Gaff...

Blind user, so no pictures please
Note this Signature is meaningless.!
"Graeme" wrote in message
...
I'm trying to create a pdf file of online forum posts. Using the browser
print to pdf option works, but only up to a point in that the saved pdf
file includes all the hidden text in the online original rather than just
the text and images I see without clicking or hovering. Hope that makes
sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files?
The forum I want to save consists of 30 threads, each with 3 or 4 pages.
Saving as above means saving one page at a time which is OK, but would
leave me with 100+ pdf files.

Any recommended prog or add on I could use? Ideally open source i.e.
cheap or free?

Thanks!
--
Graeme




Mathew Newton[_2_] April 30th 20 10:22 PM

Creating a pdf file from web pages
 
On Thursday, 30 April 2020 11:27:56 UTC+1, Graeme wrote:
I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.
Hope that makes sense.

How can I best save what I see?


You could try pasting the URLs into something like https://pdfcrowd.com/ - having just quickly tried it it seems to do a reasonable job of 'printing' what you see (although it does add a small banner on the bottom). There a browser extensions to make it a bit more '1-click' too.

AnthonyL May 1st 20 12:35 PM

Creating a pdf file from web pages
 
On Thu, 30 Apr 2020 13:41:15 +0100, newshound
wrote:

On 30/04/2020 12:22, AnthonyL wrote:
On Thu, 30 Apr 2020 11:27:47 +0100, Graeme
wrote:

I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the
saved pdf file includes all the hidden text in the online original
rather than just the text and images I see without clicking or hovering.
Hope that makes sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files?
The forum I want to save consists of 30 threads, each with 3 or 4 pages.
Saving as above means saving one page at a time which is OK, but would
leave me with 100+ pdf files.

Any recommended prog or add on I could use? Ideally open source i.e.
cheap or free?


I've given up with "printing" from web pages and copy/paste into a
Word Processor then print (to pdf/file/printer).

html is no longer html.


+1. Is this a forum with binaries? Collecting the text is relatively
straightforward. I sometimes use Notepad as a first stage.


This is NOT a forum.


--
AnthonyL

Why do scientists need to BELIEVE in anything?

Andy Burns[_13_] May 1st 20 12:37 PM

Creating a pdf file from web pages
 
AnthonyL wrote:

newshound wrote:

Is this a forum with binaries?


This is NOT a forum.


He wasn't talking about "here", he was talking about A N Other forum.


Graeme[_7_] May 1st 20 05:26 PM

Creating a pdf file from web pages
 
In message , Andy Burns
writes
AnthonyL wrote:

newshound wrote:

Is this a forum with binaries?

This is NOT a forum.


He wasn't talking about "here", he was talking about A N Other forum.

Funny how we read things. I read 'Is this a forum with binaries?' as
referring to the online forum I mentioned, not 'Is this (uk.d-i-y) a
forum with binaries?', not least because newshound would know that
uk.d-i-y is not a binary group.

Thanks for all the comments. I have transferred the first two forum
threads to pdf by copying and pasting the text and images required which
gives a clean and satisfactory result, although somewhat laborious. Oh
well, perhaps what lockdown was designed for?

Cheers,

--
Graeme

Paul[_46_] May 1st 20 10:35 PM

Creating a pdf file from web pages
 
Graeme wrote:
In message , Andy Burns
writes
AnthonyL wrote:

newshound wrote:

Is this a forum with binaries?
This is NOT a forum.


He wasn't talking about "here", he was talking about A N Other forum.

Funny how we read things. I read 'Is this a forum with binaries?' as
referring to the online forum I mentioned, not 'Is this (uk.d-i-y) a
forum with binaries?', not least because newshound would know that
uk.d-i-y is not a binary group.

Thanks for all the comments. I have transferred the first two forum
threads to pdf by copying and pasting the text and images required which
gives a clean and satisfactory result, although somewhat laborious. Oh
well, perhaps what lockdown was designed for?

Cheers,


You can create PDF files by hand, which would bring
the question perilously close to the group charter.

The following file can be copied into Notepad and
stored as "helloworld.pdf". Where the extension may
help the icon of the file look like an Acrobat Reader
icon.

The file is copied off the web, and I messed with it
a bit and screwed up the checksums. (I added two sentences,
used some matrix operators to step the line beginning
for the next line, then corrected the stream length to
112 characters (includes a line termination character per line.)

If you screw up the file enough, Acrobat tries to repair it
internally before displaying this. This might cause a 20 second
delay until it opens.

----------------- Do not copy this line ------------------
%PDF-1.7

1 0 obj % entry point

/Type /Catalog
/Pages 2 0 R

endobj

2 0 obj

/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]

endobj

3 0 obj

/Type /Page
/Parent 2 0 R
/Resources
/Font
/F1 4 0 R


/Contents 5 0 R

endobj

4 0 obj

/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman

endobj

5 0 obj % page content

/Length 112

stream
BT
70 50 TD
/F1 12 Tf
(Hello, world!) Tj
1 0 0 1 70 40 Tm
(We meet again.) Tj
1 0 0 1 70 30 Tm
(The end.) Tj
ET
endstream
endobj

xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer

/Size 6
/Root 1 0 R

startxref
492
%%EOF
----------------- Do not copy this line ------------------

It's a gnarly language, and barely feasible as a
means for humans to package stuff by hand. Real files
have a lot more baggage inside.

And if you looked inside another PDF and your conclusion
is "Paul, a PDF doesn't look like this!". Of course not.
PDF is available in binary and text format. And this is
a human readable example. What I don't understand about
this sample file, is it's missing a short "binary string" that
has appeared in some other so-called text ones. And the
file still seems to work.

Many modern documents contain "embedded fonts". Which would
ruin a simple example like this. This sample file relies
on the interpreter having a Times-Roman font. If you change
the declaration to ComicSans, the document will likely not
display (ComicSans not a part of a base set of fonts).

Refs:

Sample chit-chat:

https://stackoverflow.com/questions/...rs-for-ios-app

Where I got the sample file as my base file:

https://github.com/mozilla/pdf.js-sa...t_path=d98b4e1

Paul

AnthonyL May 2nd 20 01:05 PM

Creating a pdf file from web pages
 
On Fri, 1 May 2020 17:26:11 +0100, Graeme
wrote:

In message , Andy Burns
writes
AnthonyL wrote:

newshound wrote:

Is this a forum with binaries?
This is NOT a forum.


He wasn't talking about "here", he was talking about A N Other forum.

Funny how we read things. I read 'Is this a forum with binaries?' as
referring to the online forum I mentioned, not 'Is this (uk.d-i-y) a
forum with binaries?', not least because newshound would know that
uk.d-i-y is not a binary group.


Yes, sorry, lost in translation.


--
AnthonyL

Why do scientists need to BELIEVE in anything?


All times are GMT +1. The time now is 11:14 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004 - 2014 DIYbanter