![]() |
Creating a pdf file from web pages
I'm trying to create a pdf file of online forum posts. Using the
browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense. How can I best save what I see? Secondly, having hopefully achieved the above, can I combine pdf files? The forum I want to save consists of 30 threads, each with 3 or 4 pages. Saving as above means saving one page at a time which is OK, but would leave me with 100+ pdf files. Any recommended prog or add on I could use? Ideally open source i.e. cheap or free? Thanks! -- Graeme |
Creating a pdf file from web pages
On Thursday, 30 April 2020 11:27:56 UTC+1, Graeme wrote:
I'm trying to create a pdf file of online forum posts. Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Does the forum have a separate 'print view' mode? Or could you use the Stylus browser extension to rewrite the page style sheet to change the quoted text to 0pt white? PDFSAM and others will concatenate or merge PDFs into one file https://pdfsam.org/ Owain |
Creating a pdf file from web pages
On Thu, 30 Apr 2020 11:27:47 +0100, Graeme
wrote: I'm trying to create a pdf file of online forum posts. Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense. How can I best save what I see? Secondly, having hopefully achieved the above, can I combine pdf files? The forum I want to save consists of 30 threads, each with 3 or 4 pages. Saving as above means saving one page at a time which is OK, but would leave me with 100+ pdf files. Any recommended prog or add on I could use? Ideally open source i.e. cheap or free? I've given up with "printing" from web pages and copy/paste into a Word Processor then print (to pdf/file/printer). html is no longer html. -- AnthonyL Why do scientists need to BELIEVE in anything? |
Creating a pdf file from web pages
On 30/04/2020 12:22, AnthonyL wrote:
On Thu, 30 Apr 2020 11:27:47 +0100, Graeme wrote: I'm trying to create a pdf file of online forum posts. Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense. How can I best save what I see? Secondly, having hopefully achieved the above, can I combine pdf files? The forum I want to save consists of 30 threads, each with 3 or 4 pages. Saving as above means saving one page at a time which is OK, but would leave me with 100+ pdf files. Any recommended prog or add on I could use? Ideally open source i.e. cheap or free? I've given up with "printing" from web pages and copy/paste into a Word Processor then print (to pdf/file/printer). html is no longer html. +1. Is this a forum with binaries? Collecting the text is relatively straightforward. I sometimes use Notepad as a first stage. |
Creating a pdf file from web pages
In message ,
newshound writes On 30/04/2020 12:22, AnthonyL wrote: On Thu, 30 Apr 2020 11:27:47 +0100, Graeme Any recommended prog or add on I could use? Ideally open source i.e. cheap or free? I've given up with "printing" from web pages and copy/paste into a Word Processor then print (to pdf/file/printer). html is no longer html. +1. Is this a forum with binaries? Collecting the text is relatively straightforward. I sometimes use Notepad as a first stage. Thanks for the suggestions. yes, lots of large, full colour images. The forum software is phpBB which seems to be popular, and works well. I think straight copy/paste of the text is the way forward, with images added where appropriate. HTML is indeed no longer html, at least at my basic level. I note Owain's suggestion to rewrite the style sheet, but that is way beyond my experience. I do, though, like the idea of producing a pdf file for each thread then concatenate or merge PDFs into one file. -- Graeme |
Creating a pdf file from web pages
On Thursday, 30 April 2020 14:05:24 UTC+1, Graeme wrote:
HTML is indeed no longer html, at least at my basic level. I note Owain's suggestion to rewrite the style sheet, but that is way beyond my experience. You can have a look at userstyles as it may have been done for you already, eg https://userstyles.org/styles/57/sub...eme-beautifier Owain |
Creating a pdf file from web pages
On 30/04/2020 12:22:47, AnthonyL wrote:
On Thu, 30 Apr 2020 11:27:47 +0100, Graeme wrote: I'm trying to create a pdf file of online forum posts. Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense. How can I best save what I see? Secondly, having hopefully achieved the above, can I combine pdf files? The forum I want to save consists of 30 threads, each with 3 or 4 pages. Saving as above means saving one page at a time which is OK, but would leave me with 100+ pdf files. Any recommended prog or add on I could use? Ideally open source i.e. cheap or free? I've given up with "printing" from web pages and copy/paste into a Word Processor then print (to pdf/file/printer). html is no longer html. I used to copy and past into Word, but now print direct to pdf, or save the whole page if appropriate as HTML. For printing and using the web style sheet I use the addon "Print Edit WE". I have not looked back since. You can also de-select areas you don't want to output. I suggest you take a look before giving up. |
Creating a pdf file from web pages
On Thursday, 30 April 2020 11:27:56 UTC+1, Graeme wrote:
I'm trying to create a pdf file of online forum posts. Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense. How can I best save what I see? Secondly, having hopefully achieved the above, can I combine pdf files? The forum I want to save consists of 30 threads, each with 3 or 4 pages. Saving as above means saving one page at a time which is OK, but would leave me with 100+ pdf files. Any recommended prog or add on I could use? Ideally open source i.e. cheap or free? Thanks! Lyx. Does an amazing job of merging PDFs. But it does take quite some getting your head around it. Free - tick. |
Creating a pdf file from web pages
On 30/04/2020 11:27, Graeme wrote:
I'm trying to create a pdf file of online forum posts.Â* Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense. How can I best save what I see? Secondly, having hopefully achieved the above, can I combine pdf files? The forum I want to save consists of 30 threads, each with 3 or 4 pages. Saving as above means saving one page at a time which is OK, but would leave me with 100+ pdf files. Any recommended prog or add on I could use?Â* Ideally open source i.e. cheap or free? Thanks! Opera browser does a good job of saving web pages to pdf. It may not do the number of pages you want, but might be worth a try. |
Creating a pdf file from web pages
On Thursday, 30 April 2020 11:27:56 UTC+1, Graeme wrote:
I'm trying to create a pdf file of online forum posts. Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense. How can I best save what I see? You could try pasting the URLs into something like https://pdfcrowd.com/ - having just quickly tried it it seems to do a reasonable job of 'printing' what you see (although it does add a small banner on the bottom). There a browser extensions to make it a bit more '1-click' too. |
Creating a pdf file from web pages
On Thu, 30 Apr 2020 13:41:15 +0100, newshound
wrote: On 30/04/2020 12:22, AnthonyL wrote: On Thu, 30 Apr 2020 11:27:47 +0100, Graeme wrote: I'm trying to create a pdf file of online forum posts. Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense. How can I best save what I see? Secondly, having hopefully achieved the above, can I combine pdf files? The forum I want to save consists of 30 threads, each with 3 or 4 pages. Saving as above means saving one page at a time which is OK, but would leave me with 100+ pdf files. Any recommended prog or add on I could use? Ideally open source i.e. cheap or free? I've given up with "printing" from web pages and copy/paste into a Word Processor then print (to pdf/file/printer). html is no longer html. +1. Is this a forum with binaries? Collecting the text is relatively straightforward. I sometimes use Notepad as a first stage. This is NOT a forum. -- AnthonyL Why do scientists need to BELIEVE in anything? |
Creating a pdf file from web pages
AnthonyL wrote:
newshound wrote: Is this a forum with binaries? This is NOT a forum. He wasn't talking about "here", he was talking about A N Other forum. |
Creating a pdf file from web pages
In message , Andy Burns
writes AnthonyL wrote: newshound wrote: Is this a forum with binaries? This is NOT a forum. He wasn't talking about "here", he was talking about A N Other forum. Funny how we read things. I read 'Is this a forum with binaries?' as referring to the online forum I mentioned, not 'Is this (uk.d-i-y) a forum with binaries?', not least because newshound would know that uk.d-i-y is not a binary group. Thanks for all the comments. I have transferred the first two forum threads to pdf by copying and pasting the text and images required which gives a clean and satisfactory result, although somewhat laborious. Oh well, perhaps what lockdown was designed for? Cheers, -- Graeme |
Creating a pdf file from web pages
Graeme wrote:
In message , Andy Burns writes AnthonyL wrote: newshound wrote: Is this a forum with binaries? This is NOT a forum. He wasn't talking about "here", he was talking about A N Other forum. Funny how we read things. I read 'Is this a forum with binaries?' as referring to the online forum I mentioned, not 'Is this (uk.d-i-y) a forum with binaries?', not least because newshound would know that uk.d-i-y is not a binary group. Thanks for all the comments. I have transferred the first two forum threads to pdf by copying and pasting the text and images required which gives a clean and satisfactory result, although somewhat laborious. Oh well, perhaps what lockdown was designed for? Cheers, You can create PDF files by hand, which would bring the question perilously close to the group charter. The following file can be copied into Notepad and stored as "helloworld.pdf". Where the extension may help the icon of the file look like an Acrobat Reader icon. The file is copied off the web, and I messed with it a bit and screwed up the checksums. (I added two sentences, used some matrix operators to step the line beginning for the next line, then corrected the stream length to 112 characters (includes a line termination character per line.) If you screw up the file enough, Acrobat tries to repair it internally before displaying this. This might cause a 20 second delay until it opens. ----------------- Do not copy this line ------------------ %PDF-1.7 1 0 obj % entry point /Type /Catalog /Pages 2 0 R endobj 2 0 obj /Type /Pages /MediaBox [ 0 0 200 200 ] /Count 1 /Kids [ 3 0 R ] endobj 3 0 obj /Type /Page /Parent 2 0 R /Resources /Font /F1 4 0 R /Contents 5 0 R endobj 4 0 obj /Type /Font /Subtype /Type1 /BaseFont /Times-Roman endobj 5 0 obj % page content /Length 112 stream BT 70 50 TD /F1 12 Tf (Hello, world!) Tj 1 0 0 1 70 40 Tm (We meet again.) Tj 1 0 0 1 70 30 Tm (The end.) Tj ET endstream endobj xref 0 6 0000000000 65535 f 0000000010 00000 n 0000000079 00000 n 0000000173 00000 n 0000000301 00000 n 0000000380 00000 n trailer /Size 6 /Root 1 0 R startxref 492 %%EOF ----------------- Do not copy this line ------------------ It's a gnarly language, and barely feasible as a means for humans to package stuff by hand. Real files have a lot more baggage inside. And if you looked inside another PDF and your conclusion is "Paul, a PDF doesn't look like this!". Of course not. PDF is available in binary and text format. And this is a human readable example. What I don't understand about this sample file, is it's missing a short "binary string" that has appeared in some other so-called text ones. And the file still seems to work. Many modern documents contain "embedded fonts". Which would ruin a simple example like this. This sample file relies on the interpreter having a Times-Roman font. If you change the declaration to ComicSans, the document will likely not display (ComicSans not a part of a base set of fonts). Refs: Sample chit-chat: https://stackoverflow.com/questions/...rs-for-ios-app Where I got the sample file as my base file: https://github.com/mozilla/pdf.js-sa...t_path=d98b4e1 Paul |
Creating a pdf file from web pages
On Fri, 1 May 2020 17:26:11 +0100, Graeme
wrote: In message , Andy Burns writes AnthonyL wrote: newshound wrote: Is this a forum with binaries? This is NOT a forum. He wasn't talking about "here", he was talking about A N Other forum. Funny how we read things. I read 'Is this a forum with binaries?' as referring to the online forum I mentioned, not 'Is this (uk.d-i-y) a forum with binaries?', not least because newshound would know that uk.d-i-y is not a binary group. Yes, sorry, lost in translation. -- AnthonyL Why do scientists need to BELIEVE in anything? |
All times are GMT +1. The time now is 11:14 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004 - 2014 DIYbanter