View Single Post
  #15   Report Post  
Posted to uk.d-i-y
Paul[_46_] Paul[_46_] is offline
external usenet poster
 
Posts: 870
Default Creating a pdf file from web pages

Graeme wrote:
In message , Andy Burns
writes
AnthonyL wrote:

newshound wrote:

Is this a forum with binaries?
This is NOT a forum.


He wasn't talking about "here", he was talking about A N Other forum.

Funny how we read things. I read 'Is this a forum with binaries?' as
referring to the online forum I mentioned, not 'Is this (uk.d-i-y) a
forum with binaries?', not least because newshound would know that
uk.d-i-y is not a binary group.

Thanks for all the comments. I have transferred the first two forum
threads to pdf by copying and pasting the text and images required which
gives a clean and satisfactory result, although somewhat laborious. Oh
well, perhaps what lockdown was designed for?

Cheers,


You can create PDF files by hand, which would bring
the question perilously close to the group charter.

The following file can be copied into Notepad and
stored as "helloworld.pdf". Where the extension may
help the icon of the file look like an Acrobat Reader
icon.

The file is copied off the web, and I messed with it
a bit and screwed up the checksums. (I added two sentences,
used some matrix operators to step the line beginning
for the next line, then corrected the stream length to
112 characters (includes a line termination character per line.)

If you screw up the file enough, Acrobat tries to repair it
internally before displaying this. This might cause a 20 second
delay until it opens.

----------------- Do not copy this line ------------------
%PDF-1.7

1 0 obj % entry point

/Type /Catalog
/Pages 2 0 R

endobj

2 0 obj

/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]

endobj

3 0 obj

/Type /Page
/Parent 2 0 R
/Resources
/Font
/F1 4 0 R


/Contents 5 0 R

endobj

4 0 obj

/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman

endobj

5 0 obj % page content

/Length 112

stream
BT
70 50 TD
/F1 12 Tf
(Hello, world!) Tj
1 0 0 1 70 40 Tm
(We meet again.) Tj
1 0 0 1 70 30 Tm
(The end.) Tj
ET
endstream
endobj

xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer

/Size 6
/Root 1 0 R

startxref
492
%%EOF
----------------- Do not copy this line ------------------

It's a gnarly language, and barely feasible as a
means for humans to package stuff by hand. Real files
have a lot more baggage inside.

And if you looked inside another PDF and your conclusion
is "Paul, a PDF doesn't look like this!". Of course not.
PDF is available in binary and text format. And this is
a human readable example. What I don't understand about
this sample file, is it's missing a short "binary string" that
has appeared in some other so-called text ones. And the
file still seems to work.

Many modern documents contain "embedded fonts". Which would
ruin a simple example like this. This sample file relies
on the interpreter having a Times-Roman font. If you change
the declaration to ComicSans, the document will likely not
display (ComicSans not a part of a base set of fonts).

Refs:

Sample chit-chat:

https://stackoverflow.com/questions/...rs-for-ios-app

Where I got the sample file as my base file:

https://github.com/mozilla/pdf.js-sa...t_path=d98b4e1

Paul