[gutvol-d] Re: Typesetting ("gods and fighting men")

27 Apr 2010

      ok, back to work...

***

have you ever tried to copy text out of a .pdf?

if you have, you know that it can be frustrating.

because a lot of information seems to get lost.

perhaps the most noticeable are "empty" lines.

if you used a blank line between paragraphs,
all of those blank lines are lost, which means
that your paragraphs are now all run together.

compounding that problem is that the "soft"
linebreak at the end of mid-paragraph lines is 
turned into a "hard" linebreak.   it's a disaster...

you _can_ choose the "export as plain text" item
from one of the menus, and that does retain the
empty lines.   unfortunately, it also strips styling.

the copy-text route retains the styling, or at least
_some_ of it.   but not all of it.   italics are often lost.
so is the indentation on block-quotes, poetry, etc.

so no matter what you do, getting text from a .pdf
is a struggle.   that's one reason why .pdf is called
"the roach motel of documents", because text can
go in, but it cannot come out again.

you can demonstrate this to yourself by using
the .pdf that michael mcd created...

***

i do some things with my .pdf tool to solve this
little problem.   for instance, it doesn't output a
"blank" line when it encounters one.   instead, it
outputs a double-colon -- "::" -- that is white,
thus _invisible._   (or i'll often make it light gray.)

but it's still there, and gets copied out when you
copy the text, so you can then do a global change
of "::" to nil, and voila, you have your blank lines.

in z.m.l., italics are represented by _underbars_,
so i also have my program output the underbars,
again turning them white so they'll be invisible...

i haven't worked on this for a while, so i cannot
remember what state of success it's in right now,
but my goal is to create "round-tripping", so that
when you use z.m.l. to create a .pdf, the text you
copy out of that .pdf, after a few global changes,
can be used to create that exact same .pdf again.

go ahead and copy the text out of one of my .pdfs,
and you'll get a good idea what i'm talking about...

***

the tricks that are built into my tool are ones that
you can do "manually" in your own wordprocessor,
if you'd like to create a "round-trip" capability too.

surround your italicized stuff with white underbars,
change your blank lines to a white double-colon,
and use white periods to create your indentation.

i did that in the next two .pdfs i will talk about, so
you can copy the text out of 'em to see this at work.

***

i used a text-editor to create two more .pdfs for us
in our experiments using "gods and fighting men".

i unwrapped the text, freeing it from p.g. linebreaks.
then i made the fontsize a more-readable 12-point.
i also put back in a more-spacious 14-point leading.

all these changes pushed the .pdf to some 567 pages,
from the previous 391, so that's an offsetting negative,
but the positive aspect is a much more readable .pdf...

i created a ragged-right version, and a justified one:
...
http://z-m-l.com/misc/14465-rewrapped-rag.pdf
   http://z-m-l.com/misc/14465-rewrapped-just.pdf
these two are exactly the same, except for justification,
so you might find it odd that the first is just 1.5 megs,
while the second is almost twice as big, at 2.9 megs...

the reason for this discrepancy is the ragged-right .pdf
stores the location of each line, rendering it right there,
while the justified .pdf has to store the location of each
_word_, to print it in the right place.   it's a big difference.

at any rate, maybe the mad scientist will look at these
and advise us on what pointsize he'd like to see "final",
what leading he wants, and if he prefers justification...

-bowerbird

[gutvol-d] Re: Typesetting ("gods and fighting men")

Bowerbird＠aol.com