
ok, back to work... *** have you ever tried to copy text out of a .pdf? if you have, you know that it can be frustrating. because a lot of information seems to get lost. perhaps the most noticeable are "empty" lines. if you used a blank line between paragraphs, all of those blank lines are lost, which means that your paragraphs are now all run together. compounding that problem is that the "soft" linebreak at the end of mid-paragraph lines is turned into a "hard" linebreak. it's a disaster... you _can_ choose the "export as plain text" item from one of the menus, and that does retain the empty lines. unfortunately, it also strips styling. the copy-text route retains the styling, or at least _some_ of it. but not all of it. italics are often lost. so is the indentation on block-quotes, poetry, etc. so no matter what you do, getting text from a .pdf is a struggle. that's one reason why .pdf is called "the roach motel of documents", because text can go in, but it cannot come out again. you can demonstrate this to yourself by using the .pdf that michael mcd created... *** i do some things with my .pdf tool to solve this little problem. for instance, it doesn't output a "blank" line when it encounters one. instead, it outputs a double-colon -- "::" -- that is white, thus _invisible._ (or i'll often make it light gray.) but it's still there, and gets copied out when you copy the text, so you can then do a global change of "::" to nil, and voila, you have your blank lines. in z.m.l., italics are represented by _underbars_, so i also have my program output the underbars, again turning them white so they'll be invisible... i haven't worked on this for a while, so i cannot remember what state of success it's in right now, but my goal is to create "round-tripping", so that when you use z.m.l. to create a .pdf, the text you copy out of that .pdf, after a few global changes, can be used to create that exact same .pdf again. go ahead and copy the text out of one of my .pdfs, and you'll get a good idea what i'm talking about... *** the tricks that are built into my tool are ones that you can do "manually" in your own wordprocessor, if you'd like to create a "round-trip" capability too. surround your italicized stuff with white underbars, change your blank lines to a white double-colon, and use white periods to create your indentation. i did that in the next two .pdfs i will talk about, so you can copy the text out of 'em to see this at work. *** i used a text-editor to create two more .pdfs for us in our experiments using "gods and fighting men". i unwrapped the text, freeing it from p.g. linebreaks. then i made the fontsize a more-readable 12-point. i also put back in a more-spacious 14-point leading. all these changes pushed the .pdf to some 567 pages, from the previous 391, so that's an offsetting negative, but the positive aspect is a much more readable .pdf... i created a ragged-right version, and a justified one:
http://z-m-l.com/misc/14465-rewrapped-rag.pdf http://z-m-l.com/misc/14465-rewrapped-just.pdf
these two are exactly the same, except for justification, so you might find it odd that the first is just 1.5 megs, while the second is almost twice as big, at 2.9 megs... the reason for this discrepancy is the ragged-right .pdf stores the location of each line, rendering it right there, while the justified .pdf has to store the location of each _word_, to print it in the right place. it's a big difference. at any rate, maybe the mad scientist will look at these and advise us on what pointsize he'd like to see "final", what leading he wants, and if he prefers justification... -bowerbird