re: [gutvol-d] roundtripping formatted text through a .pdf

jon gorman said:
I think bowerbird just claimed the pdf that he exported was easy to import back as pdf (via copying out the text), not necessarily that he converted an existing pdf file. Of course, I'm probably wrong about that. Without capitols I sometimes get lost ;)
i understand... :+) and what you've said is pretty-much correct. yes, i'm _only_ talking about .pdfs that _my_ viewer-app creates. (if some other program created the .pdf, then blame that program.) so, you put plain-text into my viewer, and it formats it nicely. you can print that nice formatting to a .pdf (which looks nice). and then you can copy the text out of the .pdf. when you do that, much of the nice formatting has been stripped away, of course, and we're back to plain-text again. (if i remember correctly, .pdf _does_ retain italicizing, but it _doesn't_ retain bolding. i don't have the faintest idea why, it's kinda weird like that. and it definitely stores the color of the text, which is cute. but it definitely strips the _size_ of the text, which is bad. all of this is in _my_ version of acrobat reader, which is v4. we talk about acrobat/.pdf like it's one straightforward thing, but it's a crazy mish-mash of different-and-changing versions, so all of our discussion needs to be couched in careful clauses.) but the loss of formatting doesn't matter, because after you have made a few global changes (which, among other things, restore the blank lines between paragraphs that get stripped), you can put the text back into my viewer-program, and it will redo the nice formatting, just like it did it in the first place... with zen markup, this is all pretty easy to accomplish... :+)
If I had time, I'd write one by hand for ya that had none of the encoding mess.
i'd love to see that! -bowerbird

Bowerbird@aol.com wrote:
but the loss of formatting doesn't matter, because after you have made a few global changes (which, among other things, restore the blank lines between paragraphs that get stripped), you can put the text back into my viewer-program, and it will redo the nice formatting, just like it did it in the first place...
It's no round-tripping if you have to hand-tweak the files. Before I'd have to re-apply by hand all things your program fumbled along the way, I'd "round-trip" the pdf thru images and Abbyy Finereader. (That works for *any* pdf.) What use is this feature anyway, if you just `round-trip' pdfs produced by your program? Then why not keep the zml file around? If you could convert *all* pdf files into zml, that would be something. Or did you just learn a new buzz-word: "round-trip", and are milking it for what its worth? -- Marcello Perathoner webmaster@gutenberg.org
participants (2)
-
Bowerbird@aol.com
-
Marcello Perathoner