re: [gutvol-d] here's that alice pdf

joey said:
For those who are interested, I use the open-source library "itextsharp" to generate PDFs from an XML master that also provide what Bowerbird calls "clean copied text".
i knew somebody who wasn't thinking was gonna fall into that exact trap, and -- sure enough -- it took joey 8 minutes. joey. maybe. you. should. reflect. on. things. for. a. few. minutes. more. just. a. suggestion. how long do you think it took me to _write_ that 55k thesis? let me assure you that it was more than 8 minutes. a lot more. so hey, next time, spend _at_least_ as much time ruminating about what i wrote as i spent writing it. ok? it'll help you from falling into obvious pits like this. you might be able to get "clean text" out of a .pdf. in and of itself, that's not all that difficult to achieve. after all, all you have to do is create some dummy lines. now that i've made it clear how to workaround that flaw, and the "itextsharp" people have made it clear as well, i expect that everyone will be able to copy out clean text. if you cannot, even though the "solution" is known to all, then your app is particularly brain-dead, we would assess... but "clean text" is _not_ the main achievement here. oh sure, it's nice and all, especially considering the pain that repurposing dirty text has imposed on people so far. but a more _important_ aspect of "round-tripping" is that the end-user is getting the _master_ copy from select-all. i highlit that, but you weren't quick enough to get it: when _you_ select-all and copy out of a .pdf, do you get back _your_ "master" -- i.e, the original .tei file? um, no, you don't. you might get out clean text. but you don't get out your "master". not even close. so you'd have to re-apply all your markup to that text. that reapplication will take more than my 2 minutes... but me? when i get out my clean text, i am getting back _my_master_. and if the clean-up is automatic? think about that. no need to "re-apply" a darn thing. it's ready to go. ready to go right into the zml-viewer. where -- just like in the earlier round -- formatting will be auto-applied to it, based on its structure, to make it pretty... that's round-tripping. power in the hands of the end-user. in fact, you could call it "power tripping" for short. really! so you weren't paying enough attention, joey. hey wait! you're not the "joey" from that "friends" show, are you? because if you are, then i know that your "dense" mental capacities are just an act you use to get the chicks. so tell me, are you _that_ joey. i might live near hollywood, but -- to tell the truth -- i only know 10 (or so) celebrities... but if you're _that_ joey, i'd peg my new number at 11... -bowerbird

Bowerbird@aol.com wrote:
i highlit that, but you weren't quick enough to get it: when _you_ select-all and copy out of a .pdf, do you get back _your_ "master" -- i.e, the original .tei file?
You don't. You don't need to. You can download the master from pg. alice.tei is 180k so on a 1MBit dsl the download will take: 1,4 seconds.
that reapplication will take more than my 2 minutes...
If I do a "select all" in acrobat and then paste it into OpenOffice all I get is a jumble of words without any formatting at all. I don't see any line breaks. I don't see any italics nor underscores in place of italics. I don't see any pictures. A run-of-the-mill user will not even try to do anything with this jumble of words spat across his screen.
that's round-tripping. power in the hands of the end-user. in fact, you could call it "power tripping" for short. really!
Or "power-cord tripping" for a better mental image. -- Marcello Perathoner webmaster@gutenberg.org

--- Bowerbird@aol.com wrote:
hey wait! you're not the "joey" from that "friends" show, are you? because if you are, then i know that your "dense" mental capacities are just an act you use to get the chicks.
If this sort of baseless personal stabbing is going to be accepted here, I'm going to just stick to DP, which is a shame since I like having exposure to the broader PG community. (Speaking of which, Jon, still working on the Sutra...every time I near completion, I come up with some other idea...it's been a fantastic project for thinking about improving images). The reason I say so instead of just unsubscribing is because I know many others have left this list for the same reason. This nonsense has dominated the list for awhile. __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com

On Wed, Sep 28, 2005 at 09:54:50AM -0400, Bowerbird@aol.com wrote:
joey said:
For those who are interested, I use the open-source library "itextsharp" to generate PDFs from an XML master that also provide what Bowerbird calls "clean copied text".
i knew somebody who wasn't thinking was gonna fall into that exact trap, and -- sure enough -- it took joey 8 minutes.
Bowerbird: Please read more carefully. My email was not addressed to you. It was addressed to people who are looking for a way to get "clean copied text" from an XML master. I didn't mention round-tripping because that's not something I have the least bit of interest in.
participants (4)
-
Bowerbird@aol.com
-
joey
-
Jon Niehof
-
Marcello Perathoner