
jon gorman said:
Just wanted to clarify things.
that's good. i like clarification... :+)
I'm skeptical about bowerbird's claims as well
that's good. i like skeptics... ;+) but the proof is in the pudding, jon, the proof is in the pudding...
but it's misleading to say that Acrobat doesn't store text in the document.
i believe, like you, that that would be a misleading statement.
It is possible to make the text rather obscure
well, as i said, one _can_ make it rather totally "obscure" by converting it to graphic format before writing it to the .pdf. in that case, the user cannot copy out the text -- as text -- to the clipboard. such "text" is not found by "find" either. (here i'm largely speaking, of course, as a _programmer_ who is actually outputting the content to the .pdf driver. most people creating a .pdf don't have that luxury, in that they're stuck with whatever their authoring tool might do. as a sidebar here, i will note that the problems involved in copying text from a .pdf are well-known and long-standing, so they _should_ have been addressed by the programmers of common authoring tools, like word-processors, by this time. in programming my tool, i have sought to empower my users, including in this arena of round-tripping text put into a .pdf.)
but that doesn't mean that if formatted correctly you could not scan through the file in a text editor and
read it. Granted, it's rarely done, but doesn't mean it's impossible.
well, i believe your statement is misleading as well, jon... (and if you're striving to "clarify" things, you really should try something to see if you _can_ do it before you _say_ you can...) load a .pdf into an editor; you won't find much (if any) text qua text, not in a recognizable form you can easily copy out to the clipboard. (it's not _impossible_ you will find some text, depending upon how the .pdf was created, since there is text in some .ps files. but it's never a long unbroken stretch before it is interrupted by postscript commands, so this approach is doomed to failure.) so one shouldn't expect to find text -- stored as text -- in a .pdf, not in the traditional sense. (however, see the p.s. on this post.) nonetheless, if the text wasn't stored in the .pdf in _some_ way, users wouldn't be able to copy it out to the clipboard, would they? and acrobat wouldn't be able to do "find" operations on it, would it? (notably, though, you'll discover that acrobat's "find" capabilities don't extend to whitespace. for instance, you can't do a search for two spaces, even if there were such instances in the original file.) -bowerbird p.s. it might be possible to store text in the comments of a .pdf, i'm not sure. if you could, then that _might_ be interesting to do. (i will explore the possibility, especially when my app starts to create .pdfs directly without running them through a .pdf driver.) with such storage, one wouldn't need to pull the .pdf into acrobat in order to retrieve the text from it, which might be a capability that some people would find useful. (it would also allow ordinary search programs to search the .pdf.) but that's just gravy to me; as long as users can "roundtrip" text out of a .pdf, my goal is met. once people get used to my viewer, they won't even _want_ .pdfs.