
David A. Desrosiers wrote:
I just ran strings(1) across about 40 of the PDFs I have here from various clients, online resources and PDFs I've created in Windows and with OpenOffice.org, and not a single one contained any readible strings that are actually in the _content_ of the documents themselves, other than the strings which comprise URLs embedded in the document itself.
So where is the text of the document stored? If its somewhere in here, why is it obfuscated by default, in every single PDF I have?
The document content itself is most-definitely NOT stored as "plain text" in the pdf documents I have here, which is a pretty broad sample set.
A pdf is a chunked file format and each chunk can be compressed or even encrypted. A run-of-the-mill pdf is always at least compressed. If you create your own pdf with pdftex you can set the compression level to 0 and lo! the text magically appears inside the pdf. -- Marcello Perathoner webmaster@gutenberg.org