
At 07:18 PM 16/06/05 +0200, you wrote:
David A. Desrosiers wrote:
Acrobat doesn't store text in PDFs, they store pixels and vectors and OCR'd coordinates. Most-definately not text.
You are most definitely wrong there. How else would the "find" function work?
[snip] And fonts are imbedding into a pdf file!
You see that all the text is there. Spaces are simulated by horizontal movement and kernings also. It would not be too difficult to write a perl script to recover the text out of the pdf.
or if you have the full adobe acrobat programme you can simply export to a rtf file. I did that sort of thing at work for three years. You may have to do some formatting to pretty it up, but it's definitely text. JHowse ================================================================================ "I'm not likely to write a great novel or compose a song or save a baby from a burning building...but I can help make sure that there is an electronic library of free knowledge available for future people to access."--jhutch. Preserving History One Page at a Time!! Celebrating our 6750th book posted to Project Gutenberg Join Project Gutenberg's Distributed Proofreaders http://www.pgdp.net/c/ ================================================================================