Re: [gutvol-d] roundtripping formatted text through a .pdf

16 Jun 2005

      At 07:18 PM 16/06/05 +0200, you wrote:
...
David A. Desrosiers wrote:
...
Acrobat doesn't store text in PDFs, they store pixels and vectors and
OCR'd coordinates. Most-definately not text.
You are most definitely wrong there. How else would the "find" function
work?
[snip]

And fonts are imbedding into a pdf file!
...
You see that all the text is there. Spaces are simulated by horizontal 
movement and kernings also. It would not be too difficult to write a perl 
script to recover the text out of the pdf.
or if you have the full adobe acrobat programme you can simply export to a 
rtf file. I did that sort of thing at work for three years. You may have to 
do some formatting to pretty it up, but it's definitely text.

JHowse

                        ================================================================================
                        "I'm not likely to write a great novel or compose a 
song or save a baby from a burning building...but I can help
                         make sure that there is an electronic library of 
free knowledge available for future people to access."--jhutch.
                                                                        Preserving 
History One Page at a Time!!
                                                             Celebrating 
our 6750th book posted to Project Gutenberg
                                                  Join Project Gutenberg's 
Distributed Proofreaders http://www.pgdp.net/c/
                        ================================================================================