
"Norm" == Norm Wolcott <nwolcott2ster@gmail.com> writes:
Norm> The gallica pdf's are very low resoloution mostly. Where Norm> there are diagrams they hardly come out at all, especially Norm> mathematical ones with small lettters on them. I t may he Norm> helpful to have a copy of the book nearby. OCR'ing pdf's is Norm> not for the faint hearted, as they are ot designed for this Norm> purpose. However they are good for layout of the original Norm> publications and for copyright use as the date of Norm> publication is usually given. Also shows the title page Norm> often omitted from other pdf files. But why you download pdf from gallica? For OCR you should download tiff, that is perfectly suited, and does not pose conversion problems. The gallica pdf is just a wrapper for the tiff files (compare a gallica pdf with a gallica tiff: the tiff is integrally contained in the pdf, with some extra wrapper) for every page). For example FineReader, if you feed a pdf, passes through ghostscript, substantially "printing" the pfd and converting the resulting bitmap; if you choose the wrong dpi while converting, you lose resolution; it instead directly uses a tiff file (tiff is the internal image format in FineReader). gallica pdf is OK if you want to read (but a multipage tiff viewer is even better). But not for OCR. You cannot blame gallica if you cannot tick the correct box when you download. Carlo Traverso