Wallace wrote:
Jon Noring wrote:
A lot of books we won't be able to get choppable copies of, and in a lot of cases, won't even need to: I think a key priority should be to beg, bum, borrow, or steal microform scanning capability, and start working our way through the CIHM back-catalogue, supplemented with proofraiding of the main existing Canadian image-libraries (ECO, BNQ, ourroots, etc.)
Certainly, other sources are welcome, so long as the quality of the page images meets minimal standards (to be established by the project) and that they are fully unencumbered (when the works are public domain) so the page images may freely be placed online. DP, as an example, sometimes gets scans from institutions which, by agreement with those institutions, prevents DP from placing the scans publicly online. I've voiced my displeasure at this. Except in the most extraordinary circumstances, where it is otherwise impossible to ever find an unencumbered scan of a particular work, and this is rare, PGCan should avoid such restrictions as much as possible, and actively work to acquire only unencumbered scans of public domain works. I'd even refuse offers for encumbered scans of PD works, with the hope that once PGCan gets big enough, it can go back to the institution and get the scans with no encumberances. This brings up the new talking point as to the requirements for scans. I believe the scans are just as important as the structured digital text (SDT) to be produced from them, and should be made public, referenced from the final SDT. This means the master scans need to have a minimum resolution and color depth suitable for not only OCR purposes, but also for scholarly-quality online reading (and other) purposes. My current view, subject to change, is that all master scans of B&W pages should be done at 600 dpi (optical) and grey-scale. Current work I'm doing on a project indicates that indeed 300 dpi is insufficient for comfortable online viewing, especially for texts which have a lot of fine print. If 300 dpi is decided anyway, then a *must* is that they be grey-scale. But 600 dpi grey-scale is better (in a few rare cases it may be wise to go even higher -- and of course color requires full-color scans.) The downside, of course, is that the file sizes for the scans are much larger. One could compress them using DjVu (which is impressive), but I still believe the original scans should be preserved in the original lossless form. Do not produce JPGs as part of the scan acquiring process (as IA is experimenting with using digital cameras for scanning.)
We can further distribute that task through my "cells" idea, which takes the "team" concept over at PGDP one step further.
Cells would be groups would would work more closely together to collect works in a geographical area (the Halifax cell), a given library (the Acadia University cell) or a given field of interest (the Canadian Incunabula cell; the LOTE cell; the Genealogy cell, etc.)
Good idea. I'm definitely all for encouraging/catalyzing special-interest groups to digitally scan works of interest to them. Such groups usually bring in enthusiastic volunteers who will not only help to scan the works, but will help in the proofing process to produce SDT. Local historical societies, and genealogy groups (both by family surname and by locality), are the notable groups which come to mind. A startup project I'm working with, LibraryCity, has planned for a while to mobilize these local special interest groups to digitize their holdings and to get them online. LC plans to focus on the usability and enhanceability of the final digital products. Blogs, annotation, and collection interlinking are major features of the LC focus.
They would bootstrap themselves into existence, both on our site, and through outside contacts, and make conscious efforts to assimilate everything and anything that interests them and is clearable. Think LDS genealogists meet the Borg.
Laugh. I live in Salt Lake City, and am an avid amateur genealogist. I often do research in the Family History Library downtown. (I am NOT LDS.)
The second, cataloging/copyright clearance, will take the scans which have been done, and put together MARC (or equivalent) records for the works (a lot of data can be taken from other libraries.) In addition, the group can do the research on the copyright of the works, which of course the cataloging information is important in the process. And finally, this group can look over the scans to determine if any pages are missing or badly scanned (a sort of QC function).
Again, provisional publication of the scans could help accelerate and distribute that process.
Certainly. Placing the scans online (which I assume is what you mean by "publication") certainly *requires* that cataloging records be first generated from them, as well as copyright clearance. It is my belief that whoever accesses the PGCan repository of finished works can push a button and get the catalog record in the format of interest to them, such as MARC-XML (Lars Aronsson talked about using FRBR -- don't know much about that.) Jon Noring