
here's one from last week that never got mailed out... i'll be leaving here again very shortly, since i have been reminded just why i had stayed away, because this place can be so negative and destructive and poisonous... ick! *** jon, you said the scanning took "much more than four hours". so how long _did_ it take? and if you were to do it again, with your present scanner, how long would it take you? also, how long did it take you to manipulate the images? and how did you do that? what specific steps did you take, in what order, and what program did you use to do all that? is there anything of all that which you'd do differently now? *** jon said:
OCR is quite fast. It's making and cleaning up the scans which is the human and CPU intensive part.
well, it all depends, jon, it all depends... with the right hardware -- like office-level machinery -- 60 pages a minute can get swallowed by the gaping maw. that's right. one page per second. that seems fast to me. that means your 450-page scan-job would take 7.5 minutes. probably took you more time than that to cut the cover off. and the machine will automatically straighten those pages, o.c.r., and upload to the net, while you stare dumbfounded... likewise with the kirtas 1200, geared to scanning books. http://www.kirtas-tech.com/ it does "only" 20 pages a minute, but hey, 1000 pages/hour ain't nothing to sneeze at. they estimate that in a full-scale production environment, the price-per-scan is 3 cents a page. sounds like brewster should buy a half-dozen of these babies. so it all depends. the bottom line, though, is that if a person has experience, good equipment, solid software, and a concentrated focus, they can open a paper-book to start scanning it and move it all the way through to finished, high-power, full-on e-book in one evening, maybe two. *** i said:
third, you used a reasonable naming-scheme for your image-files! the scan for page 3, for instance, is named 003.png! fantastic! and when you had a blank page, your image-file says "blank page"! please pardon me for making a big deal out of something so trivial -- and i'm sure some lurkers wrongly think i'm being sarcastic -- but most people have no idea how uncommon this common sense is! when you're working with hundreds of files, it _really_ helps you if you _know_ that 183.png is the image of page 183. immensely. even the people over at distributed proofreaders, in spite of their immense experience, haven't learned this first-grade lesson yet.
i forgot to mention earlier that my processing tool can automatically rename your image and text-files, based on the page-numbers that it finds right in the text-files (which it extends in sequence for those files without a page-number -- usually the section-heading pages). so even if you're dealing with someone else's scans, and _they_ didn't name their files wisely, you don't have to deal with the consequences. *** jon said:
I believe as you do that an error reporting system is a good idea so readers may submit errors they find in the texts they use -- sort of an ongoing post-DP proofing process.
i didn't elaborate earlier that it goes much deeper than that. a very important point here is that an error-reporting system -- over and above the obvious effect of getting errors fixed -- will actively incorporate readers into the entire infrastructure, making them active participants cumulating a world of e-books. if you have ever edited a page on a wiki, you're likely aware that the experience gives a very strong feeling of _empowerment_ -- because you can "leave your mark" right on a page, quite literally. if we set up a wiki-page to collect the error-reports for an e-text, in a system allowing people to check the text against a page-image, they'll be much more motivated to report errors than they are now, with the "send an e-mail" system. the feedback is more immediate, and compelling, with a wiki. furthermore, by collecting the reports, in the change-log right on the wiki, you can avoid duplicate reports. you can also give rational for rejecting any submitted error-reports, and/or engage people in a discussion about whether to act on a report. all of this makes your readers feel _responsible_ for the e-texts. a lifetime of experience with printed matter has made people very _passive_ about typographic errors. there's no reason to "report" an error they find in a newspaper, for instance, because hey, it's already been printed. the same with a magazine or a printed book. water under the bridge. and they translate that same attitude over to e-books, even though it _does_ do good to report errors there. so we need to do something to shake them out of their passivity, something to make them feel _responsible_ for helping fix errors. (just for the record, although i use the term "wiki", i don't mean it literally. what i have in mind is more of a "guestbook" type method, where people can _add_ their text to the page, but not necessarily _delete_ what other people have added. it's thus more like a blog, where everyone can add their comments to the bottom of the page, but the top part stays constant, to list the "official" information. but i'll still use the term "wiki" to connote a free-flowing attitude.) in addition to the wiki, you can build an error-reporting capability into the viewer-program that you give people to display the e-texts. if they doubt something in the e-text, they click a button and boom!, that page-image is downloaded into the program so they can see it. if they have indeed found an error, they copy the line in its bad form, correct it to its good form, and then click another button and boom!, the error-report is e-mailed right off to the proper e-mail address. this symbolic (and real!) incorporation of readers into our processes is a rad thing to do. but it's not the _only_ benefit of such a system; it also facilitates the automation of the error-correction procedures. the error-report can be formatted such that your software can automatically summon the e-text _and_ the relevant page-scan. so you see a screen with the page-scan _and_ the error-report. you check its merit, and if it's good, click the "approve" button and the e-text is automatically edited. further, the change-log is updated right on the wiki-page for that e-text, and anyone who requested error-notification gets an e-mail describing the change. auxiliary versions of the e-text -- like the .html and .pdf files -- are automatically updated. and all you did was click one button... face it, if you're dealing with 15,000+ e-texts, doing it manually is a sure-fire way to burn yourself out. who needs that hassle? i mocked up a demo up this, using a simple a.o.l. guestbook script. i'm sure you versatile script-kiddies here could do something that was much more sophisticated, but my version will give you the idea: http://users.aol.com/bowerbird/proof_wiki.html -bowerbird