Re: [gutvol-d] epubeditor.sourceforge.net

the good alex said:
Here's something someone at the archive is working on (after hours, since it's not an official project yet). He'd love to hear your thoughts. http://edwardbetts.com/correct
i'm not sure of the exact point being made, but a line-by-line proofing interface like betts made is simple enough to create with the page-scans and a correctly-created o.c.r. output text-file... ya don't need to plow through ten tons of x.m.l. i even demonstrated it years ago here on this list, with an interface quite similar to the one shown (except without that too-funky font and spacing.) but i couldn't find the code or graphic right away, so i just went and rewrote it and regenerated it... here's a graphic showing baseline-determination:
i've appended the code which gets those baselines. after that, it is simple to _slice_the_pagescan_ into _lines_ and then interleave each of the lines of text. (you can also probe the slices for paragraph indents, for the shortened last-line-of-paragraph lines, for run-heads with pagenumbers on the outside border, chapter-heads which start relatively low on the page and end-of-chapter pages which end relatively high, footnotes with their small text and thus long lines, title-lines indented on both sides, and blockquotes also indented on both sides, plus image insets which result in abbreviated blocks of paragraph lines, etc. and of course many of those findings can be used to sync up the text-file lines with their pagescan slices. and none of it is difficult at all, not in the slightest.) as far as the interface of the betts' methodology, i'd suggest a less-unpleasant correction modality... ..._except_... ...for the fact that this whole correction strategy is one that is severely misguided and wrong-headed. line-by-line proofing is an unwise investment when the vast majority of lines in the o.c.r. (upwards from 90% in most cases) can _easily_ be made error-free... ...and of course i've been saying that for many years, having presented a veritable raft of supporting data... instead, fix the lines which are _clearly_ incorrect -- i.e., which show up on spellcheck and easy probes -- and then move the near-perfect text into a pleasant smooth-reading environment used by actual readers. there's absolutely no need for a betts-style interface. i mean, really, do what you like, people... but if you think you can find volunteers who will let you waste their time, i think you're likely wasting _your_ time... because d.p. has a lock on that kind of idiot. -bowerbird p.s. and yes, i realize that the interface shown is actually a word-based one, not a line-based one, which actually adds another layer of smudge, but we can safely ignore that for the bigger picture... p.p.s. here's that code... dim x as integer dim y as integer dim solid as color dim question as color dim consec as integer canvas1.graphics.textsize=18 consec=0-99 solid=(canvas1.graphics.pixel(10,10)) for y=10 to canvas1.height-10 for x=25 to canvas1.width-25 question=(canvas1.graphics.pixel(x,y)) if abs(question.red-solid.red)<30 then if abs(question.green-solid.green)<30 then if abs(question.blue-solid.blue)<30 then if x=canvas1.width-25 then consec=consec+1 ' was a solid slice end if else y=y+1 ' was not a solid slice if y<=canvas1.height-10 then x=25 consec=0 end if end if end if next x if consec=4 then canvas1.graphics.forecolor=rgb(0,0,255) canvas1.graphics.drawstring str(y),canvas1.width-15-canvas1.graphics.stringwidth(str(y)),y-3 canvas1.graphics.forecolor=rgb(255,0,0) canvas1.graphics.drawline 0,y,canvas1.width,y end if next y
participants (1)
-
Bowerbird@aol.com