the good alex said:
> Here's something someone at the archive is working on
> (after hours, since it's not an official project yet).
> He'd love to hear your thoughts.
> http://edwardbetts.com/correct
i'm not sure of the exact point being made, but
a line-by-line proofing interface like betts made
is simple enough to create with the page-scans
and a correctly-created o.c.r. output text-file...
ya don't need to plow through ten tons of x.m.l.
i even demonstrated it years ago here on this list,
with an interface quite similar to the one shown
(except without that too-funky font and spacing.)
but i couldn't find the code or graphic right away,
so i just went and rewrote it and regenerated it...
here's a graphic showing baseline-determination:
> http://z-m-l.com/misc/flatland.jpg
i've appended the code which gets those baselines.
after that, it is simple to _slice_the_pagescan_ into
_lines_ and then interleave each of the lines of text.
(you can also probe the slices for paragraph indents,
for the shortened last-line-of-paragraph lines, for
run-heads with pagenumbers on the outside border,
chapter-heads which start relatively low on the page
and end-of-chapter pages which end relatively high,
footnotes with their small text and thus long lines,
title-lines indented on both sides, and blockquotes
also indented on both sides, plus image insets which
result in abbreviated blocks of paragraph lines, etc.
and of course many of those findings can be used to
sync up the text-file lines with their pagescan slices.
and none of it is difficult at all, not in the slightest.)
as far as the interface of the betts' methodology,
i'd suggest a less-unpleasant correction modality...
..._except_...
...for the fact that this whole correction strategy is
one that is severely misguided and wrong-headed.
line-by-line proofing is an unwise investment when
the vast majority of lines in the o.c.r. (upwards from
90% in most cases) can _easily_ be made error-free...
...and of course i've been saying that for many years,
having presented a veritable raft of supporting data...
instead, fix the lines which are _clearly_ incorrect --
i.e., which show up on spellcheck and easy probes --
and then move the near-perfect text into a pleasant
smooth-reading environment used by actual readers.
there's absolutely no need for a betts-style interface.
i mean, really, do what you like, people... but if you
think you can find volunteers who will let you waste
their time, i think you're likely wasting _your_ time...
because d.p. has a lock on that kind of idiot.
-bowerbird
p.s. and yes, i realize that the interface shown is
actually a word-based one, not a line-based one,
which actually adds another layer of smudge, but
we can safely ignore that for the bigger picture...
p.p.s. here's that code...
dim x as integer
dim y as integer
dim solid as color
dim question as color
dim consec as integer
canvas1.graphics.textsize=18
consec=0-99
solid=(canvas1.graphics.pixel(10,10))
for y=10 to canvas1.height-10
for x=25 to canvas1.width-25
question=(canvas1.graphics.pixel(x,y))
if abs(question.red-solid.red)<30 then
if abs(question.green-solid.green)<30 then
if abs(question.blue-solid.blue)<30 then
if x=canvas1.width-25 then
consec=consec+1 ' was a solid slice
end if
else
y=y+1 ' was not a solid slice
if y<=canvas1.height-10 then x=25
consec=0
end if
end if
end if
next x
if consec=4 then
canvas1.graphics.forecolor=rgb(0,0,255)
canvas1.graphics.drawstring str(y),canvas1.width-15-canvas1.graphics.stringwidth(str(y)),y-3
canvas1.graphics.forecolor=rgb(255,0,0)
canvas1.graphics.drawline 0,y,canvas1.width,y
end if
next y