re: talking to the walls

1 Nov 2004

      carlo said:
...
How do you include the information in the files
  if it has been removed?
go back to a source and get it, that's how.

where applicable, information about page-breaks
can be obtained from the d.p. proofed text-files;
it's a simple matter of matching up image-scans
with the text they contained.  (see the page that
contains a table of the scans with their text-files.)
that's why i offered a demo using that specifically.

(nonetheless, it's positively _criminal_ that we
should even have to do _anything_ to re-gain this
information, since it was _willfully_ discarded.
when is this bad practice going to be halted?)

for books not done by distributed proofreaders,
it's as easy as loading the text-file into my viewer
and clicking on each word that starts a new page
as you get that information by viewing a paper-copy.
(my viewer will then save an updated copy of the file.)
this process can be facilitated by setting the leading
so the lines-per-page is equivalent to the paper-copy,
making the task almost trivially easy (but still useful!).
...
And moreover, how do you find the correct page 
  when some material (e.g. the footnotes) has been moved, 
  and the page contents are no longer consecutive?
footnotes are easy.  (my viewer displays them on the page
where they are called anyway, so there's no problem there.)

and if you point me to some examples of the other "material"
that is moved, i'll be happy to tell you how i'd deal with that.
...
I have a solution of both problems for DP-produced books 
  using the files output by DP before the post-processing stage;
right.
...
these files correspond to individual pages of the original book, 
  and you can find the image corresponding to a fragment of text 
  through a grep on the DP-file.
that's one way of doing it.

but why not run the process systematically, one time,
restoring the page-break information in the text-files,
and incorporating the ability to grab the image-scans
-- automatically and simply -- using that information.

i'm sure you know that the eyes of most users 
glaze over when you start talking about "grep".

besides, what needs to be done is to _thoroughly_incorporate_ the
error-reporting process _into_ the end-user's reading-experience,
so as to maximize the eyeballs of all the people reading the e-texts.
it's just a shame that -- at the same time readers are condemning
the e-texts because "they are full of errors" -- practically _nothing_
is being done to harness their ability to _catch_ and _report_ errors.
...
The concept has been implemented recently by a student, 
  and a test of 300 recently posted PG ebooks should be 
  publicly available before the end of this week. This is 
  a part of a system for ebook maintenance (an user can 
  submit a proposal of correction of a text through a web page, 
  after consulting the original images, and an administrator later 
  can accept - or reject - the proposals and obtain automatically 
  a corrected version).
sounds like a process i described in great detail months ago here.

i'm glad somebody is programming it for you guys, because i'll be
leaving here shortly.  but i intend to write the app anyway, because
users who want to grab content from the million-book-project will
need it to turn those scans into nicely-proofed and formatted text...

-bowerbird

Bowerbird＠aol.com

tags

participants (1)