Re: [gutvol-d] Improving the PG library

24 Sep 2012

      On 2012-09-24, don kretz wrote:
...
I'm not seeing the value of this RTT thing (but I'll admit I'm not
sure what it is - maybe an example would be helpful.)
The RTT is more or less DP's P3 output without the clothing of eol
hyphens and dashes so that a line in the RTT will always correspond
directly to a line on the page. Rather than:

[OCR]--p1-->[P1]--p2-->[P2]--p3-->[P3]

the proposal is to do

[OCR]--p1-->[P1]--p2-->[P2]--Diff against extant PG text-->[RTT]

possibly doing p1 and p2 repeats to improve accuracy.

If you follow a workflow where you do formatting seperately, you will
likely at some stage in the process have something akin to the RTT.

The problem that the RTT seeks to be the solution to is: given that we
all have our own ideas about workflows and we will defend those ideas to
the death, what is the latest usable snapshot common to the vast
majority of these workflows?  What is the latest point that I can pick
off from your workflow that will allow me to continue with my workflow,
and vice versa?

For a subset of the workflows that can be based on the RTT, there will
be other usable snapshots further on. If someone is carrying out such a
workflow there is nothing stopping these being captured as well; other
compatible workflows can start from this snapshot instead. There is also
nothing precluding deriving from a derivative if that works for you. The
RTT is just a low level foundation for all these things that is there if
you want to use it.

In general, the default answer to how any given thing is encoded into
the RTT is "like DP does it", since that is what people know, and DP or
a DP like organisation will be needed to produce them. The exception is
the eol clothing which actively destroys data and line correspondence,
and that exception is only possible because LOTE already uses the
exception. There are all sorts of things that can also be done in a
"post-process to RTT" sort of way, such as converting to curly quotes and
translating form "--" to "—", but, as you correctly point out, there is
no point getting to that level of detail until we at least sort out how
to select the right master scans.
...
If there is to be a source text which is the canonical starting point
for further work, it seems to me it needs to have been treated so as
much implicit syntactical identification as possible has been
explicated and disambiguated with some documented form of markup -
which form doesn't matter much, because if it is sufficiently complete
and unambiguous it can be converted into any other form.
This canonical starting point (CSP -- I love TLAs) is something like F2
output, although I'm sure you would come up with something a lot less
random. I suggest Bowerbird wouldn't like it: he would suggest you might
as well just use ZML. Marcello would probably point out that RST would
be the solution. Jeroan would point to TEI's superior gamut. If you can
all agree on a CSP format then that's brilliant: we can have an MS, an
RTT and a CSP. But I don't want to get involved in the flame war
(*cough* LaTeX).

Cheers

Jon

Re: [gutvol-d] Improving the PG library

Jon Hurst