Why? To discuss as an illustration of proper handling of the issues you list, and others; and to see what alternative markup schemes look like in action.


Sent from my Phone

From: James Adcock
Sent: 10/10/2012 4:53 AM
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] Basic simple test case.

Re 14668:

 

Well, the first question would be: Why?

 

Contrary to the idea that PG needs to scale up efforts 10X and “do everything” maybe the right answer is to scale DOWN things by 10X and fix the books that people actually want to read, but which are currently hopelessly gone moldy, rather than offer more kiddie readers?

 

Secondly, one needs to get page scans, which are at least available from Google in a variety of editions, you’d have to pick one.

 

In terms of the current “automagic” HTML conversion from txt, this txt shows the problem that PG isn’t even currently “correctly” specifying that similar <p> formatting be used on each device.  Seems given the PG txt conventions, PG should be specifying “no indent, 1em of white space between paragraphs” for the <p> styling – so at least the basics match the txt styling.  This is important because txt “formatters” implicitly are using the txt formatting rules as an element of the formatting – i.e. syntax vs. semantics *cannot* be uniquely determined automagically by examining a PG txt file, so the best one can hope to do is to emulate the PG txt layout.

 

In terms of hand-recoding the html/epub/mobi there appears to be no great problems other than understanding and dealing with the issue of merged/rounded top/bottom margins or not, which can be dealt with in the standard manner of using top margins only.

 

In terms of design issues, there appears to be minor issues of poetry – not hard since the poetry lines are short.  (how to “correctly” autowrap lines of poetry remains problematic in html since html doesn’t support poetry)

 

There are issues of quasi-table listings of words, where the traditional solution is simply to linearize the lists.  IE these word lists were “packed” on paper to save paper, but on ebook devices vertical landscape is “free” [horizontal landscape however definitely *is not*] so the word lists can simply be “unpacked.”

 

And there appears to be a minor issue of plain rules vs. decorative rules.

 

But all this would still beg the first question: Why?

 

Who is the customer?

 

What parent would want this for their kid today? Seriously?

 

Is some researcher interested in this for historical reasons?  Well – frankly they would be better off examining the bitmap scans.

 

Fundamentally, one can’t code anything reasonable unless you decide who the customer is, and how they are going to actually be using your efforts.