
jeroen said:
Although I agree with Michael that there is no need to preserve things as linebreaks in most texts --
ok, well you and michael agree. that's good. :+) but what do you say to end-users who want that info? somehow, "tough luck, kid, _we_ don't think it's necessary" doesn't sound like the kind of thing _i_ want to tell people. because that's the type of statement that makes people go off to a different cyberlibrary. that's my whole point. (and to all of the other people who responded on similar "theoretical" grounds, i'm truly sorry you missed the point.)
if you really need to go to that level of detail, there is always the original or the scans to fall back upon
well, neither of those gives you the flexibility of digital text. but yes, a tight coupling of the two forms is the best method. you will note that those "digital reprints" from jose menendez allow a reader to summon the scan of the page with one click. (since the page already looks like the scan anyway, there might be little reason to do it, though, except to verify that similarity. but this constant willingness to demonstrate the verisimilitude will be the proof that makes people comfortable with the use of the smaller-sized digital reprint, with its expanded functionality, as opposed to the bigger, slower, dumber collection of scans. anyone who has proofread a scan against reflowed text knows the reflowing makes that task immensely more difficult though, so you'll never attain the same confidence in the text's accuracy.)
I want to make a case for preserving page numbers, if not at least as recognisable anchors in text, and only for those books being referenced to regularly by other books.
page-numbers are retained in many e-texts these days... but i'm sure you remember we all had this same argument about page-numbers. i'm confident that -- down the line -- sentiment will similarly change to be in favor of line-breaks. in general, i've just been content to wait it out until the change; but seeing all the e-texts as they cross my screen downloading made me realize again the sadness of the discarded line-breaks.
This excludes most fiction, but is particularly important for scientific works, which have constructed a kind of paper web with cross references mainly based on page numbers.
there are plenty of cross-references made to works of fiction. and the concept of "books reading each other" would require that _all_ of our books are brought under the same umbrella...
In long term, such references of course should give way to proper references to the actual paragraph or sentence being referenced
good! you recognize the need for a finer-grained pointer than the page. because that's the kind of thinking that leads to line-break retention. you can narrow things down rather specifically when you point to the range that's represented from page-19-line-7 to page-21-line-14, or from page-87-line-6 to page-87-line-8, can't you? not only that, this kind of reference also works for the person who only has the paper copy of the book, not the e-book, if the two are duplicates of each other. and that's precisely the type of capability i'll have in my viewer-program. even in a traditional browser, it wouldn't be hard to implement something roughly equivalent, though. the user could specify some text with a link, and after going to the precise point of the link, the browser could then execute a "find" command for the specified text. it wouldn't be hard at all, and would seem to give a rather exact form of pointing to a specific place. it has the benefit of being implemented entirely outside of the document, as well, which i see as being tremendously important. if all our links need markup in the original document to be implemented, as is the present case, we're _never_ going to be able to quickly get to a point of profuse interlinks. we'll get thoroughly bogged down in the quicksand of heavy markup first... (for an example of that, take a look at the markup which jon noring posted, and then read through that particular diversion of this thread. the horrors!)
but as a practical ad-interim solution, staying with page numbers will increase the number of texts we can digitize with our limited means.
it doesn't cost anything to retain the line-break information.
I would however, like to see the collection be incorporated in a kind of
wiki-like system, where people can add -- without tampering with the static source texts -- annotations, add tagging and create live cross references
i've had a demo up for some time now showing "continuous proofreading".
i also used a similar template in these demo-books:
http://www.greatamericannovel.com/mabie/mabiep001.html http://www.greatamericannovel.com/myant/myantc001.html http://www.greatamericannovel.com/ahmmw/ahmmwc001.html http://www.greatamericannovel.com/sgfhb/sgfhbc001.html
this system could easily be elaborated upon to build what you requested here. indeed, i will be pouring all of the p.g. texts that i'll be handling -- perhaps some 5000-6000, as near as i can tell -- into just this type of system, within the next 6 months, and i would be open to any ideas that you might have... heck, design a webpage to do what you want, and i will use it as the template. you know me, i don't even care if it "validates", as long as it's easy and it works. *** andrew said:
There are places such as wikisource.org, where you could add the texts and start providing links such as you mention here immediately.
i'll check out wikisource.org to see what kind of capabilities they offer. in the past, when i've looked at existing sites, it has seemed that wikis aren't geared to do things -- like populate pages -- on a massive scale. even rather fundamental things like batch f.t.p. are sometimes missing. and when you're dealing with thousands, or tens of thousands, of files, it becomes absolutely necessary to deal with them in a template fashion. i also think there's a good reason jeroen asked for a "wiki-like system", and not a wiki per se, as indicated by his concern about "tampering" with the static source texts. the thought is that the original source -- and indeed, the string of comments as well -- must be inviolate. that's because the idea is to build a body of thought around a text, of which links -- intrasystem, and outgoing and incoming -- are a very crucial aspect. and it's not possible to link into a wiki proper, because what was there yesterday might well be gone today, only to reappear in different form tomorrow. you can't link into a pile of sand. oh sure, you could instruct users to leave link markup untouched. and they might even follow your instructions. (yeah, right.) still, that will interfere with refactoring, and get very crufty before long. besides, a good part of the give-and-take of this kind of conversation involves letting all of the arguments stand, rather than editing them. (and especially rather than "editing them by deletion".) let the future examine all the arguments, and see which ones stand the test of time. so you need to have stability for the process itself, not just for the links. -bowerbird p.s. jeroen, if you want to provide me a template, i could use it sooner rather than later, the better to architect it into my overall work-flow...