James: Thanks for sharing more with us regarding what your UniBook system is like. However, we don't need to get to carried away yet. I'd like to point out (with all due respect) that many people have developed a "perfect markup system" for electronic books, and then become frustrated when it doesn't catch on with many (or any) other people. That's just the way things go. However, I am willing to put plenty of effort into this as a "proof-of-concept" of what a PG-type effort can be like. I do sincerely hope it will not be just another "let's reformat PG texts to our own specifications" effort. What I would ask for right now, in the beta stage, is a documented markup that we can use, and the ability to consistently produce a PG-type plain text file from it. I know that XML has the promise of all sorts of wonderful possibilities, but let's leave that for the middle term.
My system uses psuedo markup
I'm curious what you mean by _psuedo_ markup. I'd love to take a look at it soon.
As long as the content is imported into UniBook using this syntax
Which brings up the inevitable possibilities of human error in getting the syntax wrong, or enclosing incorrect material within syntactical delimiters.
should I mention some of the other cool things that can be done
I'll put it on a list of future plans... Thanks, Andrew On Fri, 14 Jan 2005, James Linden wrote:
First off, Andrew is correct -- I do not want my project, UniBook, to be under any sort of PG umbrella -- I wrote it for a far bigger purpose. PG is just one of many projects that can make use of it. I have no problem providing the source code (once I clean it up a bit) under an open source license.
My system uses psuedo markup, and is actually _easier_ to do than PG's vanilla text (in my opinion). I still have to write full documentation on the syntax, something I've held off doing because of aforementioned political BS.
As long as the content is imported into UniBook using this syntax, it can be automatically parsed with accuracy. Obviously, all imports would be vetted by humans, but that'd be a minimal amount of work.
I should mention that the demo at ibiblio.org/edison is very rough, and doesn't have all the formats support that I've actually written and have backed up on CD. That CD also has the search engine, browse by title/author/date/genre/LOC heading/style, etc.
When I last worked on the code (over a year ago), I had full output support for 6 formats, and beta level output for another 3. There are 4 more still on my list to write after those 3 beta ones are finished. Once a text is in the system, outputing takes an average of 1/2 second per format (TXT and XML are much faster, but TEI and PDF are a bit slower). So, assuming the code is done for all 13 formats, that'd take less than 7 seconds to (re)generate all formats for each text (assuming the text is 1MB in size) in the archive. It averages out (based on current texts in PG) to be about 3 seconds per text, because many of them are well under 1MB.
Assuming we have 15,000 items (as MH says), which we actually do NOT have, that'd take about 32 hrs to regenerate the entire library in 13 formats.
Adding new output formats is very easy -- it's just a PHP class with a single required function which accepts one parameter -- the document content. What that function does is irrelevent as long as it returns the final output or filename as a string. This means it can either build the output itself, or call an external program, etc.
Let's say that PG's desired master format is TEI, UniBook can output it as mentioned. If that TEI spec every changes, we just have to change the output function, and regenerate the archive in only that format.
Maintaining the archive becomes child's play as well -- make any edits to the database record(s) that are needed, then re-generate the output formats. This makes it extremely easy to implement a user submitted error corrections system which "admins" can just verify items to be changed, instead of having to go through the files manually, etc.
Here's where UniBook currently stands:
1) Need some code cleanup (I pretty much have to do that since I wrote it) After that, we can CVS/SVN it for cooperative maintainance.
2) Need administration interface (web based) for importing files, confirming imports, managing extra catalog data (LOC headings, etc). I can handle this as well if needed.
3) Need GUI for building the importable files. I've written several different versions of such an app in VB, but it really needs to be done in Java, so it's portable as an app, and embeddable as an applet for web-based interface. This is where I need help -- I don't know enough Java to write GUIs from scratch. I can provide a fully functioning VB GUI (with code if desired) that would just need to be reproduced in Java. The whole interface is relatively simple - a WYSIWYG with limited functionality.
Once a GUI is written, it'd be child's play to get ALL of PG's current text imported into the system - by volunteers interested in doing it - along with all new text being done with it natively.
Oh yeah, should I mention some of the other cool things that can be done with this system as the base? Like automatically generating CD ISO images for any combination of texts? For example: we can do a CD for each year's new/updated texts, without wasting space on ones that haven't changed. Or, we can generate a CD image for all of Shakespeare, etc. People can build their own list and have an ISO automatically generated for them to download, with the texts in the format(s) of their choice...
...the list goes on and on...
-- James _______________________________________________ Project Gutenberg of Canada Website: http://www.projectgutenberg.ca/ List: pgcanada@lists.pglaf.org Archives: http://lists.pglaf.org/private.cgi/pgcanada/