Re: [gutvol-d] epubeditor.sourceforge.net

9 Nov 2011

      On 10/30/2011 12:35 PM, Bowerbird@aol.com wrote

[massive snippage]
...
you know who's gonna work on your app, lee?
you. and only you, lee. you. and nobody else.
Yes, that is what I have always believed. My main purpose for creating 
an "open source" project is not to attract "adherents," but simply for 
purposes of transparency (and backups in "the cloud"). If someone wants 
to help out, and it's someone who shares my vision I'm happy for the 
help. If someone just wants to "steal" my ideas (for whatever they're 
worth) that's fine too.

[much, much more snippage]
...
...
What /I/ want is the output from FineReader
as though the "Save as HTML" option was selected,
with all the markup that FineReader was able to intuit
if i get "all the markup that finereader was able to intuit",
then i can do the job just as well as you can. maybe better.
Then we're both in luck. As I've been looking more carefully at the HTML 
output produced by the IA script, I'm discovering more and more useful 
information.

When one uses FineReader, the post-recognition process brings up a 
side-by-side view of the image and the recognized text. The recognized 
text highlights words that FineReader is uncertain about or which do not 
appear in its dictionary.

In the IA output, I'm discovering that that data has been preserved. I 
think with some effort, it would be possible to use this data to build a 
web interface substantially identical to the proofreading interface 
provided by FineReader.

So Alex, all that talk a while back about how I wanted a "leaner, 
meaner" file? Forget about it. I think I like it just the way it is. I 
can select out what I need, and it has some potential.

Why don't you talk IA into hiring me, so I can work on this full time? :-)

[yet more snippage]
...
what do the professionals advise us amateurs to do?
they advise us to save the file as plain-ascii text, and
then to apply the .html to that plain text, including
the reapplication of styling (e.g., italics) which gets
_lost_ when the file is saved in plain-ascii format...
An interesting assertion, although a bit thin on actual evidence. 
Apparently Liz Castro advocates using Adobe's InDesign (shudder) to 
generate the HTML to create ePub's and Josh Tallent talks about using 
Microsoft Word's "Save as HTML" as the first step, and then cleaning up 
the resultant HTML (he goes on to point out that "HTML is a very simple 
language to learn").

I, personally, have started with "Just Bare" ASCII only one time, and 
gave it up before I was done because it was just too painful. Of course, 
I'm obviously not a professional, but I would /never/ advocate that an 
amateur to start with Just Bare ASCII; what with macros and global 
search and replace even cleaning up clumsy HTML is easier that adding it 
all back in by hand.
...
the application of good solid .html, though, is wise,
so _that_ part of the advice i can thoroughly second...
[snipped assertion I happen to disagree with]
...
now, the truth is that those pros have "scripts" that
apply the markup automatically. plus they _know_
.html already, well, so this comes naturally to 'em,
even if they have to do some of the work manually
You make a good point here. 99.9% of the time when I'm creating an 
e-book I start with the HTML output from FineReader. But as has been 
pointed out elsewhere, FineReader produces SGML/HTML not the XML/HTML 
required by ePub. So the very first thing I have to do is convert the 
FineReader output to XHTML. (It's possible to use HTMLTidy to accomplish 
this, but I wrote my own program derived from the Tidy code base which 
not only does the conversion, but it does some other useful 
transformations as well).

When I first started designing ePubEditor, I made the conscious decision 
/not/ to try and write or integrate Yet Another HTML Editor (or Yet 
Another CSS Editor, or Yet Another JPEG Editor, or Yet Another SVG 
Editor, or Yet Another NCX editor, ...). While not a participant in the 
great vi vs. emacs religious war, I am aware of the history; I wanted a 
tool that could incorporate virtually any user's preferred HTML editor 
and not force her to accept my preferences. Thus, ePubEditor has an 
editable set of preferences where a specific editor could be specified 
for each file's media-type.

Your comment got me thinking that perhaps in addition to 
media-type-specific editors I should have user-configurable, 
media-type-specific /transformers/ as well. So I added this 
configuration preference, together with a "Transform" button on the 
Manifest pane; these additions are included in the most recent changes I 
uploaded to SourceForge tonight.

The configuration dialog provides for a media-type, a transformation 
program, a program command line, a /transformed/ media-type, and a new 
extension. For example, in the case of FR output, I can set 
"text/html+vnd.abbyy" as the media-type, fr2html.exe as the program and 
"text/html" as the new media-type. Then, I add the FR output file to the 
Manifest and set the media-type to "text/html+vnd.abbyy". When that file 
is selected, and the "Transform" button is activated, my selected tool 
runs which transforms the file from FR format to XHTML. The media-type 
is then reset to "text/html" and I can either perform more 
transformations, or open it in the associate "text/html" editor (in my 
case, Microsoft's Visual Web Developer Express).

As another example, if you had a corpus of works marked up with z.m.l., 
and had a script that would convert from z.m.l. markup to HTML markup, 
you could set that script as the transformer for files with the 
media-type of "text/plain+x-zml." You could add .zml files to the 
Manifest (setting the media-type appropriately, of course), and convert 
them to HTML using the transform function, resetting the media-type and 
renaming the file with a ".html" extension. You could then perform other 
transformations on those files, edit them with your choice of editor, or 
split them into more manageable segments using the "Split HTML" 
function. A simple "Save As..." and you would have an ePub file 
(although it would not be guaranteed to have valid content. For that, 
you would want to run the built-in "epubcheck" report.)

Thanks for the idea.

[last of the snippage]
...
so if you're one of the amateurs who are _struggling_
with the proper creation of these files, lee's program
would be a _godsend_ to you, saving time and hassle.
Thank you. That's what I am attempting to accomplish.

Re: [gutvol-d] epubeditor.sourceforge.net

Lee Passey