Re: [gutvol-d] seriously, feel free to ask any questions

8 Dec 2011

      Hi Lee,

	For $100??? You can be serious! Make that $1000 and I might think about it!

	If you said substantially identical screen output! That would be simple enough.
	But, substantially identical HTML. That is a very tall order. Then problem is not the 
	algorithm as such, but the heuristics involved in transforming the the input into an 
	intermediate form. 

	Anyone who has used commercial web editors can easily see simple conversions
	can cause quite drastic changes in the mark-up with just subtle differences. The rendering
	in a browser will give you almost identical display, yet the mark-up elements may be completely
	different.

	What would have to be done is a style analysis of the input. Maybe one could render the input to
	PDF and run that through a style analyzer for OCR. Sounds promising.

	regards
		Keith.

Am 08.12.2011 um 03:39 schrieb Lee Passey:
...
Here's my challenge to anyone who thinks this is an easy nut to crack.
I will take a moderately complex e-book (containing at least as much markup as PG 31103, which BowerBird seems to have a great deal of respect for) but also containing lists and tables (I'm thinking maybe the Autobiography of Benjamin Franklin, or maybe Pudd'n'head Wilson). I will mark those books up with ReStructured Text, Markdown, and z.m.l. (just to keep the playing field level, I will start with identical HTML in all cases, and do an automated conversion).
I will pay $100 to the first person who can come up with a computer program that can reconstruct substantially identical HTML from those three text formats plus the Project Gutenberg version, without knowing which file is which and without human intervention.
When you have your program in a form that can be run by an independent third party, I will provide the files for testing.

Re: [gutvol-d] seriously, feel free to ask any questions

Keith J. Schultz