Re: [gutvol-d] a review of some digitization tools -- 022

30 Dec 2011

      ...
...
pandoc -f html -t epub -o test.epub 38423-h.htm
Lee> Did you try the reverse process? That is, converting from HTML to one
of the subtle markup languages, like ReStructured Text or Markdown? If so,
does the output appear to be something that would pass the whitewashers'
scrutiny?

At your request I tried the following output formats:

RST: pandoc churns for literally a couple minutes and then dies a horrible
death based on memory exhaustion.

MARKDOWN: pandoc produces something which looks text-like and has forced
linebreaks similar to PG txt70 but it doesn't particularly look like
markdown to me.  It would have to be extensively edited to pass muster with
the WW.

PLAIN: pandoc produces something which looks text-like and has forced
linebreaks similar to PG txt70 but it doesn't particularly look like PG
txt70.  It would have to be extensively edited to pass muster with the WW.

RTF: Produces gibberish which none of my rtf programs recognize as being
RTF.

In Summary "plain" output format might be vaguely useful to some people as
an aid in moving from html-based development to txt70 in order to jump the
hoop.

Re: [gutvol-d] a review of some digitization tools -- 022

Jim Adcock