[gutvol-d] Re: In search of a more-vanilla vanilla TXT

15 Sep 2009


      Hi Everybody,

	I will step in here for a moment.

	As Bowerbird has mentioned this discussion is as old as PG itself.

	The problems are:
		1) Plain Vanilla Texts can not reproduce books (It is not meant, too)
		2) PG does NOT have a comprehensive format for reproducing books.
		3) PG has not evolved with mopdern computer technology.
		4) Ecerybody wants thier pet formats for reading.
		5) PG does not have a consolidated following willing to build the  
resources needed
                     to solve the above.

	There are many various reason for the above problems. Yes, there ARE  
and have been efforts
	to solve the above. Yet, none of these have fruited much or have been  
able to satisfy needs of
	all its contributors or users.

	So what is needed:
		1) A single modular and extensible format for encoding the books
			a) the structures in the book (text) need to be represented
			b) it does not presume a particular output format
			c) does not care about the size of files
			d) does not need to be very readable easily

		2) a parser for creating output formats
			a ) use all information to create the best possible output for a  
particular format

		3) an editor
			a) display the book
			b) allow for changes in the representation of the book
			c) must be modular and extensible

		4) a parser for creating the representation of the book in the  
format from scans
			a) must be modular and extensible
			b) must be multi-pass
			c) flags possible conflicts with the format
			d) intelligent to do most markup by itself
			e) intelligent to correct common errors by itself

		5) parsers for converting older formats
			a) all of 4)
			b) does not expect particular information
			c) allows for presets injorder to same time and desirable  
representation.

		6) a proofing workflow


	So what do we have. We need a a format that is not based on an  
existing format, is modualr and extensible.
	Either we start from scratch or use a generic format. SGML or XML  
come to mind. We can then put in waht we
	want and need, have a well structured format, can extend it easily  
and it is modular. Plus, XML can handle all kind
	of information an data.

	Yes, we have to reinvent the wheel for markup, but we want a  
representation that contains as much information
	as possible. The question would be how much is needed. At least the  
markup will be a layout format.

	It should only take about a month to create such a format. The other  
parts will take a little longer. The important thing
	is everything has to be centered around the representation format and  
not the output. The output is handle
	by parsers. Where a particular output format can handle or represent  
a particular feature can be a concern of the
	PG internal representation. The developers of the output format can  
converted it to what ever the seem fittest.


	regards
		Keith.

[gutvol-d] Re: In search of a more-vanilla vanilla TXT

Keith J. Schultz