
hmonroe>I would appreciate some guidance on what adjustments would need to be made to the HTML produced by Guiguts to make it more amenable to conversion to epub format by epubmaker or otherwise. With apologies, it's been a couple years since I looked at guiguts so what I will offer instead of guiguts-specific comments are comments in general about what I see "broken" with much of the html files on PG re "epub" -- including recent html submissions. I hope, perversely, that some of these problems are in fact coming from guiguts -- in which case "we" have some hope of solving these problems! Now by "epub" I will take as meaning "anything that is submitted to PG in html form, which PG then converts to some flavor of epub, and which then presents to the PG customer, OR, which PG further converts from epub via kindlegen to target some flavor of mobi reader before presenting to the customer." If you read the "epub" specs carefully I think you will find that "epub" includes devices such as mobi devices where that "epub" in turn is compiled to an intermediate form (mobi, or kf8) [via kindlegen] before being presented to the final customer device. 1) There is one very mobi-specific problem which can in turn be decomposed into two parts: a) many mobi devices round vertical distance spaces to the closest 1.0em whether or not those distance specification where written in ems. b) many mobi devices devices DO NOT merge top and bottom margins but rather follow the archaic approach of ADDING those margins. Where this bites PG customers very very very frequently is when submitted HTML contains some form of "splitting the baby" i.e. specifying both top and bottom margins -- especially when those top and bottom margins are applied the ubiquitous case of <p> paragraphs. For example a "p { margin-top: 0.5em; margin-bottom: 0.5em; }" specification LOOKS innocuous, but in fact results in a *** 2.0em *** spacing between paragraphs on most mobi devices! While ugly even that is still not a killer unless the book being converted contains lots of dialog-exchanges, in which case one gets a book which contains one line of dialog, followed by two lines of nothingness, another line of dialog, followed by two lines of nothingness, etc. -- which in practice one reads-in the prosody of extremely slow sluggish stupid protagonists -- which is presumably NOT the original author's intent! The only "reasonable" way around this problem is to understand the limits of these devices. For example it is easy to "prove" in practice that changing the original specification of p { margin-top: 0.5em; margin-bottom : 0.5em } to p { margin-top: 0.51em; margin-bottom: 0.49em; } is sufficient to "completely" solve the problem on many html books using this particular "split the baby" approach [which was really not needed in the first place!] 1b) Actually there are many "epub" [Android (say)] Epub Reader apps which choose not to implement more or less of the "html" spec resulting in many of the same problems as described as being "mobi specific" in 1) above. 1c) I will not comment on how Apple chooses to implement "epub." Read the Elizabeth Castro web blogs and books about this subject: she says it better than I could hope to do. 2) The great majority of the remaining problems I commonly see are not mobi-specific problems, nor are they epub-specific problems, rather they are simply problems where the original author of the html text mentally conceptualized the rendering of that html ONLY on their desktop monitor of say 20" width horizontal resolution 2048 bits and they didn't bother to think what would happen to that conceptualization when rendered on a device which say has a 3" horizontal width resolution 480 bits. And/or they don't think about how their color choices will render on monochrome devices. Or frankly, they don't care. Or frankly, they are openly hostile to PG customers who own one or another different "flavor" of machine and are going out of their way to sabotage owners of such machines, rather than trying to do "write once read everywhere." They could discover most of these problems for themselves, should they choose to fix them, simply by setting the window size of their html browser to say literally 3" wide to 4" high *without* making any other changes, such as changes in font size. Is what you see there still beautiful? No? Then you have written broken HTML code. Good HTML code still looks beautiful even when displayed in that small a window without making any font size changes or any other changes. And without doing any horizontal scrolling. Not "passable." Not "broken but who cares." But rather: "still beautiful even when viewed through a non-scrolling window port of 3" wide by 4" high and without changing font sizes." 3) A Common Problem: The original html author has a really wide 20" screen where the width is much wider than the height is high. They maximize their html browser and then try to read their work -- which looks really really ugly on their screen which has a 16 wide by 9 high aspect ratio. So they stick in a "body { margin-left: 16%; margin-right 16% }" statement which makes them feel better about how their work looks on their machine. They could have specified these margins in their own HTML browser preferences -- but most people don't know how to do that. Or they could demaximize their browser window and choose a window shape more closely simulating the shape of a typical printed book page. But they don't do that. OK, now how does this html display on our hypothetical small machine? The original 3" width display is now effectively only 2" wide -- guaranteeing an unhappy PG customer! The 480 horizontal pixel resolution is now only 320 pixels wide -- NOT a happy camper! 4) Related problem: Gee now that I the html author have all this extra left and right margin space -- how about if I get really clever and stick page numbers in that space!? Wouldn't that be a contribution to humanity? Answer: Now the author has created a justification for mandating those large margins, making it very difficult to remove them. Some small machines automatically remove these margins, or give the device owner manual control to remove those margins. Do those page number then silently and thankfully get dumped into the big bit bucket of bad ideas in the sky? Nope, instead what happens is that those page numbers now get rendered ON TOP OF the text the customer is trying to read. OR those page numbers get rendered randomly INTO the body of text being read: "Four score and seven years ago our fathers brought forth on this continent, a new natPage 6ion, conceived in Liberty, and dedicated to the proposition that all men are...." 5) Related problem: I the html author have this text from the mid-1800s which contains a quasi-illuminated letter as the first letter of each chapter paragraph, and I think it is really important to render my html exactly as close to the original text as possible -- in spite of the fact that the quasi-illuminated letter is a stock printers' image "clip art of the day" which has no relationship to the text in question -- the book being transcribed is an "Everyman's Library" edition where the publisher thought he could gin up Xmas sales by sticking in "Red Letters" and gilding the binding for people buying Xmas presents to provide pretty padding for their library shelves. So I will stick in a gif of the quasi-illuminated letter doing a "float left" at the start of my paragraph. Q: Does this work? A: Not if you have any artistic sense. In html there is no fixed relationship between the float and the rendered "normal" paragraph text so even if it "looks good" on your particular choice of html browser on your particular desktop machine that's just "dumb luck." And this has no chance of looking attractive on small machines simply because the screen size relationships are totally different. And "float left" is really just another variation on the left and right margins problem listed above -- the three or four lines of text floating right of the float left are now reduced to 2" in length guaranteeing they will look stupid, if not actively broken. And this will probably screw up accessibility. 6) Related problem: OK not a gif-letter, but I still want to make that first letter red and 3ems in height to make it really dramatic and I want it to be a "real" dropped cap so I will again do a float-left and I will play around with negative margins to make it drop. Q: Good Idea, right? A: Again, there is no fixed relation in html between your float-left and your following paragraph so if "works" on your desktop and browser its just "dumb luck" and won't work on anyone else's browser and certainly NOT on their small machine and will look ugly and silly on these other machines -- if not totally broken. And it will probably screw up accessibility. What you *can* probably "make work" is to enlarge the first letter (and/or word) of the first paragraph of each chapter using proper CSS, if you make it *marginally* larger, say 2em, and you don't make it red, and you don't attempt to "drop" it, and if you use the CSS to tweak the line spacing a little bit for the first line of the first paragraph after a chapter break. But is it *really* worth it? And does it *really* make "artistic" sense?? [Aside: A "real" dropped cap, if one were to read typographer's texts, whether of the "illuminated" or plain variety is an enlarged letter whose baseline exactly matches the baseline of a 2nd or 3rd subsequent line in the paragraph. Its spacing has to be carefully adjusted so that the visual space to the right of that dropcap "looks right" in relationship to the following paragraph text. Even in the "good old days" typesetters had trouble making this look right, relying literally on a metal file rasp to adjust the size of the drop cap to fit correctly. Good luck making it "fit right" in HTML -- There are no metal files in CSS.] 7) Related problem: The original text I am transcribing is an 1800s "photocopy" of a 1600s text in 9" x 14" size which contains editorial notes "in the margin" which I want to reproduce literally in the left margin of my html text. And of course I still want my page numbers in the right margin. I am not greedy -- I will only take 1" of left margin for my editorial notes and 1" of the right margin for my page numbers because I want my HTML to look just like the original. Q: Good idea, right? A: on our hypothetical 3" wide small device your decision to take 1" of left margin for float-left editorial notes and 1" of right margin float-right page numbers leaves the owner of a our 3" small device with exactly 1" of effective display to try to actually read their choice of book. PG customer not happy -- what an ingrate! 8) Related problem: producers of small devices or reader software for those devices who run into these problems often enough decide they are simply better off NOT implementing some of this html "stupidity" and to silently ignore these most troublesome html tags. With the tags ignored the body text associated with those tags now gets inserted "at random" -- but at least the customer can read it! And now the more technological customers (and authors) for those devices complain about how those devices "...don't implement all of the HTML tags in version 9.99.999 of the global HTML standard. How dare they!" 9) Related problem: poetry. There being no good way to display poetry in HTML. There being no good way to line-wrap-and-poetry-indent. Many people propose that they have "solved" this problem using negative margins. Behavior of negative margins is extremely problematic on many small machines. For example you assume that you have a positive margin in which to poke-back your negative margin. Except the small machine has been set in a mode to discard margins so now your negative margin pokes off the screen. Except the small machine software isn't written to even conceive of that possibility. Or maybe in theory all the margin settings should add together, negatives and positives to form an overall positive margin "which ought to work right." Except someone writing the device software has decided that there is no way that some intermediate term in the margin calculations should ever go negative, and if it does that's a bug that we need to catch so we will turn that negative intermediate calculation into the number "0" -- and again, your carefully crafted poetry scheme is busted. I have seen some schemes that use block inside of span which look like they might work if developed carefully to avoid negative margins. But one would have to test these ideas when developing them on a large number of actual machines. And the transcriber would have to understand the poem in question well-enough to understand that they shouldn't be trying to transcribe literally all the line-breaks and indents in the printed page poem in the first place -- because some of those indents and line-breaks are simply limitations of the original paper page width imposed by the printer and publisher on the original poet. Except sometimes poetry *IS* literally the form of the visual image on the page. 10) And finally but not finally PG automagically inserts archaic boilerplate which horribly breaks on many if not most customer machines in the first place -- and insists on making this boilerplate the first thing the customer sees. Negative impressions first please! Etc. PS: The really sad thing is that it really simply to do a really good job of rendering most books into html that looks attractive and is quite readable on "all" machines including epub and mobi machines. The problems arise when transcribers try too hard to be too "clever" and too "literal" in their transcription. Leading to the over-general observation "Naïve transcriptions look good, sophisticated transcriptions look bad." PPS: Literally try opening some recent PG submissions in HTML mode on your desktop or laptop setting the window port size to 3" wide by 4" high and see what happens -- horizontal scrolling is cheating! Hmm, I grab, literally at random: http://www.gutenberg.org/files/38593/38593-h/38593-h.htm -- and does it work? Nope. Ugly and broken.