June 2005 - gutvol-d - lists.pglaf.org

www.gutenberg.org
by Marcello Perathoner 25 Dec '05

25 Dec '05

As of my request ibiblio has changed our apache virtual host ServerName from gutenberg.net to www.gutenberg.org. What? Nothing changes in web site operation except one little particular (if you hit a non-existing url you get redirected to www.gutenberg.org instead of gutenberg.net) Why? The .org top-level domain is meant for non-profit organizations while the .net domain is meant for network infrastructure such as providers, backbones etc. We own both gutenberg.net and gutenberg.org, but using the latter one is just more standard-compliant. Although both urls www.gutenberg.net and www.gutenberg.org give exactly the same results, you should start using www.gutenberg.org in all publications, papers etc. -- Marcello Perathoner webmaster(a)gutenberg.org

3 2

PG Cookbook
by Aaron Cannon 30 Jun '05

30 Jun '05

I was thinking a few days ago about how PG has attracted volunteers from all over the world from different backgrounds, and I got to thinking that it might be kind of fun if PG were to compile a cookbook containing the favorite recipes from our volunteers. Since, in most cases, recipes can't be copyrighted, there shouldn't be any problem in that regard. I'll bet we could get a pretty sizable and diverse collection if we put the word out on this list and at DP. Anyway, it's just an idea. Thoughts? Any interest? Sincerely Aaron Cannon -- E-mail: cannona(a)fireantproductions.com Skype: cannona MSN Messenger: cannona(a)hotmail.com (Do not send E-mail to the hotmail address.)

7 8

Re: [gutvol-d] PG Cookbook
by Gutenberg9443＠aol.com 29 Jun '05

29 Jun '05

Announcement: I am not going to edit that cookbook. If I ever edit a cookbook, it will be one I wrote. However, for anybody who actually makes a decision to make a cookbook, here is my recipe. I call it I-am-worn-out-and-it-is-hot-as-h***-high-protein high-fiber salad: Chill one can of pork'n'beans. Chill one can of whole-kernel corn. Drain corn and empty into large bowl. Add undrained pork'n'beans. Chop up however many tomatoes and fresh onions you want to put in it. Add black olives and/or green stuffed olives and/or whatever else you want. Add celery or lettuce or whatever else you want. Toss in mayonnaise or Russian dressing or Italian dressing or whatever else you want. Eat with corn chips or potato chips or no chips at all. I am the only person I know who eats this. Everybody gives me that "are you out of your mind" look if I tell them about it. But the combination of beans and corn creates complete protein. Whatever veggies you decide to put in it are, of course, veggies. So it's a reasonably high-nutrition meal. You might drink milk with it or add diced cheese, as it is low in calcium. Or you might get your calcium by eating Tums after it, if your digestion isn't as fond of fiber as mine is. Anne Anne Do you like to breathe? Then save the trees! Begin a personal relationship with an ebook TODAY!

1 0

re: [gutvol-d] Greetings ebook makers ;)
by Bowerbird＠aol.com 29 Jun '05

29 Jun '05

jeffery said: > This would be the first sign of a response > from the gutenberg community. well, "the gutenberg community" doesn't consider me to be a part of it, so i guess you are still waiting for them. :+) in fact, since you've now soiled your trousers by even speaking to me, they will probably tell you that they are ignoring you for _that_ reason. > Do you have any comments or suggestions? motivate yourself, because they won't give you any help. :+) -bowerbird p.s. sorry i spelled your name wrong before...

1 0

re: [gutvol-d] Greetings ebook makers ;)
by Bowerbird＠aol.com 29 Jun '05

29 Jun '05

jeffrey said: > Hello fellow ebook creators, hello jeffrey. did anyone respond? -bowerbird

2 1

Amazon offers 1082 volume Penguin Classics for $7,989
by John Hagerson 27 Jun '05

27 Jun '05

http://slashdot.org/article.pl?sid=05/06/27/0632258&from=rss

2 1

Amazon offering of the complete "Penguin Classics Library"
by Jon Noring 27 Jun '05

27 Jun '05

Refer to: http://online.wsj.com/public/article/0,,SB111921715006463546-S0zI_EVookezth… Fair Use snippet from above article: "We get a lot fewer random Amazon.com links sent to us since the great Henry Raddick stopped writing book reviews, something we're still mourning. But this one was jaw-dropping: The Penguin Classics Library Complete Collection, consisting of 1,082 books. List price: $13,317.74. Discount price: $7,989.99. Never has a 40% discount seemed quite so weighty." I think the interest to PG and DP is obvious. :^) Jon Noring

1 0

Re: [gutvol-d] PG Cookbook
by Tony Baechler 27 Jun '05

27 Jun '05

Hello. Well, while I don't specifically have any favorites, I have over 163,000 recipes I would be willing to donate if that helps. I have no idea of the copyright status of them though. Also, could you please elaborate on why recipes can't be copyrighted? Specifically, could you please tell me in which cases recipes can be protected by copyright? I have thought for many years about making recipes, either individually or in cookbook form available in Braille or similar formats for the blind, but I was always worried about the legal issues. The laws are very specific on how copyrighted works may be put into formats such as Braille and I have no money or means to defend myself in case of suits. You may write off list if you would like.

3 2

Derivative works, or, what is copyrightable?
by Robert Cicconetti 26 Jun '05

26 Jun '05

I've been going through my files trying to close out some partially finished projects. I have several Beatrix Potter books that had missing or damaged pages, and I went to the library today to try to fill in the blanks. Unfortunately, they only had the newer editions with a modern copyright, claiming a copyright because they had made a new transfer of the old watercolors. As far as I understand copyright law, this claim is bogus; a derivative work must be different enough from the original to be considered a new work; a slight technical improvement on the reproduction is not enough. An original lithograph, sure, but not making new screens. This may or may not be complicated by the fact that the publisher operates both out of the UK and the US. Is my understanding correct enough to go through with an official clearance request? Or shall I hunt for older copies? Potter books are not rare, but finding the older ones is more difficult. Thanks, R C

3 2

re: [gutvol-d] header detection revisited
by Bowerbird＠aol.com 23 Jun '05

23 Jun '05

lee said: > The question is a bit ambiguous. only if you haven't been following the drama for the last year-and-a-half. welcome to this listserve, lee. have you dropped the handle for good now? it's been some time since we chatted, especially frontchannel... > What are you trying to detect headers _from_? > AFAICT, Gutenberg e-texts don't have big and don't have bold, > so neither can be the hallmark of a header in Gutentexts. that's right. so for that i need to call on some of the other items in my 30-item checklist. the very best way to detect headers in a p.g. e-text is to test for blank-lines above the line in question. three blank lines will grab almost all of the headers, as well as a dose of false-alarms. the job then is to toss the false-alarms, and to do the best job possible of discerning the missed headers. and actually, in perhaps 25%-30% of project gutenberg's e-texts, pulling lines that start with "chapter" will net most headers. :+) > Presumably, therefore, you are trying to detect headers in some > marked-up text that uses some sort of presentational markup. "markup" doesn't usually enter into the equation. it can, of course, but if something has been marked up, a good way to find the headers is to examine the markup. nonetheless, i _can_ use my system on the _presentation_ of text that has been marked-up; many of my examples will be just that. as such, it can be used in cases where the mark-up is not available, for one reason (print) or another (.pdf), but its presentation is. but of more direct concern to this listserve, however, is its application toward the task that many people here do, which for the most part is to digitize text from scans of paper-books. a routine that recognizes headers in o.c.r. output -- because they are relatively big and/or set in bold -- saves the digitizer from that chore. i haven't discussed the importance revolving around header-recognition, so that might not seem like a big deal. but it is indeed rather important. (any e-book programmer, like yourself, lee, knows why it's important.) and, getting back again to the existing e-texts -- some 16,000+ now -- a routine for determining the headers in them would be quite valuable... if you're looking for a general overview, i focus on 3 distinct arenas: 1. strict z.m.l., where header-structure is defined by certain rules. 2. "fuzzy" mode, where texts are somewhat consistent, but not always. 3. "wild" texts, where all bets are off and you do the best that you can. project gutenberg's e-texts generally fall in the second category. as the examples i give will show, it would be relatively easy for me to make software that inputs text from the second category and then modifies it and outputs a file conforming to the strict first category. but nobody from project gutenberg took me up on my offer to do that... i've done enough work on arena #3 to know that it will be possible, although you can't expect perfect output from the tool on a wild text. i largely abandoned arena #2 when project gutenberg people passed, although there will be wide-ranging applicability of this arena on texts with some kind of regularity in them, such as listserve digests. but my main focus now is on spreading the gospel of arena #1 -- z.m.l. in z.m.l., headers are indicated simply by having blank lines above them. (and the more blank lines, the higher the priority-level of the heading, so it's a cinch to handle even the most complex of heading-structures.) this simplicity means that it's easy to write fast code to find headers in a z.m.l. file, and it's simple for users to understand how to make 'em. there is still a big explosion of self-publishing that will be happening, and i want to spare all those new writers the pains of doing mark-up. i'd much rather have them concentrating on their _content_ instead! once i've got all the tools in place to do what i want with arena #1, i'll return to arena #3. being able to take text "from the wild" and ascertain its underlying structure, and then output it in strict z.m.l., so it can be handled with my tools, will be an awesome achievement. again, this is an arena where markup is impractical, perhaps impossible. consider all the content that is being generated _every_single_day_ on yahoogroups. nobody's going to mark-up all that content, so we need to have a way of pulling it into our e-books and have it be nicely formatted. > Given your assumption that headers are > 1. conspicuous, > 2. hard to miss, and > 3. easy to find > (all variations on a theme) thanks for noticing the theme... ;+) but it's not really an _assumption_. (nice try to spin it that way, though.) it's actually an _observation_ on the very _nature_ of _being_ a _header_, one of those things that seems totally obvious once realized and verbalized. and of course, once you have realized that headers are _hard-to-miss_, it becomes very silly to maintain that it is "impossible" to detect them. of course you can detect them -- because they stick out like sore thumbs! > it seems to me that the best way to detect a header is to > determine the general characteristics of the majority of > all paragraphs in a document (size, indentation, amount of > punctuation, location of punctuation, capitalization, etc.) and > identify as headers any "paragraphs" which fall way outside the mean. now you're thinking. looks like you're on your way to replicating my 30-item checklist. > I presume you have a reliable way to identify paragraphs > (not always possible when using text derived from PDF files). well, yes. and the fact that text copied out of a .pdf loses its blank lines -- which then makes paragraph-detection exceedingly more difficult, -- does indeed make the detection of headers more difficult as well. which means you have to solve the paragraph-detection problem first, as best as you can, anyway, with text that you've copied out of a .pdf. restoring the paragraphs is a much bigger task than detecting headers. if you can't perform that hard task for end-users, why do the easy one? but the solution isn't as hard as you might think, although it's not 100%. when i'm done discussing headers, if you want to discuss this, we can... and besides, dealing with text copied out of a .pdf is not a high priority. the best way to deal with _that_ kind of text is to go to the producer and say, "can i instead have the file that you used to produce the .pdf?" but even without having solving this .pdf paragraph-detection problem, -- i.e., with all blank-lines removed -- my checklist does pretty well... > Consider the shortest verse of the Bible: "Jesus wept." > Biblical verses are merely numbered paragraphs. > Can your algorithm determined that it is a paragraph and not a header? um yeah. "headers" in the bible are "paragraphs" that are not numbered. and -- as you yourself just pointed out -- the actual verses are. voila. > This is the problem of the false positive: > it is as important to identify not-headers as it is to identify headers. yes it is. and much of the 30-item checklist is attuned to that issue. once you've accepted that this is part of the job, it's not all that hard. > You would be much more likely to > increase your list of special cases > if you would share the thirty-odd > special cases you have already identified. i haven't identified "thirty-odd special cases". i've abstracted 30 rules that act in combination to answer the question at hand -- is this a header? and it wasn't that hard. you can probably come up with 10-15 right off the top of your head, without even thinking too much. and if you subjected those to empirical testing on lots of e-texts, as i have over the course of the last 2-3 years, you would probably discover the rest of my 30 items. and then you too would be saying, "it's not impossible, folks, and in fact, it's not even all that difficult." there's no magic here. just hard work... -bowerbird

1 0