
ok, first i wasn't gonna even write this. but then i decided that i had to write it. and then i decided that i wouldn't send it. but now i've decided that i have to send it. but oh lord, i am so tired of the merry-go-round, and re-assert my resolution to get off of the thing. i mean, i do love to say "i told you so", a lot, but after a while, even _that_ gets old. but you guys need to be reminded of the facts of this matter... so i must... *** al said:
This is the convention used by Joshua Hutchinson, when he provided page scans from DP:
oh please. these lame attempts to rewrite history by reporting it selectively are a demand for a slapdown. that file-naming convention is one that _i_ developed, which josh then bastardized because he didn't grok it. and let's not forget that i had to rail for _years_ to install this common-sense file-naming convention, fighting _against_ josh and david "donovan" garcia and _many_ others over at distributed proofreaders, where some idiots _still_ use other naming schemes since d.p. doesn't demand consistent common-sense. (just in case you thought that flaw was unique to p.g.) the archives will prove that this fight lasted for years. ** so, what was the fight about? the basic notion is that the filename for a scan should include the pagenumber of the page the scan captured. d.p. doesn't require this, so there was some question about whether p.g. should also be lax in that regard, and whether d.p. should be "encouraged" to change... as i recently showed, d.p. still allows bad file-naming. you might remember i pointed to scans from the first 8 pages of "when it was dark", which is now at d.p.:
http://zenmagiclove.com/pgstupidity/012.png http://zenmagiclove.com/pgstupidity/013.png http://zenmagiclove.com/pgstupidity/014.png http://zenmagiclove.com/pgstupidity/015.png http://zenmagiclove.com/pgstupidity/016.png http://zenmagiclove.com/pgstupidity/017.png http://zenmagiclove.com/pgstupidity/018.png http://zenmagiclove.com/pgstupidity/019.png
so the scan for page 1 is named "012.png". and the scan for page 2 is name "013.png". and the scan for page 3 is name "014.png". and so on. it's smarter to name the scan for page 1 as "001.png". page 2 will be "002.png", and "003.png" is page 3, etc. and yes, people fought against me for _years_ on this. *** that's right -- i had to _fight_, for _years_and_years_, to get common-sense to prevail. i consider this to be a _victory_, since -- in this one case, anyway -- it did! there are too many others where a fight still goes on... but anyway, so we're in the middle of a years-long fight about this file-naming convention, or the _lack_ of it... over time, i've written up lots of careful documentation for the way that the convention _should_ be construed. and then one day, josh rewrote my careful system, adding a few twists of his own (which screwed it up), and everybody shifted to support "josh's proposal". as if he had invented it. yeah, right. ok, boys, do it your way. you assholes. *** and just to give a flavor of some of the other history around this issue, the page-scan-submission process was the genesis for don kretz writing his "twisted" app. since d.p. refused to require its content providers use smart file-naming in the beginning, there were lots of scan-sets around (then, and now) with bad filenames, so don wrote this tool for people uploading scan-sets. it was originally called "twister", and its purpose was to aid in the file-renaming process. after coding that, don realized it was relatively simple to extend the app, to the point he had turned it into a very good start on a postprocessing program. unfortunately, the "powers" over at d.p. turned their noses up at it, so it never got the adoption that it should have, and d.p. was the loser. and the postprocessing queue lingers to this very day... *** as for the issue of "omnibus" editions -- ones which combine input from several p-books -- i'll say this... in 1992, omnibus editions made sense. for instance, "alice in wonderland" has a few places where it says "later editions added this" and it gives the addition... in that place and time, that was the perfect solution. it would've been stupid to create two different texts, which differed by only a few lines, absolutely stupid. 15 years ago, 1997, omnibus editions still made sense. scans were rare, and even when available, o.c.r. stunk. 10 years ago, 2002, they still made sense. that's why i staunchly defended you, greg, and michael hart, and p.g. in general, when some people (like jon noring and lee passey) were attacking you relentlessly because you weren't being "faithful" to this or that canonical version. o.c.r. had gotten better, and scans were more prevalent, but bandwidth still presented a large logistical obstacle. 5 years ago, 2007, google books and archive.org were still trying to establish a solid footprint on the ground, so omnibus editions could still argue they made sense. but we were starting to swim in scan-sets, and even the longstanding bandwidth logjam was promising to clear. so, today -- 2012 -- there is no argument that can be made in support of an omnibus edition. not any at all. "our policy is what it always was -- it hasn't changed" is hollow once we grant that the world _has_ changed. i've always said that, if the text was indeed based on a specific edition, and p.g. had scans for that edition, you should mount them. (again, just common sense!) indeed, it was my support for _that_ position which led jon noring to say "see, even bowerbird agrees with me" at the 10,000 celebration in san francisco back in 2003. (of course, in typical noring style, jon refused to see that i did _not_ agree with his overall position, which was that p.g. should cease making omnibus editions entirely, and base every book on one specific scan-set, and mount it. i only supported it _if_ a text was based on one p-book.) now, for many books, we have scan-sets for all versions. so project gutenberg needs to point to _one_ scan-set that's "the one" for each and every particular p.g. e-text. the alternative -- that a p.g. e-text is "another" version, but one which has no relationship to any printed book -- dooms the p.g. text to being neglected, then ostracized, because it pretends to document the past, but it cannot point to anything tangible as a proof of its provenance, while the scan-sets sitting online are self-documenting. *** carlo said:
I have two books that are really weird:
weird is not a problem.
one in which the page numbers are out of order in the original
the numbers which are actually printed on the pages can be printed in error, just like anything else that's printed. so, were the pages _bound_ in the incorrect order? if so, you should fix that error by rearranging them. but if the content appears in the proper sequence, then it is the _pagenumbers_ which were incorrect, and you would correct that error by changing them. you might also leave a note, to explain the situation, so users aren't confused by the apparent discrepancy.
and another in which a signature repeats the numbers of a previous one (after 1...208, we have 197a....208a, then 209...285).
this one is clearly an error. you should renumber the second sequence. in this case, since the extra pages _followed_ page 208, rename them this way:
197a -> 208a 198a -> 208a 199a -> 208c 200a -> 208d 201a -> 208e 202a -> 208f 203a -> 208g 204a -> 208h 205a -> 208i 206a -> 208j 207a -> 208k 208a -> 208l
as i will discuss shortly, one purpose of the filenames is to represent the binding order, via a sort of them... and that is the principle which is behind this solution. *** al said:
This is the convention used by Joshua Hutchinson, when he provided page scans from DP:
yeah, right. ok, so let's take a look, shall we?
Basic format: The prefix for the cover pages is: "c". The prefix for the roman pages is: "f". The prefix for the arabic pages is: "p".
so far, so good. but, for the record, i had to fight like a dog to get even something as basic as _this_ accepted. really! even after people accepted the general idea that the pagenumber should be reflected in the name, some of 'em wanted to name the cover as "cover", and "back-cover", and "spine", and what have you. they couldn't understand something as simple as _a_need_for_names_to_reflect_the_binding_order_. some of them wanted to prefix the front-matter with "r", to represent the "roman" numerals there. but of course then those files would sort _after_ the "p" prefix that everyone agreed on for "page". but these idiots couldn't grok that simple notion. ***
For blank pages there should be no file and the page number should be skipped.
wrong. wrong wrong wrong. you need to include a file for _every_ page in the book, or else you will ruin the verso/recto left/right nature of the spread. just goes to show how josh failed to grok the basics. and "skipping" is just a big invitation for disaster. because then when you lose a file for any reason, people will just assume that it was a blank page... if a book does _skip_ pagenumbers, you should inject images into that range that inform users "the book skipped pagenumbers at this point", again taking care to preserve your verso/recto.
Optionally an image saying: "This page is blank in the original." may be inserted.
well, this isn't a "bad" thing to do. but neither is it a _necessary_ one. a blank scan speaks for itself... indeed, if you do things right, the space used by a blank-page scan should make it obvious that that specific page was indeed blank in the book. so a simple look at the size of your scans would be enough to tell an app which pages are blank.
Example of file naming: front cover c0001.png back cover c0002.png spine c0003.png
again, more evidence of josh's utter stupidity. the back cover should be given a name that will sort it _after_ regular "p"-prefix pages. (and also after all of the back-matter pages.) c002 must be used for the inside front cover. yes, folks, if you're going to scan the cover, you must scan the verso of the cover as well. indeed, for _any_ thing you scan, you _must_ scan the recto side first, and then the verso, because we need to be able to show spreads. which means that the only logical name for a scan of the spine is the last one in the bunch. (if, as usual, the inside front cover is blank, you should substitute in a table-of-contents. in general, you can substitute in _anything_ that'll be useful to people, for a blank page.)
i title page f0001.png ii title verso f0002.png iii dedication f0003.png iv is blank v contents f0005.png
idiots. oh, and the title-page is usually _not_ roman i. if you really have a roman i (for 1), and the title-page comes before it, then you should use the "c" prefix for the title and its verso.
page 1 p0001.png page 2 p0002.png image on page 2 p0002-image1.png image on page 2 p0002-image2.png page 3 p0003.png
wrong wrong wrong wrong wrong wrong wrong. the image-files for images on a page must be kept separate from the page-scans themselves. not necessarily in another folder -- since that spoils the good idea of all files in one folder) -- but _definitely_ with a different naming scheme, one which sorts those names to a different place, out of the sorting for recto-verso binding-order. *** al doesn't mention another shortcoming of the system that josh "borrowed" (so badly) from me. for any unnumbered "tip-in" illustration pages, my systems had the filenames append a letter... so, for a tip-in between, say, pages 198 and 199:
196.png 197.png 198.png 198a.png 198b.png 199.png 200.png 201.png
you'll notice this is what i suggested to carlo above. pretty straightforward, eh? hard to screw it up, yes? well, no, not for josh, apparently. he did it like so:
196.png 197.png 198.png 198-a.png 198-b.png 199.png 200.png 201.png
looks pretty close, don't you think? well, you're wrong. because if you sort those names, you'll find that they don't sort in that order, not on most systems anyway. try it, and you will see that they sort like this instead:
196.png 197.png 198-a.png 198-b.png 198.png 199.png 200.png 201.png
with that sort, we've destroyed the p197-p198 spread. and, of course, we've rearranged the content's order... it's funny how not understanding something fully leads an amateur to make a fundamental mistake. *** and a couple other notes about stupidities here, which are not just "after-thoughts", but actually are aimed at the _most_ stupid things about this. first, it's stupid to pad the numbers to _4_ digits. since very, very few books go over 1,000 pages... so padding to 4 digits is unnecessary. it is also unsightly. but _worst_ of all, it's ungainly when people have to _type_ it, when they enter a u.r.l. in the extremely rare cases where pagenumbers go over 1000, you just switch from "p" to "q" as the preface, and bingo, you're back in business. (you might even wanna reserve "q" to mean that.) but the most stupid thing of all is something that you can't even see here, because it's _not_ here... one of the most important rules for file-naming -- indeed, it's probably the _cardinal_ rule! -- is that a filename must be unique to its content. a filename _must_ point unequivocally to one thing. the reverse angle -- that each thing must have one and only one filename -- is a good _goal_, even though there are some worthy exceptions. but there is _no_exception_ to the cardinal rule: a filename must point unwaveringly to one thing. or, to put this in a different way, different content _must_always_ have a filename which reflects that. or, yet another way: _different_stuff_must_never_have_the_same_name._ if you look at p.g. image-files, however, you will discover that it has tons of different files that all have been given the same name -- p0001.png... likewise, you will find a ton of p0002.png, and p0003.png, and p0004.png, and p0005.png... it's stupid. it's ridiculous. it's ridiculously stupid. please don't demonstrate your stupidity by trying to argue that this is acceptable, because "the files are in different folders". that shows you miss the point. besides, it just so happens that the _folders_ have _the_same_name_as_well_, for yet another violation. so to differentiate one p0001.png from another, you need the parent-folder name of the parent-folder... holy batman, talk about abstracting the abstraction! and the whole point is that you need to overcome the possibility of confusion in the event that your files are copied to the wrong folder. or to the _same_ folder... a solution is easy. append the 5-digit pg# to each file. so the files for pg#12345 might be named like this;
12345-f001.png 12345-f002.png 12345-f003.png 12345-p001.png 12345-p002.png 12345-p003.png 12345-p004.png
see how easy that is? take a good close look at it... so, did you take a good close look? really? if you did, you should have spotted a bad _error_... i had "12345-f003.png", but no "12345-f004.png". remember, we have to maintain the page-spreads... back to the point, though. this way if these files were to be accidentally copied into the folder for pg#23456, we'd know immediately, and nothing'd be overwritten. with non-unique filenames, you will have a mess and you won't even know right away that you screwed up... *** like i said, getting the pagenumber put in the filename was a _huge_ victory for me, one that i fought hard for, so i'm glad there was _some_ benefit from all my work. but if y'all think you did it right, you're badly mistaken. *** and finally, a few more things, while i'm at it... some of my antagonists try to have a field day with "oh, that bowerbird, he thinks he's so damn smart". and even some newcomers might be led to agree. well, first off, much of this is just _common_sense_. now, if you don't have enough basic intelligence to recognize _common_sense_ kicking you in the shin, don't try and blame me that you are such a retard... and second, the things which aren't "common sense" are things i learned from my _hard-won_ experience. you think i haven't screwed up, and given non-unique names to different stuff, and then overwritten the new with the old? think again. i've done it. several times. enough times that i _learned_ it is a mistake to break the cardinal rule, which is why it _is_ the cardinal rule. if you guys were smart, you'd learn from my mistakes. when i say "you sure don't wanna be doing it that way," you should be able to hear the pain of my experience. and when you ignore me, i just laugh at you, because i know the pain of _your_ experience _will_ teach you. the other thing is "that bowerbird is such a rude guy". and again, i can see newcomers feeling that way too... well, listen up. i explained _all_ of these things nicely. i was polite the first time, and the second, the third, fourth, fifth, and sixth. i was always calm and careful. i am _still_ calm and careful, to this very day, mind you. but -- after a dozen careful and calm expositions that explain little more than common sense and experience, and which were met with knock-down-drag-out _fights_, where i was insulted and my reputation was maligned -- it's no wonder that i've now developed the attitude that i will call _stupidity_ by the word that best describes it... and that is also why i rub it in and say "i told you so..." so if any of you want to indignantly label that as "rude", then i'll humbly suggest that you can take up the issue with the goddesses of honesty and integrity and truth... -bowerbird
participants (1)
-
Bowerbird@aol.com