Reporting errors in PG files (was Dim view of P3ers)

Jim Adcock wrote:
just as in PG-land the lack of standards are causing texts to be distributed to users frequently missing or duplicating letters and words and in some cases whole paragraphs.
Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org Error reports should be as specific as possible. Mention the etext number, the line number(s), the line(s) of text in question, and the proposed correction(s) to each. If there are many errors, feel free to download and correct the existing files, and send them to the above address. (Don't re-wrap; don't touch the PG header or footer.) If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it. When the new fileset is submitted through the normal process, mention the text number that it's an update/correction/replacement for. The original producer's credit will be added to yours, the original etext will be archived, and the new version posted (under the original etext number). Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores". Al

It seems to me that error identification, reporting, verification and repair would be a lot easier if PG provided easily-accessible on-line access to the page images, and a form to provide the required information, and least for point cases. Then the reporting person could just find the page, check the image you're going to use for verification, and narrow things down for processing. On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) <ajhaines@shaw.ca> wrote:
Jim Adcock wrote:
just as in PG-land the lack of standards are causing
texts to be distributed to users frequently missing or duplicating letters and words and in some cases whole paragraphs.
Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org
Error reports should be as specific as possible. Mention the etext number, the line number(s), the line(s) of text in question, and the proposed correction(s) to each. If there are many errors, feel free to download and correct the existing files, and send them to the above address. (Don't re-wrap; don't touch the PG header or footer.)
If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it. When the new fileset is submitted through the normal process, mention the text number that it's an update/correction/replacement for. The original producer's credit will be added to yours, the original etext will be archived, and the new version posted (under the original etext number).
Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores".
Al
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Sun, Apr 18, 2010 at 10:07:35AM -0700, don kretz wrote:
It seems to me that error identification, reporting, verification and repair would be a lot easier if PG provided easily-accessible on-line access to the page images,
We post 'em when we get 'em. There is guidance for the file naming convention on images. Mostly we do not get page images. In the case of DP, a few people have provided page images after the eBooks were posted. But this does not seem to be a part of the regular DP processing chain.
and a form to provide the required information, and least for point cases.
A form... maybe. I am not sure this would make things any easier to fix (for the fixers -- there are only three people who regularly apply fixes -- Al is one of them, so his views carry more weight than mine!). But it might make it easier for people to report errata.
Then the reporting person could just find the page, check the image you're going to use for verification, and narrow things down for processing.
Sure. Only some errors require checking page images, but it would be nice to have them. It would be nice to have them for numerous purposes to which our readers might put them. -- Greg
On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) <ajhaines@shaw.ca> wrote:
Jim Adcock wrote:
just as in PG-land the lack of standards are causing
texts to be distributed to users frequently missing or duplicating letters and words and in some cases whole paragraphs.
Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org
Error reports should be as specific as possible. Mention the etext number, the line number(s), the line(s) of text in question, and the proposed correction(s) to each. If there are many errors, feel free to download and correct the existing files, and send them to the above address. (Don't re-wrap; don't touch the PG header or footer.)
If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it. When the new fileset is submitted through the normal process, mention the text number that it's an update/correction/replacement for. The original producer's credit will be added to yours, the original etext will be archived, and the new version posted (under the original etext number).
Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores".
Al
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Greg said:
A form... maybe. I am not sure this would make things any easier to fix (for the fixers -- there are only three people who regularly apply fixes -- Al is one of them, so his views carry more weight than mine!). But it might make it easier for people to report errata.
A webform would (hopefully) make reporting more consistent, possibly with such mandatory fields as etext number, title, and author. (Yes, the occasional report arrives with none of them, only a pre-10K filename, which has to be tracked down in the gutindex files to find the etext number.) However, the current volume of errata reports (several/week, if that), probably doesn't make the work of creating such a form worth while. And, agreed--it wouldn't help the actual correction process. ----- Original Message ----- From: "Greg Newby" <gbnewby@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Sunday, April 18, 2010 10:29 AM Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)
On Sun, Apr 18, 2010 at 10:07:35AM -0700, don kretz wrote:
It seems to me that error identification, reporting, verification and repair would be a lot easier if PG provided easily-accessible on-line access to the page images,
We post 'em when we get 'em. There is guidance for the file naming convention on images.
Mostly we do not get page images. In the case of DP, a few people have provided page images after the eBooks were posted. But this does not seem to be a part of the regular DP processing chain.
and a form to provide the required information, and least for point cases.
A form... maybe. I am not sure this would make things any easier to fix (for the fixers -- there are only three people who regularly apply fixes -- Al is one of them, so his views carry more weight than mine!). But it might make it easier for people to report errata.
Then the reporting person could just find the page, check the image you're going to use for verification, and narrow things down for processing.
Sure. Only some errors require checking page images, but it would be nice to have them. It would be nice to have them for numerous purposes to which our readers might put them. -- Greg
On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) <ajhaines@shaw.ca> wrote:
Jim Adcock wrote:
just as in PG-land the lack of standards are causing
texts to be distributed to users frequently missing or duplicating letters and words and in some cases whole paragraphs.
Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org
Error reports should be as specific as possible. Mention the etext number, the line number(s), the line(s) of text in question, and the proposed correction(s) to each. If there are many errors, feel free to download and correct the existing files, and send them to the above address. (Don't re-wrap; don't touch the PG header or footer.)
If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it. When the new fileset is submitted through the normal process, mention the text number that it's an update/correction/replacement for. The original producer's credit will be added to yours, the original etext will be archived, and the new version posted (under the original etext number).
Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores".
Al
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

The only page scans PG has are those that may have been submitted by the preparer. (Joshua Hutchinson has submitted many scansets of DP productions.) Some DP submitters provide page scans linked to page numbers in the HTML version, but this is rare. (I don't think I've ever seen a scanset from an independent producer.) The Whitewashers, a.k.a. the Errata Team, simply aren't equipped to find, download, and process pagescans for the submissions they handle. Any questions/policy concerning making pagescans mandatory, e.g. the cost/amount of the increased drive space needed, I leave to Greg/Michael. An errata submission webform would be useful. (Some emailed errata reports are sadly lacking in detail.) Maybe sometime when Greg has a student intern? ----- Original Message ----- From: don kretz To: Project Gutenberg Volunteer Discussion Sent: Sunday, April 18, 2010 10:07 AM Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) It seems to me that error identification, reporting, verification and repair would be a lot easier if PG provided easily-accessible on-line access to the page images, and a form to provide the required information, and least for point cases. Then the reporting person could just find the page, check the image you're going to use for verification, and narrow things down for processing. On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) <ajhaines@shaw.ca> wrote: Jim Adcock wrote: just as in PG-land the lack of standards are causing texts to be distributed to users frequently missing or duplicating letters and words and in some cases whole paragraphs. Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org Error reports should be as specific as possible. Mention the etext number, the line number(s), the line(s) of text in question, and the proposed correction(s) to each. If there are many errors, feel free to download and correct the existing files, and send them to the above address. (Don't re-wrap; don't touch the PG header or footer.) If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it. When the new fileset is submitted through the normal process, mention the text number that it's an update/correction/replacement for. The original producer's credit will be added to yours, the original etext will be archived, and the new version posted (under the original etext number). Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores". Al _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org
Not sure how that's going to help when the problems are pretty systematic?
If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it.
I'm doing one such right now but I'm apprehensive of the flame-fest that will ensue if one namely me actually tries to redo an old text. But, I guess I'm willing to throw my body on the fire and see what happens *next*...
Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores".
The more general problem is that texts continue to be created that are generally not readable with fidelity by many users on many different machines. Typical problem, as others have mentioned, lies in the choice HTML coding techniques used, and a preference for visual cuteness on one or another HTML machine rather than on fidelity on a wide variety of HTML and HTML derived machines -- including issues of "accessibility."
participants (4)
-
Al Haines (shaw)
-
don kretz
-
Greg Newby
-
Jim Adcock