
I'm close enough to finishing my Betty Lee, Junior project to compare it to BB's posted version at http://z-m-l.com/go/betle/betle.zml. Most of these diffs are from the first 80 pages or so. I'll be posting my version of Betty Lee, Junior somewhere as soon as I run the last few checks. Unfortunately, Project Gutenberg won't take it. Before the Rule 6 freeze, I posted the first two books in this series as PG texts #34605 and #34728. I did not get clearance on Betty Lee Jr. or Betty Lee Sr. in time. I'm open to suggestions as to where to put the final version of this book. BB wrote: it would be nice if roger put out his actual o.c.r. it would also be great if roger put out his .rtf copy. but i'm not sure how interested he is in this stuff. I *am* interested in "this stuff." Not so much this one book, but in the processes. I'm still learning (and re-learning) a lot on this project. So I'll do what I can to accommodate BB's request. My "actual ocr" on this wasn't RTF, which is why I missed so many italics originally. In Betty Lee, Sr., I went from the RTF and retained the markup. But in this project, I didn't use it. Best I can do is go back to the original batch I used with Abbyy and save the text as a RTF, which I've done. I'll make that available to BB. I post these diffs not to compare processes, since mine has had the benefit of a smoothreader. I post it because there are some diffs which perhaps BB would have caught if he improved his pre-smoothie process. Some are regex-catchable. Some are guiguts-catchable or detectable with one of my analysis programs. Many are smoothreader catches. ----- One common error is in scannos, which as BB pointed out, should be caught by a smoothie. Here are some examples, "RF" is me: RF: some of them, and give Ramon's message, but I just can't show BB: some of them, and give Earn on's message, but I just can't show RF: Betty and next to Peggy Pollard, who, it BB: Betty and nest to Peggy Pollard, who, it RF: a thing to work for that being president BB: a tiling to work for that being president RF: the back. Mary Emma could not go with BB: the back. Mary Emma could hot go with RF: problems. From Lucia's manner, she BB: problems. From Lucia's manlier, she RF: of the page and below was a brief resume BB: of the page and below war; a brief resume And my favorite scanno in this text: RF: I'm the crossest girl you ever saw, so far as mere looks BB: I'm the Grossest girl you ever saw, so far as mere looks ----- There spelling discrepancies, which are findable with spellcheck: RF: of those still, quiet stiletto exchanges BB: of those still, quiet stilletto RF: tonsillitis. Betty saw her and overheard BB: tonsilitis. Betty saw her and overheard ----- This scan had lots of stray marks which made it into the OCR text: RF: packed a thin chiffon dress, while BB: packed, a thin chiffon dress, while RF: this, Miss Betty Lee!" BB: this,' Miss Betty Lee!" ----- Hard to find are missed italics: RF: wouldn't do _one thing_. She is sweet BB: wouldn't do one thing. She is sweet RF: other times too, but _always_ then, BB: other times too, but always then, before ----- There are missing quote marks: RF: won't you?" BB: won't you? RF: who sat down. "How is your mother BB: who sat down. How is your mother And there are extra quote marks: RF: little habit of dropping in when BB: little habit of 'dropping in' when ----- Catchable by my analysis program, which does Levenshtein checks. This one is one edit distance away: RF: are the Sevillas and where do they live? BB: are the Savillas and where do they live? ----- Guiguts-catchable errors: RF: sometimes! I can't study! Come over here BB: sometimes! I can't study I Come over RF: who reads the sport page." BB: who reads the! sport page." RF: know." BB: know," (at end of paragraph) ----- There was one error that appears to be a bug in BB's generator, putting spaced double quotes at the end of a line: RF: like my residence here. BB: like my residence here." " There are many of these. ----- That's it for my sampling of the kind of errors left behind by BB's process as far as it went. With the addition of a good smoothreader, many of these diffs would have disappeared from BB's version. Hope this helps. --Roger