
On 18-Feb-2010 21:21, Bowerbird@aol.com wrote:
it's usually the case that making those checks and fixes useful in the general case, against any random book, is a more difficult matter.
That's kind of my experience, I guess. Several fixes will suggest themselves, in the context of a given specific text. The next one might need different fixes. But that doesn't mean a long list of fixups might be tried when there's no cost to just adding tests/fixes to the list.
for me, the interface is prime. if you're looking for tools that work on a command-line, in a text-in-text-out way, i'm the wrong tree for you to be barking up, that's for sure.
I see.
i'll send you both.
Better stick to Windoze, if it's a GUI.
ok, that was very useful... my tool assumes that the page-scans are in the same folder as the app, which is easy enough to satisfy.
the tool also assumes that your text is all in one file, and that the page-boundary is of a certain type. i'd assume that your vi skills will enable you to satisfy this assumption in a fairly simple manner.
other than that, i'd say you'll be good to go.
Text in one file -- check. I favour marking page boundaries with "===00123" these days, but a global search/replace can fix that.
i did a series here a couple years back where i collected a list of checks that was necessary for the book i tested, and somebody turned that list into a set of reg-ex tests.
you can find that set on the download page for don's app:
Yes. Looking at that. I am not 100% sure I want to mess with Twister exactly, but the list of regular expressions looks interesting. I'm picturing building a perl script that applies all of these fixes, then creates a patch set based on the the differences it has introduced. I could then edit the patch set as a file, nuking changes that are wrong, and finally apply the patches for the changes I like.
I rely a lot on jeebies and gutcheck.
so, when you get a report from them on the possible errors, you enter vi and use search to locate each one of the errors?
Kind of. Jeebies and gutcheck reference specific line numbers. So I go through the output of these bottom up. For each hit I go to the specified line number and see what's up, fix if needed and then move to the previous hit. I work bottom to top so that changes I make don't invalidate the line numbers in the gutcheck output as I go. I find it takes a good couple of passes before I am satisfied I have all the genuine hits covered. Invariably the WW finds things I've missed anyhow.
sounds like you really want to use that reg-ex list that was based on the month-long series that i did.
Yeah. Got those. Like I say -- I will turn it into a perl script and see where that takes me.
i'll send mine to you too, but his is based on reg-ex checks...
Would be great. Thanks.
http://www.gutenberg.org/dirs/3/1/2/1/31212/31212-8.txt and just tell me what you find. I have no doubt there is lots
if the scans are online too, or can be, i'll certainly take a look at it...
Lots of choices there. http://www.canadiana.org/ECO/ItemRecord/48293?id=16c79d4f15394e51 http://www.archive.org/details/advocateanovel00heavgoog http://books.google.com/books?id=ot4OAAAAYAAJ&oe=UTF-8 There are no page numbers in the Gutenberg text though. See you, ============================================================ Gardner Buchanan <gbuchana@teksavvy.com> Ottawa, ON FreeBSD: Where you want to go. Today.