
well, obviously, in prototyping a system, you use just part of the data, not all of it. especially on something like this. :+) i subset the content so i can _examine_ it, molding it into shape manually if necessary. in the process of that, i gain an understanding of what needs to be done, so i can program it, and i develop the first pass at those routines... for instance, as i said, one of the first tasks is whipping the catalog into the shape i want it... that job has already taken me a number of hours, and it's not done yet. the catalog was quite a mess -- and hey, it's just titles and author-names! -- so all told, it'll probably take me some 20 hours, and maybe 30, just for this 1:5 subset... you can see the current state of my clean-up work here:
there's still work that needs to be done on subtitles, and on the "mirror" titles (which were a total disaster), but other than that, this data is now very consistent... by the time i'm done with this, i'll have good routines to clean it up automatically, to the extent it's possible. so i expect the next 1/5 of the catalog to be cleaned in half the time -- 10-15 hours. during each phase, i'll pick up more information on how to automate it. so the next 1/5 of the catalog will take half the time, about 6-8 hours. and the next 1/5 will take half that, about 3-4 hours. and the last 1/5 will take 1-2 hours. and by then, i'll have some very well-polished routines. so if i decided to do the whole job over, from scratch, for maximal consistency, it'd take about 4-10 hours... this time-savings, via automation, is what you want. that's why you just do a subset of the data in a model. -bowerbird
participants (1)
-
Bowerbird@aol.com