don wants me to "slow down"

don said:
Slow down
let me come back to that...
read the *second* sentence.
i did read it. i wasn't impressed.
It's pure html in the mysql database.
precisely. that's my objection. and i stated it, as such, quite clearly, when i said:
so on top of the obfuscation of the text by the .html, it's now buried inside of a mysql database, meaning that anyone who wants to understand the system well needs to acquire an additional hairy set of experience.
in other words, you take clean text and gunk it up with .html, and then you also bury that gunked-up text in a database. and you want us to think that is a good system? that's like taking an art-book with nice pictures and drawing mustaches and beards on all the pretty women (that would be the .html coding), then putting the book into a bag (the database), and then expecting us to "appreciate" the artistry. um, no thanks. just leave the thing unmolested.
And if it's "impure", then blame me, because it's precisely what I imported from the DP project - for better or worse. Same for the CSS.
well, one of us is confused, that's for sure... _i_ thought we were talking about a system that could be used by an entity like d.p. to do book-digitization, with a suggestion wordpress would do the .html/.css. in other words, we put in .text, and get out .html/.css. but now _you_ seem to be discussing your own system, where you "imported" the .html (and the .css) from d.p. you seem to have already _started_ with the .html/.css. so one of us is confused about the topic of discussion. i don't really care what you use for your own projects. whatever makes you happy is absolutely fine with me. but if you're going to propose a system for volunteers, i'd say it must take their skill-sets into consideration, and attempt to make things as simple as they can be. and for book-digitization, text-files are all you need, a fact which i have demonstrated time and time again. text-files have the advantage of magical simplicity... the weird thing is that most geeks know this very well, and practice it near-religiously in their own workflows. a very big part of the unix philosophy is "piping" text... but all of a sudden, in dealing with _books_, which are (and always have been) predominately text (with some pictures thrown in), all of a sudden we need _markup_. what's up with that? especially since, in their own _manuals_, geeks such as the python community use -- ta-da! -- light-markup... i guess angle-bracket-tags are only for the suckers who are too stupid to realize that they don't really need 'em... *** don said:
Slow down
um, again, no thanks, don. i have a much better suggestion -- flip your script. let's instead have _you_ "speed up" a bit -- a lot! -- along with the rest of this listserve, if it's possible... you've become accustomed to the glacial pace of distributed proofreaders and project gutenberg and this listserve, where the topics of conversation haven't changed all that much in a decade when -- all around us -- e-books have finally "caught on". 8 years ago, this list was hung up on .xml, acting as if that was "the important thing" for discussion... history proved that was one big stinking dead-end. and now you're back on .html as "the big thing", but that's a dead-end too. do you really believe that in 2025, people will still be fussing with angle-brackets? get off the merry-go-round and shoot for the moon. do you want to know the most exciting development in e-books in recent months? it's amazon's "x-ray".
Amazon invented X-Ray, a new feature that lets customers explore the "bones of the book." With a single tap, readers can see all the passages across a book that mention ideas, fictional characters, historical figures, places or topics that interest them, as well as more detailed descriptions from Wikipedia and Shelfari, Amazon's community-powered encyclopedia for book lovers. Amazon built X-Ray using its expertise in language processing and machine learning, access to significant storage and computing resources with Amazon S3 and EC2, and a deep library of book and character information. The vision is to have every important phrase in every book.
if amazon pulls off "x-ray", it could be phenomenal. this is the kind of thing that p.g. could've "invented", since it once had the biggest public-domain corpus. but you guys here were caught up in _file-formats_, among the most tedious and irrelevant e-book trivia. so while you've focused on markup, other entities are doing stuff that's far more interesting and important. it's as if you were at the grand canyon, but instead of paying any attention to the marvelous natural features, you're all still at the tram arguing about what kind of upholstery will be best to use for the seats in the tram. and when i roll my eyes, you think i'm picking a fight. you're missing out on the real action, and you don't have the slightest clue that that's what's happening. so no, don, i _ain't_ gonna "slow down", because i'm _tired_ of going nowhere and nowhere and nowhere. and i firmly believe that you, don, should also decide not to let your good imagination be held back by the foot-draggers over at d.p. you're smarter than them. so demonstrate it, don, by letting your creativity soar. and believe me, a suggestion to "use wordpress" does _not_ accomplish that objective; it doesn't come close. -bowerbird

Wordpress takes in whatever you put into the edit screen. It could be z-m-l for all wordpress cares. Then it's up to you to provide software to produce what you want it to display in a web page or other output format. But if you retrieve it for purposes of editing again, it's still whatever you put in. Maybe z-m-l. I ;put in html because that's what dp/pg provides that looks halfway decent in its native form and includes pictures. You know as well as I that I'd be much happier if it were otherwise. But I'm dealing with what's available in the way of publishable text (whatever format) and systems capable of supporting what DP does, without making preconditions about what the text format or the data storage technique might be. So far I haven't found much in the way of lightweight marked up text to work with. So as far as a system which includes the ability to store work, share it, retrieve it, edit it, put it back, and assemble it into various output formats with more or less difficulty, then wp is on my list. It doesn't require html; it accepts html. Probably more people use wp than any other system I can think of for handling etexts other than email, and there isn't much in the ay of ebook fodder in email format that I've seen so far. It's done a pretty good job so far keeping out the spammers, bots, and hackers so far, which is something I'd prefer not to spend my time on. Apparently it's not on your list because it stores text in a database. For me it's unpleasant, but I'll work with it. It's implementation choices are better or worse than others, including how it stores text, what the default editors are, and what the standard input and output formats are. Suggest some other platforms and maybe I'll add them to my list. The only ones that come to mind right now are DP, wordpress (and maybe other blog engines).and wiki software, inculding TIA and its kin. What do you recommend for a platform?

Slow down a bit BB, You are misunderstanding something here. I leave out the html part because that is a different matter which you can have out with don! It is the database part you do not understand. The user need not know anything of the database to extract, use, and save files, in what ever format! You see the database is first and firmly just a filing system for the files! Just like the OS is! Whether you believe me or not there were OSs (please do not which, it has be a long time) that uses a database of organizing the files and their usage. A file system is in a practical sense, is nothing else, but a database. That is when you say "open XYZ" the file system does to a file/table finds the file name XYZ, finds where it is store on the storage device, goes there and gets it(o.k just a data structure) and gives it to the the program that is suppose to open it. Which, by the way is also, looked up in some kind of table/file, depending on you OS. So you can see you are surrounded by databases. Agreed, they may not be called databases, but it is the same principle! So, if don wants to use a database (system) as a file(filing) system what is wrong with that. Besides, do you KNOW the intrinsics of the OSs or file systems of all the servers that you use. Do you send them Unix, Windows, Apple, etc commands to access the files you need. No!, you use some program, script that goes and get them for you. Or, does it matter to YOU if the web server you are accessing use a content management system (by the way they are database based). The simple answer is NO!. Why? Because you simply do not see that it is there! You browser does not care either! You do need to know anything about it at all. But, the people administering the site know it is there and glad they have it. Basically, Don will have a front end to his database, that will give you all you need and it will be just as easy (probably easier) than using your script based system. On the other side I would have to learn how to use your scripts and how to find them, how to reprogram them to work for a particular book or text! I hope you can see that it does not matter that the details of how a file is saved is obscured. What is important is that the use of the file is transparent, just the your OS obscures the details of how it work, yet you have a simple interface to your files. To quote Steve Jobs: "It just works!" regards Keith. Am 12.11.2011 um 19:46 schrieb Bowerbird@aol.com:
don said:
Slow down
let me come back to that...
read the *second* sentence.
i did read it. i wasn't impressed.
It's pure html in the mysql database.
precisely. that's my objection.
and i stated it, as such, quite clearly, when i said:
so on top of the obfuscation of the text by the .html, it's now buried inside of a mysql database, meaning that anyone who wants to understand the system well needs to acquire an additional hairy set of experience.
in other words, you take clean text and gunk it up with .html, and then you also bury that gunked-up text in a database.
and you want us to think that is a good system?

On 11/12/2011 11:46 AM, Bowerbird@aol.com wrote:
well, one of us is confused, that's for sure...
Umm, yes, that would be you.
_i_ thought we were talking about a system that could be used by an entity like d.p. to do book-digitization,
No, that's a different thread. We all know that the antiquated AOL software you use has some difficultly tracking discussion threads (if it can do it at all), so your confusion is somewhat understandable. I'm not going back and review all the messages in this thread but let me see if I can clarify things. This current (sub-)thread began when I posted a message explaining that I choose not to try and develop an HTML editor for ePubEditor because user preferences of correct editors tend to vary widely and border on religious convictions. I have no interest in getting into the middle of a religious war. Mr. Adcock and Mr. Kretz both promptly illustrated my point by suggesting different alternatives (yet other alternatives have since surfaced); Mr. Kretz's suggestion was to use the WordPress.blog engine (I have no idea how I could interface to /that/ editor from inside ePubEditor, although I suspect he was recommending WordPress as a stand-alone editor, not integrated into ePubEditor). Mr. Kretz has gone on to try and explain the functioning of WordPress in light of your apparent lack of comprehension, but at no point in this conversation did Mr. Kretz suggest that WordPress should be adopted by Distributed Proofreaders as part of it's work-flow or standard tools, nor, for that matter, has anyone else. In fact, DP has not been part of this conversation until you brought it up just now. In my original post I also noted that the results of the Internet Archive's "fromabbyy.php" script was more valuable than I had originally thought. Mr. Buie responded with a link to a project by Open Library's Edward Betts that possibly (although not necessarily) used those same results to match a line of HTML with an image of that same line of text in the original scan. Mr. Traverso is the one that started a second (sub-)thread suggesting that Mr. Betts' process might be useful to Distributed Proofreader's proofreading process, and Mr. Perathoner and Mr. Adcock continued the conversation on variations of /that/ theme. You are apparently conflating the two discussions, attributing to Mr. Kretz suggestions that not only he has never made in this conversation, but that /no one/ has made.
with a suggestion wordpress would do the .html/.css. in other words, we put in .text, and get out .html/.css.
Again, until now this discussion has never suggested anything to do with the conversion of Just Bare text to HTML text. I understand that this is the topic you /want/ us to be discussing, but it's not the topic we /have been/ discussing. Now, I have no particular objection to discussion threads evolving to take up related issues; that's part of the useful dynamic of discussion lists. So unlike some I have seen on this list I'm not going to say "this is my thread, if you want to talk about enhancing Just Bare text by using some heuristics go start your own thread." However, I do think it is extrememly rude to criticize people simply for talking about the things that interest them instead of the things that interest /you/.
participants (4)
-
Bowerbird@aol.com
-
don kretz
-
Keith J. Schultz
-
Lee Passey