Mirroring the firehost? Re: Re: Many solo projects out there in gutvol-d land?

On Sat, 20 Feb 2010, Michael S. Hart wrote:
Does anybody have a serious objection to putting the 8,000, or so, books that were listed earlier as being in limbo, in something like our "PrePrints" section, where we put eBooks that are admittedly not ready for prime time???
Yea, there are people arguing that it's a horrible thing to do. I'm 100% with you on this. Available with a few errors is far more useful than unavailable. And it's not that they aren't actually available now, they are. DP has always had the concatenated text available for download. It's behind a sign on and not indexed by any of the search engines, so if you don't know it's there already you can't find it.
What's the URL? I could set up a nightly mirror... Do they automatically disappear from this area, after they are finally published? -- Greg

On Sun, 21 Feb 2010, Greg Newby wrote:
had the concatenated text available for download. It's behind a sign on and not indexed by any of the search engines, so if you don't know it's there already you can't find it.
What's the URL? I could set up a nightly mirror...
Do they automatically disappear from this area, after they are finally published?
There's not a single place, you have to walk the projects lists using the search function. They do eventually disapear, but the status changes to posted when they are posted to PG. Do you have a sign on at DP? If so try: http://www.pgdp.net/c/tools/project_manager/projectmgr.php?show=search&title=&author=&language[]=&special_day[]=&projectid=&project_manager=&checkedoutby=&pp_er=&ppv_er=&postednum=&state[]=P3.proj_waiting&n_results_per_page=100 That's everything in the P3 waiting queue. If you pick one from that list. (I'm going to grab one of mine.) http://www.pgdp.net/c/project.php?id=projectID4b5e3e5a9b845&detail_level=3 There's a link titled "Download Concatenated Text" with a download button that will download a zip with the text from the last proofing round. The two queues that are most interesting because they are the largest are the P3 waiting and F2 waiting. -- Greg Weeks http://durendal.org:8080/greg/

Here's the html form with the GET variables that comprise the url. <form method='post' action='http://www.pgdp.net/c/tools/project_manager/generate_post_files.php'> <input type='hidden' name='projectid' value='projectID4b7deca7757f8'> <input type='radio' name='round_id' value='[OCR]'>[OCR] <input type='radio' name='round_id' value='P1' >P1 <input type='radio' name='round_id' value='P2' CHECKED>P2 <br>For each page, use:<br> <input type='radio' name='which_text' value='EQ' CHECKED>the text (if any) saved in the selected round; or<br> <input type='radio' name='which_text' value='LE'>the latest text saved in any round up to and including the selected round.<br> (If every page has been saved in the selected round, then the two choices are equivalent.)<br> <input type='hidden' name='include_proofers' value='0' /><input type='hidden' name='save_files' value='0' /><input type='submit' value='Download'> </form> All you need then is a list of the project codes.

"Greg" == Greg Weeks <greg@durendal.org> writes:
Greg> On Sun, 21 Feb 2010, Greg Newby wrote: >>> had the concatenated text available for download. It's behind >>> a sign on and not indexed by any of the search engines, so if >>> you don't know it's there already you can't find it. >> What's the URL? I could set up a nightly mirror... >> >> Do they automatically disappear from this area, after they are >> finally published? Greg> There's not a single place, you have to walk the projects Greg> lists using the search function. They do eventually Greg> disapear, but the status changes to posted when they are Greg> posted to PG. Greg> Do you have a sign on at DP? If so try: Greg> http://www.pgdp.net/c/tools/project_manager/projectmgr.php?show=search&title=&author=&language[]=&special_day[]=&projectid=&project_manager=&checkedoutby=&pp_er=&ppv_er=&postednum=&state[]=P3.proj_waiting&n_results_per_page=100 Greg> That's everything in the P3 waiting queue. If you pick one Greg> from that list. (I'm going to grab one of mine.) Greg> http://www.pgdp.net/c/project.php?id=projectID4b5e3e5a9b845&detail_level=3 Greg> There's a link titled "Download Concatenated Text" with a Greg> download button that will download a zip with the text from Greg> the last proofing round. Greg> The two queues that are most interesting because they are Greg> the largest are the P3 waiting and F2 waiting. Greg> -- Greg Weeks http://durendal.org:8080/greg/ I have scripts that can download concatenated text scripts without manual handling, and without a browser, but are quite tricky, and I am not willing to discuss them in public, but will provide them to Greg Newby (as DP board member) if he wants. Just send me an email. Carlo

The two queues that are most interesting because they are the largest are
the P3 waiting and F2 waiting. Actually, the PP queue is the longest, but roughly 1/3 each fall on the P3, F2, and PP queues. It would be cool if the PP queue could be presented with HTML headers, and with the ----File.... pagination stuff stripped -- since this queue already has much of the HTML markup. The alternative is to strip the HTML markup back out before presenting.

On Tue, 23 Feb 2010, Jim Adcock wrote:
The two queues that are most interesting because they are the largest are the P3 waiting and F2 waiting.
Actually, the PP queue is the longest, but roughly 1/3 each fall on the P3, F2, and PP queues.
It would be cool if the PP queue could be presented with HTML headers, and with the ----File.... pagination stuff stripped -- since this queue already has much of the HTML markup. The alternative is to strip the HTML markup back out before presenting.
I think the text file for all rounds should be processed a bit to strip out the page separators as well as to strip in line markup (i.e. <i> and it's brothers) and proofer notes. [** something] I have no problems with doing the in-PP projects as well, but most of them will fairly quickly get posted. There are exceptions, but most will. -- Greg Weeks http://durendal.org:8080/greg/

If you strip out the page delimiters you'll probably want to do something intelligent with text flow across the break for footnotes etc. On Tue, Feb 23, 2010 at 7:41 AM, Greg Weeks <greg@durendal.org> wrote:
On Tue, 23 Feb 2010, Jim Adcock wrote:
The two queues that are most interesting because they are the largest are
the P3 waiting and F2 waiting.
Actually, the PP queue is the longest, but roughly 1/3 each fall on the P3, F2, and PP queues.
It would be cool if the PP queue could be presented with HTML headers, and with the ----File.... pagination stuff stripped -- since this queue already has much of the HTML markup. The alternative is to strip the HTML markup back out before presenting.
I think the text file for all rounds should be processed a bit to strip out the page separators as well as to strip in line markup (i.e. <i> and it's brothers) and proofer notes. [** something] I have no problems with doing the in-PP projects as well, but most of them will fairly quickly get posted. There are exceptions, but most will.
-- Greg Weeks http://durendal.org:8080/greg/
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Tue, 23 Feb 2010, don kretz wrote:
If you strip out the page delimiters you'll probably want to do something intelligent with text flow across the break for footnotes etc.
If you can automate it great, but if not then don't bother. -- Greg Weeks http://durendal.org:8080/greg/
participants (5)
-
don kretz
-
Greg Newby
-
Greg Weeks
-
Jim Adcock
-
traverso@posso.dm.unipi.it