Public beta of new website at https://dev.gutenberg.org

I tried several items in the Quick Search. I was occasionally surprised by the search results as the books initially appeared to be unrelated but the search terms were always listed somewhere in the "Bibliographic Record". The advanced search is slightly puzzling. Subject: As expected, this appears to search the subject fields of the Bibliographic Record. For example, "Standard Oil" appeared in the subject lines of all four books. One of the books (17090) includes the contents in the Title field (see Screenshot_2020-01-17 Project Gutenberg.png). [After mentioning this I've noticed that the same issue exists with many of the books.] Title: Oddly book 17090 was included in the title results despite neither word actually appears in the title. Probably due to the title + contents conflation previously mentioned. And the third book (54399) was unexpected but the words did appear in the title of the book, just not in that order. So it immediately occurred to me to search for "Standard Oil" with the quotes. "No record found. Please retry." I expected to see 60692 in the results. But the Help screen does say avoid punctuation characters so it probably works as designed. I noticed that the tab results always says "Search on Titles" (see Capture.png) even though the selected search did not include a search item in the Title field. It could be difficult to decide which should appear so I suggest use the first field from the top. For example, if an Author search term is included then always have the title say "Search on Authors". Cheers, Rick

Thanks for this, Rick. We have had some reports of search anomalies over the years, and yours is thorough and useful. The search functionality is provided directly by PostgreSQL, with a little parsing etc. by the autocat3 program mentioned in the website design page. This isn't a focus of our current redesign (i.e., https://dev.gutenberg.org) -- it was not changed at all, other than updates to the search form layout. But I am adding notes for our future redesign work to include looking at search functionality. My current inclination is to move away from PostgreSQL for search. It is rather opaque in how it handles ordering results. It has proven to have some real issues with stemming, including for languages other than English. Instead, we will take a look at Lucene, and perhaps a few other technologies. Lucene can handle fielded search. But there are lots of details: the catalog changes all the time, as new books are added, and catalog edits are applied. I would value hearing ideas and experiences about this, or recommendations of how to approach it. Best, Greg On Fri, Jan 17, 2020 at 01:23:08PM -0800, Rick Tonsing wrote:
I tried several items in the Quick Search. I was occasionally surprised by the search results as the books initially appeared to be unrelated but the search terms were always listed somewhere in the "Bibliographic Record".
The advanced search is slightly puzzling.
Subject: As expected, this appears to search the subject fields of the Bibliographic Record. For example, "Standard Oil" appeared in the subject lines of all four books. One of the books (17090) includes the contents in the Title field (see Screenshot_2020-01-17 Project Gutenberg.png). [After mentioning this I've noticed that the same issue exists with many of the books.]
Title: Oddly book 17090 was included in the title results despite neither word actually appears in the title. Probably due to the title + contents conflation previously mentioned. And the third book (54399) was unexpected but the words did appear in the title of the book, just not in that order.
So it immediately occurred to me to search for "Standard Oil" with the quotes. "No record found. Please retry." I expected to see 60692 in the results. But the Help screen does say avoid punctuation characters so it probably works as designed.
I noticed that the tab results always says "Search on Titles" (see Capture.png) even though the selected search did not include a search item in the Title field. It could be difficult to decide which should appear so I suggest use the first field from the top. For example, if an Author search term is included then always have the title say "Search on Authors".
Cheers,
Rick
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org https://lists.pglaf.org/mailman/listinfo/gutvol-d Unsubscribe: https://lists.pglaf.org/mailman/options/gutvol-d
participants (2)
-
Greg Newby
-
Rick Tonsing