Automated book summaries: input requested
The content of this message was lost. It was probably cross-posted to multiple lists and previously handled on another list.
There are a few things to consider. I seem to recall, that there was once years ago, when we did include volunteer-contributed "blurbs" to the catalog for books in our collection. The quality, length and content matter varied a lot. It turns out to be a specialized skill, to write an effective, brief book summary. In the end, I think they all got removed at some point in time. Relating to automated book summaries, I wonder if there is a way to offer them on a trial basis, and measure if there is interest in them from visitors to gutenberg.org. I would be a fan of having automated book summaries presented in a way that it is clear they are not part of the "traditional catalog" information. This ties into another topic I've been thinking about, that we are perhaps overdue for an effort to tweak the design of the PG catalog pages again. When we try to keep adding more little bits and pieces over time, it gets to be a crowded page. I know we tried using a "tabs" system once, but did not keep it very long. I wonder if there is some other way to present a "basic record", and then have optional "Extended" information if a user wants to access it. --Andrew On Thu, 12 Sept 2024 at 10:33, Greg Newby <gbnewby@pglaf.org> wrote:
Your input is requested: I’ve been working with some programmers to build AI-based book summaries.
The intention is for these to be added to the landing pages for books on the Project Gutenberg website. We have iterated on the prompt to the AI (ChatGPT 4.o mini) and, to me, the summaries are pretty good. On landing pages, we’ll clearly label them as AI-generated summaries. In the future, we might replace them with improved summaries.
We are only able to feed the first 12K characters or so to the AI, due to the costs of the AI model (in the future, this will improve). The summaries all have a similar structure: The title & author, some basic background including the period it was written, and a second paragraph that characterizes the start of the book. (Just a few stories in these examples characterize the whole book, when it fits within the 12K character limit).
I’d value general feedback on this approach and the quality of the summaries.
If you have some specific book #s that you know well, and would like to see automated summaries of them, please let me know and we’ll add them to the list.
The summaries are here: https://displaysummaries-johannesseiko.replit.app/
Your input can go in this thread, or by email to me: gbnewby@pglaf.org
Thanks!
~ Greg
Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation www.gutenberg.org A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org
Hi Andrew. Thanks for this. Responses below: On Sun, Sep 22, 2024 at 11:08:30PM -0700, Andrew Sly wrote:
There are a few things to consider. I seem to recall, that there was once years ago, when we did include volunteer-contributed "blurbs" to the catalog for books in our collection. The quality, length and content matter varied a lot. It turns out to be a specialized skill, to write an effective, brief book summary. In the end, I think they all got removed at some point in time.
Agreed. We have fewer than 100 hand-crafted summaries.
Relating to automated book summaries, I wonder if there is a way to offer them on a trial basis, and measure if there is interest in them from visitors to gutenberg.org. I would be a fan of having automated book summaries presented in a way that it is clear they are not part of the "traditional catalog" information.
Summaries for books 1-15000 are online (skipping the "books" that don't have a .txt associated). We're working on the rest. I responded to Joyce's message about testing the other day. I'm in favor, and don't know how to do it. All summaries end in (This is an automatically generated summary.), so we will be able to remove, edit or replace as needed.
This ties into another topic I've been thinking about, that we are perhaps overdue for an effort to tweak the design of the PG catalog pages again. When we try to keep adding more little bits and pieces over time, it gets to be a crowded page. I know we tried using a "tabs" system once, but did not keep it very long. I wonder if there is some other way to present a "basic record", and then have optional "Extended" information if a user wants to access it.
I'm working with a web designer who, I hope, will have ideas about how to better present the landing pages. I think we can make them more similar to an online bookstore's page, to help people decide what to read and what format to view/download. Related, it would be great to improve our searching capabilities, especially for searching within lists of results. That's potentially a pretty big effort. ~ Greg
On Thu, 12 Sept 2024 at 10:33, Greg Newby <gbnewby@pglaf.org> wrote:
Your input is requested: I’ve been working with some programmers to build AI-based book summaries.
The intention is for these to be added to the landing pages for books on the Project Gutenberg website. We have iterated on the prompt to the AI (ChatGPT 4.o mini) and, to me, the summaries are pretty good. On landing pages, we’ll clearly label them as AI-generated summaries. In the future, we might replace them with improved summaries.
We are only able to feed the first 12K characters or so to the AI, due to the costs of the AI model (in the future, this will improve). The summaries all have a similar structure: The title & author, some basic background including the period it was written, and a second paragraph that characterizes the start of the book. (Just a few stories in these examples characterize the whole book, when it fits within the 12K character limit).
I’d value general feedback on this approach and the quality of the summaries.
If you have some specific book #s that you know well, and would like to see automated summaries of them, please let me know and we’ll add them to the list.
The summaries are here: https://displaysummaries-johannesseiko.replit.app/
Your input can go in this thread, or by email to me: gbnewby@pglaf.org
Thanks!
~ Greg
Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation www.gutenberg.org A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org
It seems that when the automated book summaries were loaded, the human-generated summaries in the 520 field were wiped. Eric
On Sep 23, 2024, at 4:12 PM, Greg Newby <gbnewby@pglaf.org> wrote:
Hi Andrew. Thanks for this. Responses below:
On Sun, Sep 22, 2024 at 11:08:30PM -0700, Andrew Sly wrote:
There are a few things to consider. I seem to recall, that there was once years ago, when we did include volunteer-contributed "blurbs" to the catalog for books in our collection. The quality, length and content matter varied a lot. It turns out to be a specialized skill, to write an effective, brief book summary. In the end, I think they all got removed at some point in time.
Agreed. We have fewer than 100 hand-crafted summaries.
Relating to automated book summaries, I wonder if there is a way to offer them on a trial basis, and measure if there is interest in them from visitors to gutenberg.org. I would be a fan of having automated book summaries presented in a way that it is clear they are not part of the "traditional catalog" information.
Summaries for books 1-15000 are online (skipping the "books" that don't have a .txt associated).
We're working on the rest.
I responded to Joyce's message about testing the other day. I'm in favor, and don't know how to do it.
All summaries end in (This is an automatically generated summary.), so we will be able to remove, edit or replace as needed.
This ties into another topic I've been thinking about, that we are perhaps overdue for an effort to tweak the design of the PG catalog pages again. When we try to keep adding more little bits and pieces over time, it gets to be a crowded page. I know we tried using a "tabs" system once, but did not keep it very long. I wonder if there is some other way to present a "basic record", and then have optional "Extended" information if a user wants to access it.
I'm working with a web designer who, I hope, will have ideas about how to better present the landing pages. I think we can make them more similar to an online bookstore's page, to help people decide what to read and what format to view/download.
Related, it would be great to improve our searching capabilities, especially for searching within lists of results. That's potentially a pretty big effort. ~ Greg
On Thu, 12 Sept 2024 at 10:33, Greg Newby <gbnewby@pglaf.org> wrote:
Your input is requested: I’ve been working with some programmers to build AI-based book summaries.
The intention is for these to be added to the landing pages for books on the Project Gutenberg website. We have iterated on the prompt to the AI (ChatGPT 4.o mini) and, to me, the summaries are pretty good. On landing pages, we’ll clearly label them as AI-generated summaries. In the future, we might replace them with improved summaries.
We are only able to feed the first 12K characters or so to the AI, due to the costs of the AI model (in the future, this will improve). The summaries all have a similar structure: The title & author, some basic background including the period it was written, and a second paragraph that characterizes the start of the book. (Just a few stories in these examples characterize the whole book, when it fits within the 12K character limit).
I’d value general feedback on this approach and the quality of the summaries.
If you have some specific book #s that you know well, and would like to see automated summaries of them, please let me know and we’ll add them to the list.
The summaries are here: https://displaysummaries-johannesseiko.replit.app/
Your input can go in this thread, or by email to me: gbnewby@pglaf.org
Thanks!
~ Greg
Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation www.gutenberg.org A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org
participants (4)
-
Andrew Sly
-
Eric Hellman
-
gbnewby@pglaf.org
-
Greg Newby