Questions on metadata in HTML files

A few weeks ago I got an email saying I was one of the last two PG contributors using RST and that RST would no longer be accepted by PG. It was suggested that if I wanted to use RST I could convert it to HTML using pandoc. I have been experimenting with pandoc and rst2html and have gotten good results with both, so I have a pretty good procedure going where I can make RST files and generate usable HTML files from them. I think rst2html gives better results than pandoc, but both work OK and I don't miss the PG extensions to RST much if at all. HOWEVER, I am not entirely clear on how to submit my HTML docs now. The ebookmaker page suggests that I can include metadata in my HTML files and ebookmaker will pick it up. That does not seem to be happening. My HTML metadata looks like this: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="generator" content="Docutils 0.16: http://docutils.sourceforge.net/" /><title>A PROSE ENGLISH TRANSLATION OF VISHNUPURĀNAM</title><meta content="A Prose English Translation of Vishnupurānam" name="pg.title" /><meta content="99999" name="pg.id" /><meta content="Public Domain" name="pg.rights" /><meta content="James Simmons" name="pg.producer" /><meta content="This file was produced from page images at Internet Archive." name="pg.credits" /><meta content="Manmatha Nath Dutt, M. A., M.R.A.S." name="dc.creator" /><meta content="A Prose English Translation of Vishnupurānam" name="dc.title" /><meta content="en" name="dc.language" /><meta content="1896" name="dc.created" /><meta content="yyyy-mm-dd" name="pg.released" /><meta content="images/cover.jpg" name="coverpage" /> This is pretty much what I used with RST, but with names all lower case. I've tried both ways and ebookmaker ignores them both. So how do I put metadata in HTML so ebookmaker can find it and use it? James Simmons

I don't see that anyone's responded to this yet, so… Unless I'm missing something, there shouldn't be much (any?) need to have "<meta content…" entries in the HTML file. Most of the information in the example below will come from the information in the upload form. That information gets to the WWers via the info.txt file that's generated by the upload process. The etext number will be unknown until it's generated by the WWers. Re "The ebookmaker page suggests that I can include metadata in my HTML files and ebookmaker will pick it up." This paragraph on the ebookmaker page says: "Ebookmaker will try to identify author, title, encoding and eBook number from your file, IF it includes the standard Project Gutenberg metadata as found in the published collection." I read it that the information is in the form of the standard PG header, not as a set of metadata entries in the HTML file. All you really need to do is zip your text and HTML files, along with any /images folder, and submit the zip file the normal way. Do not submit an epub file--one is auto-generated from the posted HTML file, or from the text file if an HTML file wasn't submitted. (PG has no mechanism for posting them anyway.) Ditto with attempting to create your own PG header/footer--they'll just have to be removed for PG's posting software to add its own. Al Haines Project Gutenberg From: gutvol-d <gutvol-d-bounces@lists.pglaf.org> On Behalf Of James Simmons Sent: August 26, 2021 12:35 PM To: Gutenberg Volunteers <gutvol-d@lists.pglaf.org> Subject: [gutvol-d] Questions on metadata in HTML files A few weeks ago I got an email saying I was one of the last two PG contributors using RST and that RST would no longer be accepted by PG. It was suggested that if I wanted to use RST I could convert it to HTML using pandoc. I have been experimenting with pandoc and rst2html and have gotten good results with both, so I have a pretty good procedure going where I can make RST files and generate usable HTML files from them. I think rst2html gives better results than pandoc, but both work OK and I don't miss the PG extensions to RST much if at all. HOWEVER, I am not entirely clear on how to submit my HTML docs now. The ebookmaker page suggests that I can include metadata in my HTML files and ebookmaker will pick it up. That does not seem to be happening. My HTML metadata looks like this: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="Docutils 0.16: http://docutils.sourceforge.net/" /> <title>A PROSE ENGLISH TRANSLATION OF VISHNUPURĀNAM</title> <meta content="A Prose English Translation of Vishnupurānam" name="pg.title" /> <meta content="99999" name="pg.id" /> <meta content="Public Domain" name="pg.rights" /> <meta content="James Simmons" name="pg.producer" /> <meta content="This file was produced from page images at Internet Archive." name="pg.credits" /> <meta content="Manmatha Nath Dutt, M. A., M.R.A.S." name="dc.creator" /> <meta content="A Prose English Translation of Vishnupurānam" name="dc.title" /> <meta content="en" name="dc.language" /> <meta content="1896" name="dc.created" /> <meta content="yyyy-mm-dd" name="pg.released" /> <meta content="images/cover.jpg" name="coverpage" /> This is pretty much what I used with RST, but with names all lower case. I've tried both ways and ebookmaker ignores them both. So how do I put metadata in HTML so ebookmaker can find it and use it? James Simmons

Al, OK, that makes sense. The paragraph you quoted was the one that got me confused. It doesn't really make much sense to have it there, I'm getting pretty good results using RST without PG extensions, which I didn't like much to begin with. In the old flow they only really benefited the PDF output, which I don't think anyone will miss. Thanks. James Simmons On Fri, Aug 27, 2021 at 1:21 PM ajhaines <ajhaines@shaw.ca> wrote:
I don't see that anyone's responded to this yet, so…
Unless I'm missing something, there shouldn't be much (any?) need to have "<meta content…" entries in the HTML file. Most of the information in the example below will come from the information in the upload form. That information gets to the WWers via the info.txt file that's generated by the upload process. The etext number will be unknown until it's generated by the WWers.
Re "The ebookmaker page suggests that I can include metadata in my HTML files and ebookmaker will pick it up."
This paragraph on the ebookmaker page says:
"Ebookmaker will try to identify author, title, encoding and eBook number from your file, IF it includes the standard Project Gutenberg metadata as found in the published collection."
I read it that the information is in the form of the standard PG header, not as a set of metadata entries in the HTML file.
All you really need to do is zip your text and HTML files, along with any /images folder, and submit the zip file the normal way.
Do not submit an epub file--one is auto-generated from the posted HTML file, or from the text file if an HTML file wasn't submitted. (PG has no mechanism for posting them anyway.) Ditto with attempting to create your own PG header/footer--they'll just have to be removed for PG's posting software to add its own.
Al Haines Project Gutenberg
From: gutvol-d <gutvol-d-bounces@lists.pglaf.org> On Behalf Of James Simmons Sent: August 26, 2021 12:35 PM To: Gutenberg Volunteers <gutvol-d@lists.pglaf.org> Subject: [gutvol-d] Questions on metadata in HTML files
A few weeks ago I got an email saying I was one of the last two PG contributors using RST and that RST would no longer be accepted by PG. It was suggested that if I wanted to use RST I could convert it to HTML using pandoc.
I have been experimenting with pandoc and rst2html and have gotten good results with both, so I have a pretty good procedure going where I can make RST files and generate usable HTML files from them. I think rst2html gives better results than pandoc, but both work OK and I don't miss the PG extensions to RST much if at all.
HOWEVER, I am not entirely clear on how to submit my HTML docs now. The ebookmaker page suggests that I can include metadata in my HTML files and ebookmaker will pick it up. That does not seem to be happening.
My HTML metadata looks like this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="Docutils 0.16: http://docutils.sourceforge.net/" /> <title>A PROSE ENGLISH TRANSLATION OF VISHNUPURĀNAM</title> <meta content="A Prose English Translation of Vishnupurānam" name="pg.title" /> <meta content="99999" name="pg.id" /> <meta content="Public Domain" name="pg.rights" /> <meta content="James Simmons" name="pg.producer" /> <meta content="This file was produced from page images at Internet Archive." name="pg.credits" /> <meta content="Manmatha Nath Dutt, M. A., M.R.A.S." name="dc.creator" /> <meta content="A Prose English Translation of Vishnupurānam" name="dc.title" /> <meta content="en" name="dc.language" /> <meta content="1896" name="dc.created" /> <meta content="yyyy-mm-dd" name="pg.released" /> <meta content="images/cover.jpg" name="coverpage" /> This is pretty much what I used with RST, but with names all lower case. I've tried both ways and ebookmaker ignores them both. So how do I put metadata in HTML so ebookmaker can find it and use it? James Simmons

Hi, James. What Al wrote below is correct. Apologies I didn't get a chance to respond earlier. In fact, META tags are added during the publication workflow. Submitters dno't need to worry about this. I'm glad RST is working out! Best, Greg On Fri, Aug 27, 2021 at 01:34:24PM -0500, James Simmons wrote:
Al,
OK, that makes sense. The paragraph you quoted was the one that got me confused. It doesn't really make much sense to have it there,
I'm getting pretty good results using RST without PG extensions, which I didn't like much to begin with. In the old flow they only really benefited the PDF output, which I don't think anyone will miss.
Thanks.
James Simmons
On Fri, Aug 27, 2021 at 1:21 PM ajhaines <ajhaines@shaw.ca> wrote:
I don't see that anyone's responded to this yet, so…
Unless I'm missing something, there shouldn't be much (any?) need to have "<meta content…" entries in the HTML file. Most of the information in the example below will come from the information in the upload form. That information gets to the WWers via the info.txt file that's generated by the upload process. The etext number will be unknown until it's generated by the WWers.
Re "The ebookmaker page suggests that I can include metadata in my HTML files and ebookmaker will pick it up."
This paragraph on the ebookmaker page says:
"Ebookmaker will try to identify author, title, encoding and eBook number from your file, IF it includes the standard Project Gutenberg metadata as found in the published collection."
I read it that the information is in the form of the standard PG header, not as a set of metadata entries in the HTML file.
All you really need to do is zip your text and HTML files, along with any /images folder, and submit the zip file the normal way.
Do not submit an epub file--one is auto-generated from the posted HTML file, or from the text file if an HTML file wasn't submitted. (PG has no mechanism for posting them anyway.) Ditto with attempting to create your own PG header/footer--they'll just have to be removed for PG's posting software to add its own.
Al Haines Project Gutenberg
From: gutvol-d <gutvol-d-bounces@lists.pglaf.org> On Behalf Of James Simmons Sent: August 26, 2021 12:35 PM To: Gutenberg Volunteers <gutvol-d@lists.pglaf.org> Subject: [gutvol-d] Questions on metadata in HTML files
A few weeks ago I got an email saying I was one of the last two PG contributors using RST and that RST would no longer be accepted by PG. It was suggested that if I wanted to use RST I could convert it to HTML using pandoc.
I have been experimenting with pandoc and rst2html and have gotten good results with both, so I have a pretty good procedure going where I can make RST files and generate usable HTML files from them. I think rst2html gives better results than pandoc, but both work OK and I don't miss the PG extensions to RST much if at all.
HOWEVER, I am not entirely clear on how to submit my HTML docs now. The ebookmaker page suggests that I can include metadata in my HTML files and ebookmaker will pick it up. That does not seem to be happening.
My HTML metadata looks like this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="Docutils 0.16: http://docutils.sourceforge.net/" /> <title>A PROSE ENGLISH TRANSLATION OF VISHNUPURĀNAM</title> <meta content="A Prose English Translation of Vishnupurānam" name="pg.title" /> <meta content="99999" name="pg.id" /> <meta content="Public Domain" name="pg.rights" /> <meta content="James Simmons" name="pg.producer" /> <meta content="This file was produced from page images at Internet Archive." name="pg.credits" /> <meta content="Manmatha Nath Dutt, M. A., M.R.A.S." name="dc.creator" /> <meta content="A Prose English Translation of Vishnupurānam" name="dc.title" /> <meta content="en" name="dc.language" /> <meta content="1896" name="dc.created" /> <meta content="yyyy-mm-dd" name="pg.released" /> <meta content="images/cover.jpg" name="coverpage" /> This is pretty much what I used with RST, but with names all lower case. I've tried both ways and ebookmaker ignores them both. So how do I put metadata in HTML so ebookmaker can find it and use it? James Simmons
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org https://lists.pglaf.org/mailman/listinfo/gutvol-d Unsubscribe: https://lists.pglaf.org/mailman/options/gutvol-d
participants (3)
-
ajhaines
-
Greg Newby
-
James Simmons