Isn't it strange that the creators have birth and death dates included in the tag?
I suppose this is going to cause me problems later, since i'm using the open library webservice to fetch book covers, and its very sensitive to false information. I can't think of a way to remove it because of the seperator used, just about the worst possible choice since "," appears as delimiter of the first and last names. Not to speak of the names in french and latin or organizations that either don't have ",". The other possible match "-" also appears in names... possibly a regex like this "[0-9]+* (B\.C\.)? [0-9]+*" ? Then there are cases like "Sunzi, 6th cent. B.C." Bad metadata!
I'm dumb. I didn't notice the friendlytitle triple. Disregard this.
Then again, don't. The friendly title doesn't have the last name. Bah.
In fact it has what appears to be automated parsing errors. For instance: <dc:title rdf:parseType="Literal">The Strange Adventures of Captain Dangerous, Vol. 2 of 3 Who was a sailor, a soldier, a merchant, a spy, a slave among the moors...</dc:title> <dc:creator rdf:parseType="Literal">Sala, George Augustus, 1828-1895</dc:creator> <pgterms:friendlytitle rdf:parseType="Literal">The Strange Adventures of Captain Dangerous, Vol. </pgterms:friendlytitle>
It appears to be a hard-coded limit length, but appears to work correctly in the other cases... Whats the algorithm you use?
I managed but the special cases are driving me crazy. Some of the names is like this, a clear error i believe: Headley, P. C. (Phineas Camp), 1819-1903, 1819-1903 Combs, Josiah Henry, 1886-1960, 1886-1960 Algie, R. M. (Ronald Macmillan), 1888-1978, 1888-1978 then there are the various eastern people that only have a date no leading - or anything. And the other abreviations like *d. ca. fl. cent *
Actually just searching for a digit in the last "," seperated string seems to do it, except in the three cases above. Can you fix the rdf?
Paulo Levi wrote:
I managed but the special cases are driving me crazy. Some of the names is like this, a clear error i believe:
Headley, P. C. (Phineas Camp), 1819-1903, 1819-1903 Combs, Josiah Henry, 1886-1960, 1886-1960 Algie, R. M. (Ronald Macmillan), 1888-1978, 1888-1978
Fixed these. If you find errors in the catalog, report to catalog@pglaf.org *AFTER* doing a diligent LoC search on http://catalog.loc.gov/ to make sure that you are reporting a real error.
then there are the various eastern people that only have a date no leading - or anything. And the other abreviations like *d. ca. fl. cent
The catalog has been edited by many different people over a period of nearly 40 years. We know it is not as consistent as we'd like it, and will probably never be so. Welcome in the world of real programming ...
Paulo Levi wrote:
It appears to be a hard-coded limit length, but appears to work correctly in the other cases... Whats the algorithm you use?
Friendlytitle is the title that appears on the bibrec page, so that users who bookmark the page will see something meaningful in their bookmark list. Its the first line of the book title followed by as many authors as will fit into 80 chars.
participants (2)
-
Marcello Perathoner
-
Paulo Levi