Wednesday, September 02, 2009

in the old days these kinds of misfires were only courtesy of Indigo

At first glance the Chronicle of Higher Education is underwhelmed at Google's book search handiwork.

"Google acknowledges the incorrect dates but says they came from the providers. It's true that Google has received some groups of books that are systematically misdated, like a collection of Portuguese-language works all dated 1899. But a very large proportion of the errors are clearly Google's own doing. A lot of them arise from uneven efforts to automatically extract a publication date from a scanned text. A 1901 history of bookplates from the Harvard University Library is correctly dated in the library's catalog. Google's incorrect date of 1574 for the volume is drawn from an Elizabethan armorial bookplate displayed on the frontispiece. An 1890 guidebook called London of To-Day is correctly dated in the Harvard catalog, but Google assigns it a date of 1774, which is taken from a front-matter advertisement for a shirt-and-hosiery manufacturer that boasts it was established in that year.

Then there are the classification errors, which taken together can make for a kind of absurdist poetry. H.L. Mencken's The American Language is classified as Family & Relationships. A French edition of Hamlet and a Japanese edition of Madame Bovary are both classified as Antiques and Collectibles (a 1930 English edition of Flaubert's novel is classified under Physicians, which I suppose makes a bit more sense.) An edition of Moby Dick is labeled Computers; The Cat Lover's Book of Fascinating Facts falls under Technology & Engineering. And a catalog of copyright entries from the Library of Congress is listed under Drama (for a moment I wondered if maybe that one was just Google's little joke)."

It's funny because it's true.

No comments:

Related Posts with Thumbnails