setrrainbow.blogg.se - Ispeak google books

#ISPEAK GOOGLE BOOKS PORTABLE#

I'd be surprised if that proportion of errors or anything like it held up in general for books in that range, and dating errors are far denser for older works than for the ones Google received from publishers. (Or maybe that was another Barack Obama.)Ī search on books mentioning candy bar that were published before 1920 turns up 66 hits, of which 46, or 70 percent, are misdated. The same search turns up 81 hits for Rudyard Kipling, 115 for Greta Garbo, and 29 for Barack Obama.

You turn up 182 hits for Charles Dickens, more than 80 percent of them misdated books referring to the writer as opposed to someone else of the same name.

Or try searching on the names of writers or famous restricting your search to works published before the years of their birth. Do a search on "internet" in books written before 1950 and Google Scholar turns up 527 hits. It might seem easy to cherry-pick howlers from a corpus as exensive as this one, but these errors are endemic. A book on Peter Drucker is dated 1905, a book of Virginia Woolf's letters is dated 1900, Tom Wolfe's The Bonfire of the Vanities is dated 1888, and an edition of Henry James 1897 What Maisie Knew is dated 1848. (You can find images of most of these on my slides, here - I'm not giving the url's since I expect Google will fix most of these particular errors now that they're aware of them).Īnd while there may be particular reasons why the 1899 date comes up so much, these misdatings are spread out all over the place.

#ISPEAK GOOGLE BOOKS PORTABLE#

To take GB's word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, André Malraux' La Condition Humaine, Stephen King's Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams' Culture and Society, Robert Shelton's biography of Bob Dylan, Fodor's Guide to Nova Scotia, and the Portuguese edition of the book version of Yellow Submarine, to name just a few.

And Google's are a train wreck: a mish-mash wrapped in a muddle wrapped in a mess. But to answer those questions you need good metadata. Ditto for someone who wants to look at early-19th century French editions of Le Contrat Social, or to linguists, historians or literary scholars trying to trace the development of words or constructions: Can we observe the way happiness replaced felicity in the seventeenth century, as Keith Thomas suggests? When did "the United States are" start to lose ground to "the United States is"? How did the use of propaganda rise and fall by decade over the course of the twentieth century? And so on for all the questions that have made Google Books such an exciting prospect for all of us wordinistas and wordastri. (That's what "googling" means, isn't it?) But for scholars looking for a particular edition of Leaves of Grass, say, it doesn't do a lot of good just to enter "I contain multitudes" in the search box and hope for the best. It's well and good to use the corpus just for finding information on a topic - entering some key words and barrelling in sideways. My presentation focussed on GB's metadata - a feature absolutely necessary to doing most serious scholarly work with the corpus. All of which lends a particular urgency to the concerns about whether Google is doing this right. So whoever is in charge of the collection a hundred years from now - Google? UNESCO? Wal-Mart? - these are the files that scholars are going to be using then. There's no Moore's Law for capture, and nobody is ever going to scan most of these books again. Mark has already extensively blogged the Google Books Settlement Conference at Berkeley yesterday, where he and I both spoke on the panel on "quality" - which is to say, how well is Google Books doing this and what if anything will hold their feet to the fire? This is almost certainly the Last Library, after all.