Now that Google has digitized millions of books, the next logical step is to sift through those books for interesting patterns of thought. Enter Google NGram Viewer, now in beta, which can identify how many times a specific word or phrase appears in millions of books over certain time periods. The digitized books can be searched in a variety of languages – American English, British English, French, German, Hebrew Spanish Russian, Chinese.
This fascinating new tool enables one to identify certain crude trends about the ideas floating around in published books, which may be a proxy for what was on the minds of educated people. The raw data begs for us to make educated guesses about why certain words spike -- or disapear -- during certain time periods. (Thanks, Jim Boyce, for bringing this to my attention.)
So what happens when “commons” and “public goods” is put into Google’s magic database machine? You get the following chart:
What’s interesting is how the word “commons” has the most (relative) mentions around 1760 through 1780, and a fairly good number of mentions through the 1840s. But then the frequency trails off to the most minimal levels through the year 2000. (Data from 2000 to 2010 are not yet available, but will be coming soon.) I surmise that with the rise of industrialism in the 1840s --and therefore the many enclosures of the commons -- the visibility of the commons in people’s lives began to retreat.
I was wondering if a search of books from Great Britain would yield similar results. Here's the chart that you get if you search “British English” books from 1800 to 2000:
If anyone has any interesting theories that these charts provoke, please share them.
For more on how Google NGrams works, go this this webpage. Google has also assembled a proportionately correct but random sampling of books from 1500 to 2008 called “The Google Million,” which consists of a million English-language books. As Google explains:
“No more than about 6,000 books were chosen from any one year, which means that all of the scanned books from early years are present, and books from later years are randomly sampled. The random samplings reflect the subject distributions for the year (so there are more computer books in 2,000 than 1980). Books with low OCR quality were removed, and serials were removed.”
Happy conjectures!
Recent comments