Here's one of Google's more obscure tools. "Google Books Ngram Viewer" -- Say that ten times fast! What the heck is an Ngram anyway?
An Ngram, also commonly called an N-gram is a statistical analysis of text or speech content to find n (a number) of some sort of item in the text. It could be all sorts of things, like phonemes, prefixes, phrases, or letters. Although the N-gram is somewhat obscure outside of researcher, it is actually used in a variety of fields, and it has a lot of implications for people making computer programs that understand and respond with natural spoken language. That, in a nutshell, would be Google's interest in the idea.
In the case of Google Books Ngram Viewer, the text to be analyzed comes from the vast amount of books Google has scanned in from public libraries to populate their Google Books search engine. For Google Books Ngram Viewer, they refer to the text you are going to search as the "corpus." The corpora in the Ngram Viewer are divided up by language, although you can separately analyze British and American English or lump them together. It ends up being super interesting to toggle from British to American usage of terms and see the charts change.
How does it work?
- Go to Google Books Ngram Viewer at books.google.com/ngrams.
- Items are case-sensitive, unlike Google Web searches, so be sure to capitalize proper nouns.
- Type in any phrase or phrases you wish to analyze. Be sure to separate each phrase with a comma. Google suggests, "Albert Einstein, Sherlock Holmes, Frankenstein" to get you started.
- Next, type in a date range. The default is 1800 to 2000, but there are more recent books (2011 was the most recent listed on Google's documentation, but that may have changed.)
- Choose a corpus. You can search foreign language texts or English, and in addition to the standard choices, you may notice things like "English (2009) or American English (2009)" at the bottom. These are older corpora that Google has since updated, but you may have some reason to make your comparisons against old data sets. Most users can ignore them and focus on the most recent corpora.
- Set your smoothing level. Smoothing refers to how smooth the graph is at the end. The most accurate representation would be a smoothing level of 0, but that may be difficult to read. The default is set to 3. In most cases, you don't need to adjust this.
- Press the Search lots of books button. (You can also just hit enter at the search prompt.)
What is it showing you?
Google Books Ngram Viewer will output a graph that represents the use of a particular phrase in books through time. If you have entered more than one word or phrase, you will see color coded lines to contrast the different search terms. This is pretty similar to Google Trends, only the search covers a longer period of time.
Here's a real life example. I was curious about vinegar pies recently. They're mentioned in Laura Ingalls Wilder's Little House on the Prairie series, but I'd never heard of such a thing. I first used Google's Web search to learn more about vinegar pies. Apparently they're considered part of American Southern cuisine and really are made from vinegar. They hearken back to times when not everyone had access to fresh produce at all times of the year. Is that the whole story?
I searched Google Ngram Viewer, and there are some mentions of the pie in both the early and late 1800s, a lot of mentions in the 1940s, and an increasing number of mentions in recent times (perhaps some pie nostalgia.) Well, there's some problem with the data at a smoothing level of 3. There's a plateau over the mentions in the 1800s. Surely there weren't an equal number of mentions of one particular pie every year for five years? What's going on is that because there aren't a lot of books published during that time, and because my data is set to smooth, it distorts the picture. Probably there was one book that mentioned vinegar pie, and it just got averaged to avoid a spike. By setting the smoothing to 0, I can see that this is exactly the case. The spike centers on 1869, and there's another spike in 1897 and 1900.
Did nobody talk about vinegar pies the rest of the time? They probably did talk about those pies. There were likely recipes floating all over the place. They just didn't write about them in books, and that's a limitation of these Ngram searches.
Remember how I said that Ngrams could consist of all sorts of different text searches? Google allows you to drill down quite a bit with the Ngram Viewer as well. If you'd like to search for fish the verb instead of fish the noun, you can do so by using tags. In this case you'd search for "fish_VERB"
Google provides a complete list of commands you can use and other advanced documentation on their website.