Spelling and Findability

July 3rd, 2012

As part of my research for The Metadata Handbook I’m exploring the impact of metadata quality on book sales (and, by extension, music sales, movie rentals and more).

I paused on the distinction between findability and discoverability. They are often used vaguely and frequently interchangeably.

Several entries ago I defined findability as “the challenge of locating exactly what you’re looking for (even if you have incomplete or inaccurate information about the book).” I next considered what happens to findability when metadata is missing, although the search terms were accurate. I followed that up with a glimpse at the problem of spelling errors in search terms, demonstrating that search engines built into ecommerce sites like Amazon are not as powerful as Google search.

In this final look at findability I’ll drill down on the intersection of spelling and findability.

My inspiration for this blog entry is a recent New York Times column by Jane E. Brody. Several commenters on her provocative column mention a book by Barbara Ehrenreich. One person calls the book Blind-Sided, although others use the correct (though less probable) title Bright-Sided (How Positive Thinking Is Undermining America). A search for Blind-Sided on Amazon delivers 73 results, none of them Ehrenreich’s book. Adding “Barbara” to the search term doesn’t help. “Ehrenreich” is a tough name for anyone to spell from memory.

Meanwhile a Google search for “Blind-Sided Barbara” gets it right as the fifth result, the first book result returned by the search. Advanced search engines allow for common misspellings. Google just goes much further than most commercial database search technology.

This example is not caused by a spelling error, per se, but the impact is similar. A small error, simple enough, human enough, and the real book isn’t findable. How common is this likely to be?

Spelling errors are a two-headed Hydra: the error can be either in the database, in the search terms, or both.

A 2005 study by Jeffrey Beall in the Journal of Digital Information (Metadata and Data Quality Problems in the Digital Library) delineates nine different types of possible spelling errors “that occur in digital libraries, both in full-text objects and in metadata.”

Further illustrating the challenge, Beall sampled academic journal database JSTOR for spelling errors and found plenty:

Ryan James and Andrew Weiss looked at Google Books metadata and found that “36% of sampled books in the Google digitization project contained metadata errors…higher than one would expect.”

And so on.

The lesson for publishers is recognizing that a thorough spell check before texts and metadata leave your offices makes books findable. The spell-checker in Microsoft Word is professional quality and allows for the addition of custom dictionaries. It can be augmented by third-party software such as StyleWriter by Editor Software. Finally, the Wikipedia Typo Team is a source of suggestions, tools and inspiration.