How Metadata Gets Mangled

July 14th, 2012

My metadata colleague, Renée Register, uses Zappos boots to offer an easy-to-grasp example of metadata in action.

That works.

But as I was pondering the problem of mangled metadata I thought: media mavens say that “the ultimate product is YOU.” So what better metadata challenge to confront than your personal metadata, also known as your resume.

This afternoon I registered on Monster.com just to see how Monster handles metadata.

After filling in the pre-populated metadata fields I also uploaded the Word .doc version of my full resume (which Monster parses to match potential job opportunities). If Monster treated my resume the way book retailers treat metadata it would look like this:

These are the sort of changes, insertions, arbitrary rules and simple neglect found in book metadata every day.

As Brian O’Leary points out in his June report to the Book Industry Study Group (BISG) and BookNet Canada:

  • Both distributors and data aggregators change the metadata they receive from publishers…these modified feeds can either replace or compete with publisher data.
  • Publisher’s metadata is modified or supplemented without their knowledge or consent.
  • Metadata recipients reported that they receive incomplete or incorrect metadata from publishers.

Brian’s 37-page report is called the Development, Use, and Modification of Book Product Metadata. He has done a fine job in uncovering the challenges faced by authors, publishers, service providers and resellers in their struggle to accurately describe each of the several million titles in the publishing supply chain today.

I got a look at the report, but it costs publishers $200 (for BISG members) or $500 (for non-members). It’s not kosher to report on a document that readers can’t judge for themselves so I’ll limit my remarks here. In a blog post on my metadata web site I listed some of the publicly-accessible sources of information about the study.

“I must go on. I can’t go on. I’ll go on.”
― Samuel Beckett

What continues to strike me about the state of book publishing metadata is how anybody can find anything at all. Yet some $20 billion worth of trade books manage to be sold in North America each year, many of them online. To read the BISG report you’d think that half those sales would be missed. But they’re not. The publishing industry succeeds despite the mangled metadata mess (though certainly not because of it).

There’s an informative diagram in the report (also publicly available on SlideShare) that well illustrates the supply chain kerfuffle.

With all of this back-and-forth between publishers, distributors, content converters, online reading sites and various resellers, the only surprise once again is that any accurate data makes its way to Amazon or Apple. But it does. And lots of books are sold also from entries full of egregious errors. How could this be possible?

That’s a subject for a subsequent blog entry.

July 15, 2012: Renee Register has published the first of three articles on the BISG study, Considering the BISG Report, “Development, Use, and Modification of Book Product Metadata.”