Metadata for Books—Better Living Through Blogging

November 2, 2015 by Thad McIlroy

There are not many people who blog about metadata. There are fewer still who blog about metadata for books. (*Note 1, below.)

I do. It’s my vice. And like many vices it’s a source both of stimulation and of solace. (And, of course, as a vice, it can get out of hand.)

I blog about metadata here at The Future of Publishing, but not often, knowing that for many of my readers metadata is a subject so dull that well, you can get a sense of how dull it can seem in the note below. (*Note 2, below.)

Onix: "It is a very aggressive Pokémon..."

Onix: “It is a very aggressive Pokémon…”

Nonetheless, there are many (OK, perhaps a few thousand) people who devote much (some) of their day to book publishing metadata concerns. There are a few thousand more quite interested in the topic, but spared from dealing with it as an everyday responsibility. And there is another bunch (a swarm? a murder?) who follow metadata from a distance of 30,000 feet. It is to these groups that I wish to call attention to my metadata blog.

The blog appears, appropriately enough, on the website of our book (co-authored with Renée Register) The Metadata Handbook.

I cover all the hot topics: new ONIX for Books Codelists, moving from ONIX 2.1 to 3.0, “The Discoverability Problem,” standards activities at the publishing associations and, well, the latest ONIX for Books Codelists.

I just wanted to bring the blog to your attention because, well, because not enough people know about it and I wish more people would read it. Thanks.

Note 1. Other metadata blogs:

There are only three blogs that I’m aware of that consider metadata for book publishers.

BookNet Canada, “a non-profit organization that develops technology, standards, and education to serve the Canadian book industry” has some of the best coverage of book publishing metadata, because of their very smart metadata guru, Tom Richardson.

Bibliocloud, the powerful publishing management system, rooted in metadata, maintains a blog, though it mostly promotes its services.

Metadata consultancy Jared & Perry blogs, though the last entry is was in March, 2014.

There’s also a public Yahoo group called onix_implement. It’s “an e-forum for implementation queries about ONIX for Books.” As such it is very technical and very situation-specific.

The library community lives and breathes metadata far more deeply than book publishers. Here is a sampling of library-focused metadata blogs:

Metadata Matters, the blog of Metadata Management AssociatesThe latest post, by Diane Hillmann, considers the politics of metadata in the library community.

Outgoing: Library metadata techniques and trends is a very technical blog by Thom Hickey.

The Metadata Discussion Group of the Indiana University Libraries is fairly active.

There is a Metadata Blog of the ALCTS (Association for Library Collections & Technical Services) Metadata Interest Group, although the last post was in May.

Dull, in a new way.

Dull, in a new way.

Note 2: Leave it to Charles Dickens to capture the pain of the truly dull. In October 1885 he wrote about Thomas Walker’s “threepenny weekly magazine.” It was, he complained, “so dull that it is hard to understand how it survived its first weeks at all; so dull that its decease, after a brief career of some six months, is no matter of wonder; so dull that it is, at first sight, difficult to make out why even its memory should have survived that of so much of the periodical literature which has succeeded it.”

March 20, 2016: A colleague suggested the other day that publishers feel the responsibility of creating good metadata is akin to mom saying, “Eat your vegetables!”

Tags: , , ,



  • Peter

    Nov 13th, 2015 : 12:03 PM

    Thad, thanks for all your good work around book metadata. I’ve recently been hip-deep in Ingram’s book files and could only shake my head. I’ve been a book publisher and understand the profound value of good book metadata. “Good” wants definition here of course, but my bias is good for consumers of books, good for the growing diversity of eCommerce bookstore sites.

    The options for licensing data seem quite limited and the quality of the data highly varied and problematic for reasons that seem mysterious. Ingram’s data is rife with typos, mis-attributed authors, and provides only small cover files. Baker & Taylor is the same, it seems, but with less breadth than Ingram. Google’s API service seems limited in the scope of the title base. Indie Bound, Library Thing, also have real limitations. Zola Books and offer APIs but no data service, as near as I can tell.

    All this is suprising to me. It would seem certainly advantageous to publishers to have a rich diversity of online bookstore outlets available. But this requires an eCommerce friendly book metadata service available.

    What am I missing?

  • Thad McIlroy

    Nov 13th, 2015 : 8:08 PM

    Hi Peter,

    Thanks for your comment: I’m much in agreement. “Good” when defined as “good for consumers of books, good for the growing diversity of eCommerce bookstore sites” is on the money.

    I’ve heard many complaints about the quality of metadata available for license. On the one hand the errors “typos, mis-attributed authors and more, are understandable when you’re trying to corral literally millions of bibliographic records. On the other hand it leaves eCommerce sites in the lurch, looking for something better. Amazon, apparently, originally licensed Bowker metadata, but they’ve made endless alterations to the data on their site, such that Amazon is, arguably, the best available source of book metadata today. Yet even a cursory glance on turns up dozens of inconsistencies and errors.

    There is, as they say, a business opportunity here for someone to create a new cleaner set of records. The problem is that the cost is enormous and there aren’t going to be many new customers in the foreseeable future. You’re, sadly, stuck.

  • Peter

    Nov 14th, 2015 : 7:42 AM

    Thank you, sir. It does seem like it’s a solution in need of a market. And the “if we build it they will come” isn’t compelling to VCs.

    One additional query: do you have any insight or experience with the Google API yourself? If not do you know anyone who has worked with their data? Any insight would be much appreciated.