Thad McIlroy – Future Of Publishing Is Metadata Magic? — Thad McIlroy

September 20th, 2012

When I began researching The Metadata Handbook I believed that metadata was magic. I believed that if you could add rich and accurate metadata to a title listing you’d all but guarantee big sales. I’ve since learned that while metadata is enchanting, its powers are far more down to earth.

It’s easy to see why metadata gets mistaken for magic. It’s complex, extremely complex to execute extremely well. Conversancy with ONIX, the metadata standard, requires an appreciation of XML and DTDs and ISTCs and…you get the point. It was Arthur C. Clarke who first said “Any sufficiently advanced technology is indistinguishable from magic.” ONIX-based metadata nearly qualifies. But it’s not magic. It promises to make your books discoverable, and discoverability implies many new sales once your wonderful books are found.

Room for 10,000 more volumes

What this fails to take into account is the sheer volume of books published. Laura Dawson recently revealed that there are 32 million books in print. The Library of Congress “receives some 22,000 items each working day and adds approximately 10,000 items to the collections daily.” Amazon added 56,477 new Kindle ebooks in the last 30 days (link broken).

These volumes have the impact of all but extinguishing the magical flame of discoverability. Yes, metadata will make your book discoverable – at the same time it’s making another 10 or 20 or 30 thousand books discoverable. Your choice.

In a recent post I considered titles on Amazon that fall under the heading “baking bread.” Amazon offers over 3,000 titles in this category. When I search on Google for baking bread I need to wade through 47 results before I hit the first book, Emmanuel Hadjiandreou’s How to Make Bread. Most of the other results link to cooking sites and to how-to videos on YouTube. To be fair, the same search on Bing offered Beth Hensperger’s Baking Bread: Old and New Traditions as the 17th result.

Over at O’Reilly Tools of Change Joe Wickert notes that “today’s search engine access is generally limited to our metadata, not full book content. As a result, books are at a disadvantage to most other forms of content online.” This leads him to the question: “At what point do we expose the book’s entire contents to all the search engines?”

I think that Joe has initiated an important discussion. A couple of points.

Paradoxically, the content of a book is also metadata about the book. Look at it this way: we’ve already got defined metadata fields for “table of contents”, “excerpt” and “index”. The entire text of a book is just a very long excerpt from within the book. What better metadata about a book than all of the words and ideas contained in the text?

The topic is controversial only, I think, because of concerns about theft of the content, that the book will be read online without payment. If you could index the content of a book so that it showed up in search engines without making it simple to download or read the book then why not just do so? There are lots of ways to handle this technically, whether using JavaScript, tagged images or… (technical experts chime in here). At the very least let Google Books index the contents.

To me this points to a topic that’s fallen off the radar of late: Creating a great website for every new book.

The Big Five publishers are still creating crummy websites, even for their big-name authors. Sure it’s hard work to create a good web site. But mostly it’s a challenge for the imagination. And it’s surely no harder than writing and publishing the book.

If authors and publishers want to maximize their sales opportunities in a desperately crowded market they’ll make sure they get the metadata right by controlling the one instance entirely within their control: the book’s web site. And they should take full advantage a book’s most potent metadata: the full text.