The Metadata Problem

I am not a metadata expert.  I have a couple of friends who could run circles around me in terms of depth and breadth of their experience.  But I do have opinions.

I’ve always thought that the logical person to append metadata — the person who brings the metadata in — is also the least likely to person to know which metadata will be of interest.  Downstream, the consumers of data will have their separate — and diverse — metadata “agendas”, if you will.  The originator doesn’t know what those agendas are (and probably can’t know, since it changes over time).  And, of course, the consumers of data don’t know what metadata apply to a particular dataset without examining it.

In addition, the task of appending metadata is an add-on: it’s something extra you have to do.  What incentive does the originator of a dataset have to do this, other than charity?

Tagging systems like delicio.us have solved a part of this problem by a bottom-up system of tagging where metadata are tagged onto datasets retroactively by any user of the system.  These systems don’t satisfy metadata zealots because the vocabularies aren’t controlled, but, as the Wikipedia article on tagging says, things work out.  the vocabularies are usable and typically converge, or at least don’t diverge too badly.  The crowd is, if not wise, at least not clueless.

It would be even better if there weren’t a separate tagging operation at all.  In a no-tagging operation, some workflow that the user was going to do anyhow would implicitly add metadata.

Typical use case here: when a user drags an email to a “junk” or “spam” folder, the mail management systems can infer that the email can be tagged as junk or spam.

I struggle a lot to get proper metadata in my personal information cloud, by dragging emails to folders and tagging.  The payoff is that search works pretty well for me in tracking things down when I need to.

Your thoughts?