The Metadata Problem

I am not a metadata expert.  I have a couple of friends who could run circles around me in terms of depth and breadth of their experience.  But I do have opinions.

I’ve always thought that the logical person to append metadata — the person who brings the metadata in — is also the least likely to person to know which metadata will be of interest.  Downstream, the consumers of data will have their separate — and diverse — metadata “agendas”, if you will.  The originator doesn’t know what those agendas are (and probably can’t know, since it changes over time).  And, of course, the consumers of data don’t know what metadata apply to a particular dataset without examining it.

In addition, the task of appending metadata is an add-on: it’s something extra you have to do.  What incentive does the originator of a dataset have to do this, other than charity?

Tagging systems like delicio.us have solved a part of this problem by a bottom-up system of tagging where metadata are tagged onto datasets retroactively by any user of the system.  These systems don’t satisfy metadata zealots because the vocabularies aren’t controlled, but, as the Wikipedia article on tagging says, things work out.  the vocabularies are usable and typically converge, or at least don’t diverge too badly.  The crowd is, if not wise, at least not clueless.

It would be even better if there weren’t a separate tagging operation at all.  In a no-tagging operation, some workflow that the user was going to do anyhow would implicitly add metadata.

Typical use case here: when a user drags an email to a “junk” or “spam” folder, the mail management systems can infer that the email can be tagged as junk or spam.

I struggle a lot to get proper metadata in my personal information cloud, by dragging emails to folders and tagging.  The payoff is that search works pretty well for me in tracking things down when I need to.

Your thoughts?

Connected TV

Reading a bunch about marrying Internet and traditional TV today, trying, among other things, to suss out how the ecosystem is going to develop.

One insight I had today: people will not prefer smart TVs, if they end up preferring them at all, because they’ve got one fewer box. The history of phones, smartphones, and now tablets shows that people pick their boxes because of functionality, not box count. People cheerfully carried around blackberries and dumb phones together for years, one for email, one for voice. Today people have a phone and a tablet and a laptop, all for slightly different use cases, each picked for excellence of function.

My guess would be people will do the same for TVs. We will cheerfully combine legacy set top box, new box, and maybe even smart TV, if each excels at some purpose we want.

Money is a vector, not a scalar

We were having a discussion about “throwing good money after bad” the other day, and I found myself blurting out “well, after all, money is a vector, not a scalar.”

I’m sure you all remember (from your linear algebra class, perhaps) the difference between a vector quantity and a scalar. A scalar quantity has a magnitude while a vector has a magnitude and a direction.

“Good” money and “bad”. What are these but an additional dimension for money? In a bad investment the quantity of the money grows while its goodness shrinks; in a good investment they grow together. Additional money in a bad investment grows smoothly in quantity but has a discontinuity as it leaps from bad to good.

There are lots of discussions about money that acknowledge its vector nature. “Dumb” money and “smart”. “Patient” money. The “velocity” of money. “Easy” money (large first derivative of money with respect to effort).

Maybe just a dumb metaphor. Your thoughts?