Tagging by Doing

Part of building a better PIM infrastructure is undoubtedly solving the “tagging problem.”

The tagging problem is: how do you get users of a PIM system to tag their data 1) promptly 2) accurately 3) with regard to downstream users, and 4) MECE-ly without requiring them to use a heavy-duty ontology of some sort?

  1. Promptly.  The best thing is to tag data at the point of production, at least for salience to the person entering the data.  For downstream taggers, promptness is also of value as the information comes to them.
  2. Accurately.  There are two senses of accurate (maybe more?): Accuracy1 is making sure that data with the same semantics gets the same tags.  Accuracy2 is making sure that the tag is appropriate to the data.
  3. Helpful Downstream.  In a perfect world, the tagging with all be helpful to future users of the data.
  4. MECE-ly.  MECE is McKinsey’s acronym for “mutually exclusive, collectively exhaustive” which describes a set of tags which, for want of a better word, are what linear algebra would call a basis for a knowledge space.  Every item fits into one of the tags, and no data fits into two of the tags.

The tagging problem has no real solution, and the lack of a solution may be the downfall of many a knowledge system, whether personal or tribe.

Solutions are elusive because it’s not immediately in the interests of the tagger to do the tagging, even when it might be in their long-term interest.  Of course, it’s always in the long-term interest of any user of the system.

But what could you do if tagging occurred as an automatic by-product of some other operation?  One that users wanted to do?

I once trained an open-source Bayesian categorizer to distinguish spam from bacn from desirable mail in my inbox.  I did it by doing what I would have done anyhow — dragging the incoming mail into good, bacn, and spam folders.  But while I was doing that the classifier was changing its weights (or whatever classifiers do) and automatically improving its ability to triage my mail.

Could that same technique be applied to more complicated tagging problems?  Not sure.  It needs some thought.

One thought on “Tagging by Doing”

  1. Can’t tell what will work without getting specific. The micro-standards movement, Google for that matter, do quite well with entity identification and coding (names, places, events, …). Think that’s the approach to upfront tagging that works.

    Don’t ignore the value of emergent tagging. By watching how something is consumed, tagging can be quite powerful. That’s how relevance and associations are built.

    Finally context matters a lot. Trip it does great parsing your reservations because it knows it’s looking at reservations.

    So what types of tags do you envision? What types of contexts? Does this beg a meta ontology?

Leave a Reply

Your email address will not be published. Required fields are marked *