Occurred to me recently that Google can be, under certain conditions, an excellent tool to categorize sites and media properties with, though I realize it will not work in every case. I’ll talk alot more about this in my book on Social Media Analytics (a new book site is being worked on where I will talk about the book as I’m writing it and ask for feedback).
But here’s one example, if I were to take a list of sites – any list – could be Comscore‘s 50,000 or so sites, or just any list all of blogs, message boards, photo sharing sites, main stream media outlets, whatever, and run a set of pre canned queries on them while counting the results, I could tell you how relevant they were on the subject of that query. Sure, there would be some issues with dynamic urls and sites that are serving up a lot of duplicate content (which Google tries to suppress with the duplicate content filter) but over all, if I have a good set of queries and enough time, I could categorize a bunch of sites with relevancy for a particular subject (what the query is about).
If I had enough different queries, and enough time, I could categorize the web (but right now, without a bit of programming, this would be impossible to scale); in fact much of Comscore is manually deciding, via a dictionary team, what categories a site is in. And Google collects information via Google Analytics Benchmarking where sites that share data can compare themselves to other sites who also share anonymous data in a category (say, magazines) and see how they preform on 6 preset metrics.
Much of this got stimulated by looking at CisionPoint and Recorded Future, two platforms I’m playing with right now and will have more to write about them in, lets say, the near future. I’m also giving a webinar with Jay Krall of Cision in mid November on all the neat things CisionPoint can do and it will be an interactive webinar where I’ll be asking Jay some cool questions and he’ll show people what CisionPoint actually does.
For example, CisionPoint now reads in Radian6 data and merges it with their Media Outlet database and Industry Segmentation – I bet a lot of people didn’t know that or what to do with such information.
More about this shortly.