Been thinking about all the Geo-location data I’m collecting in Radian6 around 4SQ check-ins. Truth be told, I started tracking check-ins to see if I could do, more or less myself, what Dave Kerpen had a team of 6 staff members doing with his Likeable Social Media book (his team attempted to put a hint to download a free chapter of his book, or buy it, into every bookstore in the country).
Not sure what the results of that attempt to use a social media clan to harness popularity did – but his book ranks very well on Amazon, esp in the Web Marketing Category.
But I quickly gave up on the idea, not because I could not do it, it was simply my approach is different, I’m an analyst, not a marketer. I can get others to do marketing for me, but fundamentally, my work is with revealing truth, not spreading it. I let the quality of my content take care of itself – too busy creating content to worry about putting hints all over the internet – but, I admit, Dave Kerpen, in attempting to use Foursquare the way he did, developed a service with is probably useful to other book marketers as a product, in and of itself. Still, that’s not where I wish to go (or really can go, to be honest).
Rather, I began to look at the data I was collecting, care of Radian6, who provides me with an Influencer Account allowing my mind to roam – which is what I present to you today.
Before you is all the check-ins from Foursquare over the last 3 months that have been capture, mostly on Twitter. Based on my numbers (any mention that has “4sq.com” in it) there are roughly 7 million check-ins that take place a month, over the last 3 months.
For some reason, possibly due to changes I made in the topic profile, the latest volumes aren’t up to what they were for most of the 3 months, but I think you can see check-ins tend to peak during weekends, particularly Saturdays (not unexpected).
My queries are based on what I’m finding in the data, but admit they could be expanded – I don’t claim they are as good and clean as they could be – and therefore, these numbers presented in the charts below, really depend on the queries, which will differ from analyst to analyst – there is simply no standard yet, that exists on query writing. It’s a young field yet, Social Analytics – as I pointed out in my book.
Still, there’s not much you can do with the information unless your willing to segment it into categories, and even subcategories. In fact, the more categorization on can do on data, providing your categorizations are useful and fit the use-case your after (and I’ll have to explain that some other time) your data will become much more actionable. Without the categorization, its almost next to useless.
Here goes – I managed to categorize about 30% of all the check-ins I have been collecting in Radian6 over the last few months (decided not to resize the screen so you can see it in it’s full size).
Doing this wasn’t that hard, and generally speaking, working with Geo-located data has some advantages in that context and structure, to some extent, is provided by the check-in, itself. Looking at the River of News for each category gave me, more or less, satisfactory results. Had I had another 20-30 hours I could have whittled away and categorized the 2/3 of check-ins that are left, but I didn’t bother to do that.
A couple of observations – since we know the total number of check-ins was 19,307,702 (19.3 million) then
- 4% of all check-ins were about Mayors (awarded/ousted, etc).
- 5% of all check-ins were about Badge activity (awarded/achieved)
- 2% of all check-ins was about arriving @home and relaxing (finally)
- 2% of all check-ins was about Sports Activity (of some sort)
- about 1/2 of a percent of check-ins was about being @work (but more could be done here with a fuller set of linguisticVariants - see chapter 3 of Social Media Analytics)
- Restaurant/food/dining out has the most check-in activity at 5% of all check-ins.
I could go on, but my point is - there’s opportunity to data-mine these breakdowns (they are more actionable – because they have context); aside from word-clouds, there’s Influencers that can clearly be picked out based on Twitter Following. I do admit, using a crude metric such as Twitter Followers, for someone who checked into a Museum, is probably not an indication of real interest in Museums, or that if the person really does influence others in that sector
– all it shows is potential reach (and I talked about the problems of using crude metrics and suggested alternatives in my Technorati interview a few weeks ago
But once you have done the segmentation (along the lines of what Gary Angel suggests is called a “Two-Tiered segmentation” the Radian6 data on Foursquare check-ins, becomes much more interesting than it might ordinary be. It’s much the same as most Social Data, without that work in intelligently categorizing it, its not good for much – but for the most part, people don’t seem to understand that or be willing to pay for it, yet.
Worse yet, they pick the entirely wrong technologies to work with and totally miss the boat, hiring the wrong people, looking for results based on very shaky assumptions, and work. But what is new?!
With the categorization on Travel (above) which covers at 5% of the total travel segment (21,653/469,160) – done via Foursquare and using simple pattern matching – I now can use the Radian6 demographics and actually get something out of it – because now I have context, something that is usually missing, out of the box with most of the social data.
The most popular age group (25-34) has 2436 records, which is about 11% of the 21,653 identified, and perhaps, less than 1% of the total mentions for travel (but it’s a sample, and 1% might be enough for our purposes).
I was able to get some data from a word cloud by first honing in on “domains mentioned” which just showed Foursquare (as expected).
But, when we look at people in NYC and continue to use Radian6 Insights data, we pick up very little information – the platform is still maturing and can’t seem to scale well beyond 2 dimensions, in most cases, it will all depend on the data sources.
I was more interested in going back to what I achieved by segmenting the check-in data, in the first place.
I looked for a Cable TV/Network segment, perhaps to go with the TRENDRR.TV post I did yesterday (another is coming, tonight). Focused on MTV and did another word-cloud.
You can keep on going down, further and further, in what I admit is a manual process of digging and find nuggets of gold.
The real issue with Radian6, and probably something that can’t be entirely fixed, is the interface design, itself.
For measurement, one would have wanted clustering (automated, to the extent that AI can kick in) but due the fundamental design of the platform, one has to dig, and dig, and dig, which is time consuming and not particularly scalable (you have to do it over, and over and over again, every time you do a new project, a new report, you have dig as if you have never done it before).
So, in this sense, Radian6 was an excellent source for me to visually collect data (what it was designed to do) but is not built to automate sub-segmentation - the very thing that would finally make the data “actionable”.
Still, thanks to Radian6, I have collected a wealth of Geo-located Foursquare data that can be extremely useful, if know what to do with it.
Remember, check-in data is much easier to segment than other types of social data – we already have context – where they checked in; plus we usually have a short string that is semi automated and easy to run text-analytics on.
There’s so much data in my segmentation that I’ll probably write some more about it in a few days or weeks. And when you start daisy chaining Salesforce data into Radian6, esp my example of Wealthbuilder, the other day – you could, potentially get something that is so hyper-targeted as to be almost scary.
But we’ll leave that for another post, another time.