Text Mining / Text Analytics and Web Journal – November 10th – 14th, 2011

I was quoted in an AllAnalytics.com post today about TEXT MINING IS ON THE RISE (go to AllAnalytics.com to read the post); I haven’t seen anything much to write home about with the text Analytics I’ve seen.  Maybe, it’s about more than mining text for patterns, and perhaps, it’s really about how that information is layered and organized.

Plans are shaping up for another London book signing in late January, 2012 and I’ll know more shortly, when the plans are finalized.  Another piece of news is that I have been asked to teach a class starting in mid/late January – more details in a few weeks.

On with the journal part of this post – I saw a post titled For Dunkin’ Donuts the World of Social Media Runs on ROI and noticed the information came largely from NetBase social analytics platform I’m familiar with, but haven’t actually had a chance to use (though I have a friend who works for Netbase …hmmm) and I also have a friend who works at Dunkin Donuts …. hmm.   My take on the article itself is that I have a hard time focusing on what it is actually saying, even though I know what the article is trying to say.

Also noted the “negative themes” (which goes along with my comments in the Text Mining/Text Analytics  at AllAnalytics.com) that NetBase is supposed to be the best engine at taking “unstructured” social data (or any data) and mining it for meaning.


I suppose this chart around negative themes is more useful to me than a word cloud or map that contained only single words.  But I want to draw attention to the excellent summary at the bottom of the post.

” …. Confidence level index—Our social analytics were calculated at a 95% Confidence Level with a +/-6% Confidence Interval.”

Well, that might represent “your” or “their” confidence in the data, but is there any real way to check if this data is “95% correct”?   I doubt it – and even if NetBase provided the data sample it used, others would or could debunk it.   Too bad their isn’t a certifying standards group to look at all these claims and test them, independently, for compliance and accuracy.  Right now, any vendor can say, pretty much, anything they want – there’s no fact checkers in the house.

Also read a post on IBM, MIT Study: Analytics Success Depends on Corporate Culture at CMSWire that says 44% of organizations say the primary barriers to enterprise-wide analytics adoption are cultural.  Amen – how true!  The study also says that companies who are the most sophisticated in using Analytics are moving ahead faster than those who aren’t as adept.

“… Year-to-year comparisons reveal that the more sophisticated users are expanding their deployment of analytics and widening the performance gap over their peers.”

The post went on to say..

… six areas distinguished Transformed (adept) organizations the most:

  • Ability to analyze data — 78%
  • Ability to capture and aggregate data — 77%
  • Culture open to new ideas — 77%
  • Analytics as a core part of business strategy and operations — 72%
  • Embed predictive analytics into process — 66%
  • Insights available to those who need them — 65%

I think this gets down to watching what companies actually do, not what they say (that they do).  I’ve seen companies that feel they are “adapt” and a “data driven culture” that start and end with hype.  What needs to happen, I think, is access organizations by what they can actually do with their data, not what they claim they are doing.   If we did that, we’d find, I’m sure, that most companies are at the most beginning levels in the ability to use and learn from the data they are collecting (and I’m working on a new offering to address that need).

And I suppose one bright spot in finding organizations adept at using their data (shall I mention the word “big data”?) are city governments like our own New York City.  Take this example of how New York City uses their data, analyzes it, aggregates it, puts Analytics in it’s core business strategy, is open to new ideas (cite Mayor Bloomberg) and is predictive in it’s use of the data while making the information useful to  those who need it …

…. In the past year, Mayor Bloomberg’s Policy and Strategic Planning Analytics Team has launched successful analytics programs in three areas: fire risk, prescription drug abuse and mortgage fraud.

To identify properties with a higher-risk of fire death, the Analytics Team combined FDNY data with data on illegal conversion complaints, foreclosures, tax liens, and neighborhood demographics. They found that certain factors strongly correlate with fire risk, including multiple illegal conversion complaints, the owner’s financial condition, the year of construction, and socioeconomic factors of the neighborhood. The Team then used this analysis to create a risk assessment model, which it is now using to give enforcement agencies a weekly list of the highest risk properties with illegal conversion complaints, which are then inspected jointly by the Department of Buildings and FDNY.

Sure, the data analysis is made available to all who need it.    I think about the IBM / MIT study, but the words sounds big, general, till you apply them to a specific case – then it all starts making sense.    So the mature Analytics organization of the future , could actually be your city government, today. Wow!  I guess the same might be said for state and federal governmental bodies – but each has to be looked at in a case by case context.

So when you hear companies and various organizations “hyping” their Analytics capabilities – observe what they actually DO with the data, not what they’re saying about themselves, which, most of the time, is more hype than reality.

Seth Godin wrote a post on 6 questions/metrics to analyze a website with that I thought was quite good – though to be honest, a lot of work to implement, if you actually wanted to use these metrics to evaluate your business website, or any website.  He says …

  1. What’s the revenue per visit? (RPM). For every thousand visitors, how much money does the site make (in ads or sales)?
  2. What’s the cost of getting a visit? Does the site use PR or online ads or affiliate deals to get traffic? If so, what’s the yield?
  3. Is there a viral co-efficient? Existing visitors can lead to new visitors as a result of word of mouth or the network effect. How many new visitors does each existing user bring in? (Hint: it’s less than 1. If it were more than 1, then every person on the planet would be a user soon.) This number rarely stays steady. For example, at the beginning, Twitter’s co-efficient was tiny. Then it scaled to be one of the largest ever (Oprah!) and now has started to come back down to Earth.
  4. What’s the cost of a visitor? Does the site need to add customer service or servers or other expenses as it scales?
  5. Are there members/users? There’s a big difference between drive-by visits and registered users. Do these members pay a fee, show up more often, have something to lose by switching?
  6. What’s the permission base and how is it changing? The only asset that can be reliably built and measured online is still permission. Attention is scarce, and permission is the privilege to deliver anticipated, personal and relevant messages to people who want to get them. Permission is easy to measure and hard to grow.

So, who knows the “revenue per visit” of their websites?  I don’t – I don’t sell anything on my site – so how’d I figure that out?  Ok, say I did sell something, that would be one thing – but then the fun begins as I figure out how I’m going to calculate revenue after expenses. Since I don’t do anything special to get traffic to my site, how do I know what the cost of a visit is going to be?  The Viral component of my site messaging – that’s got to be fun to try to figure out. My best bet there is to sign up for BuzzFeed and hope for the best – ha!  Same with the other metrics such as cost of a visitor, and permission base – these are qualitative measures that must be accessed, they can’t can’t usually be measured directly, via automation, in any reasonable manor.  So, I have no issue with Seth Godin writing a post like this – but just for the record,  remember it’s extremely hard to apply the information Seth has provided.  Nuff said.

Meanwhile, SAS has an excellent video on it’s SAS Social Media Analytics – definitely worth watching this clip.  And I like the Social Media Today post on Hospital Analytics and Engagement because it explained what each tool used is for and put it in context that a hospital social media person can relate to (see example, below)…

Map My Followers – Geographic data of your Twitter followers.

  • Purpose: Shows you the location and geographic concentration of the users following you.
  • Specific To: Your Twitter followers.
  • Cost: Free
  • Ask Yourself: Is there a visible concentration of Twitter followers in my region? If not, what can I do to target a more local audience (hashtags, mentions of local events, etc)?



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>