I’m quoted in the post (my post, that is) which I bolded, below.
Discussions on automated sentiment analysis “accuracy” are starting to border on the bizarre. In the past couple of weeks, I’ve read claims that SAS’s new tool can identify sentiment “better than most humans”. Just a few days later, I read a post this weekclaiming that ”sentiment analysis [is] best done by humans”.
At the heart of this ongoing debate (and confusion) surrounding automated sentiment analysis is the issue of “accuracy”– the degree to which software can correctly extract positive, negative, or neutral tone from text. Using “accuracy” as a criterion for useful sentiment analysis demonstrates a fundamental misunderstanding of what sentiment really is and what “accuracy” really means. Unfortunately, this misunderstanding has led media researchers and software programmers to search for ”100% sentiment analysis accuracy”, and distracted our industry from what its real focus should be– understanding how the media influences human behavior.
Automated sentiment analysis will never be accurate. Not 1% accurate, 50% accurate, or 100% accurate. To say that an algorithm or statistical model has “accurately” identified a piece of text as positive, negative or neutral requires that sentiment is a real thing in the text that can be correctly identified, like a person’s name or a product. The problem is that positive and negative don’t really exist on paper or on a computer monitor. The scientists and philosophers who study sentiment all agree that it only exists as property of the animal nervous system. “Positive” and “negative” are neurological states that evolved to helps organisms avoid stuff that can harm them or to promote behavior that’s likely to nourish and help them propagate. Sentiment is absolutely not something that exists “out there” in the world; it only exists in our perceptions of the world.
I have noted this several times in an entirely different context – Art – where so much is “subjective” anyway, and wrote a comment the post at Content Analytics (which I assume will be published soon) – here it is, anyway.
Great points and thanks for quoting my post on Sentiment Analysis best done by Humans – which I agree is the best way. What you observed applies in many spheres, including Art.
For example, I often note my “mental” and “emotional” state affects my ability to enjoy a museum or art gallery opening.
Sometimes, I’m in the mood (or, as you put it - I saw some stuff or felt some things before entering that were supportive and put me in a good state) and I get something tangible out of the experience.
Other times, I simply can’t focus – my mind is distracted by whatever else is going on in my life or people around me, and I can’t contain what I’m looking at – and I don’t enjoy the experience.
At such a time I might be temped to write a negative account or a say something “negative” about what I’m looking at, that under other circumstances, in another mood, might say something entirely different. [note: if we were to randomly measure response - it would be a "toss" where anyone is in their emotional and mental state at a particular time a comment or blog post was written].
So I think we come to the difference between sentiment and opinions that can be “swayed” by momentary considerations – and those that are more ingrained – core beliefs that are unlikely to change regardless of what my mood is or what I just read.
I came to an awareness that our “core beliefs” are less likely to be swayed by what we just saw or felt – and perhaps, what we need to add, if we’re doing Sentiment Analysis at all, is some measure (which we probably can’t get to without panel based data) of what the core disposition or beliefs are of the those stating an opinion online, and then how far aligned what they are saying is with that belief(s).
Posted by Marshall Sponder on April 18, 2010 | Link It
URL Link
HTML Link
BBCode Link
Trackback
I’m working on an reputation analysis of a international training organization who has expressed concerns about their online reputation. After pulling the data from Sysomos MAP and comparing the sentiment score against human scoring – I’ve decided that if you care about Sentiment Accuracy – it’s best to have humans evaluate sentiment.
Also noted that human scoring of documents has an additional advantage that never seems to be spoken about – the more “we” get involved in the output of our monitoring (i.e.: by devising scoring metrics and applying them to the data at hand) the more “engaged” and satisfied, I feel, we will be with the monitoring programs we devise.
I’ve been seeing this more and more, and have taken that to mean that community managers and analysts that are fully “invested” in the data they collect (they touch it, in other words) the more satisfied they are with what they are doing.
And, as a result of doing all this sentiment “scoring” we have more confidence in the results than if we let a machine program do it.
Which reminds me – not only did I present at the Sentiment Analysis Symposium last week but I also presented on Sentiment Analysis in London last month – and did a deck for it.
In the case of my client – i found about 22% of the sentiment around them was negative – but that was after I looked at everything – and that’s actually a fairly large amount of negative sentiment.
By the way, I’m a fan of using Automated Sentiment Analysis – problem is – there’s no standards around this and the current implementations and technologies are still too immature to handle many of the tasks Sentiment Analysis is used for.
Posted by Marshall Sponder on April 15, 2010 | Link It
URL Link
HTML Link
BBCode Link
Trackback
Here’s the entire set of pitches from the Sentiment Analysis Symposium this Tuesday – the speaker roster is directly below and then videos are set up in one playlist (instead of 15 videos one on top the next – which didn’t make sense to me to place that way – you can listen to all the videos or just advance to the one’s you want to hear).
There wasn’t any attempt to film the rest of the sessions (a real shame) and my iPhone isn’t really up to the task of doing anything more than what you see above – and I was sitting in the first row, to the right!
The rest of the afternoon had more to do with specific examples (industry segments) of using Sentiment Analysis and the issues around that for Pharma, for example – or Financial Services – where it’s quite challenging to read stories in Newspapers and figure out the real sentiment of them.
I think one thing I saw consistently is the feed quality – content mixed with ads, or content drift – stories all mixed up together that then gets fed into Sentiment Analysis Engine – that makes Sentiment Analysis next to impossible to do well, without significant effort to isolate the quality content in the feeds. I’m flashing back go Giles Palmer and I talking in London last week and how he spent 2 million pounds to build BrandWatch into what it is today – much of that money was spent on crawling the web and ensuring the quality of the data BrandWatch picks up – after all …. garbage in ….. garbage out.
The other thing, besides my talk in the afternoon – was the difficult of doing Sentiment Analysis using the Semantic Web – that paring and figuring out what is the topic, who is the recipient, what is the action and what is the sentiment of all of that …. very challenging and there are even several levels to this.
I think it’s neccessary to set people’s expectations on what is realistically possible and the time it takes to set up sentiment systems, plus the humans that need to make sense of all of this. After all, if you don’t have a large volume of comments to read through – a human, or set of humans is best (but even humans will disagree on the same content 15% of the time).
And … as one speaker said – it’s not so much the positive and negative that humans or sentiment systems disagree with – it’s the so called “neutral” content that is where so much is open to interpretation .
From my point of view – if you were scanning a document into Optical Character Recognition and you only got 75% of the characters right (the level of what we can expect with Sentiment Analysis Systems today) would you be able to figure out what the document says or means? Sometimes you could – but would it be worth the effort?
Well, it clearly is worth the effort if you have a large volume of data to go through – but the work involved in setting up custom lexicon and segments is daunting.
A good conference and I hope I get the opportunity to speak at the next one – whenever they hold it.
Note: Next time I’d like Giles Palmer or someone similar from Radian6 or Sysomos to come and spend a session explaining how their system takes raw data, puts sentiment overlays ontop of it and presents the information to the customer. I think this is step most people don’t understand including me – as somethings don’t make sense the way the vendors explain it – or leave the details vague.