Skip to Main Content

Sep 9, 2008 | 2 minute read

Pluribo: Natural Language Data Mining

written by Linda Bustos

I'm really impressed with Pluribo, a Firefox plug-in that summarizes Amazon reviews (currently only for select categories).

How Pluribo Works

In a nutshell, Pluribo collects millions of reviews from Amazon and other review sources, scanning the text and pulling out phrases that express consumer opinions like "easy to install," or "I was disappointed by the battery life" using natural language data mining. Pluribo calls this "sentiment analysis" and even assigns a numerical score to various features about a product, so long as there are enough reviews on each feature to be statistically significant. Not only that, the algorithm favors reviews that have been voted more "helpful" by other customers and filters poorly voted and redundant reviews.

Here's an example of a Pluribo summary on Amazon:

Despite reservations with the low battery life and size, reviewers enjoy the video mode, sharp zoom, and large display. If you don't care about the battery life and size, it's a decent option.

Pluribo compares extracted feature scores for a product against others in its category and presents them in a visual way:

And hovering over the underlined features in the summary produces something like this:

Here's a bit more about Pluribo, the company. If you have the budget, you might want to contact them about developing similar technology for your site's reviews or product descriptions? Hopefully they will release an API - I'd love to see what kind of tools can be built from natural language data mining and summarization technologies!

Patent-pending summary technology

Pluribo faces the challenge of continuously aggregating a massive quantity of opinions, mining their textual content, extracting key statistical trends, and summarizing the results in natural language. What's more, we are committed to doing this in a fully automated way, which reduces human bias and allows for greater speed and scale.

To meet this end-to-end challenge, we developed a novel suite of natural language data mining and summarization techniques. These techniques are encapsulated in our summary engine, and are covered under a U.S. patent application. Given the right data, the summary engine can rapidly summarize opinions on nearly any topic. At present, we are applying the summary engine mainly to product review data from If there is demand, we may soon open our API to developers working in other topic areas.

Make sure to check into Get Elastic on Monday, I'll be posting more about customer reviews and how to optimize them for usability and conversion. Don't miss it!