Text Analysis: Word Clouds

This is part 4 of our series on Text Analysis, previous segments are available in our archives.

Once you have calculated a metric to rank the importance of the words in your survey responses, how do you visualize them? For the most accurate representation of the ordering of the terms, I suggest a bar chart or a Cleveland bar chart1 – that way you can see exactly how much each term differs in relative importance.

Word Clouds

However, if you aren’t too concerned with the exact ordering of terms and just want to get a quick glance at the most popular terms, a word cloud is a great alternative. A word cloud allows you to show the higher-ranked terms in larger text and a different color than lower-ranked terms.

I’ve created two word clouds based on the Data Drive Survey question “What Key Performance Indicators (KPIs) do you use to run your business?” The first word cloud is created by weighting the terms by term frequency only.


The most frequently used term was “rate” with “revenue” coming in second. Note though that “etc” and “and” are very highly ranked (fourth and fifth, respectively). These are not very helpful words that you probably don’t want showing up as highly ranked as they are. By looking the data in this way, “revenue”, “conversion”, and “churn” seem to be the most important KPIs after filtering out unhelpful words.  

Now let’s see what happens if we weight the terms by tf-idf.


With tf-idf, “revenue” becomes the highest ranked term, while “rate” has dropped to third. Also, “etc” and “and” have dropped significantly in importance (dropping to 19th and 18th place, respectively), which is exactly what you’d hope would happen. A great demonstration of the how tf-idf works is “none”, which has become the second most important term. As it turns out, a single respondent wrote “none” as her / his response. And that was the only time “none” appeared in any response. Perhaps “none” is not the most helpful of responses, but it does very clearly demonstrate the impact of the tf-idf algorithm. After that though, this view gives us a better sense of KPIs with terms like “billable”, “monthly”, and “MRR” ranked near the top.

If you are wondering, I did this analysis and created the word clouds using R, though you can create word clouds using any number of free online tools. You can download the code I wrote from our GitHub account. Let me know if you have any questions!

Questions? Send any questions on data analytics or pricing strategy to doug@outlier.ai and I’ll answer them in future issues!


[1] I discussed both of these plots during my series of data visualization. If you missed it, let me know and I can send it to you.