Last week during Sean’s series on Survey Design, he alluded to the difficulties that can come up when analyzing text-based, open-ended questions. The main issue with these types of questions is that the responses can vary wildly in length and content, and might include spelling and grammatical errors. Furthermore, even when all of the spelling and grammar are correct, submissions might have jokes, sarcasm, or other subtleties where a larger context is helpful to understand the true meaning. All of these issues combined make text-based, open-ended questions very difficult to analyze and glean insights at scale. Humans are pretty good at interpreting qualitative data, but it is usually prohibitively expensive to hire people enough people to correct and classify large amounts of text.
Computational Text Analysis
Understanding written and spoken text is such a common task that a very rich and deep field of study called natural language processing (NLP) has developed around it. This week I’ll cover a few techniques that are used to overcome some of the issues I’ve outlined above.
We will just scratch the surface of text analysis this week. NLP has been an active area of research for decades, with some particularly exciting, new developments using “deep learning” techniques—some of which you might interact with every day, like the automated assistants on your phone or devices in your house.