Using Semantic Analysis for White Space Discovery

September 1, 2011

All we know is still infinitely less than all that remains unknown.
William Harvey

All we know is still infinitely less than all that remains unknown.
William Harvey

Analyzing social or text data for relevant insights is much easier when you have an inkling of what you are looking for. But what happens if because of the enormous volume of data, you simply don’t know what you don’t know. How do you begin to analyze data to surface important business intelligence when you’re not even certain you have anything of value? And how can this be done quickly and efficiently?

It’s Easier to Find What You Are Looking For, If You Know What to Call It

Using keyword or Boolean technology can be an effective strategy for analyzing data if you know what you are looking for, especially if you know the exact terms to use to conduct your search.  This type of analysis can be quickly set up and is relatively inexpensive to maintain.  But this approach is quickly overwhelmed as the volume of data begins to expand introducing variations on key terms.

So, why is Boolean logic ill-equipped for advance analysis of unstructured data, especially if you’re not entirely certain about the focus of your research?  There are a couple reasons.

1) Keyword and Boolean technologies presume you know all the terms consumers might be using to talk about your brand or product.  For example, using a monitoring tool that uses Boolean technology to analyze social media conversations around the company Target would require you to build an advanced query using Target and Store. Then, of course, you are relying on consumers to use both words together.  As you begin to add more and more expressions to extract on-topic text, the Boolean approach becomes more brittle. It’s very difficult to include and exclude every permutation based on keyword alone.

2) Keyword matching fails to disambiguate the meaning of terms.  There may be instances when you want to isolate conversations around the term “Comcast” but only extract themes related to customer service. Keyword analysis is unable to associate conversations based on meaning.

Surfacing What You Don’t Know

But if you want to conduct open-ended or white space analysis to surface emerging trends, unexpected insights, or associations, you need a technology that is capable of picking up weak signals and that is able to quickly deconstruct data. Latent Semantic Analysis (LSA) recognizes language features within large data sets and selects conversations for grouping based on their meaning. It’s able to accurately disambiguate a term that is used in multiple contexts. In other words, LSA is able to understand and accurately categorize conversations around “Crocs” the shoes and “crocs” the reptile.  This type of precise filtering of conversations gives you the ability to identify emerging issues or unexpected uses and applications of your product or service.

For example, we worked with a Consumer Packaged Goods (CPG) client to research social media conversations around their breakfast food products.  The purpose of the analysis was to better understand and manage their online audience but uncovered an unexpected insight into how their products were being used.  CI uncovered that consumers were combining the breakfast food product with Greek yogurt to improve the taste of the yogurt. Interestingly for our client was that some of their consumers were touting the health benefits of both products and sharing that information with others on popular message boards, such as Weight Watchers and Babycenter. If you’re interested, please read the full case study.

Semantic analysis is uniquely suited to conducting this type of open-ended analysis because it is able to isolate important attributes from groups of authors and reveal unique considerations and preference. This ability to identify unknown and unsolicited associations occurring through natural online conversations is of critical importance when you are analyzing social media conversations. Social media conversations represent consumer preferences, opinions and considerations expressed in real-time, using their own voice and language. Having the ability to accurately filter and categorize consumer language based on meaning and context will provide you with a much broader and deeper understanding of how consumers are talking about your products and service.