Building Blocks for Your Social Data Integration Solution

In the previous two posts, I have presented a basic approach for enterprises to embark on the Social Media integration journey and took a stab at how to view the wide array of use cases through the lens of data domains and departmental functions. Ideally, a long-term solution – what we might lengthily call a Enterprise Social Data Integration Platform – should support all four categories of use cases and also all (as many as possible!) departments in the enterprise.

Venturing into system building territory, a basic functional flow will help in slotting the basic building blocks of such a solution. In essence, such a flow comprises:

Setting up listening and publishing posts on various social media: This is a continuous process of identifying hot spots of relevant activity in the social world and setting up a process to listen and participate in the most effective manner. It is interesting to note the trend of such connectivity needing to go beyond the top-3 or even top-10 social networks to very niche, highly-specialized discussion forums and industry blogs. Do you have a well-defined and maintained list of the top social media hot-spots for your company or even industry?
Collecting and optionally staging raw data: As the raw data comes in from various sources, it needs to be collected and fed into further analysis steps to start extracting value from it. Each source comes with its own standards and formats for the data as well as myriad other considerations like security credentials, API rate limits, automated agent limitations, etc. There are emerging standards like Activity Streams and OpenSocial but there is no broad mainstream convergence on any of these yet. So for the foreseeable future, the solution needs to work with all the low-level complexities of social data.
Cleanse and prepare the raw data for analysis: Once the raw data is collected, it is imperative to improve the signal-to-noise ratio before doing heavy-duty analysis on the data. Traditional Data Quality methods will have to be adapted to look at Entity extraction based relevance scores and removing data sets falling below a threshold value of relevance.
Generate Insights out of the raw data: Depending on the use case, specific text analytics/algorithms and business analytics routines will have to be applied to the cleansed data. Basic routines would include Semantic Analysis, Entity Extraction, Sentiment Analysis and Influence Analysis which would apply to individual “records” (an “activity” in the social world). These routines add annotations, if you will, to these activity records. These records and their annotations can then be summarized and sliced/diced depending on the end business use case. For some use cases, complex event processing will be required to quickly correlate events in discrete systems to find emerging patterns.
Most importantly, Evoke Actions: Of course, all the analysis in the world is useless, if you do not act on it (which does include getting to the conclusion that no action is required!). In the social world, action can manifest in traditional enterprise systems and channels in various ways like sharing content, designing more refined customer experiences, building communities and collaborating both internally and externally.
Enterprise Data Integration: This will be required to be integrated either as input or as a target for insights/enriched data. As you might recall from the data domains discussion, the most valuable use cases are in the intersection of enterprise and social data. As an example, for people related information, identity matching between CRM records and social handles provides very powerful capabilities to understanding them better. As is evident from this example, seamless integration with enterprise systems and social media is required for such mashup analyses.

Much of this flow applies to all use cases. Of course, you should close the loop eventually by feeding back inputs from each downstream stage to the listening stage!

The basic building blocks of such a solution can then be derived from this functional flow:

There is an interesting component – Information Lifecycle Management (ILM) – which is often an afterthought in such solutions, but shouldn’t be! ILM becomes vitally important in Big Data scenarios (of which the social data is definitely one). ILM helps in policy-based automated management of data lifecycle – creation, validation, integration, archival, deletion.

Do you see any important pieces missing in this list? What are the key issues / success factors that you see with these components?

In subsequent posts, we will look at some interesting issues in some of these components.