Keeping Big Data Secure: Should You Consider Data Masking?

Big data is a boon to every industry. And as data volumes continue their exponential rise, the need to protect sensitive information from being compromised is greater than ever before.

Big data is a boon to every industry. And as data volumes continue their exponential rise, the need to protect sensitive information from being compromised is greater than ever before. The recent data breach of Sony Pictures, and new national threats from foreign factions serve as a cautionary tale for government and private enterprise to be constantly on guard and on the lookout for new and better solutions to keep sensitive information secure.

One security solution, “data masking”, is the subject of a November 2014 article on Nextgov.com.

In the article, Ted Girard, a vice president at Delphix Federal, defines what data masking is—along with its applications in the government sector. Being that data masking also has non-government applications, organizations wondering if this solution is something they should consider for original production data should find the following takeaways from the Nextgov article helpful.

The information explosion
Girard begins by stating the plain and simple truth that in this day and age of exploding volumes of information, “data is central to everything we do.” That being said he warns that, “While the big data revolution presents immense opportunities, there are also profound implications and new challenges associated with it.” Among these challenges, according to Girard, are protecting privacy, enhancing security and improving data quality. “For many agencies just getting started with their big data efforts”, he adds, “these challenges can prove overwhelming.”

The role of data masking
Speaking specifically of governmental needs to protect sensitive health, education, and financial information, Girard explains that data masking is, “a technique used to ensure sensitive data does not enter nonproduction systems.” Furthermore, he explains that data masking is, “designed to protect the original production data from individuals or project teams that do not need real data to perform their tasks.” With data masking, so-called “dummy data”— a similar but obscured version of the real data—is substituted for tasks that do not depend on real data being present.

The need for “agile” data masking solutions
As Girard points out, one of the problems associated with traditional data masking is that, “every request by users for new or refreshed data sets must go through the manual masking process each time.” This, he explains, “is a cumbersome and time-consuming process that promotes ‘cutting corners’– skipping the process altogether and using old, previously masked data sets or delivering teams unmasked versions.” As a result, new agile data masking solutions have been developed to meet the new demands associated with protecting larger volumes of information.

According to Girard, the advantage of agile data masking is that it, “combines the processes of masking and provisioning, allowing organizations to quickly and securely deliver protected data sets in minutes.”

The need for security and privacy
As a result of collecting, storing and processing sensitive information of all kinds,
government agencies need to keep that information protected. Still, as Girard points out, “Information security and privacy considerations are daunting challenges for federal agencies and may be hindering their efforts to pursue big data programs.” The good news with “advance agile masking technology”, according to Girard, is that it helps agencies, “raise the level of security and privacy assurance and meet regulatory compliance requirements.” Thanks to this solution, Girard says that, “sensitive data is protected at each step in the life cycle automatically.”

Preserving data quality
Big data does not necessarily mean better data. According to Girard, a major cause of many big data project failures is poor data. In dealing with big data, Girard says that IT is faced with two major challenges:

1. “Creating better, faster and more robust means of accessing and analyzing large data sets…to keep pace.”
2. “Preserving value and maintaining integrity while protecting data privacy….”

Both of these challenges are formidable, especially with large volumes of data migrating across systems. As Girard explains, “…controls need to be in place to ensure no data is lost, corrupted or duplicated in the process.” He goes on to say that, “The key to effective data masking is making the process seamless to the user so that new data sets are complete and protected while remaining in sync across systems.”

The future of agile data masking
Like many experts, Girard predicts that big data projects will become a greater priority for government agencies over time. Although not mentioned in the article, the NSA’s recent installation of a massive 1.5 billion-dollar data center in Utah serves as a clear example of the government’s growing commitment to big data initiatives. In order to successfully analyze vast amounts of data securely and in real time going forward, Girard says that agencies will need to, “create an agile data management environment to process, analyze and manage data and information.”

In light of growing security threats, organizations looking to protect sensitive production data from being compromised in less-secure environments should consider data masking as an effective security tool for both on-premise and cloud-based big data platforms.