skip to navigation

Data Masking Overview

Protecting Sensitive Data

Sensitive data is everywhere in our information systems. Today, organizations are required to protect sensitive data; these requirements come from regulations the organization should meet and the necessity of the organization to protect its own data and customers.

Description: Data Masking Security Policy

The sensitive data is a target for many threats, ranging from outsider attempts to penetrate the organization's internal networks to insider attempts to access restricted information.

Protecting the production environment is a well-known challenge, as well as a complex and a resource-consuming task. Over time, various methods were introduced to control the access to the sensitive data. Data encryption, database access control and the organization procedures allow us to successfully meet the requirements for protecting sensitive data in the production environment.

A Potential Breach

There is, however, a significant security breach that is usually neglected when protecting the sensitive data. These are fully replicated databases (or substantial subsets) routinely copied from the production environment for development and QA purposes. These environments are not as protected as the production environment, but since they contain the same sensitive data, they can be the target for threats.

These environments are exposed on a daily basis to developers, database administrators (DBAs) and QA personnel. These are usually employees of the organization, but occasionally employees of other companies such as software vendors and outsourcing companies, and they all have access to these environments. Moreover, even if the organization trusts these employees, the goal of protecting sensitive data is to limit the people that are allowed to see it freely to the minimum. There is no doubt that developing software and testing it does not require access to sensitive information.

Accessing the entire data in the non-production environments is essential for the development process; therefore solutions for production environments, such as encryption and access control, are irrelevant for these environments.

It should also be noted that for these reasons, there is a world-wide increase in regulation dealing specifically with sensitive data in non-production environments.

The Solution - Masking

The solution for sensitive data in non-production environment is simple - it should not exist in them. But how do we eliminate the sensitive data while preserving the functionality of the system?

The answer is data masking, a process which results in data that is not real. Common examples are a masked pattern (like 'X' signs instead of parts of a credit card) or valid data which is not correct (such as a white credit card number, or a credit card number that doesn't match the expiration date and owner's ID).

Each data masking project requires detailed planning, a deep understanding of the organization's needs and database structure and an exact definition of the organization's sensitive data.

There are two main approaches in data masking solutions:

  • Online Masking Tools - these tools mask the traffic between the database and the end users. When users query data, the tool masks the data on the fly, before it is sent back to the user. With these solutions, the user sees only masked data, while the sensitive data still exist in the database.
  • Physical Masking Tools - these tools physically replace the sensitive data in the database with the masked data before the environment is available to the users. With these tools, the non-production environment does not contain sensitive data at all.






Online Masking

Online masking is easy to implement, however, it has some significant disadvantages over physical masking:

1. Usually, online masking solutions protect a specific interface to the database. Using the specific application or network database protocol will result in masked data. But a direct access to the server (such as ssh or terminal services) will allow the user to access the original sensitive data.

2. The data itself is physically unchanged. Non-production servers are usually less secured than the production ones. A physical disk copy will contain the sensitive data.

3. Since non-production servers are usually less secured, there is a chance a user will have an administrative password and will be able to turn off the online masking solution.

4. Most online masking solutions mask the data from the database to the user, but not from the user to the database. This scenario is easy to bypass. Assume a user would like to know the salary of his colleague, which is masked. Since the data from the user to the database is not masked, he will be able to query the row from the HR data which contains his colleague name and a specific salary (which is only a guess). If he gets a row back, the row will contain the masked salary, but the fact that a row is returned will tell him that his guess was accurate.

For these reasons, IT security specialists agree that online masking is not an effective approach when protecting sensitive data in non-production environments.

Physical Masking

Physical masking means a real data change in the database. Changing the data can be done in two ways:

1. Internally - the data is changed in the database itself

2. Externally - the data is extracted and is masked during the reload back into the database

External masking is usually done by ETL and Archiving solutions that have been adapted to enable scrambling of data. In this process, the data must be extracted from the database, transferred to the scrambling software, scrambled and then returned to the database. This can be done from production to non-production, or from the non-production to itself.

The main disadvantages of external masking are:

1. The masking is usually a slow, complicated process, from implementation and customization of the scrambling software to allocation of processing power and extensive storage.

2. The process requires extracting data (usually from the production environment) and loading it to the non-production database. This eliminates the ability to use any method to replicate the data (methods that are much more efficient such as database backup and recovery, storage replication, etc.).

Internal database scrambling is much more efficient. Copying the database can be done using any method available, while the masking process runs in the database after the copy is completed. The masking process can make the most of the database's processing power, save storage and achieve the desired results much faster!

JumbleDB is a complete data masking solution for the enterprise, but is still simple to use and manage. Read all about JumbleDB..