Anonymizing sensitive data

Data masking is a strategy used to protect sensitive data in a dataset by transforming them into different data that maintain the coherence and consistency of the original set. A good data masking not only has to maintain data consistency and relations between tables, but also needs to replicate the same statistical distribution as the original source.

Also known as data "anonymization", "obfuscation" or "tokenization", seeks to generate realistic and anonymized datasets based on real production data, which can be used for alternative purposes such as analytics, test generation or AI training, all without compromising the security of the real data. Therefore, in order to keep the real data secure, the data masking process is irreversible; the user will not be able to get the real data by using the masked version.

Over time and thanks to technology, two masking strategies arised, Static Data Masking (SDM) and Dynamic Data Masking (DDM). Let's take a look at the differences between them.

Static Data Masking (SDM)

It is used for permanent data transformation. From your data warehouse, select all the sensitive data you want to transform and start the process.

The resulting data source is the same (it’s overwritten), but the original data is replaced with the masked data. This causes the lost of the previous information. This strategy can not be applied to production databases as it permanently alters the records.

On the other hand, it is a secure system as it does not store sensitive data. Thus, in the event of a cyber-attack, hackers will not have access to the confidential information.

At the execution level, in this masking strategy, transformations are carried out in advance to avoid affecting the performance of transactions.

In addition, it simplifies security tasks, thanks to the permanent replacement of data. There is no need to create a very detailed security plan on object level, since the most sensitive data have been replaced.

Dynamic data masking (DDM)

Dynamic Data Masking is used to transform sensitive data without altering the original source. This enables data traceability and above all, it allows different transformation rules to be applied to the original data source, as it is not overwritten.

This strategy can work in real time but it’s not very appropriate in extremely dynamic environments with high read and write usage, since the new masked data can be rewritten causing the database to become corrupted. Because of this, it is essential to implement new masking strategies that can avoid new data to go corrupt.

Comparative table

What is the best strategy?

As we have seen, no strategy stands out in absolute terms, it all depends on the needs of each company and other factors such as testing strategies, the nature of data, and the level of security. There are a multitude of tools to help you anonymize your data. In Gigantics we focus on security, consistency and speed. Thanks to the use of artificial intelligence, by implementing an SDM strategy, we can both help companies minimize the time it takes to create new datasets and provision non-productive environments in just a few minutes. Therefore, companies would only have to deal with the creation of security protocols for production databases, minimizing the risk of a security breach that could originate from other environments.

If you found this article useful and would like to get more information on how data masking can help your company improve the security of your data, please get in touch with us. We’ll be glad to help you.