Data masking is a strategy used to protect sensitive data in a dataset by transforming them into different data that maintain the coherence and consistency of the original set. A good data masking not only has to maintain data consistency and relations between tables, but also needs to replicate the same statistical distribution as the original source.
Also known as data "anonymization", "obfuscation" or "tokenization", seeks to generate realistic and anonymized datasets based on real production data, which can be used for alternative purposes such as analytics, test generation or AI training, all without compromising the security of the real data. Therefore, in order to keep the real data secure, the data masking process is irreversible; the user will not be able to get the real data by using the masked version.
Over time and thanks to technology, two masking strategies arised, Static Data Masking (SDM) and Dynamic Data Masking (DDM). Let's take a look at the differences between them.
Static Data Masking (SDM)
It is used for permanent data transformation. From your data warehouse, select all the sensitive data you want to transform and start the process.
The resulting data source is the same (it’s overwritten), but the original data is replaced with the masked data. This causes the lost of the previous information. This strategy can not be applied to production databases as it permanently alters the records.
On the other hand, it is a secure system as it does not store sensitive data. Thus, in the event of a cyber-attack, hackers will not have access to the confidential information.
At the execution level, in this masking strategy, transformations are carried out in advance to avoid affecting the performance of transactions.
In addition, it simplifies security tasks, thanks to the permanent replacement of data. There is no need to create a very detailed security plan on object level, since the most sensitive data have been replaced.
Dynamic data masking (DDM)
Dynamic Data Masking is used to transform sensitive data without altering the original source. This enables data traceability and above all, it allows different transformation rules to be applied to the original data source, as it is not overwritten.
This strategy can work in real time but it’s not very appropriate in extremely dynamic environments with high read and write usage, since the new masked data can be rewritten causing the database to become corrupted. Because of this, it is essential to implement new masking strategies that can avoid new data to go corrupt.