December 22, 2016

Data Anonymization

What is data anonymization?

Information that may be used alone or in combination with other data to identify a specific individual is known as personally identifiable information, or PII.

Data anonymization is the process of removing sensitive or private information to secure the data of individuals. Data anonymization typically entails obfuscating, hashing, or masking personal information. Often data is concealed with fixed-length codes or with values that have been altered.

As a data analyst you may be asked to anonymize data at the end of your data analysis, or if you are acquiring a dataset for testing purposes, the data anonymization may have to take place before you work on it.

One of the most sensitive categories of data is financial and healthcare data. These sectors heavily rely on data anonymization methods. After all, there is a lot on the line. De-identification, a procedure to rid data of all personally identifying information, is therefore typically employed on data in the financial and healthcare industry.

replace this image

If your neighbor had access to your personal information as shown on the list below, would you be fine with it?

Personal phone number and private mailing address, your email address
Your full name and date of birth, your birth location, your mother’s maiden name
Social Security or Medicare numbers
Your health records
Personal Photographs or worse, your kids’ photos
Your bank accounts or other personal accounts
Your purchases, your customer numbers

Types of data anonymization

In order for data subjects to be no longer directly or indirectly identifiable is known as data anonymization There are and is divided into five types of data anonymization operations: generalization, suppression, anatomization, permutation, and perturbation.

Data Generalization

Generalization: the intentional removal of some facts to make it less recognizable. Data can be transformed into a range of values that are broad. For example, omitting certain digits of a credit card or removing the number of a street address.

Data Suppression

Data suppression entails erasing all of the information, which is frequently present in publicly accessible reports and files. Thus, confidential data is hidden from view through data masking or encrypted in a database or file.

Data Anatomization

The slicing algorithm aids in the maintenance of correlation and utility while the anatomization technique minimizes information loss, which reduces data dimensionality and data loss.

Data Permutation

Data permutation is a method for rearranging attribute values in a dataset so that they don’t match the original records.

Data perturbation

Is the process of marginally modifying the original dataset through the use of methods that round values and introduce random noise. Anything that interferes with data transmission or communication by degrading signal quality is considered noise in this context.

Data Anonymization

What is data anonymization?

Data Generalization

Data Suppression

Data Anatomization

Data Permutation

Data perturbation

Related Posts

What is Big Data?

Open Data

What is Metadata?