Customers’ data is important. The number of privacy laws in recent years has grown from 20 to 100, to name a few: PCI compliance in the payment industry, the European GDPR regulation, and the Brazilian LGPD. All these new regulations attempt to bridge an old gap: data anonymity. How to handle data and protect the individuals comprised in it? Companies often face lawsuits to compensate for personal information breaches in their database.
Many times production data is copied onto test, QA or staging environments, only to be followed by exposure to the eyes of testers, receivers, or unauthorized developers on machines less protected than production environments. It is not seldom for files also to be shared with external partners, who often require but a small part of the data transferred, and granting access to user’s data might be a breach. If in one hand sharing data is both necessary and inevitable, on the other technologies that assure the privacy of individuals details are no longer only desirable, but essential.
In this talk, we will approach two important topics: how to manage data whilst securing users’ personal information and how to do it in machine learning models. Exposing different techniques of anonymization and pseudonymization (k-anonymity, differential privacy, and others), showing that solving the anonymity problem is much more complex than replacing names, last names, and social security numbers.