Snappy our cat Mascot, shows how Snaplet transforms private information into mock values so that you can safely code against production-like data.

What the FAQ is PII?

We demystify the term PII, and show you why 'personally identifiable information' or PII is a matter worth considering.

Here at Snaplet, life is all about PII. In fact, our whole team is obsessed with it. I'm not talking about the lobster pie that Snappy (head of naps and smiles) loves so much. What I'm referring to, is a whole different recipe, and if not handled with care, could leave a bad taste in your mouth. So what is PII then? Let's delve in.

Definition

In today's digital age, the term "PII" is more than just an acronym; it's a crucial aspect of our online existence. The term PII is short for Personally Identifiable Information, also known as personal data or private information. Thus, when anyone talks about PII, they are probably referring to private information. Other terms include sensitive data, private data or sensitive information. PII generally refers to any information that can be used to identify a living person. This could be as straightforward as a name but also include abstract concepts like IP addresses, geolocation, biometrics, behavioural data and even online identifiers such as cookies.

In datasets, common PII fields include names, addresses, emails, passwords, and various identification numbers like passport, driver's license, and credit card details. It can also encompass sensitive personal attributes such as race, age, gender, job position, workplace, and educational history.

Data has become a permanent fixture in most companies. Large amounts of personal data is stored daily–be it on a local server or somewhere in the cloud. Something that, on its own may not be sensitive information, could turn into PII as soon as a secondary piece of information is made available that could identify a certain individual. Companies collect seemingly unimportant information that, when not handled with care, could end up in the wrong hands. Some companies even sell data for large amounts. Criminals can use personal information to set up fake accounts, steal proprietary information, commit fraud etc. For those individuals whose information is being used, stored or sold, this is worrying.

To protect these individuals, the European Union (EU) has taken measures to define and protect personal data. The General Data Protection Regulation (GDPR) came into effect on May 25, 2018 and impacts anyone that uses personal data of EU residents. The GDPR seems to be the gold standard since it protects the data privacy of EU citizens and residents no matter where in the world the company using that data is located. Since then, various countries have set up similar legislation, such as the California Consumer Privacy Act (CCPA) in the United States, the Lei Geral de Prote o de Dados Pessoais (LGPD) in Brazil, and the Protection of Personal Information Act (POPIA) in South Africa. These laws impose fines for non-compliance and data breaches.

Snappy, cute Snaplet cat mascot, dreams about pie and loses his PII (personally identifiable information).
Pie VS. PII

Why do we care?

As a web developer, app creator or product owner, you're aware that some traces left behind by users of your product, could be sensitive in nature. Security and legal compliance become increasingly important if these traces can be used to identify individuals. Since production-like data (as close to production as possible) is needed to effectively build and test apps and products, privacy issues come in as early as research and development. Without Snaplet, cleaning up sensitive data is a tedious task that requires a lot of maintenance. It is also difficult to manage as databases grow. Because it is such a schlep, some software developers copy their production databases to their personal laptops. As a developer you will probably be acquainted with the term "PG dump" or simply "database dump". This is a slow and painful process that involves large amounts of data being downloaded.

You may ask: "Why code against production-like data then?" There are several advantages to coding against an environment that closely mirrors your actual production system. For example, more accurate and realistic testing scenarios and providing real-world issues and edge cases that might not be apparent with synthetic, fake or sample data. In the end, these benefits contribute to the overall success and reliability of software applications.

How we help

We know that, like us, most developers really take data privacy issues seriously. We also know how annoying it is to waste time on maintenance tasks and work that doesn't include actual coding. You know, NOT doing what you're actually getting paid for. This is exactly why we created Snaplet!

Snaplet is a snapshot tool for developers to copy your PostgreSQL database and transform personally identifiable information (PII) so you can code against production-like data and share that data with your team. To put it simply, a Snaplet snapshot is an easier, safer and better database dump.

How? Snaplet automatically identifies and anonymizes(transforms) private information into mock values, giving you production-like data without any privacy concerns. In addition to making your data safer, we also have subsetting, which allows you to get a smaller data sample filtered according to your specific coding or debugging needs. Isolated data samples make it quicker to restore and allow you to reproduce specific bugs between you and your team mates. These snapshots contain a transformed copy of your database in the cloud, which means you can test safely from your local development machine withoutworrying about data privacy. That is how we take the schlep out of data anonymization. Now that’s easy as PII.

If you are tired of maintaining seeds scripts and wary of the old "dump", then try out Snaplet and see how coding against production-like data is the best way forward.

Almarie Stander
June 22, 2021