Snappy, the Snaplet cat Mascot struggles to choose between copying a production database to their local machine or writing a seed file. They choose Snaplet instead.

The blue pill or the red? Choose Snaplet to generate synthetic data to seed your database.

We explain the pros of coding against de-identified production-like data and provide six reasons why you should be using Snaplet to do it.

Here at Snaplet, we like to work smarter, not harder. Not only does it make us more productive, it also gives us more free time to watch funny cat videos. Snappy loves a good laugh. There are many tools for developers that make it easier to code, but until now, relatively few tools make working with data easier. The following memory of Snappy will illustrate what I mean.

Snappy was building a new feature for CATCH, a "one-of-a-species" dating site for cats. They had a good idea of the code they wanted to add, but they didn't have very good test data. They created a few "asdf123 qwerty" entries in the database, but what they really wanted was representative data. Before Snaplet seed and Snaplet snapshots, they were given the option to choose between two pills; The red pill would require them to download their entire production database to their local development machine. I am sure you have heard of 'database dump' somewhere. The other treatment (blue pill) required them to write a seed script.

Both remedies were quite inadequate and had some serious side effects; for example, dumping production data to a development environment (postgres pg_dump) was quicker than writing seed scripts. But creating a snapshot pipeline was difficult to do. To set up a snapshot pipeline they needed a senior engineer with infrastructure and broad product knowledge. Since it was a direct copy of the production database (that lacked sophisticated data anonymization) the snapshot was slow to download and expensive to store because of the size of the original production database. What's more, the original production database contained PII (personally identifiable information), which meant that they needed to de-identify sensitive data values manually. This was very time-consuming.

The blue pill was considered the longer, "safer" route: Seed scripts were hand-written files that made assumptions about the shape of the data. To this day, seed scripts are hard to create and what is worse, they need to be constantly maintained over the whole life-cycle of a product. (It takes about two to four hours per week to create and maintain seed scripts for a medium-sized team). Needless to say, on several occasions, the assumptions were inaccurate. It worked on one machine, but threw out an error on another. Also, it was tiny representation of the real size of the data, which meant BUGS!

Usually, by this time, Snappy would be finding refuge in their Kattens No. 1 Heubii Cat Cave. Not only was writing seed scripts a schlep to do, it totally destroyed Snappy's productivity and ruined their usually easy-going temperament - another bitter pill to swallow.

Even today, most software developers code against unrealistic data and/or they make assumptions about "live production data" which then lead to bugs, wasted time and inaccurate features.

That is why Snaplet was created. Snaplet is a composable data generation- and database anonymization tool for software developers. This means there are many ways one can use it. You might want to start off with Snaplet seed and later need to use a snapshot. With Seed, you can populate your database with deterministic mock data, without you having to write any scripts. This is great when you quickly want to generate seed data in order to get a new project off the ground fast. Snaplet snapshot copies a sample of your production database, automatically obfuscating sensitive information, making it safe and easy to code against production-like data and share this data with others.

So what makes Snaplet outperform seed scripts or data dumps?

1. Safer working environments

Snaplet detects personally identifiable information (PII), and then allows you to easily transform the data so you don't have any sensitive information on your dev environment.

2. More time to do what you're actully getting paid for

Snaplet does most of the work. You don’t have to write seed scripts and/or waste time on maintaining them. Subsetting reduces time spent uploading and downloading snapshots.

3.Fewer bugs

Working with production-like data creates a better development environment. Subsetting allows you to recreate specific bugs with filtered snapshot samples.

4. Collaboration

You can share your database snapshots with your team members.

5. Branching

You can work on different features and bug fixes at the same time without losing context because Snaplet creates isolated databases for each feature or bug-fix branch.

6. Self-service workflows

More independence as developers don’t need to wait for infrastructure support. This also reduces time spent on onboarding new team members.

It took some loving coaxing and dancing around with a catnip branch to get Snappy to hesitantly emerge from the cat cave. After demonstrating the Snaplet solution, showing them how Snaplet saves time, reduce bugs, allows for collaboration and branching, we had Snappy back on the keyboard ready to get into coding flow. It was finally time for Snappy to ditch the pills and take their first Snapshot. A few simple steps for mankind ... but a major leap for Snappy.

P.S. Let me tell you why you’re here. You are here because you want data. What you want, you can’t explain. But you feel it. We don't know the future and we didn't come here to tell you how this is going to end. But we came here to tell you how it's going to begin... With a '@snaplet/seed'. Go ahead, create your own Snapshot.

Also see:

Almarie Stander
June 22, 2021