What happened on Prisma Day? Scott Chacon talks about Snaplet's data masking tool

Scott Chacon demos how to de-identify your data and get a subset of your production database for local development with Snaplet

Here at Snaplet, we like to say, what happens on Prisma Day does not stay on Prisma Day... even in these uncertain times, sharing is caring, albeit only by "word of virtual mouth". So here you go.

Last week, Snaplet cofounder, Scott Chacon, presented a talk on Prisma Day to demo Snaplet and show how easy it is to use our database snapshot tool. Check out the talk below, or skip to the transcript to read what he had to say:

Transcript of the talk

Working with databases in development is hard

"Thanks for having me, I did want to talk a little bit today about databases. I find databases very interesting in the software development life cycle because there’s something that we don’t have a lot of developer tools for, I think. I was one of the co-founders of GitHub. I spent a lot of time thinking about the development life cycle and developer tools. And, one thing that I’ve worked at, in every company that I’ve worked at including GitHub, is how difficult it is to work with databases in development. Most of the time, they go down one of a couple of different routes, either we do seed data which I can demo real quickly:

Keeping it close to production data is harder

So, if I was working on a Prisma project, for example, in Redwood, I would reset the database or I’d start up a new one, and I’d run a seed on it and I would say, “Okay, great.” And then the problem with this is that it’s very difficult to keep this close to production, to keep it close to the data you really want to. It’s impossible to debug problems. It’s usually a minimal set of data, so you’re not dealing with a lot of performance issues, and it’s very hard to keep up-to-date. I don’t think Ive ever been, spent any significant amount of time on a team that could keep that up to date in a realistic manner and have data that as really valuable and for people to work with.

Copying a production database to your laptop is hardest (and not a good idea)

On the other hand, you can download a copy of a production database which has happened in every company Ive worked at, including GitHub, and run it on a local developer laptop. And that’s bad for lots of reasons too right? It’s really big, it’s really slow. In a lot of cases you can debug data, but you have personal identifiable information on everybody’s laptop and that’s really not ideal.

Our solution: make working with development data easier

One of the things that we’re trying to solve, me and Peter Pistorius, he’s one of the core developers on Redwood, is to help make working with development data a little bit easier. And so we’re doing that with a project that we’re calling Snaplet and so I what I want to show you here is sort of an example of what that might look like. So I did a quick demo of just kind of something that normal people would run into [when] working on something.

Snaplet makes working with databases as easy as working with code

So, if I have seed in my database and I run my Redwood project here, you can see, I have a couple of customers in this sort of video rental thing. It’s really, it’s really beautiful, I think. You can all, everybody can appreciate the beautiful design that went into this, mostly on my part here, but we can see we have this development snapshot, everything looks fine. The video’s returned. If we actually go to production-so here’s our production-somebody told us this problem, that this person has both returned and not returned. How does that work right? And it’s really difficult to figure that out because I can’t... You can shell in somewhere and run SQL commands, and try to figure out what does the data look like. You can actually look directly at the database, but actually to really debug this is difficult. And I think everybody’s really run into a problem like this.

So what I am going to do is I’m going to demo what Snaplet looks like. Essentially, what Snaplet does is, you connect to your database. We will pull it down, we’ll clean any information that we need and then you can get a copy of that. We make a snapshot of that every day and you can pull down a cleaned sort of version of that database and load it into your development environment, and use that in a way that is easy, and fast, and not dangerous. And so, if I here, type 'Snaplet restore', what it will do is, pull down a snapshot of the development (of the cleaned version of the production database) and restore it locally so that I can work on it in a safe manner, and so I'm going to back up my database that I was working on. So that I have a copy of it that I can go back to. And it says that it restored the snapshot.

So now, if we go to production, before we didn’t really have, we only had a handful of customers, now we have essentially all of the customers that we have in production, except that they’ve all been anonymised. And so I can actually take the same customer ID that we have in production here and now I can pull it up here and you can see that instead of Scott Chacon with my email address, don’t email me, I’m just kidding, you can email me, we have Betsy Hayne with a completely made up faker email address, and we can go through in Snaplet and we can decide for any of these what table do we not want to include. If they’re really big log tables, we don’t have to pull them down, saving us some time and space, we can replace, we can say what are PII sort of suggestions, and we can say, replace the city with sort of fake cities, and things like that. In the next snapshot that we make, it will clean that data out.

So I hope that you’ve enjoyed this presentation. I thank you for your time and enjoy the rest of Prisma Day."

Quick recap:

Snaplet is a software tool that helps developers create de-identified Snapshots of their production database. Our tool helps you to anonymize or de-identify sensitive data (PII) in your database and create a snapshot of the anonymized or "mock" database before you restore it locally. This makes your local testing environment fast, accurate and above all, safe. Snappy loves safe.

Almarie Stander
June 22, 2021