Managing sensitive customer data in development environments
Our team in Bath have recently been working on a data anonymisation project and found some really interesting insights, that as a team we wanted to share with the wider development world. We’ve always been passionate about sharing, and that’s part of the reason we set up the Magento Developer Conference – Mage Titans. So here’s our findings, from one development team to another, written by our Technical Lead – Nick Jones.
To ensure that a developer can efficiently work on a project you need to ensure that their local development environment is as close as reasonably possible to the production environment. A key part of this environment is the database, an export of which can contain a myriad of really sensitive information. Data like a customer’s date of birth or home address should be treated with great care. Importing a database containing this information locally is at best a bad idea and at worst a serious breach of all kinds of best practice and regulations.
As a potential data processor (as far as the ICO is concerned) we have certain rules that we need to follow when handling any kind of sensitive data from our direct clients. The ultimate way to handle the data appropriately is never to have to handle it all! When taking all of this into consideration, our developers decided to take this into their own hands, and create something to fill the gap, help development teams, and ultimately protect our clients and their data.
MageDBM (Magento Database Manager) is a Magento extension built by our development team. It was originally developed for Magento 1.x, and has been rewritten from the ground up to support Magento 2. it’s purpose is to take scheduled backups from the production database as a mechanism for developers to easily pull in a database to their local environment. MageDBM is a tool that runs on the production database server and a developer’s local environment. On the production database server, when creating the backup, we added the ability to specify a list of tables that should be removed from the export. This means that a developer can easily get a database containing (arguably) the most important data: catalogue and configuration. This will, hopefully, allow them to identify, fix and test bugs locally with confidence, without needing to push a release up to a UAT or staging environment.
The database backups themselves are stored in an Amazon S3 bucket. The design of the permissions system in the AWS ecosystem means that fine-grained access to the files is achieved and the data is protected from third parties.
Our tool provides commands for displaying the backups available, pulling a backup down or pushing a backup up to the bucket. Storing this on S3 means that we don’t have to grant production SSH access to team members or third parties, certainly increasing the security of the production environment.
If the aim of a development environment is to be as close to production as possible, MageDBM was lacking in one aspect: it couldn’t help a developer reproduce issues relating to the volume of data in the database. A side effect of stripping out customers is that we lost 200k rows from our database, and fixing bugs around that fact become difficult. This quickly led to the classic developer “it worked on my machine” scenario where an inefficient database query doesn’t look quite so inefficient running against only 10 database records.
In our latest release of MageDBM, we’ve added the ability to not only strip database tables from the database export, but to anonymise specific columns in those tables. This lets us pull down 200k customer accounts, and their orders, without knowing the names, addresses, or genders of those customers, nor to where their orders were shipped. This lets us test out those scalability issues locally without compromising the data of our customers, nor exposing us to the larger responsibility of being a Processor of sensitive data.
Every Magento store is different and every store is likely to have some project-specific database tables holding mildly sensitive data. We’ve supported developers of these stores by allowing the configuration of MageDBM to be committed to the project repository alongside the Magento codebase itself. The following example will replace all entries in the api_key column on the api_users table with a random SHA 256 value:
anonymizer: - name: api_users columns: api_key: Faker\Provider\Miscellaneous::sha256
This means that developers can start being more aware of data protection by committing changes to the MageDBM anonymisation configuration alongside the installation of new Magento extensions, for example. All of the sensitive data in a typical installation is already covered by the in-built rules that ship directly with MageDBM.
The introduction of MageDBM into our Magento 2.x project development toolset is a win-win for developers and our customers. What steps are you taking to project your customer data? Is it time to get MageDBM baked into your pipeline too?
If you want to talk development and share knowledge and best practices, you should definitely be attending or speaking at the Mage Titans event! See all the details here.