You’ve seen it before. Large, unwieldy spreadsheets with an arbitrary user ID in the leftmost column. Will the data related to those scenarios still be intact across the 15 interrelated in-house systems and third party stubs next week? Tomorrow even? Maybe.

Test data spreadsheet
A typical test data spreadsheet

Test data ‘managed’ in this fashion quickly becomes outdated. Perhaps there are multiple teams working with the same testing environments. What if someone in Team A doesn’t know someone in Team B was using USER_ID 00000097 and changes it? What if someone accidentally wipes the database? All manual testing and reliant automated tests come to a standstill.

There is surely a better way.

Automated test data conditioning

Hear me out:

  • Test data is procedurally generated. No more ‘what makes this happy path scenario a happy path scenario?’. It’s right there.
  • Version-controllable. Data changed that shouldn’t have? Want to see what the system was sending back 6 months ago? It’s all there.
  • No massive databases full of trash (unless you want there to be!). Test data generation can be done on the fly at any time. Want to test a very specific scenario? No more looking in spreadsheets for a specific user ID and praying that the data is still there. Someone messed with your data? Quickly recondition it to a known state.
  • Agnostic of your mocking mechanism. This just spits out data. How you send that data back is up to you now.
  • You can hook this up to your automated tests. Your Given step now generates and conditions the test data for your specific scenario.
  • In addition, this strategy allows for multiple user traits to be conditioned with ease. I want a user with traits X, Y and Z. Find that in your spreadsheet!

People want to provision virtual machines. They want to provision architecture. But they don’t seem to want to provision test data. I have done it, and it works.