Posts in Test Data
Using production data
Data selection is the first step down the road of data-driven testing. You'll need to select the data that either drives the navigation of your application, represents the data that gets entered into your application, or both. One way to select data for testing is to use production data.

Although you shouldn't rely solely on this type of data, it can be one of the richest sources of scenarios for automated testing, both because the data is representative of real scenarios the application will face, and because it will most likely provide a high number of different scenarios. You can load the data straight into the test environment, read it into data files for processing later, or read it in real time and convert it as you use it.

Production data is also an excellent source for parallel testing. If you use production data in the system you're developing, you'll quickly know if that system works like the system in production. This technique can especially help in finding problems with floating-point values, conversion ratios, and lengths associated with data types.

There are some caveats about using production data, however. Production data will most likely not contain many of the special cases you'll want to test for, and it's not a replacement for well-thought-out test scenarios. There are also potentially some legal issues surrounding the use of production data. Especially if you outsource some of your testing, you'll want to be sure to check your company's policies on the use of production data; if no formal policy exists, consult someone in your legal department. Even if you can't directly use production data, odds are you'll be able to change some values (names, social security numbers, and such) and use the rest of the data.
Selecting data based on availability
Data selection is the first step down the road of data-driven testing. You'll need to select the data that either drives the navigation of your application, represents the data that gets entered into your application, or both. One way to select data for testing is to select based on availability.

This could include:

  • Production data that's in an easy-to-access format

  • Data from past iterations

  • Spreadsheets used by manual testers for your project

  • Data from other projects or teams in your company

  • Data from some data generation source



The idea here is that if the data is easily accessible, as well as usable and meaningful, including this data in your testing can save time and money. I emphasize usability and meaningfulness because it's important that you don't select data just because it's there and ready to be used -- it may be bad data.
Selecting data based on requirements
Data selection is the first step down the road of data-driven testing. You'll need to select the data that either drives the navigation of your application, represents the data that gets entered into your application, or both. One way to select data for testing is to select based on requirements.

When you select data that will allow you to test a requirement, or a set of requirements, look for data that will allow you to exercise feature sets, capabilities, and security features in your application:

  • If your application has different roles, what data would you need to exercise each role?

  • What features do you want to include in your test coverage and what data do you require to use them?



In addition, consider what impact the various target deployment environments will have on the elements to be tested. Your list of selected data should include data for both the application under test and the target environment(s). What data will you need to test the following?


  • Hardware devices

  • Device drivers

  • Operating systems

  • Network and communications software

  • Third-party base software components (for example, e-mail software, Internet browsers)

  • Various configurations and settings related to the possible combinations of all these elements

  • Internationalization



Selecting data based on risk
Data selection is the first step down the road of data-driven testing. You'll need to select the data that either drives the navigation of your application, represents the data that gets entered into your application, or both. One way to select data for testing is to select based on risk.

When you identify risks, you consider what can go wrong. You're looking for the events that might occur that would decrease the likelihood that you'll be able to deliver the project with the right features and the requisite level of quality on time and within budget. There are three ways to categorize risks:

  • By the impact of the risk -- the deviations of schedule, effort, or cost from plan if the risk materializes

  • By likelihood of occurrence -- the probability that the risk will materialize (usually expressed as a percentage)

  • By risk exposure -- the impact multiplied by the likelihood of occurrence