Transaction data presents a challenge to businesses seeking to tap its full potential: The very quantity of data promises greater insight into customer behavior, while making it harder to identify the predictive elements. The clear benefits of transaction data in both risk and marketing applications make this a compelling problem for analysts.
Data Spiders is the direct result of intensive research geared to meet this challenge. Incorporated into service offerings available through Fair, Isaac consultants, Data Spiders is already helping a number of companies leverage their transaction data in a variety of applications, such as:
- Improve targeting for cross-selling and conversion
- Improving fraud detection
- Predicting online purchase decisions
The technology can potentially help companies in almost any application where better understanding of ongoing consumer actions can improve business operations. A recent study for a direct mail retailer, detailed later in this article and shown in Figure 1, shows the enormous potential of this analytic capability.

By adding characteristics generated by Data Spiders to the conversion model, such as the velocity of offers, the retailer was able to much more effectively target repeat purchasers. For example, at 50 % of pieces mailed, the traditional model captured 50% of the non-responders and 80% of the responders. The model with Data Spiders characteristics, by contrast, captured 50% of the non-responders but more than 93% of the responders.
"Transaction data provides a very rich resource for extracting information to predict consumer behavior in many performance areas," says Gary Sullivan, director, Analytic R&D for Fair, Isaac. "Data Spiders can help us analyze and model transaction data in a new application faster. We're just beginning to scratch the surface of what we can do with it."
Genetic algorithms crawl through huge, complex databases
Data Spiders works by harnessing the power of genetic algorithms to exhaustively search for the predictive information in raw transaction data.
Consider a typical problem of interpreting raw transaction data that a marketer might face. A simple transaction database contains purchase histories of up to 10 products. What combinations of past purchases are predictive of future revenue? The number of unique product combinations is in the millions.
"Working with transaction data is challenging because it is often time series in nature, comprised of voluminous databases requiring characteristics to be rolled up across multiple records," says Chris Ralph, Analytic R&D project manager at Fair, Isaac.
With its extremely thorough data evaluation, Data Spiders makes it easier to sift through this data and create the most predictive chraracteristics for a modeling project. Any source of transaction data can be used as raw data, including clickstream data for fraud or cross-selling; retail sales records for offers strategies; call detail records (CDRs) for churn prevention; and monthly master file snapshots for policy decisions.
"The Data Spiders technology handles transactions recorded in seconds, months and everything in between," notes Ralph.
Data Spiders' genetic algorithms mimic evolution via a "survival of the fittest" process, crawling through huge databases of transaction data, generating vast numbers of characteristics and then testing to find the most powerful predictors.
"Powerful," in this case, means highly predictive, uncorrelated and easily interpretable. "I would add 'subtle' to that," says Ralph. "By subtle we mean characteristics that could be overlooked by brainstorming experts. Yet they provide additional information and are also interpretable and intuitive."
For example, a recent modeling project used a "traditional" characteristic of "Total # of product catalog page requests" to predict purchase propensity of customers visiting an online retailer. Data Spiders improved the performance of the model by finding a more powerful variation of this characteristic, "Ratio of the # of product catalog page requests in the past 10 minutes to the last 60 minutes."
Data Spiders automatically combines newly generated characteristics into groups and compares them against each other. The process of combining and recombining to test for best groups continues until "fitness" plateaus.
At the end of the Data Spiders project, the most successful characteristics are identified and are included in predictive or descriptive models to increase model effectiveness. The final result is improved realization of performance goals — improving targeting, stemming attrition or completing more online purchases.
Case study: Which customers are most likely to purchase again?
In a Data Spiders project for a large direct mail retailer, the goal was to identify which of the company's initial purchasers were more likely to respond to a direct mail campaign with additional purchases.
To accomplish this, Fair, Isaac worked with the company to build a multi-buyer conversion model. The development data used to build the model included a transaction offer history database. The challenges of searching the transaction database included the fact that the raw data reflected different number of offers to different customers, varying offer frequencies and many different offer types.
As Ralph notes, "This is the type of messy transaction data that Data Spiders is designed to work with."
In this project, Data Spiders created a library of new characteristics that captured an array of powerful and uncorrelated predictive patterns.
"Early offer velocity" is one interesting example of a subtle but intuitive characteristic discovered by Data Spiders. The original model included a characteristic that captured the fact that, as the number of offers increased past a certain point, the likelihood of a response decreased. However, Data Spiders found that if the first four offers following the initial purchase are sent at the right frequency, the conversion rate for the second purchase increased dramatically.
This concept was coded into a new characteristic that, along with several others discovered by Data Spiders, was used in the development of the final scorecard. Capturing this pattern enabled the company to leverage the trade-off between offer volume and frequency over the initial phase of new customer relationships, leading to both lower mail costs and higher conversion rates. (See Figure 1.)
"We were surprised at how strong the predictive capacity latent in this type of time series data turned out to be," sums up Sullivan. "The lesson here is that there is much to be gained from using advanced technology to look a little deeper into how your customers behaves."