By Dan Ross
ORIGINALLY PUBLISHED AUGUST, SEPTEMBER, NOVEMBER, 2005, BEYE NETWORK (REVISED 2011)
Retail transaction information contains an incredible wealth of data about many aspects of a retail enterprise. From this data, a retailer can derive customer behavior, employee and store-level performance, promotional effectiveness, profit drivers, and many other important factors. However, only 15-30% of retailers are taking full advantage of what can be learned from this data.
The reasons for this can vary, but in general, there is some commonality in the obstacles to performing transaction analysis effectively:
- Size of the data: transaction detail information is usually a large amount of data. There is at least one table in some database that needs to capture a row for every song you buy from your online music store, every bottle of milk you buy from the local supermarket chain, and every shirt you buy from your favorite clothier. While storage costs have continued to decline and processing power has continued to multiply, building a history of transaction data can easily run into terabytes of data rather quickly.
- Poor structure for analysis by business intelligence tools: as we discuss below, the structure of the data captured by point-of-sale (POS) systems is not ideal for reporting. Most business intelligence tools struggle with the highly normalized nature of the information, and data warehouses are often populated with transaction data exactly as it comes from the POS system with minimal or no optimization for reporting purposes.
- Calculations are complex: to accomplish market basket analysis effectively, multiple passes are often required against transaction data, or SQL self-joins may be needed to determine which products sell together. In addition, in some cases the SQL required to accomplish an analysis may need to join multiple transaction-related tables together, which requires sophistication in the business intelligence tool and some serious horsepower in the database engine.
Those retailers that can overcome these obstacles through the creative use of data warehousing technology stand to gain tremendous benefits and competitive advantage, given the relatively small number of their competitors who have managed to tackle this challenge.
The operational data model for transaction information is relatively complex and rich. By its nature, it is well suited for optimizing the performance of a POS system. That means that the retail transaction operational data (like most other operational data sets) has the following characteristics:
- Data is highly normalized (typically 3rd normal form);
- Single transaction information is spread across several tables (typically 3 to 4 tables);
- Information has a high degree (nearing 100 percent) of referential integrity and foreign key relationships; and
- Structure is optimized for insert speed, not reporting.
While there are a number of purveyors of POS systems, the data model that is employed to capture POS data is fairly common across the different products. It typically consists of three main entities (tables):
Transaction Header —this table typically contains one row per transaction. A complicating factor in transaction data modeling is the notion that a transaction is not assigned an enterprise-wide unique key at point of sale. Most POS systems assign unique keys only at the register level, which creates a requirement for a multi-valued primary key for a transaction at the enterprise level. This multi-valued key often consists of store, register, transaction, and (sometimes) date values. The header record for a transaction usually contains:
- Transaction time of day;
- Register number within a store;
- Employee who completed the transaction;
- Indicator field for a return or regular sale;
- Indicator field for voided transactions; and
- Customer identifier or loyalty program number.
To make the transaction data model friendlier for reporting, experienced data warehouse teams will solve the transaction primary key problem upon data warehouse load, substituting the multi-valued key with a single value unique identifier for a transaction. Other optimizations at the header level might include adding pre-calculated fields like the number of items in the transaction and several fields related to the total retail price, taxes and promotional amounts (e.g. coupons) that are contained in the detail records. This would allow a simple report like average sales price per item or average items per basket to be calculated from header information without accessing the detail table, which can be several times larger.
Transaction Detail —this table typically contains one row per item in a transaction basket. The transaction detail table is often very wide, containing dozens of attributes about each item in a transaction. Captured fields typically include the following:
- Pricing information such as retail price, discounted price, and markdown amounts;
- Manually keyed price changes at the point of sale;
- UPC codes, which can tie to vendors;
- Package sizes and weights;
- Timestamps for each item; and
- Tax amounts (sometimes) by item.
The detail table is the place where you would go to analyze product affinities, promotional effectiveness, customer service determined by register wait time, and many other important measures. Again, here is a place where a unique transaction identifier is helpful in simplifying the SQL needed to aggregate and analyze the detail. Conversely, there may be fields that need to be de-normalized into the detail table for reporting such that extra joins are not required at reporting time.
Transaction Tenders —this table typically contains on average slightly more than one row per transaction, depending on how customers pay for their purchases. A tender can be associated with cash, check, charge, gift cards, and coupons. If a customer has three coupons and pays by credit card, this table would contain four rows for that particular transaction. Typical information captured at the tender level includes:
- Tender type and amount—cash, credit, charge, coupon, etc.;
- Specific information about each tender (e.g. credit card number, coupon identifier, check number); and
- Timestamp information.
The Transaction Tender table is a hidden backdoor into customer segmentation analysis, even in absence of a customer loyalty program. Unique customers can be identified by their credit card numbers or bank account numbers and their purchase histories tracked and aggregated. The sophisticated retailer substitutes its own replacement key value for these very sensitive pieces of data.
There can be other entities tracked at the transaction level in some circumstances that are specific to a particular type of retailer or individual firm. Sometimes, taxes can, and should be, tracked separately for a number of regulatory reasons and at least one common industry data model will include a reference to customer survey data, which can be linked to transaction information.
All of the common aggregations of retail data warehouse data should be derived directly from transaction information, but many or most of the high-value analyses should be performed at this level. The successful data warehouse team puts in place data model optimizations for transaction information that make it a little easier for common business intelligence tools to access this information. So how should retailers overcome the obstacles listed above to generate real business value?
Use Less Data
Using less data seems obvious, right? If too much data is causing you problems, find a way to use less and still derive value. This is actually easier than it sounds. Many meaningful analyses can be performed at the transaction summary level. This eliminates the need for constantly accessing tables that include a row for every item in a basket. While I will soon consider what analysis can be performed at this level, for now just remember to limit access to the detail unless absolutely required. Transferring analysis from detail level to transaction summary level can be accomplished through an effective ETL process. This process does some of the heavy lifting prior to running reports with your business intelligence tool. One must remember that your goal is deriving value, not making your business intelligence tool perform tricks against a large, poorly structured database. This is especially important in grocery stores, where baskets of “12 items or less” are the exception.
Another way to minimize data size is by carefully examining your usage of history. When trying to gain analytical insight, you do not necessarily need to analyze every transaction in your data warehouse over the last 10 years. For example, grocers have many transactions and repeat customers over short amounts of time. Because of this, they might be able to start with six months of transaction detail (while the ideal might be one to two years). The same thing could probably be said for big box retail. Here, they have enough repeat customers, which allow them to use a shorter history. Specialty retailers have comparably fewer transactions, which naturally limits data set size. But these businesses can probably start with two years of data (although five is ideal) for a meaningful sample set from customers that visit one or two times a year.
Tremendous data set reductions can occur by examining the applicability of the transactions to valuable analysis. This basically means that all transactions are different and should not be compared with all other transactions. Within a national retail chain, you might see dramatically different customer behavior and purchasing by region or season. In a Northeast home improvement store, for example, you probably would not see lawn care products being sold in January. For this same chain, however, lawn care products could be sold year-round in the south and west. Grocery, beer, and wine sales can often substantially increase the overall basket price, but not every store can carry beer and wine because of state alcohol regulations. You should construct a set of “relevant” transactions for each analysis resulting in a smaller data set for analysis. You should also realize that even though analysis must be conducted at the basket level, the valuable correlation and affinity analyses likely involve product information at a higher level than SKU.
An example of this is if you are interested in how often people buy paintbrushes and painter’s tape with paint. If you are searching for this information, you probably would not care what color of paint people were buying. You probably would not care about the paint or the paintbrushes package size either. In terms of fashion retailing, it could be very profitable to incent sales of hosiery with every shoe purchase. But you might not care about the detailed characteristics, assuming there are similar margins on similar SKU’s. Before conducting affinity analysis, you should introduce a step in your process where you replace a SKU with the most relevant department or product class.
In short, there are numerous ways of reducing the amount of data necessary to access or conduct a meaningful analysis of your transactions. Whenever possible, do not access the detail. When you do, make sure you have created a relevant subset of transactions for analysis.
Simplify the Structure
Too many companies pull data directly off their POS systems, create an identical structure in their data warehouse and place a business intelligence tool on it. Transactional data is complex and hard to expose in an unvarnished fashion to business people, while business intelligence tools struggle with handling the entity relationships correctly. In addition, the inherent level of normalization and pre-calculation make analysis difficult and slow-performing. This technique is easy for IT because it simplifies the amount of data integration and ETL that must be built. It also speeds time to deploying a transactional data warehouse. Unfortunately, this technique doesn’t make analysis very easy.
There are three strategies that should be done to make the data warehouse structure easier to analyze
by a typical business intelligence tool and business user:
- Learn about master data management— The most chaotic operational data is the item master and store master information. Remember to use robust master data management processes that work well with your merchandising, inventory, and ordering systems.
- Don’t copy the POS data model into the data warehouse— Take the time to create ETL processes that actually transform the data into more usable models. De-normalize and pre-calculate information without leaving out necessary information. For example, adding a field for the number of items in the Transaction Header table saves you from accessing Transaction Detail. This also eliminates the need for counting rows at analysis time. You should also solve the transaction primary key problem to avoid complex, multi-column joins for the reports that need to access multiple transaction-related tables. Create good aggregate tables (you will need these even if your vendor says otherwise) for item movement analyses that don’t need to be recreated upon a product re-class.
- Realize that transaction data breaks the “dimension” structure—the typical dimensional data structure in retail involves three main dimensions (products, stores, and retail calendar) that carry attributes. These attributes relate to each other independently from sales and inventory amounts. Only when placed with sales and inventory details can the different dimensions relate to one another. There are relatively few exceptions, at least for business intelligence tools. A transaction is a dimension with its own attributes that are different from products, stores, or time. In fact, products, items, and stores are really just attributes of a transaction. It can even be said that a data warehouse containing transaction data is really a 1-dimension data warehouse. This is not easy for business tools to deal with and might require special treatment.
Use Your Horsepower Wisely
Unfortunately, transaction analysis requires significant disk space and CPU processing power. You cannot avoid this. However, you should prudently consider your horsepower. There are many strategies for this:
- Use your batch window to the max— As much as possible, push heavy calculation and repeatable processing into ETL and the batch process. You should also optimize the mundane data movement and loading process as much as possible. By doing this, you will have cycles for real transformation through pre-calculation and de-normalization. Ideally, this should be completed during off hours, instead of real-time when it impacts other users’ queries.
- Target your analysis— Especially when doing product affinity and customer analysis, use the techniques above to avoid analyzing the entire database at once. Common sense is important as well. In general, there is enough lift from a manageable set of affinity analyses to produce a substantial ROI. Avoid open-ended analyses that attempt to determine affinity by analyzing every transaction (in every store) for two years. If an open-ended analysis is required, use a transaction sample using statistical principles. Even on a general analysis, you do not need to examine every transaction to generate actionable information.
- Proactively tune your data warehouse— You will need as many processing cycles as you can recapture. Instead of asking your CFO for millions of dollars for hardware, it is much better to spend continuous time tuning your data warehouse and ensuring optimal performance.
Effectively analyzing transaction information is not easy, which is why many retailers still have not mastered it. Using these techniques, though, you can put yourself in a position to begin deriving value from transaction information. You can also learn more about your customers’ behavior, among many other things.
Most retailers are driven by a common fear of being pushed out of the market by larger competitors. They subsequently recognize that they are likely to meet this fate unless they can connect with the needs and behaviors of their customers. Clearly, some of the largest industry players have achieved their status through exactly these behaviors. This means understanding the needs of their customers and doing whatever they could to meet those needs.
While customer surveys and focus groups can be helpful, customers truly vote with both their feet and their dollars. The transaction detail in your data warehouse is a (nearly) perfect record of the decisions your customers make, and how those decisions affect your business. Coupled with other data in your data warehouse, you have much of the information you need to understand the behavior of your customers. You can view the analysis of this behavior from two perspectives: the “customer-centric” and the “visit-centric” perspective.
The holy grail of retail customer analysis would be to know everyone who visits your stores (or other channel), why they came, whether they found what they wanted, what they chose to buy and how they felt about the whole process. That information is difficult to find, but it constitutes a “customer-centric” analysis of customer behavior. Through loyalty programs and other means of tying the summary information about a transaction to a particular customer, many retailers can determine the following:
- Number of visits/purchases by each customer;
- Which of your stores or channels they bought through;
- Basket metrics for each transaction, such as item count, sales amount, etc.;
- Responsiveness to promotions—whether they use a coupon or special promotion code;
- How demographics (about customers and locations) affects behavior; and
- Characteristics that describe profitable (or high-volume) and unprofitable (or low-volume) customers over the lifetime of the relationship.
With this information, much of which is contained in the Transaction Header, Transaction Detail, and Transaction Tender components of a retail data warehouse, a retailer can adopt customer-centric retailing practices. These practices drive everything from how promotions are constructed to how stores are stocked, decorated, and sited, to actions, which can be used to improve relationships with desirable customers.
For example, retailers that can segment stores based on the shopping habits of various customer groups, integrate multiple channels to allow people who shop the retailer’s website to complete purchases at the store, honor lower prices found on the website at the in-store checkout for customers with documentation, as well as segment customers and align the business behind “profitable” relationships, are largely driven by data warehousing technology and its wise usage.
The customer-centric strategy, of course, does not work for all retailers. In fact, a “visit-centric” approach to customer analysis might make more sense for many retailers. Examples of this are stores who service less discretionary purchases or retailers without loyalty programs, or an ability to tie customers to transactions. In these situations, the retailer’s effectiveness can come down to three simple things: how conveniently the store is located, whether it has products in stock at reasonable prices, and how comfortable the shopping and purchasing experience is. Another reason to consider a visit-centric approach is an inability to identify individual customers and match them with transactions in the store.
The analysis of transaction-level detail, and the relentless optimization of operations are essential to the visit-centric strategy. While a store cannot easily change its location once it opens for business (though the location should be planned carefully), it can certainly be aggressive about ensuring item availability, pricing, and the checkout process. These stores should also use promotions wisely, which can influence things like basket contents and repeat visits.
One might wonder how in-stock positions could be improved through examining POS data. Clearly, some queries to the inventory systems, or the inventory portions of the data warehouse, are necessary to become proactive in addressing stockouts. However, out-of-stock situations are often compounded when one of a highly sticky pair of products is not available. This can be a visit killer, where the shopper decides that because a particular item is out of stock he or she must visit another store anyway. Subsequently, the retailer loses more than just the sale of the individual item that is out of stock.
Through product affinity analysis, a retailer can better determine which products sell together during a particular visit. In-stock positions can also be viewed in groups. Thus, if one of a group of related products is out of stock, then all products in the group could be considered out of stock as well, or at least they will be impacted by the absence of the other product. In addition, retailers can use the stickiness of products to each other to plan better promotions and modify store layouts. Affinity analysis is complicated, and only a few business intelligence tools can actually perform it. Getting into detailed affinity analysis requires a tool with data mining and statistical capability, but the analysis should be done directly against transaction detail information.
In terms of pricing, the transaction log (obviously) thoroughly details how individual products are priced and sold at the store level. It also shows how those characteristics change over time. While this is moderately interesting at the chain level, what really influences customer behavior is how item prices compare to other local stores. A best practice is to complement POS data in the warehouse with competitive shopping information.
Many retailers regularly shop their competitors’ stores and obtain pricing for a common basket of items. Not only can the data warehouse tell a retailer which items should be in that competitive basket, it should also be where the competitor pricing data is stored and analyzed. With this approach, a retailer can create a competitive index by store to see how its pricing policies position itself against competition and, naturally, facilitate market-based price changes in response to competition.
Finally, the transaction log can provide a great deal of information about the checkout process. This occurs because the data model typically captures exact times for scanning each item, as well as all of the other transaction’s summary information. The fact that the chronological sequence of a transaction can be captured allows retailers to identify:
- How long customers waited -- length of transaction in seconds
- How busy the store was – time between end of prior transaction and start of next transaction
- How cashier errors and mis-keys affected transaction -- delays between item scans
- How tender types (check, cash, credit/debit) influence checkout time
From these and other metrics, retailers can spot training or performance deficiencies in stores or employees, plan optimal labor forecasting for their most costly employee category (cashiers), and adopt better checkout policies regarding tenders. With the use of self-checkout in some stores, comparing performance of regular versus self-checkout lanes might suggest changes in the mix of the two. Combined with affinity-driven in-stock and pricing optimization, retailers can best execute on their visit-centric retailing strategy.
In summary, there are clearly many benefits that can be derived from analyzing transaction information in a retail data warehouse. There are a number of technical challenges to address, such as the volume of the information and the difficulty of analyzing it with today’s business intelligence tools. However, the truly best-in-class retailers have been doing it for years. Success in the retail industry depends upon making effective use of this information to enhance your relationships with customers.
About the author
Dan is the Managing Partner at Claraview, a strategy and technology consultancy that helps leading companies and government agencies use business intelligence to achieve competitive advantage and operational excellence. Claraview clients realize measurable results: faster time to decision, improved information quality and greater strategic insight. Dan is a frequent contributor to business intelligence literature, writing on topics spanning technical approaches and business impact, and the Claraview Retail Practice serves some of the world's most advanced users of retail data warehouses.