Don’t Fear the Statistics – Using OBI for Statistical Analysis Part 2
Nearly every client Edgewater Ranzal partners with uses statistical averages in their analytic and reporting solutions. As far as statistical functions go, it is probably the easiest to understand, however; the limitation of using the average is that it can be difficult to determine how to rate the individual performance of contributors to that average. Consider the following examples:
- The average cost of a gallon of milk is $3.20 and the corner convenience store is selling it for $3.45, is that a significant deviation from the average?
- If the average NFL player’s base salary is $1.86 million and Tennessee Titan’s Marcus Mariota made $5.5 million, is this an exceptional payout? Is the salary significant when his role as the team’s starting Quarterback is considered?
- Suppose the average gross margin percent for a company’s business units is 58% and one particular business unit’s actual gross margin is 46%. Is that business unit truly underperforming?
It turns out that the average of a particular measurement is very subjective. In this post, we explore how the standard deviation of the average can be used to mitigate subjectivity and how it can be incorporated into data visualizations to identify true outliers.
The NASDAQ-100 is comprised of the largest domestic and international non-financial companies (based on market capitalization) listed on the Nasdaq Stock Exchange. It includes technology giants such as Apple and Alphabet (parent company of Google) along with consumer services such as Bed, Bath, & Beyond. The quarterly gross margin percent from 2007 to Q3 2016 was downloaded and loaded into a data mart leveraged by Oracle Business Intelligence Enterprise Edition (OBIEE) 12c. (Q4 2016 data was not available for all companies). With the exception of Figure 1, the following visualizations were created in OBIEE 12c.
The standard deviation can be thought of as ranges that can be used to classify individual contributors to the average. For instance, the average gross margin percent for the NASDAQ-100 in Q4 2014 was calculated to be 59.9% with a standard deviation of 22.7%. This can be visualized on a number line as such:
Figure 1 NASDAQ-100 Q4 2014 Gross Margin % Performance Ranges
Many real world events that have variability follow a predictable distribution pattern. For instance, it is expected that approximately 34.1% of the contributors will fall between the average and one standard deviation up. From the figure above, it is estimated that approximately 34 of the NASDAQ-100 will have a gross margin percent between 37.2% and 59.9%. The actual distribution can be visualized as such:
Figure 2 Distribution of NASDAQ-100 Gross Margin %
The NASDAQ-100 companies do not perfectly follow the distribution; there is a fatter spread into the Negative and Positive buckets (Two Standard Deviations down and up). Other, more advanced statistical methods can be used to redefine ranges, but are beyond the scope of this post.
Of course, this visualization simply confirms statistical theories that were proven over a hundred years ago. The true value of analytics is to take statistical theories and turn them into informative visuals. One method of visualizing the ranking of companies using the standard distribution in OBIEE 12c is through a Treemap:
Figure 3 NASDAQ-100 Distribution Treemap Visualization
The size of the box represents the Gross Margin % while the color aligns with the distribution ranking from Figures 1 and 2. This visualization allows the viewer to understand both the rankings and relative performance at a glance. It is easy to discern the delineation between above and below average (border between yellow and light green) as well as which companies are herding together.
One of the most powerful and essential aspects of business analytics is the ability to dimensionalize data so it can be sliced and diced. One (of many) reasons this is done is to be sure that there is an “apples to apples” comparison. For instance, comparing the gross margin percent comparison between Qualcomm (QCOM), a semiconductor and telecommunications company, and Ross Stores (ROST), a discount department store, can create misconstrued distributions. Filtering the visualization in Figure 3 by the NASDAQ industry classifications for Technology companies results in the following Treemap:
Figure 4 NASDAQ-100 Technologies Companies Treemap
Notice that Qualcomm has slipped from “Moderately Positive” to “Moderately Negative.” Averages and standard deviations can change dramatically when looking at the components of the whole. To demonstrate this, consider the following visualization comparing the average and deviation spread of the three largest categories (by number of companies) of the NASDAQ-100:
Figure 5 Average and Standard Deviation by Categories
The border between yellow and light green represents the average while each band represents one standard deviation. Notice that the average gross margin % as well as the standard deviation is higher for Healthcare than for Technology. Healthcare companies are going to skew the performance perspective of Technology companies. This skew worsens when comparing against companies classified as Consumer Service.
As a general rule, a single point is not the best indicator of long term performance. Although the average and standard deviation for a single quarter was calculated through the agglomeration of one hundred companies, it should be considered a single data point. Consider the following visualizations that show a comparative trend for four different companies for the entire date range downloaded:
Figure 6 Gross Margin % Trend for Adobe, Amazon, Electronic Arts, and Priceline
At a glance, viewers can see that Adobe (upper left) consistently beats the average performance while consumer goods and technology giant Amazon (upper right) has been performing below average until recently. Electronic Arts (lower left), a video game developer, seems to have erratic gross margin % returns; however, looking past the noise, the company is nearly always between moderately positive and moderately negative when compared against other NASDAQ-100 companies. Finally, Priceline (lower right) has been increasing gross margin % consistently and steadily pulling ahead of other NASDAQ-100 companies. If Priceline’s gross margin % trend continues and the performance of the other companies remains constant, Priceline will move into the “Extremely Positive” gross margin % ranking in Q4 2016 or Q1 2017.
Returning to the questions posed at the beginning of this post:
- The average cost of a gallon of milk is $3.20 with a standard deviation of $0.08. The corner grocery store selling milk for $3.44 is three standard deviations above the average!
- The average NFL base salary is $1.86 million with a standard deviation of $2.80 million. Comparatively, Marcus Mariota’s $5.50 million salary is one standard deviation above average. However, with the average quarterback base salary being $5.69 million with a standard deviation of $7.17 million, he is actually minimally undercompensated.
For the final question, we ask the reader to evaluate his enterprise:
- Calculate the average gross margin percent for your company’s business units for the quarter and find the business unit that is approximately 10% less than that average. Are they truly underperforming? Are you able to properly classify these business units to gain the greatest insight into relative performance?
Average and standard deviation can be applied to any metric by which a company wishes to evaluate itself. It can be used in combination with external data to create industry benchmarks. For instance, if you were to plot your company’s gross margin % performance against the trends above, how would it look?
We want to close this post with the same idea that we closed Part 1 of the “Don’t Fear the Statistics” post: statistical analytics is part science/technology and part art. Reducing statistical calculations to consumable visualizations is the key. In the visualizations above, references to “standard deviation” were diligently omitted in favor of familiar terms such as “Moderately Negative.” Approaches such as this help with change management, adoption, and the acceleration from simple reporting to true analytical insight into business process improvement based on data.
Jason L. Hodson is a Principal Architect with Edgewater Ranzal. He focuses on the Oracle Business Intelligence platform, with particular emphasis on the federation of EPM and relational data source, Business Intelligence Cloud Service (BICS), as well as data governance with Hyperion DRM. He has experience with clients in the insurance, public utilities, manufacturing distribution, and healthcare industries. A former U.S. Marine, Jason has an undergraduate degree in mathematics/physics from Ball State University, an MBA and MS-Information Systems from the University of Cincinnati, and a MS-Information and Knowledge Strategy from Columbia University. He currently resides in Denver, CO and enjoys hiking, snowshoeing, and the local craft beer industry.