Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Fantasy League Data Quality
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Warehousing > Fantasy League Data Quality
Business IntelligenceData Warehousing

Fantasy League Data Quality

JimHarris
JimHarris
15 Min Read
SHARE

For over 25 years, I have been playing fantasy league baseball and football. For those readers who are not familiar with fantasy sports, they simulate ownership of a professional sports team. Participants “draft” individual real-world professional athletes to “play” for their fantasy team, which competes with other teams using a scoring system based on real-world game statistics.

Contents
Master Data ManagementMaster Data ManagementData WarehousingBusiness IntelligenceData QualityConclusion

What does any of this have to do with data quality?

Master Data Management

In Worthy Data Quality Whitepapers (Part 1), Peter Benson of the ECCMA explained that “data is intrinsically simple and can be divided into data that identifies and describes things, master data, and data that describes events, transaction data.”

In fantasy sports, this distinction is very easy to make:

More Read

Are You Enchanting? A Book Review
Want to Fully Leverage Data? Change the Way You Operate
Why Old Media Can’t Deny New Media
Fog2.0
CRTC Fails to Grasp Simple Data Protection Concepts
  • Master Data – data describing the real-world players on the roster of each fantasy team.
  • Transaction Data – data describing the statistical events of the real-world games played.

In his magnificent book Master Data Management, David Loshin explained that “master data objects are those core business objects used in the different applications across the organization, along with their associated metadata, attributes, …

For over 25 years, I have been playing fantasy league baseball and football. For those readers who are not familiar with fantasy sports, they simulate ownership of a professional sports team. Participants “draft” individual real-world professional athletes to “play” for their fantasy team, which competes with other teams using a scoring system based on real-world game statistics.

What does any of this have to do with data quality?

Master Data Management

In Worthy Data Quality Whitepapers (Part 1), Peter Benson of the ECCMA explained that “data is intrinsically simple and can be divided into data that identifies and describes things, master data, and data that describes events, transaction data.”

In fantasy sports, this distinction is very easy to make:

  • Master Data – data describing the real-world players on the roster of each fantasy team.
  • Transaction Data – data describing the statistical events of the real-world games played.

In his magnificent book Master Data Management, David Loshin explained that “master data objects are those core business objects used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies.”

In fantasy sports, Players and Teams are the master data objects with many characteristics including the following:

  • Attributes – Player attributes include first name, last name, birth date, professional experience in years, and their uniform number. Team attributes include name, owner, home city, and the name and seating capacity of their stadium.
  • Definitions – Player and Team have both Professional and Fantasy definitions.  Professional teams and players are real-world objects managed independent of fantasy sports. Fundamentally, Professional Team and Professional Player are reference data objects from external content providers (Major League Baseball and the National Football League).  Therefore, Fantasy Team and Fantasy Player are the true master data objects.  The distinction between professional and fantasy teams is simpler than between professional and fantasy players.  Not every professional player will be used in fantasy sports (e.g. offensive linemen in football) and the same professional player can simultaneously play for multiple fantasy teams in different fantasy leagues (or sometimes even within the same league – e.g. fantasy tournament formats).
  • Roles – In baseball, the player roles are Batter, Pitcher, and Fielder. In football, the player roles are Offense, Defense and Special Teams. In both sports, the same player can have multiple or changing roles (e.g. in National League baseball, a pitcher is also a batter as well as a fielder).
  • Connections – Fantasy Players are connected to Fantasy Teams via a roster. On the fantasy team roster, fantasy players are connected to real-world statistical events via a lineup, which indicates the players active for a given scoring period (typically a week in fantasy football and either a week or a day in fantasy baseball). These connections change throughout the season. Lineups change as players can go from active to inactive (i.e. on the bench) and rosters change as players can be traded, released, and signed (i.e. free agents added to the roster after the draft).
  • Taxonomies – Positions played are defined individually and organized into taxonomies. In baseball, first base and third base are individual positions, but both are infield positions and more specifically corner infield. Second base and short stop are also infield positions, and more specifically middle infield. And not all baseball positions are associated with fielding (e.g. a pinch runner can accrue statistics such as stolen bases and runs scored without either fielding or batting).

Data Warehousing

Combining a personal hobby with professional development, I built a fantasy baseball data warehouse. I downloaded master, reference, and transaction data from my fantasy league’s website. I prepared these sources in a flat file staging area, from which I applied inserts and updates to the relational database tables in my data warehouse, where I used dimensional modeling.

My dimension tables were Date, Professional Team, Player, Position, Fantasy League, and Fantasy Team. All of these tables (except for Date) were Type 2 slowly changing dimensions to support full history and rollbacks.

For simplicity, the Date dimension was calendar days with supporting attributes for all aggregate levels (e.g. monthly aggregate fact tables used the last day of the month as opposed to a separate Month dimension).

Professional and fantasy team rosters, as well as fantasy team lineups and fantasy league team membership, were all tracked using factless fact tables.  For example, the Professional Team Roster factless fact table used the Date, Professional Team, and Player dimensions, and the Fantasy Team Lineup factless fact table used the Date, Fantasy League, Fantasy Team, Player, and Position dimensions. 

The factless fact tables also allowed Player to be used as a conformed dimension for both professional and fantasy players since a Fantasy Player dimension would redundantly store multiple instances of the same professional player for each fantasy team he played for, as well as using Fantasy League and Fantasy Team as snowflaked dimensions.

My base fact tables were daily transactions for Batting Statistics and Pitching Statistics. These base fact tables used only the Date, Professional Team, Player, and Position dimensions to provide the lowest level of granularity for daily real-world statistical performances independent of fantasy baseball. 

The Fantasy League and Fantasy Team dimensions replaced the Professional Team dimension in a separate family of base fact tables for daily fantasy transactions for Batting Statistics and Pitching Statistics. This was necessary to accommodate for the same professional player simultaneously playing for multiple fantasy teams in different fantasy leagues. Alternatively, I could have stored each fantasy league in a separate data mart.

Aggregate fact tables accumulated month-to-date and year-to-date batting and pitching statistical totals for fantasy players and teams. Additional aggregate fact tables incremented current rolling snapshots of batting and pitching statistical totals for the previous 7, 14 and 21 days for players only. Since the aggregate fact tables were created to optimize fantasy league query performance, only the base tables with daily fantasy transactions were aggregated.

Conformed facts were used in both the base and aggregate fact tables. In baseball, this is relatively easy to achieve since most statistics have been consistently defined and used for decades (and some for more than a century). 

For example, batting average is defined as the ratio of hits to at bats and has been used consistently since the late 19th century. However, there are still statistics with multiple meanings.  For example, walks and strikeouts are recorded for both batters and pitchers, with very different connotations for each.

Additionally, in the late 20th century, new baseball statistics such as secondary average and runs created have been defined with widely varying formulas. Metadata tables with definitions (including formulas where applicable) were included in the baseball data warehouse to avoid confusion.

For remarkable reference material containing clear-cut guidelines and real-world case studies for both dimensional modeling and data warehousing, I highly recommend all three books in the collection: Ralph Kimball’s Data Warehouse Toolkit Classics.

Business Intelligence

In his Information Management special report BI: Only as Good as its Data Quality, William Giovinazzo explained that “the chief promise of business intelligence is the delivery to decision-makers the information necessary to make informed choices.”

As a reminder for the uninitiated, fantasy sports simulate the ownership of a professional sports team. Business intelligence techniques are used for pre-draft preparation and for tracking your fantasy team’s statistical performance during the season in order to make management decisions regarding your roster and lineup.

The aggregate fact tables that I created in my baseball data warehouse delivered the same information available as standard reports from my fantasy league’s website. This allowed me to use the website as an external data source to validate my results, which is commonly referred to as using a “surrogate source of the truth.” However, since I also used the website as the original source of my master, reference, and transaction data, I double-checked my results using other websites. 

This is a significant advantage for fantasy sports – there are numerous external data sources that can be used for validation freely available online. Of course, this wasn’t always the case. 

Over 25 years ago when I first started playing fantasy sports, my friends and I had to manually tabulate statistics from newspapers. We migrated to customized computer spreadsheet programs (this was in the days before everyone had PCs with Microsoft Excel – which we eventually used) before the Internet revolution and cloud computing brought the wonderful world of fantasy sports websites that we enjoy today.

Now with just a few mouse clicks, I can run regression analysis to determine whether my next draft pick should be a first baseman predicted to hit 30 home runs or a second baseman predicted to have a .300 batting average and score 100 runs. 

I can check my roster for weaknesses in statistics difficult to predict, such as stolen bases and saves. I can track the performances of players I didn’t draft to decide if I want to make a trade, as well as accurately evaluate a potential trade from another owner who claims to be offering players who are having a great year and could help my team be competitive.

Data Quality

In her fantastic book Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information, Danette McGilvray comprehensively defines all of the data quality dimensions, which include the following most applicable to fantasy sports:

  • Accuracy – A measure of the correctness of the content of the data, which requires an authoritative source of reference to be identified and accessible.
  • Timeliness and Availability – A measure of the degree to which data are current and available for use as specified and in the time frame in which they are expected.
  • Data Coverage – A measure of the availability and comprehensiveness of data compared to the total data universe or population of interest.
  • Presentation Quality – A measure of how information is presented to and collected from those who utilize it. Format and appearance support appropriate use of the information.
  • Perception, Relevance, and Trust – A measure of the perception of and confidence in the data quality; the importance, value, and relevance of the data to business needs.

Conclusion

I highly doubt that you will see Fantasy League Data Quality coming soon to a fantasy sports website near you. It is just as unlikely that my future blog posts will conclude with “The Mountain Dew Post Game Show” or that I will rename my blog to “OCDQ – The Worldwide Leader in Data Quality” (duh-nuh-nuh, duh-nuh-nuh).

However, fantasy sports are more than just a hobby. They’re a thriving real-world business providing many excellent examples of best practices in action for master data management, data warehousing, and business intelligence – all implemented upon a solid data quality foundation.

So who knows, maybe some Monday night this winter we’ll hear Hank Williams Jr. sing:

“Are you ready for some data quality?”

Link to original post

TAGGED:data quality
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

#24: Here’s a thought…

6 Min Read

Big Data

6 Min Read

Better Data Quality From Your Web Form

5 Min Read

The Very True Fear of False Positives

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?