Data Collection 101: Where Is Your Data Coming From?


Data verbiage is confusing at best and nonsensical at worst. Big data implies little other than amount. Identity data implies little more than behavior. And data scientist implies little else than data mining which implies an abundance of code and algorithms, to say the least.

The data industry is awash in terms and acronyms that many use and few fully understand – mostly because the definitions are still quite fluid. But at the center of all the data talk is one central pillar: data rights and security.

The White House has chimed in, the courts are getting more strict, the world is ebbing and flowing to a new digital reality that requires the use of secure, trustworthy data for the benefit of all. And to ensure secure and trustworthy data, you need to know from where your data comes. For any data-driven business, there are three types of data against which to increase revenue: first-, second- and third-party. With each come benefits and points of hesitation, but with the proper collection practices in place, all combine to provide a holistic on- and off-line view of your customer, informing business strategy and decision at every level in every department.

First-Party Data

First-party data is the golden child of the data world. It is the data you collect on your users on your own site, data that your users are generally willing to give to you because you are a trusted brand. Data you can be sure of because it comes from a direct connection between you and your users. Registration data is first-party data including the emails, usernames, geographical information and any other data points you ask your users to share. If you are using social media authentication as user login, though, this is not explicitly first-party data (see the second-party data definition below). On-site behavior analytics are also first party data (think data management platforms – DMPs – and tag management systems) and are a good indicator of what aspects of your site are pulling people in, making them stay and encouraging them to share. For most companies, first-party data is an understood investment. It leverages the customer relationship to create a two-way line where the customer can offer up preferences via an on-site survey or simply through their on-site behaviors, and the company can respond through A/B testing, email campaigns and more. Not utilizing your first-party data is generally considered to be an act of leaving money on the table. Unfortunately, while first-party data is the most unique, trustworthy and free, it often is the smallest or, if it is scaled, the most anonymous.


Second-Party Data

Second-party data is a bit more complex, given its relative novelty in the data industry. For the most part, second-party data is data that a user has given to another platform, but is giving you access to. For example, social authentication is a version of second-party data collection. The user originally gave the login data to Facebook, or another social media site, but is allowing your site to gain access to the same information, often because of convenience. Second-party data is an additive to your first-party data. It is just as trustworthy as the data you collect on your own site, given that a user has input the information themselves and then opted-in to giving it to you, but it adds a level of identity that first-party data typically anonymizes. For example, Google Analytics provides behavioral data for your site based on visits, not on an individual. Second-party data, like that collected from Facebook, allows you to connect specific site behaviors with an individual, providing a full story view on users who allow you access to their data. Depending on the social media site, you can also gain brand affinity information (showing you which brands on Facebook your audience likes, which companies on LinkedIn your audience follows and so on), allowing you to combine niche on-site behaviors with that niche audience’s brand affinities to pull in relevant advertising or sponsorships at premium prices. Since users are offering you second-party data when they interact with your brand, either on social media, your site or in-app, second-party data doesn’t supplant first-party. Rather, second-party data offers a deeper view into first-party data sans on-site surveys or focus groups. Of course, in order to access second-party data, you typically need a tech partner (like Umbel!) to implement and optimize. Second-party data often scales better than first-party data, is less anonymized, if at all, and is safer than third-party data given that it is typically collected and secured in accordance with privacy and rights best practices.


Third-Party Data

Third-party data is most often data collected and then sold without explicit user consent. In addition, those selling third-party data do not have a direct relationship with the user. In essence, a company may pay a publisher or a credit card company to collect data on their users and then sell that data to other companies to better inform their marketing, advertising or business strategy overall. For the most part, third-party data is input into data management platforms (or DMPs) to target ad buys. While third-party data can give you insight into your users at scale, allowing you to segment and target user groups, it often comes at a hefty cost and can be unreliable, given that the third-party “broker” doesn’t have a relationship with the user whose data they are selling. Much of the concern in the big data industry regarding the origins of data is specifically talking about third-party data. Data rights issues also crop up here, as well, given that user data is collected and sold usually without user consent. There is very little transparency for the user when it comes to third-party data collection and use. Overall, second-party data is the sweet spot for scalable, reliable user data that does not breach the privacy or rights of the users themselves. Many second-party data platforms like Facebook, Google and Apple are beginning to enforce stricter regulations for on-platform data collection, requiring user transparency about collection and use. Some, like Facebook, are even requiring that companies prove they are indeed using the data they collect – imposing an “only collect it if you will actively use it” policy.