The Internet of Things is the source of the category of big data known as sensor data, which is often the new type you come across while defining big data that requires you to start getting to know NoSQL, and which differentiates machine-generated data from the data generated directly by humans typing, taking pictures, recording videos, scanning bar codes, etc.
One advantage of machine-generated data that’s often alleged is its inherently higher data quality due to humans being removed from the equation. In this blog post, let’s briefly explore that concept.
Is it hot in here or is it just me?
Let’s use the example of accurately recording the temperature in a room, something that a thing, namely a thermostat, has been helping us do long before the Internet of Anything ever existed.
(Please note, for the purposes of keeping this discussion simple, let’s ignore the fact that the only place where room temperature matches the thermostat is at the thermostat — as well as how that can be compensated for with a network of sensors placed throughout the room).
A thermostat only displays the current temperature, so if I want to keep a log of temperature changes over time, I could do this by writing my manual readings down in a notebook. Here is where the fallible human, in this case me, enters the equation and could cause a data quality issue.
For example, the temperature in the room is 71 degrees Fahrenheit, but I incorrectly write it down as 77 degrees (or perhaps my sloppy handwriting caused even me to later misinterpret the 1 as 7). Let’s also say that I wanted to log the room temperature every hour. This would require me to physically be present to read the thermostat every hour on the hour. Obviously, I might not be quite that punctual and I could easily forget to take a reading (or several), thereby preventing it from being recorded.
The Quality of Things
Alternatively, using an innovative new thermostat that wirelessly transmits the current temperature, at precisely defined time intervals, to an Internet-based application certainly eliminates the possibility of the two human errors in my example.
However, there are other ways a data quality issue could occur. For example, the thermostat may not be accurately displaying the temperature because it is not calibrated correctly. Another thing is that a power loss, mechanical failure, loss of Internet connection, or the Internet-based application crashing could prevent the temperature readings from being recorded.
The point I am trying to raise is that although the quality of things entered by things certainly has some advantages over the quality of things entered by humans, nothing could be further from the truth than to claim that machine-generated data is immune to data quality issues.