100 Petabytes of Data in Poop?
University of California computer scientist Dr. Larry Smarr is a man on a mission—to measure everything his body consumes, performs, and yes, discharges. For Dr. Smarr, this data collection has a goal –to fine tune his ecosystem in order to beat a potentially incurable disease. Is this kind of rigorous information collection and analysis the future of healthcare?
Talk to a few friends and you’ll probably find those who count calories, steps, or even chart exercise and/or eating regiments. But it’s not very likely that your friends are quantifying their personal lives like Larry Smarr.
Atlantic Magazine’s June/July 2012 issue describes efforts of Dr. Larry Smarr in capturing his personal data – but not necessarily those of financial or internet viewing habits. Dr. Smarr is capturing health data, and lots of it. He uses armbands to record skin temperature, headbands to monitor sleep patterns, has blood drawn eight times a year, MRIs and ultrasounds when needed, and regular colonoscopies. And of course, he writes down every bite of food and also collects his own stool samples and then ships them to a laboratory.
Monitoring calories makes sense, but stools are also “information rich” says Smarr. “There are about 100 billion bacteria per gram. Each bacterium has DNA whose length is typically one to ten megabases—call it one million bytes of information,” Smarr exclaims. “This means human stool has a data capacity of 100,000 terabytes of information (~97 petabytes) stores per gram.” And all kinds of interesting information on the digestive tract, liver and pancreas can be culled from feces including infection, nutrient absorption and even cancer.
Armed with all this health data, Dr. Smarr is attempting to “model” his ecosystem. This means producing a working model that when fed inputs, can help report, analyze and eventually predict potential health issues. Just as sensor and diagnostic data are useful for auto manufacturers to perform warranty and quality analysis, Dr. Smarr is collecting and analyzing data to fine tune how his human body performs its functions.
But there’s more to the story. In his charting process, Dr. Smarr noticed his C-reactive protein (CRP) count was high—which rises in response to inflammation. “Troubled, I showed my graphs to my doctors and suggested that something bad was about to happen,” he says. Believing his higher CRP count was acting as an early warning system, Carr was dismissed by doctors as too caught up in finding a problem where there was none.
Two weeks later Dr. Smarr felt a severe pain in the side of his abdomen. This time, the doctors diagnosed him with an acute bout of diverticulitis (bowel inflammation) and told him to take antibiotics. But Dr. Smarr wasn’t convinced. He tested his stools and came up with additional alarming numbers that suggested his diverticulitis was perhaps something more—early Crohn’s disease which is an incurable and uncomfortable GI tract condition. The diagnosis of Crohn’s was subsequently confirmed by doctors.
Critics of “measuring everything” in terms of healthcare suggest that by focusing on massive personal data collection and analysis we’ll all turn into hypochondriacs, looking for ghosts in the machine when there are none. Or, as Nassim Taleb argues; the more variables we test, the disproportionately higher the number of spurious results that appear (to be)"statistically significant". And there is also the argument is that predictive analytics may do more harm than good in suggesting potential for illness where a patient may never end up developing a given disease. Correlation is not a cause in other words.
That said, you’d have a hard time convincing Dr. Smarr that patients, healthcare providers and even society at large couldn’t benefit more by quantifying and analyzing inputs, outputs thus gaining a better understanding of our own “system health”. And fortunately, due to Moore’s Law and today’s software applications, our ability to apply brute force computation to our data-rich problems is now not only possible, it’s available now.
However, what sometimes makes sense conceptually is often much more of a difficult implementation in the real world. A sluggish healthcare system, data privacy issues, and lack of data scientists to perform big data analysis are potential roadblocks in seeing the “quantified life”—for everyone—become a reality any time soon.
- Does data collection and analysis methods as described in this article portend a revolution in healthcare?
- If everyone rigorously collects and analyzes their personal health data, could this end up raising or reducing overall healthcare costs?
Paul Barsch directs marketing programs for Think Big, a Teradata company. Think Big offers roadmap, architecture, engineering and ongoing support services for data lake and analytic solutions. Paul has also worked in senior marketing roles for global consultancies EDS (now an HP company) and BearingPoint (formerly KPMG Consulting). The opinions expressed here represent those of Paul Barsch, ...
Other Posts by Paul Barsch
The moderated business community for business intelligence, predictive analytics, and data professionals.
|How do you innovate effectively and maintain a competive edge?|
Learn how in our exlcusive ebook, "Bad Data Need Not Apply: Designing the Modern Data Warehouse Environment."