Personal Data Mining

October 9, 2012
258 Views
I believe that i have an overactive immune system. I get recurring bouts of Perennial Conjunctivitis and i also experience at times pain on my neck lymph nodes and my right maxillary sinus. I always believed that all of these symptoms were somehow related. Because of my conjunctivitis i was not able to wear contact lenses. My ophthalmologist confirmed that my eye problems were “allergy-related”.
 
Since September 22nd, 2011 i began a personal experiment.

I believe that i have an overactive immune system. I get recurring bouts of Perennial Conjunctivitis and i also experience at times pain on my neck lymph nodes and my right maxillary sinus. I always believed that all of these symptoms were somehow related. Because of my conjunctivitis i was not able to wear contact lenses. My ophthalmologist confirmed that my eye problems were “allergy-related”.
 
Since September 22nd, 2011 i began a personal experiment. I decided to keep a detailed record of various elements of my everyday life : Whether i had a good night sleep and spent time outdoors, what i ate and how much stress i felt. 
 
I carry almost always my smartphone with me. So I used a Text-Editing application to record every day as much detailed information as i could. Here is an example of two consecutive dates as these appear in my daily log :
 
10/11/12, slept/bad, vitamin_c/0, coffee/1, self/ok, stress/low, sausages,cholesterol_food, sugar/5, pasta, tomatoes, mushrooms,  next_sleep/ok 
10/12/12, slept/ok, vitamin_c/500, coffee/2, self/ok, stress/high, bread, honey, milk, sugar/10, meat, garlic, yoghurt, conjunctivitis,  icecream,  next_sleep/ok  
 
So on the first example date, i did not sleep well the previous night. I had one coffee and roughly 5 teaspoons of sugar the whole day. I was feeling ok with myself, i had sausages and eggs (tagged as cholesterol_food) for breakfast and pasta with mushroom (red) sauce for lunch but no dinner. I managed to sleep well at night. I did not have any signs of an overactive immune system. However, the next day i had conjunctivitis.
 
I then had to somehow transform the entries to a suitable format – a .csv file- which could then be used by Data Mining Software (such as R and WEKA) for analysis. To do that, a simple Java program was used to transform all log entries to a .csv format using the following rules :
 
 
1) Each line represents a day.
2) Each entry is separated by comma (“,”)
3) If an entry does not contain a forward slash character (“/”) then it is treated as a Boolean feature. 
4) If an entry contains a forward slash then it is treated as a Numerical or Categorical feature.
 
So our two example dates, are transformed like this ( Not all features are shown) :
 
 
 
R was used to perform several pre-processing steps such as coding a function called addfeature which i use to derive new variables from old ones :
 
data.df<-addfeature(“fiber”,c(“beans”,”stringbeans”,”oats”,”okra”,”lentils”),data.df)
data.df<-addfeature(“cholesterol_food”,c(“eggs”,”mayo”,”octopus”,”squid”),data.df)
data.df<-addfeature(“nuts”,c(“hazelnuts”,”walnuts”,”peanuts”,”cashews”,”almonds”),data.df)
data.df<-addfeature(“immunity”,c(“itchyeyes”,”lymphpain”,”sinuspain”,”conjunctivitis”),data.df)   
  

So if on any day i had eggs, mayo(nnaise), octopus, squid or any combination of these foods an entry of cholesterol_food will be used to replace these entries.

 Having the log transformed to the format shown above, i was ready to analyze a 1-year worth of data (in this case IMMUNITY is the target), extract patterns and several hypotheses – for example that “there appears to be a connection between high stress and over-activity of my immune system.”
 
However we must be aware of the dangers that might lead us to incorrect findings. For instance we must take into account the fact that conjunctivitis usually lasts more than one day and also that some features -like Vitamin-C intake- are special in the sense that the representation shown above does not take into account the compounding effect of Vitamin intake. In other words, i might have to take for  n number of days,  an x amount of Vitamin C consecutively to see any effect. Furthermore, this analysis does not take into account the sequence of events. Should we remove foods/ingredients that  normally co-occur or not? How would that affect results? The list of questions and considerations goes on (and then when we finally have some results from the analysis, the first thing to do is to question them).
Using Data Science i was able to almost stop getting sinus and lymph pain and wear again my contact lenses ( i still get symptoms but very-very rarely). Two foods  appeared to be moderately correlated with  signs of an overactive immune system – with one of them being garlic. One would probably argue that i could find that using a simple food diary –  i  doubt about it since things were not so evident. Once this information was found these two foods were eliminated from my diet to see the outcome.

Analysis has also identified a particular Vitamin that was able -in my opinion- to regulate my immune system response so that i could have no food restrictions. Several other patterns (or hypotheses) emerged that could be used for further evaluation by specialized personnel. Whatever i tried, i tried it under the close supervision and consent from Doctor specialists.

 
It’s a logical next step to imagine the potential knowledge and hypotheses extracted by implementing the same experiment on a wider scale (for example by using Kaggle) .

On the next post : More thoughts, results and warnings