While I was writing the last post I was wondering how long before my followers will notice the mistakes I introduced into the experiments.
Let’s start the treasure hunt!
1. Don’t always trust your data: often they are not homogeneous.
In the post I put in relation the quakes in the range time between [~1800,1999] with the respective sunspots distribution.
A good data miner must always check his dataset! you should always ask to yourself whether the data have been produced in a congruent way.
While I was writing the last post I was wondering how long before my followers will notice the mistakes I introduced into the experiments.
Let’s start the treasure hunt!
1. Don’t always trust your data: often they are not homogeneous.
In the post I put in relation the quakes in the range time between [~1800,1999] with the respective sunspots distribution.
A good data miner must always check his dataset! you should always ask to yourself whether the data have been produced in a congruent way.
Consider our example: the right question before further analysis should be: “had the quakes magnitude been measured with the same kind of technology along the time?”
I would assume that is dramatically false, but how can check if our data have been produced in a different way along the time?
In this case I thought that in the past, the technology wasn’t enough accurate to measure feeble quakes, so I gathered the quakes by year and by the smallest magnitude: as you can see, it is crystal clear that the data collected before 1965 have been registered in different way respect the next period.
![]() |
The picture highlights that just major quakes (with magnitude > 6.5) have been registered before 1965. This is the reason of the outward increasing of quakes! |
… In the former post I left a clue in the caption of “quakes distribution” graph 🙂
![]() |
The size of the bubble is representative of the magnitude of the quakes |
![]() |
The size of the bubble is representative of the number of the quakes |
![]() |
The Magnitude forecasting (on the left the training set, on the right side the behavior of the forecasting model over the test set). The mean error is around +/-1.5 degrees. |
![]() |
Moving Median Filtering applied to the Magnitude regressor. |
Leave a Review