The Art of Pickling Data

August 24, 2009
182 Views

I have never pickled before, and probably won’t but I do enjoy eating them. The best tasting pickles one can imagine were pulled out of our 69 year-old backpacking mountaineer\pickling savant companion’s backpack last week. Yes, he brought a jar of pickles into the mountains, which we all enjoyed and devoured. So to the man known as ‘Uncle Dave’, I salute you and here’s a little analogy of pickling data. Besides who doesn’t like a good crunchy pickle.

— 1) In pickling we need to sterilize the equipment. Otherwise you may get contaminants that can ruin your pickles.

In datawarehousing, we need a computer or server to store the data electronically. You want to start with a clean server to maximize the amount of data you can store and to ensure no ‘cross-contamination from old tables’. I haven’t heard of this happening other then in mainframe environments; where the back-end data from ‘shadow tables’ can still come back and repopulate the ‘main’ front-end tables. If the bad data was not removed from both the back-end and front-end tables simultaneously contamination will happen.

— 2) Prepare the brine with salt, vinegar, garlic and other spices/ingredients to create your pickling


I have never pickled before, and probably won’t but I do enjoy eating them. The best tasting pickles one can imagine were pulled out of our 69 year-old backpacking mountaineer\pickling savant companion’s backpack last week. Yes, he brought a jar of pickles into the mountains, which we all enjoyed and devoured. So to the man known as ‘Uncle Dave’, I salute you and here’s a little analogy of pickling data. Besides who doesn’t like a good crunchy pickle.

— 1) In pickling we need to sterilize the equipment. Otherwise you may get contaminants that can ruin your pickles.

In datawarehousing, we need a computer or server to store the data electronically. You want to start with a clean server to maximize the amount of data you can store and to ensure no ‘cross-contamination from old tables’. I haven’t heard of this happening other then in mainframe environments; where the back-end data from ‘shadow tables’ can still come back and repopulate the ‘main’ front-end tables. If the bad data was not removed from both the back-end and front-end tables simultaneously contamination will happen.

— 2) Prepare the brine with salt, vinegar, garlic and other spices/ingredients to create your pickling solution, bring to a boil.

Prepare your scripts, data loading jobs, data models,tables, attributes, your data quality routines and more. I included data quality routines because you want to study the trends determine when they break from the norm. Data quality is the spice that will make it all better.

— 3) Boil vegetables place in jar with pickling solution, and seal.

Prepare your files and run the jobs to load the data in your repository.

— 4) After a few weeks, enjoy the pickles of your labour, the crunchier the better.

Unlike pickling, you can begin to enjoy the crunchy bits of your data and what they are telling you immediately after the data is stored. It might not be tasty but it may very well be interesting. After all, the interpretation of data is information, and information is power.