Sign up | Login with →

Comments by Dean Abbott Subscribe

On 10 Amazing Data Analytics Platforms Everyone Should Know About

Part of the Amazon suite is Redshift, a postgres database that is an offshoot of Par Accel. very fast. very efficient. very scalable. I've enjoyed using it very much.

February 26, 2015    View Comment    

On It's a Bird! It's a Plane! No, It's Just a Data Scientist.

As usual, right on the mark, and thanks for retweeting the link to this.  I expecially echo Keith's remarks that he wasn't half the modeler 5 years ago as he is today; I feel the same way. Of course Keith, you and I have a built-in advantage: we often teach the subject so have a built-in excuse to stay current, and we all have the psychological makeup to want to learn more.

I'll never forget talking to a colleague in 1988 about how he stays so current, to which he replied that you have to read and read and read. All of us in the field will never learn solely by working on projects. We have to constantly "troll" for ideas that will augment how we approach analytics, new algorithms that are out there, and new software that will help us some day. I think we all have our short lists of skills we need to acquire in the next year or two. 

thanks again for the post!

June 27, 2013    View Comment    

On Will Dwinnell: 6 Reasons You Hired the Wrong Data Miner

Please set me up. I'm not sure how to do it so I may lean on you a bit to help me through the process. Thanks!


February 8, 2013    View Comment    

On Will Dwinnell: 6 Reasons You Hired the Wrong Data Miner

Correct, Will. I haven't figured out how to separate the posts here on SmartDataCollective and the author at the bottom of the post on isn't picked up here.

December 23, 2012    View Comment    

On To Sample Or Not To Sample… Does It Even Matter?

I was sure we were talking about the same thing, but just as you write, I was trying to clarify. 

To me one interesting aspect of all of this is that it feels like we've walked into a time-warp back to the mid-90s. I remember when "data mining" conferences were just starting up, and one, I think called the "data mining summit" co-located with a VLDB conference. Part of the buzz there (maybe 1995?) was about sampling, with some arguing that we shouldn't sample these "big" (1M record) datasets (i.e. tables) because we may lose key patterns from the training data. As Jerry Friedman say (quoting, I think Brad Efron), "those who ignore statistics are condemned to reinvent it".

April 9, 2012    View Comment    

On To Sample Or Not To Sample… Does It Even Matter?

Nice post Bill! 

Taking out your caveats of when sampling is not necessary (i.e., for queries of top customers), there are two primary reasons for sampling:

1) computational efficiency (larger data takes too long to process)

2) statistical consistency (train/test/validate subsets to avoid overfitting)

#1 is optional. I fully agree with you that typically one doesn't need all the data to build effective models...eventually there is a point of diminishing returns with bigger data because the same patterns are being reinforced again and again and nothing new will come into the models

#2 is not optional. Without it, we can never be sure our predictive models are behaving consistently, and therefore we have no assurance that the models will behave well on new data.  

April 9, 2012    View Comment    

On The Big Data Debate – Scientist vs. Analyst


Thanks for posting the summary of the GartnerChat. This is an extremely important topic that I have spoken on but apparently haven't blogged on yet (surprisingly). I'll just comment on one item here for now.

"While the jury is still out on whether you need highly trained scientists to empower your data no matter how big it is"...

I'll state here that the jury is back in, and the answer is this: "no, you don't need highly trained scientists" to get value from your data (which is what I assumed was meant by "empower your data". I've trained more than a thousand analysts and have heard back and continue to meet with dozens of them--they can do it with only my minimal training. I've worked with several organizations helping completely green in-house analysts move from having no idea how to gain value from their data to being productive analysts saving $$ and improving ROI. 

This doesn't mean there isn't a need and role for highly trained scientists to tackle the hardest of problems. But this is just like any other field: there are orders of magnitude more relatively easy problems to solve than very hard problems to solve. With no course training and by only reading a manual, I was able to change alternators and water pumps in my car. That's ROI. But when the transmission failed, there's no way I was going to tackle that, so I needed a highly trained professional. 

By the way, the bios you have for the top 5 data scientiests are fantastic: the diversity of backgrounds is interesting. I think my favorite though is DJ Patil, the "math dunce" See my smartdatacollective Target, Pregnancy and Predictive Analytics, Part I for a description on way predictive analytics is not math. :)


February 23, 2012    View Comment    

On Target, Pregnancy, and Predictive Analytics - Part I

we're in complete agreement, especially with who does data mining and how they do it. You articulated it so well!

February 22, 2012    View Comment    

On What Is a Data Scientist (and What Isn't)?

I agree that some kind of certification program, one that is rigorous though not theoretical, is necessary. I've batted this around with some peers in the past and will continue to do so. I think you are right that this isn't the role of the University; Universities will focus on the theory and mathematics of predictive analytics and rightfully so (the Type I data scientists you define above). Industry needs to speerhead a certification program based on what is necessary in industry to demonstrate competence and excellence in predictive analytics and "data science".


February 19, 2012    View Comment    

On Doing Data Mining Out of Order


Great to hear from you! SEMMA is an excellent methodology as well, and overlaps significantly with CRISP-DM. SEMMA, as you know, is more geared for the analytics part of data mining, whereas CRISP-DM is more project oriented; CRISP-DM starts with Business Understanding where SEMMA starts with the data. 

I will definitely read through the link you provided.




January 24, 2011    View Comment    

On The three legged stool - business, analytics, IT

Thanks for pushing this collaboration, James. I've also been arguing for 10 years that a collaboration between business, IT and analytics is a necessary condition for success (even invoking that 3-legged stool image!), though you describe it far more eloquently that I.  I've seen far too many projects fail when one of these is missing, and in particular, when the data miners don't have clear guidance and buy-in from the decision-makers.

October 25, 2010    View Comment    

On Analytics and the myth of the aha moment

You know what they say, "the key to happiness is low expectations". Maybe I suffer from a low threshold for the "aha" moments, but I find that on most projects where I build predictive models there is an "aha" moment. I don't mean a revolutionary idea, but an unexpected very good result that was previously unknown. Sometimes this happens early in the project when we are looking at the data, probably the first time anyone had looked closely at the data. 

On a project a few years ago with a public sector customer, our "aha" moment was realizing that the approach we were taking to risk modeling would never work, and led to redefining risk, and a second 'go' at it this year. We are seeing patterns in the data now that are so striking that the domain experts we have brought in are very excited about what we are finding.  

I don't dispute that it is rare that there is almost never one revolutionary, game-changing, disruptive "aha" moment from predictive analytics, but if we don't see something that makes us go "ooohhhhh, so that's why ...", then our models are going to be only marginally better.

How's that for provocative back! :)


October 8, 2010    View Comment