Back to the Future: Moving forward by revisiting the past

March 1, 2012
133 Views

As I am about to embark on the bi-annual Teradata User Group tour (and hope to see some of you there!) to discuss our latest 14.0 release, I wanted to get on my soapbox about a topic that frequently arises.  This is the lament about not having capability when in fact the capability currently exists (or is in the “next release”). 

As I am about to embark on the bi-annual Teradata User Group tour (and hope to see some of you there!) to discuss our latest 14.0 release, I wanted to get on my soapbox about a topic that frequently arises.  This is the lament about not having capability when in fact the capability currently exists (or is in the “next release”). 

I am constantly asked how to resolve certain performance issues or query challenges.  I typically find that the code in question is old and although it has been meeting the need, it is starting to strain a little bit or starting to inhibit new workloads.  Let’s look at an example and apologies upfront for the Teradata specific example though I know that all databases (and all application environments) have similar issues.

There are some incompatibilities between some of the Teradata indexes and the load utilities, most famously using a unique secondary index against a table you want to load with the multi-load process.  The work-around was to either not have the index, drop and recreate it as necessary, or use some other loading technique which may not be as effective but does not have the limitation.  As this was a recognized limitation, it was on the list to be addressed.  In the Teradata release 12.0, the merge-into process was introduced and now you can easily keep your indexes on the tables while updating the rows using the new process.

So what is the problem?  You have to go back and re-optimize your load strategies and of course there is no time or money to accomplish that task.  This is a big mistake and a common situation as to why data warehouses do not move forward as quickly or as successfully as they could otherwise.  Keep in mind that this tuning and optimization is not just for the database but the entire stack as well, from the network all the way up to the applications (and even training!)

Part of the IT budget needs to be reserved for the introduction and retro-fitting of new capabilities into your existing process.  All vendors are constantly trying to improve their product, often times at the requests from their customers.  If you are not willing to move along with the product then you end up in a crisis management situation, i.e. you only manage a situation when it is a crisis!  You can certainly try to overcome the inefficiencies with more hardware but even that has limits (and will be much more expensive then optimizing your current processes.

The simple analogy is to your car or your house.  These are large purchases that get used every day.  You would not forgo maintenance on your car or upgrades to your house to make them more efficient and effective.  You do have to plan for major implementations but at least get them on the roadmap and know they need to be addressed, hopefully sooner rather than later..

What are some of the most common areas that need attention?  The following is by no means an exhaustive list but should give you a good place to start to identify quick wins in capacity and performance.

Network:

I’ll start with the easiest component and one that shows up often as an issue.  This effects just about every aspect of the environment as it is the communication layer.  If your network is out of date, or under-capacity then everything runs poorly and no amount of optimization in the other areas will be effective.

Load strategies:

This is a pretty easy one as if you are not already loading data intra-day then you need to revisit that decision.  Intra-day loading allows you to shorten the time lag between when data is available and when action can be taken.  Years ago overnight was OK, but today companies want to be able to respond to challenges within hours or minutes of the event occurring in the business.

Also, if you are not already allowing user queries to be run against tables that are also being loaded then you need to revisit that decision as well.  Obviously there are transactional integrity and timing needs across tables that need to be considered but all that is easily controlled by using temporal tables or other forms of views and consistency processes.

In addition to the frequency and query accessibility during the loads, you should take the time to see what optimization have been made to the loading utilities, or integration with your loading tool of choice, since the initial creation. 

For example, Teradata has the Parallel Transporter environment which provided greater coding and control than the stand alone utilities.  We have also included much better workload management control via the Viewpoint portals and removed limits that once created the need for creative work-arounds.

BI tools and query performance:

A lot has changed over the past 5 to 10 years in the field of query tools and analytics.  There has also been incredible advancement in in database processing to eliminate the need to constantly extract and process data off your data warehouse platform.

Changing BI tools is not always an attractive option but periodically you need to revisit if the tools are giving you the value for the money.  Are you on the latest and greatest releases?  Have you optimized the tools to take advantage of push down optimizations where possible?  Have you kept the user community abreast and trained on new best practices or new capabilities to enable actionable analytics instead of just “report and regret” applications?

Now is a good time to look back at the real response time of your front tools.  As you start to load data more frequently, it may be better to go at the data directly rather than having daily summaries, cube extracts or heavy indexing.  The point being that it is better to get a 1 minute answer from data that is 15 minutes old than to get a 2 second answer from data that is a day old.

Two major areas that can be investigated are the whole “service oriented” and “self-service”.  Using SOA processes, one can drive the architecture to be more flexible and adaptive to new technologies.  Creating a separation between the “data layer” and the “presentation layer” also allows for increased rate of change in your business.  The SOA environment also allows for greater reusability of applications and lowers cost of future development.  All of this should be looked at if you are not already employing such an environment.

In a similar vein, “self-service” and data visualization tools are now table stakes to a well-developed data warehouse.  Using application portlets or tools such as a Tableau allow users to customize analytics and create new avenues of investigation.  Take a look at your current state of usability and take the time to update your roadmaps and development plans.

New data types:

The breadth of analytics has exploded and correspondingly the arena of data that is now “analytically interesting” has grown as well.  Companies have been using the detailed transactional data for many years.  Today they are rising above that space and moving from transactions to interactions.  This gives a better understanding of not only what is happening by where and why it is happening as well.

Companies have integrated geospatial, temporal, detailed web logs, and even social media feeds into their data warehouse.  This data can then be analyzed for their own insights or can be combined with the traditional corporate data that has been so useful for so many years.

By incorporating the new data, and leveraging the new BI tools you can increase your usage of in-database analytics for deeper understanding of your customers or corporate processes.

The inclusion of these new data types would also allow you to take a look at your data model.  Is it agnostic to your application for the maximum flexibility?  Are you employing the right level of views and indexing rather than creating summaries, denormalizations, and extracts?

The bottom line:

Data warehousing and business intelligence is all about understanding the business so you can change the business.  That constant state of business change then reverberates back to IT as new demands and requirements. 

Technology partners are constantly enhancing their products to help you meet those ever changing needs.  It is important to remember that some simple and proactive maintenance of your existing environment can provide huge pay-offs.  Complaining about a flat tire when the new tire is in the trunk does you no good.  If you are not leveraging the latest capability of your technology now then what are you waiting for?

Let’s see how this goes.  The theme was more on the technical side and if responses indicate I can explore the business change drivers in an upcoming blog.