Edith on GT : A BI solution for Advanced Data Mining

January 1, 2009
105 Views

    About the Author-   Edith Ohri heads a pioneering data-mining company in Israel which is dedicated to the application of GT – a new DM solution for unsupervised and complex data. Her background is Industrial & Management Engineering, MSc. She had started researching the issue of data mining in the early 80’s, and has continued with it ever since. She created a new model (GT) which enables larger and more complex data analysis. In 2002 s

    About the Author-   Edith Ohri heads a pioneering data-mining company in Israel which is dedicated to the application of GT – a new DM solution for unsupervised and complex data. Her background is Industrial & Management Engineering, MSc. She had started researching the issue of data mining in the early 80’s, and has continued with it ever since. She created a new model (GT) which enables larger and more complex data analysis. In 2002 she started in SMU Singapore the development of GT software. She is involved in several areas of implementation, such as: BI, Quality Control, Bio-med and Research. She manages a DM forums with Israel Engineering Association and a DM forum with the Data Warehouse site (Israel). She is a member and active participant in a number of DM forums, give presentations, and write articles.

 

December 29, 2008

GT data mining of NYSE companies – example

This is an example of data mining with GT, based on web free data from http://www.ics.uci.edu.

The purpose is to demonstrate the ability to create a coherent explanation to complex, partial, incomplete, non-representative and unsupervised data. In this case the data also is restricted to a single point of time and exclude information regarding shares, and therefore is particularly difficult for analytics.

Given: two set of 1000 records each about companies in the New York Stock Exchange year 2000 (just before the dotcom bubble burst). The records include 22 attribute describing the company field, its state of investments, assets, liabilities, expenses, R&D, sales, profits, dividends and other major elements from the Public Report Statement, except information regarding shares.

The method:
  1. Define clusters based on just half of the data, find their characteristics and drivers, and conclude about the phenomena which they may represent.

  2. Validate the results by projecting them on the other half of data. Once the stability of conclusions is re-affirmed, the following last part of the analysis.

  3. interpretation takes place. Usually it is done in collaboration with the client, in the example it shows basically just in outlines to give a sense of it.

General observation

The pattern splits to two parts:

  1. The financially intense industries, such as Banking, Financial Services, Energy and Real Estates;

  2. The rest of industries – Business Services, Transportation, Communication, Technology at large, Raw Materials, and Health Care. See Cluster map Fig.1.

Fig. 1 Cluster map: strong relations among record clusters are marked by Red Purple, no-relations are marked in Light Green. The map shows a polarized patterns, the financial (in the low top) and the rest. Next to the Financial pattern there is a small exception sub-group, titled in Red. Note that the Technology pattern is much diverse compared with others.

Conclusions and explanations

  1. False profitability – a warning sign

Some of the Technology companies “behave” like Financial companies, instead of their own industry’s behavior. One explanation is that they channeled the large sums of money raised in a “heated” Stock Exchange, to financial activities. In such a case, the reported high profits are not a sign of sound companies but rather a symptom of a dangerously inflated market, and while the graphs present profitability and encourage investors to continue in that practice, the phenomenon is racing toward a crisis later known as the fall of the DotCom.

  1. Investments in loosing companies – high risk

There is a sub-group of Technology companies that have substantial losses yet manage to attract massive investments.

The sub-group characteristics are: a very low levels of long term liabilities and long term assets, and a high level of preferred stocks. In this sub-group a unlikely negative relation is found between company’s Total Assets and Net Income. See Fig.2.

  1. Conglomerates with “Banks” traits – need to be looked into

At the margins of the Financial and Technology clusters, there are a number of conglomerates, all of which have an exceptional “behavior”. Although there are mainly industrial companies, their patterns resemble the Energy and the Financial ones.

Remark: knowing the characteristics of the exception behavior enables the analyst “combing” the entire company database, and finding by simple query additional companies that might demonstrate similar irregularity, for further study.

Fig. 2 Technology companies: special behavior. In the red part there is an irrational phenomenon, where losing companies still manage to attract investments

 

 

 image

Fig. 3 In Statistics, the Technology special pattern does not show at all, and is un-distinguishable from the general pattern of behavior

 

 

GT Second edition note

Running the data in the upgraded version of GT shades more light over the 2000 phenomenon and reveals among the rest an interesting exceptional behavior of a few financial organizations, which apparently found a different way to create money… Their profit seems to enjoy a much larger net value than of other companies. See the chart below. The organizations are: HSBC Holdings PLC, Chase Manhattan Corp., and Societe Generale Group. This fact from 2000 may be part of the kind of practices that have led 8 years later to the current “credit crunch”.

Fig. 4 Exceptional high profits ratio from sales is observed in one sub-group of financial organizations – HSBC Holdings, Chase Manhattan, and Societe General

Final words

GT produces a fresh view of complex unsupervised data. It can track down even minute phenomena giving early signals to financial managers and analysts about the things to come, their patterns, their spread, drivers and key indicators. The study of this example belongs to a series of applications in which GT has consistently turns out ordinary data to new revelations on “what makes it tick”.

 

© Edith Ohri Procedureware Ltd. POB 16558 Tel-Aviv 61165

Tel: 972-3-5232164 edit@actcom.co.il

A copy of the article can be downloaded directly from  here -GT Article