Edith on GT : A BI solution for Advanced Data Mining
About the Author- Edith Ohri heads a pioneering data-mining company in Israel which is dedicated to the application of GT – a new DM solution for unsupervised and complex data. Her background is Industrial & Management Engineering, MSc. She had started researching the issue of data mining in the early 80’s, and has continued with it ever since. She created a new model (GT) which enables larger and more complex data analysis. In 2002 s…
About the Author- Edith Ohri heads a pioneering data-mining company in Israel which is dedicated to the application of GT – a new DM solution for unsupervised and complex data. Her background is Industrial & Management Engineering, MSc. She had started researching the issue of data mining in the early 80’s, and has continued with it ever since. She created a new model (GT) which enables larger and more complex data analysis. In 2002 she started in SMU Singapore the development of GT software. She is involved in several areas of implementation, such as: BI, Quality Control, Bio-med and Research. She manages a DM forums with Israel Engineering Association and a DM forum with the Data Warehouse site (Israel). She is a member and active participant in a number of DM forums, give presentations, and write articles.
December 29, 2008
GT data mining of NYSE companies – example
This is an example of data mining with GT, based on web free data from http://www.ics.uci.edu.
The purpose is to demonstrate the ability to create a coherent explanation to complex, partial, incomplete, non-representative and unsupervised data. In this case the data also is restricted to a single point of time and exclude information regarding shares, and therefore is particularly difficult for analytics.
Given: two set of 1000 records each about companies in the New York Stock Exchange year 2000 (just before the dotcom bubble burst). The records include 22 attribute describing the company field, its state of investments, assets, liabilities, expenses, R&D, sales, profits, dividends and other major elements from the Public Report Statement, except information regarding shares.
Define clusters based on just half of the data, find their characteristics and drivers, and conclude about the phenomena which they may represent.
Validate the results by projecting them on the other half of data. Once the stability of conclusions is re-affirmed, the following last part of the analysis.
interpretation takes place. Usually it is done in collaboration with the client, in the example it shows basically just in outlines to give a sense of it.
The pattern splits to two parts:
The financially intense industries, such as Banking, Financial Services, Energy and Real Estates;
The rest of industries – Business Services, Transportation, Communication, Technology at large, Raw Materials, and Health Care. See Cluster map Fig.1.
Fig. 1 Cluster map: strong relations among record clusters are marked by Red Purple, no-relations are marked in Light Green. The map shows a polarized patterns, the financial (in the low top) and the rest. Next to the Financial pattern there is a small exception sub-group, titled in Red. Note that the Technology pattern is much diverse compared with others.
Conclusions and explanations
False profitability – a warning sign
Some of the Technology companies “behave” like Financial companies, instead of their own industry’s behavior. One explanation is that they channeled the large sums of money raised in a “heated” Stock Exchange, to financial activities. In such a case, the reported high profits are not a sign of sound companies but rather a symptom of a dangerously inflated market, and while the graphs present profitability and encourage investors to continue in that practice, the phenomenon is racing toward a crisis later known as the fall of the DotCom.
Investments in loosing companies – high risk
There is a sub-group of Technology companies that have substantial losses yet manage to attract massive investments.
The sub-group characteristics are: a very low levels of long term liabilities and long term assets, and a high level of preferred stocks. In this sub-group a unlikely negative relation is found between company’s Total Assets and Net Income. See Fig.2.
Conglomerates with “Banks” traits – need to be looked into
At the margins of the Financial and Technology clusters, there are a number of conglomerates, all of which have an exceptional “behavior”. Although there are mainly industrial companies, their patterns resemble the Energy and the Financial ones.
Remark: knowing the characteristics of the exception behavior enables the analyst “combing” the entire company database, and finding by simple query additional companies that might demonstrate similar irregularity, for further study.
Fig. 2 Technology companies: special behavior. In the red part there is an irrational phenomenon, where losing companies still manage to attract investments
Fig. 3 In Statistics, the Technology special pattern does not show at all, and is un-distinguishable from the general pattern of behavior
GT Second edition note
Running the data in the upgraded version of GT shades more light over the 2000 phenomenon and reveals among the rest an interesting exceptional behavior of a few financial organizations, which apparently found a different way to create money… Their profit seems to enjoy a much larger net value than of other companies. See the chart below. The organizations are: HSBC Holdings PLC, Chase Manhattan Corp., and Societe Generale Group. This fact from 2000 may be part of the kind of practices that have led 8 years later to the current “credit crunch”.
Fig. 4 Exceptional high profits ratio from sales is observed in one sub-group of financial organizations – HSBC Holdings, Chase Manhattan, and Societe General
GT produces a fresh view of complex unsupervised data. It can track down even minute phenomena giving early signals to financial managers and analysts about the things to come, their patterns, their spread, drivers and key indicators. The study of this example belongs to a series of applications in which GT has consistently turns out ordinary data to new revelations on “what makes it tick”.
© Edith Ohri Procedureware Ltd. POB 16558 Tel-Aviv 61165
Tel: 972-3-5232164 firstname.lastname@example.org
A copy of the article can be downloaded directly from here -GT Article
You may be interested
How SAP Hana is Driving Big Data StartupsRyan Kh - July 20, 2017
The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…
Data Erasing Software vs Physical Destruction: Sustainable Way of Data DeletionManish Bhickta - July 20, 2017
Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…
10 Simple Rules for Creating a Good Data Management PlanGloriaKopp - July 20, 2017
Part of business planning is arranging how data will be used in the development of a project. This is why…
You must log in to post a comment.