Data models should work across regions, nature and data sets. This means data universality’. If natural phenomena exhibit universal patterns like geometry, outliers, 80-20 principle, mean reversion, fat tails, etc, then the data these natural phenomena generate should also express similar behaviour; actually, they do. But we still consider data sets as religious — stock market data are for financial analysts while sub-atomic data are for physicists and social network data are for marketers.
We see similar trends in all data sets but we don’t mix and match sub-atomic, social and stock market data. Why? Stock market data can’t be reconciled with sub-atomic data as the respective elements vibrate at a different frequency. But social data have a workable frequency with stock markets. This is the reason Twitter forecasting is in vogue, and companies are relying on the sentiment today to understand consumers and market trends.
What’s the problem? The problem is like management author Charles Handy mentioned paraphernalia’. Calling a seamstress a designer does not change her real role. In the process of finding and worshipping big data tools, we claim to have moved to the next stage without acknowledging the elephant in the room. It’s the same elephant but we call it something else. We choose to ignore that the answer to tomorrow’s problems is not in a discipline but between disciplines.
It was about 36 months earlier that we started compiling a small non-capital market exercise. We took Google search data for Fortune 500 companies, various emotions (fear, greed, happiness, etc) and ran our data algorithms on the same. Just like gold, oil and the dollar, we could create cycles of growth and decay for simple web data. We could predict which Fortune 500 companies would be searched more and which would see decay in search. Around mid-September, we went a step ahead. We actually benchmarked it to Google search data. Assuming Fortune 500 search data to be a set of time series like those of stocks, we applied the ORMI (Orpheus Risk Management Index) active methodology to select which of the top Fortune 500 companies would be searched more, assuming we made money if our portfolio of 10 selected companies were searched more. This is what the companies actually need to know: Are they going to be searched more or searched less (assuming positive search is positive bandwidth).
In a short period of 24 months (search data has limited history) our ORMI Fortune Index moved up from 100 points to 120 while an equal weighted Google search data of Fortune 500 fell by 10 per cent. ORMI Fortune Index outperformed its respective universe by 30 per cent over 24 months. This outperformance was a proof of predictive-ness, which is where data mining should head rather than subjective extrapolation, which can’t be quantified. How much is your data-mining adding to your bottom line is quantifiable. The top 10 potential search growth companies on Google search lead to six selections: Amgen, Dow Chemical, Halliburton, McKensson, Danaher, Chevron; four components in the model were cash. How we integrate this model to stock market forecasting is another step. But the current work proves how lacking the current data tools are.
The problem with search is a lack of smart catalogued databases, which can understand each other. Only when databases are able to understand each other can data come alive and make search smarter. The age of big data accompanies numerous data types like web data, social data, and consumer data. Hence it has become essential to lay down a framework for data universality. This means commonality of behaviour, commonality of patterns and data character. Such guidelines could make data visualisation, transformation and interpretation easier. The natural universalities leading to data universality can harmonise big data classification and improve predictive models.
What does real data seasonality tell us? Variable growth and decay for multiple periods (for example, intra-day, weekly, quarterly, or for the decade ahead). Why is it important? Because nature is not just about growth, it’s also about decay. Extrapolation of a trend is incomplete science; understanding when that extrapolation will peak is as important. If all data sets exhibit a universal behaviour, data manipulation should be based on these universalities and seek, identify and enhance the respective natural patterns.
The author is CMT, and Founder, Orpheus CAPITALS, a global alternative research firm