Lessons from Honeybees

Falling Into Big Data Without Knowing It

Josh Schein

Big Data is often perceived as a vast sea of digital information out of which profound insights magically emerge. What that picture can miss, however, is that the best insights arise not from processing large volumes of data, but from testing hypotheses grounded in observable phenomena. Although careers can thrive on statistical studies of Amazon rainfall levels or breakfast cereal preferences, as examples, machine learning will not extract additional value when larger datasets combine these fields.

Our era of exponentially exploding datasets creates opportunities for a small pool of professionals skilled in programming, statistics, and machine learning. A subtle bottleneck arises, however, when the time spent acquiring these skills and the narrow career paths that follow preclude exposure to higher-impact problems. Progress comes from envisioning and exploring connections, whereas Big Data struggles, perhaps more than any other field, to bridge gaps between theory and practice.

Although I did not fully appreciate it at the time, my groundings in predictive analytics began on the data collection side. While pursuing a B.A. in International Studies at West Virginia University, a unique opportunity arose to spend a year as a research assistant with Yale University in a Thai rainforest studying Asian honeybees. By chance, the work had made headlines over allegations of chemical warfare when the United States charged the Soviet Union with promoting yellow rain poison gas in Southeast Asia. Witnesses proved unreliable whereas ground samples taken by the scientists on my project found anomalies such as pollen that effectively disproved the accusations. The “yellow rain” was bee feces.

Honeybees defecate in mass flights around the hive lasting several minutes, and the average bee retains 20% of its body weight in ejecta for this moment. Although we had thought that group flights limited exposure to predators, a later study linked the behavior to nest temperature regulation. In either case, yellow rain and bee feces look alike, and these scientists were able to apply their unique experiences to envision and test a nonobvious hypothesis.

My work involved evolutionary biology with emphasis on foraging energetics, a study of calories spent seeking food. Asian honeybees contrast with those of Europe and the Americas starting with body size differences that made our work akin to comparing the fuel economies of a jetliner to a Cessna. Daily tasks might involve marking bees we needed to track, timing waggle dances to calibrate communication variances between the Asian species, observing later dances to plot foraging patterns in a rainforest, or pushing through dense foliage over the course of a week to track an uncategorized new honeybee species to its nest. One unusual assignment in the name of acquiring data was to build a sturdy box that would protect an observation hive from monkeys.

During the year, I came to appreciate how seemingly mundane observations could yield profound discoveries, as well as the extraordinary care required for collecting viable data. If a step required randomly observing bees on a hive, for example, we had to devise procedures ensure each bee was chosen randomly and not because it caught the eye. In retrospect, I was being immersed in data science best practices with exceptional scientists, and it was a life-changing opportunity.

Seven years later, I entered a jungle of another sort: Wall Street. Information again was key, although the mindsets around pursuing it were different. Bear Stearns’ high-intensity sales environment was a jolting experience. Prior years abroad enabled me to take it on as yet another foreign culture – enjoying it, in fact, but keeping my eyes open. I became increasingly aware that many otherwise savvy people around me did not always think rationally. I regularly saw high-stakes decisions turn on deeply embedded illogic which accessible information could resolve.

Stop for a moment to consider how humans make investment choices. The process usually involves reading or talking to others we trust in specific areas or for the broader picture. Seeking advice across all aspects of our lives is natural. What’s different about Wall Street, however, is that many of the world’s most talented individuals are paid highly to outthink the person advising you. Yet the complexity of markets essentially requires trusting others. Even a passive index fund buyer has relied upon someone else’s performance comparisons against active managers. And like you, Wall Street pros cannot evaluate all things at all times and must delegate trust, which can mean media, research analysts, colleagues, software programs, or other sources. None work particularly well.

During my early career at three leading firms, I was struck by an entrenched mindset at each that “we” hired better analysts than competitors. I finally reached an invaluable insight: it was not logical that each employer could have simultaneously cornered the market for talent. Surprisingly, extensive data on analyst performance existed in the public domain. Fortunately, this trove went unnoticed until well after I had launched a career on it. The data bore similarities to my experiences with honeybees: messy with outliers but usable.

Lacking the pedigree and pecuniary ambition typical of Wall Street, I grasped faster than peers that I had no reasonable expectation of outperforming an index. Importantly, I saw this to also be true of colleagues, acquaintances, or others I might be tempted to follow. This awareness framed an actionable question: does anyone consistently outperform in ways that can be tracked for constructing superior portfolios?

The analyst performance data proved to be a good starting point. After working with it for several thousand hours, I formed a team that built a process for running 50,000 Monte Carlo simulations on each analyst. We found that only 2% had a 75% chance or better of being “above average” while only 1% had 90% odds. This elite cohort worked throughout Wall Street, mostly migrated toward smaller firms over time, and tended to not receive top recognition or pay.

I folded these insights into a broader strategy in developing and running the InsideTrack® portfolios for Citi Smith Barney (merged later into Morgan Stanley) seeking observable and emulatable “smart money.” Tracking insider transactional filing disclosures became a second major component of the approach, and we accepted as clients primarily insiders diversified by industry. As I hoped, these unique individuals became a brain trust inspiring further ideas in niche areas.

In seeking untapped “smart money” opportunities, I came to work closely with colleagues steeped in statistics, programming, and machine learning but little exposed to the client-facing side. I prioritized learning what these people could do and brought them into areas where I thought they might add value. My role became one of identifying the right problems and weighing priorities while working through challenges as a team using all our abilities.

The term “Big Data” can connote aimlessness. It’s no secret that many organizations throw data at challenges without weighing end goals. Another common observation is that the larger players focus on advertising and marketing while leaving other opportunities on the table.

Interestingly, my personal encounters with cutting-edge Big Data peers have mostly involved individuals in non-financial fields, and their experiences have tended to mirror mine with respect to bridging gaps between quantitative and “real-world” perspectives. One executive running online sales for a large retailer found their optimal team to be three PhDs with tech support underneath and an industry veteran on top with the practical experience to realize, for instance, that an unexpected revenue spike might stem from a holiday. My own team had likewise evolved into a tiered structure combining industry knowledge, quantitative analysis, and technical support. In a similar conversation, a professor running a molecular biology lab once lamented to me that he often returned from overnight business trips to find that the programming and math specialists he relied upon had “created a Frankenstein.” A subject matter expert needs to remain closely involved.

As with any scientific process, the hypothesis is a starting point, and unexpected findings can lead research into new directions. Based on experiences with clients, as an example, we were curious as to whether insiders who traded actively would outperform those who didn’t, and our results on more than 150,000 insider transactions strongly showed this pattern. A richer and more interesting behavioral finance tableau emerged, however, when we separated purchases from sales: buy performance peaked on the first trade and thereafter declined steadily whereas sellers initially showed poor timing but improved markedly over later transactions. Previous academic work in the area – and there had been a lot – had rested on an implicit assumption that performance remained constant throughout an insider’s trading career, because no one had thought to test otherwise.

I hope by now to have conveyed a sense that getting into Big Data can be a gradual process. For anyone keen to apply quantitative approaches to scenarios on the ground, the critical success ingredients are capabilities in math and programming combined with hands-on subject matter knowledge. And especially if coming from the “real world,” consider academic collaborators. The endeavor will gain from the involvement of professors and students skilled in statistics, programming, and empirical methodologies, and these individuals will be appreciative and motivated partners.

Finally, have a road map, but move incrementally. Your mistakes will be smaller. Black Swan is a must-read treatise on how an unseen rock can crush anyone’s best-laid plans, as happened with the dinosaurs, although the joke on our team is that the dinosaurs perished for having failed to acquire key information.

We live in a time of rare opportunity, and despite how matters may sometimes seem, the largest players are not going to claim the entire pie. Compelling opportunities remain for those willing to act.