Leveraging Collective Intelligence: Transforming Medical Data Labeling for AI Advancements

     During his doctoral studies at MIT's Center for Collective Intelligence, Erik Duhaime, PhD '19, made an intriguing observation. While working on his thesis, he noticed his wife, who was a medical student at the time, spending hours engrossed in study apps that offered flashcards and quizzes. Concurrently, his research revealed that groups of medical students, collectively, were able to classify skin lesions more accurately than professional dermatologists. This realization led Duhaime to devise a method that involved continually measuring each student's performance, discarding opinions from underperformers, and intelligently pooling the insights of high-performing individuals.

    Combining his wife's study habits with his research findings, Duhaime went on to found Centaur Labs, a company that developed a mobile app called DiagnosUs. The app aimed to gather the opinions of medical experts on real-world scientific and biomedical data. Users could review a variety of materials, ranging from images of potentially cancerous skin lesions to audio clips of heart and lung sounds indicative of underlying issues. Incentivizing accuracy, Centaur Labs rewarded users with small cash prizes for their opinions. These opinions, in turn, served to train and improve the algorithms of medical AI companies.

    Centaur Labs' approach bridged the gap between the desire of medical experts to refine their skills and the pressing need for well-labeled medical data for AI-based biotech, pharmaceutical, and medical device development. By gamifying the process, Centaur Labs engaged users, particularly medical students, in the task of labeling data while simultaneously enhancing their skills.

    Duhaime's research on the wisdom of crowds phenomenon further informed Centaur Labs' approach. While it might not be suitable to ask random individuals for medical opinions, the concept of second opinions in healthcare remained valuable. Duhaime conducted experiments where he trained groups of laypeople and medical students, whom he referred to as "semi-experts," to classify skin conditions. By combining the opinions of the highest-performing individuals, he surpassed the accuracy of professional dermatologists. Additionally, Duhaime discovered that combining AI algorithms trained to detect skin cancer with expert opinions yielded even more promising results.

    During his PhD studies, Duhaime leveraged the entrepreneurial ecosystem at MIT to develop Centaur Labs. Funding from MIT's Sandbox Innovation Fund and participation in the delta v startup accelerator provided valuable support. Subsequently, Centaur Labs secured a place in the renowned Y Combinator accelerator.

    Centaur Labs' flagship app, DiagnosUs, created in collaboration with co-founders Zach Rausnitz and Tom Gellatly, offered users an opportunity to test and improve their medical skills. Medical students constituted approximately half of the user base, with the other half comprising doctors, nurses, and other medical professionals.

    Centaur Labs garnered millions of opinions every week from tens of thousands of users worldwide. While most users earned modest amounts, the platform's highest earner was a doctor from Eastern Europe who made around $10,000. The app's accessibility allowed users to contribute opinions from various locations, whether on their couches or during their commutes, turning the labeling process into an enjoyable and engaging experience.

    Centaur Labs' approach deviated from traditional methods of data labeling and AI content moderation, which often involved outsourcing to low-resource countries. Instead, it harnessed the collective intelligence of a diverse global user base.

    Centaur Labs' accuracy was substantiated through rigorous studies. Collaborating with researchers from esteemed institutions such as Brigham and Women's Hospital, Massachusetts General Hospital (MGH), and Eindhoven University of Technology, Centaur Labs demonstrated that crowdsourced opinions on lung ultrasounds were as reliable as those of experts. Another study conducted in collaboration with researchers at Memorial Sloan Kettering showed that crowdsourced labeling of dermosc