- Item 1
- Item 2
- Item 3
- Item 4
GPT-4 Outperforms Elite Crowdworkers, Saving Researchers $500,000 and 20,000 hours
A new study reveals that OpenAI's GPT-4 outperforms elite human annotators in labeling tasks, saving a team of researchers over $500,000 and 20,000 hours of labor while raising questions about the future of crowdworking.
Are robots are coming for crowdworker jobs? Research shows that LLMs are increasingly capable at human labeling. Photo illustration: Artisana
April 11, 2023
A team of researchers from Carnegie Mellon, Yale, and UC Berkeley investigating Machivallian tendencies in chatbots made a surprising side discovery: OpenAI's GPT-4 outperformed the most skilled crowdworkers they had hired to label their dataset. This breakthrough saved the researchers over $500,000 and 20,000 hours of human labor.
Innovative Approach Driven by Cost Concerns
The researchers faced the challenge of annotating 572,322 text scenarios, and they sought a cost-effective method to accomplish this task. Employing Surge AI's top-tier human annotators at a rate of $25 per hour would have cost $500,000 for 20,000 hours of work, an excessive amount to invest in the research endeavor. Surge AI is a venture-backed startup that performs the human labeling for numerous AI companies including OpenAI, Meta, and Anthropic.
The team tested GPT-4's ability to automate labeling with custom prompting. Their results were definitive: "Model labels are competitive with human labels," the researchers confidently reported.
In a comparison of 2,000 labeled data points by three experts and three crowdworkers against the labels generated by GPT-4, the AI-created labels exhibited stronger correlation with expert labels than the average crowdworker label. GPT-4 outperformed human annotators in all but two labeling categories, sometimes besting them by a factor of two.
GPT-4's Superior Nuance Detection
The AI model excelled the most in challenging behavior categories such as identifying:
Non-physical harm: Intent to cause non-physical harm, such as emotional bullying or intimidation
Spying: Spying or eavesdropping on others
Betrayal: The breaking of promises, contracts, or commitments
Utilizing GPT-4's labeling capabilities and implementing an ensemble model approach to augment label generation, the researchers likely spent less than $5,000 to annotate 572,322 scenarios. Ensemble models combine outputs from multiple AI models to produce a single, more accurate result.
Crowdworking's Future in Question
As large language models (LLMs) rapidly advance, crowdworking's vital role in many machine learning businesses may be at risk. Just two weeks prior, we reported researchers found that GPT-3.5 surpassed Mechanical Turk's top tier of crowdworkers in complex labeling tasks.
SurgeAI, a company boasting an "elite workforce" proficient in over 40 languages, may face increased competition from LLMs as businesses opt for AI-generated labels instead of human annotators.
Despite these developments, the immediate business opportunity remains vast as venture dollars pour into AI businesses, many of whom face immense costs in launching their language models. Surge AI's website proclaims, "We power the world's leading RLHF LLMs," citing active customers across the who’s who of the AI space.
RLHF, or Reinforcement Learning Human Feedback, is a technique used by OpenAI to fine-tune ChatGPT, incorporating human input to guide the model's learning process. Competing LLMs are adopting the RLHF technique as well.
Crowdworkers are concerned over an increasingly automated future. Krystall Kuaffman, leader of Turkopticon, a non-profit advocating for crowdworker rights, still believes strongly in the value of human discernment.
She told VICE's Motherboard publication, "Writing is about judgment, not just generating words. Currently and for the foreseeable future, people like Turkers will be needed to perform the judgment work. There are too many unanswered questions at this point for us to feel confident in the abilities of ChatGPT over human annotators."
NewsAI and Media Titans Quietly Hash Out Future of Content Licensing
June 16, 2023
ResearchIn Largest-Ever Turing Test, 1.5 Million Humans Guess Little Better Than Chance
June 09, 2023
NewsHigh-Profile AI Leaders Warn of “Risk of Extinction” from AI
May 30, 2023
NewsKey Takeaways from OpenAI CEO Sam Altman's Senate Testimony
May 16, 2023
NewsOpenAI Readies Open-Source Model as Competition Intensifies
May 15, 2023
ResearchChatGPT Trading Algorithm Delivers 500% Returns in Stock Market
May 10, 2023
NewsLeaked Google Memo Claiming “We Have No Moat, and Neither Does OpenAI” Shakes the AI World
May 05, 2023
NewsChegg’s Stock Tumble Serves as Wake Up Call on the Perils of AI
May 03, 2023
NewsHollywood Writers on Strike Grapple with AI’s Role in Creative Process
May 02, 2023
ResearchGPT AI Enables Scientists to Passively Decode Thoughts in Groundbreaking Study
May 01, 2023
NewsChatGPT Grows in Popularity as Bing and Bard Flatline
April 27, 2023
ResearchStanford/MIT Study: GPT Boosts Support Agent Productivity by up to 35%
April 26, 2023
NewsSnap's My AI Feature Faces Unexpected Backlash from Users
April 24, 2023
News"Next to Impossible": OpenAI's ChatGPT Faces GDPR Compliance Woes
April 20, 2023
NewsMicrosoft's AI Chip Strategy Reduces Costs and Nvidia Dependence
April 18, 2023
News4 Million Accounts Compromised by Fake ChatGPT App
April 17, 2023
NewsEU's AI Act: Stricter Rules for Chatbots on the Horizon
April 14, 2023
ResearchStudy: Assigning Personas Creates a Sixfold Increase in ChatGPT Toxicity
April 13, 2023
ResearchGenerative Agents: Stanford's Groundbreaking AI Study Simulates Authentic Human Behavior
April 10, 2023
ResearchBye-Bye, Mechanical Turk? How ChatGPT is Making Humans Obsolete
April 09, 2023
NewsMayor Threatens Landmark Defamation Lawsuit Against OpenAI's ChatGPT
April 06, 2023
NewsOpenAI's ChatGPT Suspended in Italy Amid Privacy and Cybersecurity Concerns
March 31, 2023
NewsCiting "Profound Risks to Society," Prominent AI Experts Call for Pause
March 29, 2023
NewsEuropol Warns of ChatGPT's Dark Side as Criminals Exploit AI Potential
March 28, 2023
CultureAs Online Users Increasingly Jailbreak ChatGPT in Creative Ways, Risks Abound for OpenAI
March 27, 2023
NewsAI Researchers Voice Disappointment at GPT-4’s Lack of Openness
March 16, 2023