- Item 1
- Item 2
- Item 3
- Item 4
Meta AI Unleashes Megabyte, a Revolutionary Scalable Model Architecture
Meta's research team unveils an innovative AI model architecture, capable of generating more than 1 million tokens across multiple formats and exceeding the capabilities of the existing Transformer architecture behind models like GPT-4.
Meta's new proposed AI architecture could replace the popular Transformer models driving today's language models. Photo illustration: Artisana
🧠 Stay Ahead of the Curve
Meta AI researchers have proposed a groundbreaking architecture for AI decoder models, named the Megabyte model, capable of producing extensive content.
The Megabyte model addresses scalability issues in current models and performs calculations in parallel, boosting efficiency, and outperforming Transformers.
This innovation could instigate a new era in AI development, transcending the Transformer architecture and unlocking unprecedented capabilities in content generation.
May 23, 2023
A Meta team of AI researchers has proposed an innovative architecture for AI models, capable of generating expansive content in text, image, and audio formats, stretching to over 1 million tokens. This groundbreaking proposal, if embraced, could pave the way for the next generation of proficient AI models, transcending the Transformer architecture that underpins models such as GPT-4 and Bard, and unleashing novel capacities in content generation.
The Constraints of Current Models
Contemporary high-performing generative AI models, like OpenAI's GPT-4, are grounded in the Transformer architecture. Initially introduced by Google researchers in 2017, this architecture forms the backbone of emergent AI models, facilitating an understanding of nuanced inputs and generating extensive sentences and documents.
Nonetheless, Meta's AI research team posits that the prevailing Transformer architecture might be reaching its threshold. They highlight two significant flaws inherent in the design:
With the increase in the length of inputs and outputs, self-attention scales dramatically. As each word processed or produced by a Transformer language model requires attention to all other words, the computation becomes highly intensive for thousands of words, whereas it's less problematic for smaller word counts.
Feedforward networks, which aid language models in comprehending and processing words through a sequence of mathematical operations and transformations, struggle with scalability on a per-position basis. These networks operate on character groups or "positions" independently, leading to substantial computational expenses.
Megabyte Model: The Game Changer
The Megabyte model, introduced by Meta AI, showcases a uniquely different architecture, dividing a sequence of inputs and outputs into "patches" rather than individual tokens. Within each patch, a local AI model generates results, while a global model manages and harmonizes the final output across all patches.
This methodology addresses the scalability challenges prevalent in today's AI models. The Megabyte model's patch system permits a single feedforward network to operate on a patch encompassing multiple tokens. Researchers found that this patch approach effectively counters the issue of self-attention scaling.
The patch model enables Megabyte to perform calculations in parallel, a stark contrast to traditional Transformers performing computations serially. Even when a base model has more parameters, this results in significant efficiencies. Experiments indicated that Megabyte, utilizing a 1.5B parameter model, could generate sequences 40% quicker than a Transformer model operating on 350M parameters.
Using several tests to determine the limits of this approach, researchers discovered that the Megabyte model's maximum capacity exceeded 1.2M tokens. For comparison, OpenAI's GPT-4 has a limit of 32,000 tokens, while Anthropic's Claude has a limit of 100,000 tokens.
Shaping the AI Future
As the AI arms race progresses, AI model enhancements largely stem from training on an ever-growing number of parameters, which are the values learned during an AI model's training phase. While GPT-3.5 was trained on 175B parameters, there's speculation that the more capable GPT-4 was trained on 1 trillion parameters.
OpenAI CEO Sam Altman recently suggested a shift in strategy, confirming that the company is thinking beyond training colossal models and is zeroing in on other optimizations. He equated the future of AI models to iPhone chips, where the majority of consumers are oblivious to the raw technical specifications. Altman envisioned a similar future for AI, emphasizing the continual increase in capability.
Meta’s researchers believe their innovative architecture arrives at an opportune time, but also acknowledge there are other pathways to optimization. Promising research areas such as more efficient encoder models adopting patching techniques, decode models breaking down sequences into smaller blocks, and preprocessing sequences into compressed tokens are on the horizon, and could extend the capabilities of the existing Transformer architecture for a new generation of models
Nonetheless, Meta’s recent research has AI experts excited. Andrej Karpathy, the former Sr. Director of AI at Tesla and now a lead AI engineer at OpenAI, chimed in as well on the paper. This is “promising,” he wrote on Twitter. “Everyone should hope that we can throw away tokenization in LLMs. Doing so naively creates (byte-level) sequences that are too long.”
NewsAI and Media Titans Quietly Hash Out Future of Content Licensing
June 16, 2023
ResearchIn Largest-Ever Turing Test, 1.5 Million Humans Guess Little Better Than Chance
June 09, 2023
NewsHigh-Profile AI Leaders Warn of “Risk of Extinction” from AI
May 30, 2023
NewsKey Takeaways from OpenAI CEO Sam Altman's Senate Testimony
May 16, 2023
NewsOpenAI Readies Open-Source Model as Competition Intensifies
May 15, 2023
ResearchChatGPT Trading Algorithm Delivers 500% Returns in Stock Market
May 10, 2023
NewsLeaked Google Memo Claiming “We Have No Moat, and Neither Does OpenAI” Shakes the AI World
May 05, 2023
NewsChegg’s Stock Tumble Serves as Wake Up Call on the Perils of AI
May 03, 2023
NewsHollywood Writers on Strike Grapple with AI’s Role in Creative Process
May 02, 2023
ResearchGPT AI Enables Scientists to Passively Decode Thoughts in Groundbreaking Study
May 01, 2023
NewsChatGPT Grows in Popularity as Bing and Bard Flatline
April 27, 2023
ResearchStanford/MIT Study: GPT Boosts Support Agent Productivity by up to 35%
April 26, 2023
NewsSnap's My AI Feature Faces Unexpected Backlash from Users
April 24, 2023
News"Next to Impossible": OpenAI's ChatGPT Faces GDPR Compliance Woes
April 20, 2023
NewsMicrosoft's AI Chip Strategy Reduces Costs and Nvidia Dependence
April 18, 2023
News4 Million Accounts Compromised by Fake ChatGPT App
April 17, 2023
NewsEU's AI Act: Stricter Rules for Chatbots on the Horizon
April 14, 2023
ResearchStudy: Assigning Personas Creates a Sixfold Increase in ChatGPT Toxicity
April 13, 2023
ResearchGPT-4 Outperforms Elite Crowdworkers, Saving Researchers $500,000 and 20,000 hours
April 11, 2023
ResearchGenerative Agents: Stanford's Groundbreaking AI Study Simulates Authentic Human Behavior
April 10, 2023
ResearchBye-Bye, Mechanical Turk? How ChatGPT is Making Humans Obsolete
April 09, 2023
NewsMayor Threatens Landmark Defamation Lawsuit Against OpenAI's ChatGPT
April 06, 2023
NewsOpenAI's ChatGPT Suspended in Italy Amid Privacy and Cybersecurity Concerns
March 31, 2023
NewsCiting "Profound Risks to Society," Prominent AI Experts Call for Pause
March 29, 2023
NewsEuropol Warns of ChatGPT's Dark Side as Criminals Exploit AI Potential
March 28, 2023
CultureAs Online Users Increasingly Jailbreak ChatGPT in Creative Ways, Risks Abound for OpenAI
March 27, 2023
NewsAI Researchers Voice Disappointment at GPT-4’s Lack of Openness
March 16, 2023