OpenAI's former CTO just made a major move. On May 11, 2026, Mira Murati unveiled what Thinking Machines Lab calls "interaction models": a new AI architecture built for real-time conversations, not turn-based exchanges.
For businesses deploying voice chatbots or AI assistants, this represents a paradigm shift. Here's what you need to know.
What Thinking Machines Lab Announced
The flagship model, TML-Interaction-Small, delivers response latency of 0.40 seconds. For context: Google's Gemini-3.1-flash-live responds in 0.57 seconds, and OpenAI's GPT-realtime-2.0 takes 1.18 seconds.
But speed isn't the main innovation. It's the "full-duplex" architecture that changes everything.
The Problem With Current Conversational AI
Today, most voice AI assistants operate in "turn-based" mode:
- User speaks
- AI waits for user to finish
- AI processes the request
- AI responds
- Back to step 1
This model creates artificial conversations. The AI cannot interrupt to ask for clarification. It cannot react while you're speaking. And critically, it cannot detect when you hesitate and naturally step in.
Thinking Machines' Full-Duplex Solution
Thinking Machines' interaction models work differently. The AI listens, speaks, and processes simultaneously. This is called "full-duplex" communication, like a natural phone call.
Technically, it's a 276 billion parameter model with a Mixture of Experts (MoE) architecture, with 12 billion parameters active at any given moment. This approach maintains speed while offering deep contextual understanding.
Why This Matters for Your Business
If you're using AI chatbots for customer service or considering deploying autonomous AI agents, this announcement has direct implications.
1. User Expectations Will Rise
Your customers will get used to more natural interactions. Voice assistants that impose artificial pauses will feel dated. It's comparable to the transition from static websites to reactive applications: once users taste fluidity, they don't go back.
2. New Use Cases Become Viable
With 0.4-second latency and continuous listening, certain use cases finally become practical:
- Real-time technical support: AI can guide a field technician while listening to their observations
- Interactive training: Sales or negotiation simulations with instant feedback
- Medical assistance: Transcription and suggestions during consultations (with appropriate safeguards)
- Simultaneous interpretation: Real-time translation during conversations
3. Infrastructure Must Keep Up
Full-duplex communication demands robust network infrastructure. Network latency adds to model latency. If your infrastructure adds 500ms of delay, the model's 0.4 seconds becomes 0.9 seconds, breaking the illusion of natural conversation.
Technical Specifications in Detail
For technical teams, here's what we know about the TML-Interaction-Small architecture:
Model Architecture
The model uses a Mixture of Experts (MoE) architecture with the following characteristics:
- Total parameters: 276 billion
- Active parameters: 12 billion per inference
- Context: Native handling of audio, video, and text
- Measured latency: 0.40 seconds (end-to-end)
The MoE approach allows for a massive model while maintaining acceptable inference speed. Only a fraction of parameters is activated for each token, reducing computational load.
Performance Comparison
| Model | Latency | Availability | Modalities | |-------|---------|--------------|------------| | TML-Interaction-Small | 0.40s | Preview 2026 | Audio, video, text | | Gemini-3.1-flash-live | 0.57s | Available | Audio, video, text | | GPT-realtime-2.0 | 1.18s | Available | Audio, text | | Claude Voice | 0.85s | Limited beta | Audio, text |
Infrastructure Requirements
To deploy these models in enterprise settings, you'll need:
- Network connection: Latency under 50ms to inference servers
- WebSocket: Support for persistent bidirectional connections
- Bandwidth: Minimum 1 Mbps for audio, 5 Mbps if video included
- Backend: Ability to handle real-time streams with minimal buffering
The Context: A Battle of Titans
This announcement fits into an intense race between several major players.
Thinking Machines Lab raised $2 billion in seed funding in July 2025, the largest seed round in history. Post-money valuation was $12 billion. According to rumors, the startup is currently negotiating a new round at a $50 billion valuation.
Meta attempted to acquire Thinking Machines in 2025. When Murati declined, the social media giant poached seven founding members. Murati responded by recruiting Soumith Chintala, the creator of PyTorch, as CTO.
OpenAI, Murati's former employer, hasn't stood still. Their GPT-realtime-2.0 model is already available, but with three times the latency. Google is pushing Gemini in the same direction with intermediate results.
An Ecosystem in Turmoil
Thinking Machines' announcement has sparked immediate reactions across the industry. Microsoft has confirmed working on similar capabilities for Copilot. Amazon is preparing native integration into Alexa. And several European startups, including Mistral AI, are exploring full-duplex architectures for their upcoming models.
For businesses, this competition is good news: it will accelerate innovation and drive down prices. But it also complicates investment decisions. Which platform to choose when the landscape evolves so quickly?
What This Means for African and Moroccan Businesses
For companies in Morocco and Africa, this technology won't be immediately available. Thinking Machines Lab announced a "limited research preview" in the coming months, with a broader launch planned for late 2026.
But you can prepare now:
Evaluate Your Current Use Cases
Which processes would benefit from natural voice interaction? Phone-based customer service is the obvious candidate, but also consider internal training, field assistance, or new employee onboarding.
Invest in Your Infrastructure
Network latency becomes a critical factor. Ensure your infrastructure can support persistent WebSocket connections with minimal latency. Now is the time to review your cloud architecture and CDN choices.
Train Your Teams
The arrival of natural conversational AI will transform certain roles. Call center agents won't disappear, but their role will evolve toward supervision and handling complex cases. Start preparing for this transition.
Document Your Processes
Interaction models excel when they have access to a structured knowledge base. Document your procedures, FAQs, and sales scripts. This documentation will serve as "memory" for your future AI assistants.
Limitations to Keep in Mind
Despite the excitement, several points warrant a cautious approach.
It's still a research preview. No commercial product until late 2026. Announced performance could evolve significantly between preview and commercial launch. Current benchmarks were conducted under controlled conditions that may not reflect real production environments.
Energy consumption isn't mentioned. A 276 billion parameter model running continuously to listen and respond simultaneously probably consumes significant resources. Cost per interaction will be a key factor for economic viability. For SMEs, the ROI versus existing solutions will need careful calculation.
Geographic availability remains uncertain. Real-time AI models require servers close to users to minimize latency. Coverage in Africa will likely be limited at launch, potentially creating access disparities between markets.
Privacy questions remain open. An AI that listens continuously raises privacy concerns. Businesses will need to be transparent with customers about what the AI captures and retains. GDPR and local regulations will impose constraints on voice data processing.
Vendor lock-in risk. Building critical applications on proprietary technology creates dependency. It will be important to plan exit strategies and alternatives from the start.
Preparing Your Organization for the Shift
The transition to interaction models isn't just technical. It requires organizational preparation across multiple dimensions.
Skills and Roles
Your team will need new competencies. Prompt engineering for real-time contexts differs from batch processing. Voice UX design becomes critical. And your operations team needs to monitor latency metrics they may never have tracked before.
Consider hiring or training for these roles: conversational AI designers who understand natural dialogue patterns, voice UX specialists who can craft interactions that feel human, and MLOps engineers who can maintain low-latency inference pipelines.
Customer Communication
Your customers will need to be informed about AI-assisted interactions. Transparency about when they're speaking with an AI versus a human is both an ethical imperative and, in many jurisdictions, a legal requirement. Prepare clear disclosure scripts and train your team on how to handle customers who prefer human interaction.
Fallback Procedures
No AI system is perfect. You need clear escalation paths for when the interaction model fails to understand a customer or encounters a request it cannot handle. Design these fallbacks now, while you have time to iterate, rather than scrambling during a production incident.
How ClaroDigi Can Help
At ClaroDigi, we closely follow these developments. Our team helps businesses integrate AI into their business processes pragmatically.
We can help you:
- Audit your current processes to identify opportunities for conversational automation
- Design a technical architecture ready for next-generation interaction models
- Train your teams on AI deployment best practices
- Pilot projects with technologies available today
- Develop governance frameworks for responsible AI deployment
The interaction models revolution is underway. The question isn't whether it will arrive, but whether you'll be ready when it does.
FAQ
When will Thinking Machines' interaction models be available?
A limited research preview is planned for the coming months, with a broader commercial launch expected late 2026. For now, there's no public access.
What's the difference between full-duplex and current voice assistants?
Current assistants work turn-by-turn: you speak, then the AI responds. Full-duplex allows the AI to listen and speak simultaneously, like a natural phone conversation, with the ability to interrupt or react while you're speaking.
How much will this technology cost for businesses?
Pricing hasn't been announced. Given the model size (276B parameters) and real-time nature of processing, expect costs higher than standard chat APIs. The cost-benefit ratio will depend on your use cases.
Can this technology replace a human call center?
Not entirely, but it can transform the role of human agents. AI will be able to handle routine requests while agents focus on complex cases and supervision. The transition will be gradual.
How does it compare to OpenAI and Google offerings?
Thinking Machines Lab announces 0.40s latency versus 0.57s for Google and 1.18s for OpenAI. However, OpenAI and Google models are already commercially available, giving them an advantage in terms of maturity and user feedback.
