NextWave AI
Posts
Mira Murati’s Thinking Machines Unveils Real-Time AI Models for Natural Human Interaction

Mira Murati’s Thinking Machines Unveils Real-Time AI Models for Natural Human Interaction

samour ankit
May 12, 2026

In partnership with

Voice dictation that doesn't mangle your syntax.

Most dictation tools choke on technical language. Wispr Flow doesn't. It understands code syntax, framework names, and developer jargon — so you can dictate directly into your IDE and send without fixing.

Use it everywhere: Cursor, VS Code, Warp, Slack, Linear, Notion, your browser. Flow sits at the system level, so there's nothing to install per app. Tap and talk.

Developers use Flow to write documentation 4x faster, give coding agents richer context, and respond to Slack without breaking focus. 89% of messages go out with zero edits. Free on Mac, Windows, and iPhone.

Try Wispr Flow free

The race to build smarter and more human-like artificial intelligence is entering a new phase. Mira Murati and her startup Thinking Machines Lab have introduced a groundbreaking category of AI systems designed specifically for live, real-time human interaction. The company’s newly unveiled “interaction models” aim to eliminate one of the most common frustrations users experience with modern AI assistants — the awkward delay between asking a question and receiving a response.

Today’s AI chatbots and virtual assistants, despite being highly advanced, still operate in a rigid turn-based structure. A user speaks or types a request, the system processes the input, and only then generates an answer. While this process works well for many tasks, it often makes conversations feel mechanical and unnatural. People have gradually adapted to these limitations by speaking slowly, carefully structuring their sentences, and avoiding interruptions when interacting with AI systems.

Thinking Machines Lab believes the future of AI should feel far more fluid and human. Instead of forcing humans to adapt to machines, the company wants AI to adapt to natural human communication patterns. Its new interaction models are designed to simultaneously listen, process, observe, and respond in real time, creating a much more dynamic and natural conversational experience.

At the core of this breakthrough is what the company calls a “full-duplex interaction architecture.” Traditional AI systems treat conversations as a sequence of separate turns, where one side speaks and the other waits. Thinking Machines’ system instead divides communication into tiny micro-turns of approximately 200 milliseconds. This allows the AI to continuously process incoming audio and visual information while also generating responses at the same time.

As a result, the AI can react almost instantly during conversations. It can handle interruptions more naturally, respond to pauses or visual cues, and maintain conversational flow in a way that resembles real human interaction. This approach could dramatically improve the usability of AI in situations where speed and responsiveness are critical.

The centerpiece of the system is a powerful new model called TML-Interaction-Small, a 276-billion parameter mixture-of-experts AI model designed specifically for fast conversational handling and immediate responses. Unlike traditional large language models that focus mainly on reasoning and text generation, this system prioritizes conversational presence and responsiveness.

Alongside the main interaction model is a secondary background model that quietly performs more computationally intensive tasks such as deep reasoning, web searches, and tool usage. This dual-model setup allows conversations to continue smoothly without forcing users to wait while the system processes complex tasks in the background. The AI can maintain natural communication while simultaneously working on deeper analysis behind the scenes.

Another important innovation introduced by Thinking Machines Lab is its “encoder-free early fusion” technology. Most multimodal AI systems rely on separate encoders to process audio, video, speech, and visual information before combining the results. Thinking Machines has simplified this process by allowing raw audio and visual signals to flow directly into the transformer architecture through lightweight embedding layers.

According to the company, this design significantly reduces latency and improves synchronization between speech and visual understanding. Lower latency is especially important for real-time interaction because even small delays can make conversations feel unnatural or disconnected.

The company claims its model achieved impressive results on FD-bench, a benchmark that measures conversational timing and interaction quality. TML-Interaction-Small reportedly achieved response latency below 0.4 seconds. For comparison, Google’s Gemini-3.1-flash-live scored around 0.57 seconds, while GPT-realtime-2.0 reportedly measured 1.18 seconds. Although these numbers may appear close, even fractions of a second can have a major impact on how natural an AI conversation feels to users.

While faster response times may improve consumer AI assistants and chatbots, the larger implications lie in enterprise and industrial applications. In healthcare environments, real-time AI systems could continuously monitor patients, respond immediately to abnormalities, and assist medical staff during emergencies. In manufacturing and industrial settings, AI could monitor machinery, identify safety hazards, and alert workers the moment problems occur.

Customer service is another area that could benefit significantly from these advances. Many current AI support systems still feel slow and scripted. Real-time interaction models could make AI conversations feel more natural, responsive, and less robotic, potentially improving customer satisfaction and efficiency.

One particularly notable feature of Thinking Machines’ system is its built-in awareness of time and context. Users can give instructions such as asking the AI to notify them if a process takes longer than it did previously, without needing to specify exact timestamps or durations manually. This ability suggests AI systems are becoming more context-aware and capable of functioning as genuine collaborators rather than simple question-answering tools.

The unveiling of these interaction models highlights the broader transformation taking place in the AI industry. Companies are no longer focused only on making AI smarter; they are now competing to make AI feel more natural, responsive, and integrated into everyday human activity.

For now, Thinking Machines Lab is limiting access to a small group of research partners, but the company plans a broader public release later in 2026. If successful, these real-time interaction models could mark a major turning point in how humans communicate with artificial intelligence, bringing AI one step closer to functioning as a truly seamless digital companion.