How Speaker Framing Works
Last updated October 8, 2025
Contents
- Introduction
- Why this change
- What you need to do
- What you hear
- What you see at the far end
- Watch our video to see how it works
Introduction
Speaker Framing intelligently highlights and tracks the active speaker, and if two are speaking at the same time, they are both framed. This allows the far end to easily follow the flow of conversation, even in busy or crowded meeting spaces. This works with the main front-of-room (FoR) Neat device and with Neat Center.
This Neat Center support was introduced in the Neat OS 25.3 update.
This feature gives you:
- Smarter multi-camera switching – When paired with a front-of-room camera, Neat Center works in real time to deliver the best speaker view by selecting the most appropriate angle between devices.
- Flexible framing options – Whether using Neat Center alone or in combination with other Neat cameras, you can now choose between Individual Framing (the original Neat Center mode) or the new Speaker Framing mode.
- More engaging meetings – By highlighting the active speaker more naturally—especially in group settings—remote participants feel more connected and in sync with the room.
Why this change
Previously to Neat OS 25.3, Neat Center only offered Individual Framing when used as a standalone camera, which limited its ability to adjust based on who was speaking. This update brings smarter, more collaborative framing by allowing Neat Center and front-of-room cameras to work together automatically in Speaker Framing mode—delivering the most relevant view at any given moment.
What you need to do
In Microsoft Teams or Zoom, when selecting Neat Center as a camera, you’ll now see options for Speaker Framing and Individual Framing. If you’re using both Neat Center and a front-of-room device, the system will automatically coordinate between the two to pick the best view of the speaker—no extra setup needed.
What you hear
Beamforming on each Neat Center (6 beams on each) selects the best beam on each Neat Center, and is sent to the main device in front of the room. The main device selects the best beam from each of the devices. All devices run their audio processing, that also includes deep noise suppression. The main device only selects the best beam.
What you see at the far end
Using voice tracking, the main Neat device and Neat Center find the person speaking and the next best person speaking if two people are speaking. Microphones are used to detect which direction the voice is coming from. Our technology searches in directions where there are faces. This helps reduce errors and prevents selecting voice reflections. Using information from both the cameras and microphones to develop a robust and accurate solution.
Based on this information from microphones, the video intelligence makes the decision on which video view to show.
There are short delays with switching involved with Speaker Framing, typically for a new speaker this delay is a minimum 3 seconds. This is to avoid false changes and maximize accuracy. The best view of the current speaker change is immediate.
Speaker Framing will highlight the active speaker in the room about 3 seconds after they start talking. It will take 3 seconds before moving to a new speaker in the room, but if no one speaks then after 11 seconds, it will default back to the main room view (whichever camera mode it was set on to begin with).
If there are two current speakers – then both can be displayed side by side in the same stream.