Inband Media Detection (DTMF Bleed)
When a customer enters card details using their phone keypad, the digits are normally carried as a separate structured signal β either RFC 2833 named telephone events in the RTP packet stream, or SIP INFO messages in the signalling channel. Both of these are cleanly suppressed by the Subphonic Telephony proxy when a capture session is active.
However, some devices, carrier networks, or telephony configurations transmit DTMF differently β embedding the digit tones directly inside the audio itself. This is known as inband DTMF. Because the tones are part of the audio stream rather than a separate signal, standard suppression cannot remove them. The frequencies that encode each digit remain audible in the media, accessible to anyone who can analyse the audio.
Subphonic's Inband Media Detection capability addresses this at source, before any exposure can occur.
Interactive Flowβ
Step through each phase of the detection lifecycle below. Use Play to watch the sequence automatically, or step through manually.
1 / 6Call received by merchant
A customer call arrives via the carrier. DTMF digits (1 2 3β¦) pass freely through the call path to the agent. At this point the call is unsecured β no capture session is active.
How Inband DTMF Arisesβ
Inband DTMF is most commonly encountered when:
- Legacy PSTN or TDM gateways are present in the call path. Older equipment converts RFC 2833 packets back to audio tones at a gateway boundary.
- Some mobile networks re-encode DTMF as inband audio during transcoding or codec negotiation.
- Certain softphones and WebRTC clients generate DTMF as audio rather than as structured events.
- Cross-carrier interconnects strip out-of-band DTMF signalling and fall back to audio tones.
A deployment may never encounter inband DTMF in normal testing conditions and yet encounter it reliably in production β particularly when calls arrive via specific carriers or originate from certain device types. Inband Media Detection provides continuous, call-by-call protection regardless of how the DTMF signal arrives.
How Inband Detection Worksβ
1. Media forkβ
When a Transact capture session enters secure mode, the Subphonic Telephony proxy forks a copy of the live RTP audio stream to the Subphonic Media Monitor. This fork runs independently and does not affect the main call path.
2. Frequency analysisβ
The Media Monitor performs real-time frequency analysis on the forked audio. It looks for the characteristic paired-frequency signatures used by DTMF β each digit is encoded as two simultaneous sine waves at specific frequencies within the telephony audio band. This analysis is continuous throughout the capture window.
3. Detection and instructionβ
If DTMF frequency pairs are identified in the audio stream, the Media Monitor immediately instructs the Subphonic Telephony proxy to take protective action. The detection threshold and response are configurable.
4. Audio muted to agent/IVRβ
The Telephony proxy mutes the inbound audio feed to the agent. Neither the agent's handset nor any call recording system connected to the agent's leg of the call receives the audio containing the inband DTMF tones. This includes merchant-managed on-prem recording platforms as well as cloud recording services. The same behaviour applies when the payment journey is routed through an IVR path rather than a live agent.
5. Push notification to Transactβ
A push notification is sent to the agent's Transact session via the same real-time channel used for ASR speech transcripts. The notification informs the agent that inband audio has been detected, that capture is paused, and presents the available next steps.
6. Agent resolves the sessionβ
The agent can choose to:
- Cancel the capture session entirely.
- Switch to Voice (ASR) β the customer speaks their card details and speech recognition handles the capture.
- Offer Pay by Link β the customer receives a secure link and completes the payment on their own device, bypassing the voice channel entirely.
All three options maintain full PCI compliance and keep the merchant environment out of scope.
Why Inband DTMF Is a Riskβ
The audio frequencies that make up DTMF tones are within the normal telephony audio band. Unlike RFC 2833 events β which are structured packets that can be intercepted and decoded β inband DTMF can be extracted from a call recording by any audio analysis tool. The tones are identifiable from samples as short as a few milliseconds and can be decoded by freely available software.
This means that if inband DTMF reaches a call recording system, the recording contains recoverable card data regardless of whether any suppression was attempted on the signalling layer. Inband Media Detection ensures the audio is muted before it can reach the recorder.
Scope Implicationsβ
When Inband Media Detection is active:
- Call recording platforms (including on-prem recorders) do not receive card data from inband DTMF, because the audio is muted at the proxy before reaching the recorder.
- The agent's audio feed is muted during the detection window, so agent endpoints remain out of PCI scope.
- The CCaaS platform receives muted audio during the detection window.
Without Inband Media Detection, inband DTMF in a call recording would cause the recording system to be considered in scope for PCI DSS β even if RFC 2833 / SIP INFO suppression is correctly deployed everywhere else.
Enabling Inband Media Detectionβ
Inband Media Detection is configured at the account or deployment level. Contact your Subphonic account team to confirm whether it is enabled for your deployment and to review detection thresholds appropriate to your telephony environment.
If your telephony environment includes legacy PSTN gateways, TDM interconnects, or calls from mobile networks via carrier SIP trunks, we recommend verifying that Inband Media Detection is active. These configurations are the most common source of inband DTMF in production environments.