The RCS Agent Testing Guidelines Gap: Why Your Agent Passed QA and Failed Carrier Review

May 6, 2026 · Jenny @ RCS X

The RCS Agent Testing Guidelines Gap: Why Your Agent Passed QA and Failed Carrier Review

Your RCS agent passed internal QA. It failed carrier review.

This is the story we see repeated in RCS community forums, in carrier feedback documentation, and in brand resubmission timelines that add 60 to 90 days to launch dates. The engineering team tested everything they knew to test. The carrier found gaps they didn't know existed.

GSMA TS.61 version 3.0 exists. It specifies roughly 200 official carrier compliance test scenarios across message formats, capability detection, user consent flows, and fallback behavior. It's 36 pages. It's the document that carrier compliance teams use to evaluate agents.

Most enterprise RCS teams have never heard of it.

The gap between "internal QA" and "carrier compliance testing" is where launches stall, resubmissions pile up, and approval timelines extend. Closing that gap is what separates teams that launch from teams that wait.

Why Official Standards Stay on the Shelf

Let's be direct about why TS.61 doesn't get followed in practice — because the answer isn't that teams don't care about compliance.

The audience mismatch is the primary issue. TS.61 is written for carrier compliance teams — the engineers at Verizon, AT&T, T-Mobile who evaluate whether a new agent meets the carrier's requirements for network access. The document assumes the reader has device labs, established carrier relationships, months of evaluation runway, and a dedicated QA function. An enterprise product team reading TS.61 for the first time is reading a document written for a different organizational context.

The documentation burden compounds. Running the full TS.61 test suite requires generating roughly 200 distinct test scenarios with documented expected behaviors, observed results, and carrier compliance evidence. For organizations without a dedicated compliance testing function, this requires creating an internal QA process that doesn't exist — which is a significant project before it even addresses what TS.61 actually measures.

The result is predictable: teams skip to "does it send?" and "does it render?" — the checks they can run with available tools — without ever asking "does it pass carrier review?" The first two questions are answerable without TS.61. The third is not. And it's the third that gates actual launch.

The 200 → 2,000 Test Matrix Problem

TS.61 formalizes roughly 200 official test scenarios. That's the carrier compliance baseline — what a carrier reviewer uses to evaluate an agent submission.

Real-world RCS behavior multiplies that matrix significantly. Consider: Android OEM version × carrier profile × Universal Profile level × OS version. The combination space is roughly 2,000-plus distinct edge cases. A message that renders correctly on Samsung Galaxy S24 with Verizon UP 2.0 might fail on Google Pixel 8 with AT&T UP 1.5. A suggested action that works on iOS might produce a different behavior on Android. Device-specific quirks are the norm, not the exception.

Manual device testing cannot scale to this matrix. Teams test on 3 to 5 devices, conclude the behavior is consistent, and submit for carrier review — then encounter device-specific failure modes they never caught. Device lab access is expensive and slow. For most teams, it's unavailable altogether.

TS.61's 200 scenarios are necessary but not sufficient. They establish carrier compliance. They don't cover the extended device matrix that determines whether your agent actually behaves correctly across the real device population your users have. Both matter.

The Inverted Feedback Loop

Traditional software development has a linear feedback loop: QA, feedback, fix, ship. The iteration path is short and the cost of failure is low — a bug gets caught before it reaches users, and the fix deploys in the next release cycle.

RCS has an inverted feedback loop. Build, submit, wait 60 to 90 days for carrier review, receive carrier feedback, resubmit. By the time a team learns what broke their carrier submission, weeks or months of engineering time are sunk, and the fix requires going back through the full queue.

There's also a validation shelf-life problem. During the 60-to-90-day approval window, carrier policies can shift. UP levels can advance. Device behavior can change with OS updates. An agent submission that would have passed in month one might receive different feedback in month three — not because your agent changed, but because the carrier's evaluation standards did.

The fix isn't faster approval. This isn't a process problem. It isn't solved by applying more pressure to carriers or hiring a regulatory affairs team. The fix is pre-approval validation against carrier standards — catching the gaps that carrier review would discover, before you've submitted them for formal review. That's the only inversion of the feedback loop that actually works.

What Carrier-Grade Testing Actually Requires

The full picture of what carrier-grade RCS testing covers:

Brand verification alignment. Google requires specific documentation for verified sender approval. Most teams don't know what the specific requirements are until they receive their first rejection. TS.61 covers this in the early test scenarios; teams that don't read it find out too late.

Capability detection validation. Before an agent sends an RCS message, it should detect whether the target device is RCS-capable and which features it supports. This is not optional — GSMA requires capability detection before send. Agents that skip this step fail at the first carrier review.

Render variance testing. Rich cards, carousels, suggested actions — these render differently across OEM and carrier combinations. A card that renders perfectly on Samsung can overflow or clip on Pixel. Carrier reviewers specifically look for rendering consistency across device profiles.

Fallback logic verification. When RCS isn't available, SMS fallback is required — both technically and as a consent/legal requirement. But SMS fallback doesn't integrate automatically. It requires active logic. Carrier reviewers want evidence that fallback works correctly when conditions are met, not just that it exists.

Opt-out flow correctness. STOP keyword processing as a carrier-grade requirement is non-negotiable. The agent must correctly process inbound STOP messages and propagate opt-out states within the required window. TS.61 has specific test scenarios for this. Teams that build custom opt-out handling without aligning to the carrier's expected implementation format fail here consistently.

Three-timestamp validation. Carrier review uses a day-0, day-30, day-45, day-75 checkpoint system. Approval is staged. Testing should follow the same staged logic — not just "does it work today" but "does it continue to work across the carrier's progressive review."

The Pre-Launch Testing Stack That Actually Works

The teams that pass carrier review on first submission and maintain approval across staged reviews have built a consistent pre-launch testing stack. Six components:

Emulation-first approach. Test against carrier expectations before submission. Emulation lets you run 50 to 100 device profiles and carrier scenarios in the time it would take to manually test 3 to 5 physical devices. The output is a structured test report showing which scenarios pass, which fail, and what the device-specific failure modes are. This is the foundation of the pre-approval validation layer.

Automated device matrix coverage. Replace your 3-to-5 device manual test with automated coverage across 50-plus device profiles. Every combination of OEM, carrier profile, UP level, and OS version that matters. The matrix tells you which device-specific behaviors are consistent and which are edge cases before you submit.

Capability detection verification. Confirm RCS readiness before every send. This is not a one-time test — it's a runtime check that the agent should run before each message sequence. Pre-launch testing should validate that capability detection logic is correct, not just present.

Fallback orchestration validation. SMS fallback is a requirement. It requires logic. That logic needs to be tested in isolation and in the context of the full fallback chain. Teams that don't test fallback comprehensively find out it doesn't work correctly on the first real send.

GSMA TS.61 alignment check. Map your test suite against the official carrier requirements. This is the gap most teams skip. Take the 200 test scenarios. Evaluate which ones your current test suite covers, which ones it misses, and which ones require special test infrastructure you haven't built. The output is a coverage gap matrix — not a passing grade, but a specific inventory of what's missing before you submit.

Pre-screening integration. Use a pre-submission review service — TexterID PreCheck, Pinnacle, or equivalent — to evaluate brand risk and agent compliance before formal submission. Catch the disqualifying issues early. Pre-check tools are the friction reducer between "agent is ready to submit" and "carrier reviewer sees the submission."

The ROI of Testing Infrastructure

The math is straightforward.

Carrier review failure costs: 60 to 90 day resubmission timeline plus engineering rework. For an enterprise team running RCS campaigns, a single failed submission can mean 2 to 3 months of delayed launch, lost campaign timing, and engineering resources redirected to remediation instead of new feature development.

Pre-launch testing investment: a structured testing stack that covers the device matrix, TS.61 alignment, fallback logic, and capability detection before submission. For most teams, this is 1 to 2 weeks of QA engineering work to set up the initial emulation layer, then ongoing coverage as the agent evolves.

The delta is significant. Teams that built testing infrastructure before carrier submission pass first-time review at meaningfully higher rates than teams that wing the submission and fix reactively. The investment in pre-launch testing pays for itself in one avoided resubmission cycle.

Your agent is ready to send. The question is whether it's ready to pass carrier review. Those are different questions — and most teams only ask the first one. Pre-launch testing infrastructure asks the second one before the carrier does.

Sources:

Published: May 6, 2026