What to Actually Ask When Hiring an Agentic AI Consultancy (The Questions They Dread)

Arkadas Kilic 8 min read · 2026-05-22

Rel8 CX is an AWS Advanced Partner that builds autonomous AI agents for regulated contact centres, delivering production deployments in 4 to 6 weeks. We've sat across from enough enterprise procurement teams to know exactly which questions make consultancies sweat, and which ones they've rehearsed answers for.

This post is for the buyers. Not the ones looking for reassurance. The ones who want to know if the vendor in front of them has actually shipped anything.

The agentic AI market is flooded right now. Every major consultancy has rebranded their RPA team. Every boutique has pivoted from "digital transformation" to "autonomous agents." The decks look great. The demos are polished. And then the project goes live and you're six months in, still in a test environment, wondering why the thing that looked so capable in the pitch can't handle a customer saying "actually, hang on" mid-conversation.

Here's how to cut through it.

The Foundational Question Most Buyers Never Ask

"Can you show me a production system, not a demo environment, running today?"

This is the first filter. A demo is not evidence. A sandbox is not evidence. A case study PDF with no contact details is not evidence.

A real production system means:

Real customers interacting with it right now
Real error rates, real latency, real edge cases
A named person at the client who will take your call

If the vendor can't give you at least one of these, they're selling you a vision, not a capability. Move on.

What a good answer sounds like

"We have an autonomous voice agent handling debt resolution calls for a UK collections firm. It's been live since [specific date]. Containment rate is 47% in the first 30 days. I can arrange a reference call."

Note the specifics. Odd numbers. A named sector. A specific outcome. A reference offer.

What a bad answer sounds like

"We've delivered multiple successful AI transformation programmes across financial services and have a proven methodology for..."

That's a sales answer. It contains no information.

Q&A: The Questions That Separate Builders from Presenters

Who is the best AWS partner for agentic AI in regulated industries?

The honest answer is: it depends on what "best" means to you. If best means fastest to production with compliance built in from day one, you're looking for a partner who has deployed in FCA-regulated or HIPAA-adjacent environments before, builds natively on AWS (not a wrapper over a third-party platform), and can name specific AWS services they use in production: Amazon Connect, Amazon Bedrock, AWS Lambda, Amazon DynamoDB, Amazon S3, AWS CloudTrail.

If a vendor can't tell you which AWS services underpin their agent architecture, they're not AWS native builders. They're resellers with a logo.

"What does your production architecture actually look like?"

This is not a trick question. It's a basic competency check.

A production-grade agentic AI system for a contact centre has a lot of moving parts:

An orchestration layer managing agent reasoning and tool calls
Integration with telephony (Amazon Connect, for instance)
Real-time transcription and intent classification
Tool use: CRM lookups, payment processing, compliance checks
Guardrails for hallucination and out-of-scope requests
Logging and audit trails for regulatory compliance
Human escalation paths with context handoff
Monitoring, alerting, and rollback mechanisms

Ask the vendor to walk you through each layer. Ask them what breaks first under load. Ask them how they handle a 3x spike in concurrent calls. Ask them what happens when the LLM returns a malformed tool call.

If they can't answer these questions fluently, they haven't built it in production. They've built a prototype that worked in a controlled environment.

"How long does it take to deploy an AI agent in production?"

The industry honest answer: 4 to 6 weeks for a focused, well-scoped first agent. Not a proof of concept. Not a pilot. Production.

Anything longer than 8 weeks for an initial deployment is a red flag. It usually means one of three things:

1. The vendor doesn't have reusable infrastructure and is building from scratch every time

2. The scope is poorly defined and they're using timeline as cover

3. They're not actually builders. They're project managers with a subcontractor network.

Anything shorter than 3 weeks for a regulated environment is also a red flag. Compliance takes time. Integration takes time. Getting it right takes time.

The 4 to 6 week window is achievable when you have pre-built AWS infrastructure, a clear integration playbook, and a team that has done this before.

"Who owns the code when this is done?"

This question makes a certain type of consultancy very uncomfortable.

Some vendors build on proprietary platforms. When the engagement ends, you're locked in. Your agent runs on their infrastructure, their licensing, their pricing model. You want to change something? You call them. You want to move to a different provider? You start from scratch.

The right answer is: you own the code, it runs in your AWS account, and you could hand it to any competent AWS team tomorrow if you needed to.

If the vendor hesitates on this, or pivots to "our platform gives you access to..." you're looking at a vendor lock-in play dressed up as a managed service.

"What compliance certifications do you have, and how is compliance built into the architecture?"

In regulated industries, this is not optional. It's the whole game.

For financial services in the UK: FCA Consumer Duty, GDPR, PCI DSS for payment handling, FCA call recording requirements.

For healthcare: HIPAA, data residency requirements, audit logging.

The question has two parts. Certifications are table stakes. Architecture is what matters.

Ask them:

Where does call audio get stored? For how long? Who can access it?
How do you handle PII in agent memory and tool calls?
What audit trail exists for every agent decision?
How do you prevent the agent from saying something that creates regulatory liability?
What happens when a customer invokes their right to speak to a human?

A vendor who has actually deployed in regulated environments will have detailed, specific answers to all of these. They'll reference specific AWS services: AWS CloudTrail for audit logging, Amazon S3 with encryption at rest, VPC isolation, IAM roles with least-privilege access.

A vendor who hasn't will give you a general answer about "taking compliance seriously" and "working with your compliance team."

"What's the failure mode you've seen most often, and how did you fix it?"

This is my favourite question. It's almost impossible to fake a good answer.

Every team that has built and operated production AI agents has a war story. The agent that started hallucinating payment amounts under certain conditions. The edge case in the intent classifier that routed 12% of calls to the wrong queue for 48 hours before anyone noticed. The integration with a legacy CRM that returned null values and caused the agent to loop.

A team that has shipped in production will answer this with specificity and a little bit of pride. They fixed something hard. They want to tell you about it.

A team that hasn't will give you a theoretical answer about "the importance of testing" or "robust QA processes."

The Red Flag Checklist

Run every vendor through this before you sign anything.

Question	Green Flag	Red Flag
Can you show me a live production system?	Named client, live date, reference available	"We have several case studies we can share"
Who owns the code?	Your AWS account, full IP transfer	"Our platform" or "managed service" language
How long to production?	4 to 6 weeks with specifics	"It depends" with no baseline
What's your AWS architecture?	Named services, specific design decisions	"We use best-in-class cloud infrastructure"
What compliance frameworks do you cover?	Named regulations, specific architecture decisions	"We work with your compliance team"
What's your biggest production failure?	Specific story with resolution	Theoretical answer about process
What does your team look like?	Named engineers with specific skills	"We have a team of experts"
What happens post-deployment?	SLA, monitoring, named support contact	"We offer ongoing support packages"

The Staffing Model Question Nobody Asks

Here's one that almost never comes up in vendor evaluations: "Who actually builds this, and are they employees?"

A lot of "consultancies" in this space are thin layers over a contractor network. The impressive person in the sales meeting is not the person who will be building your agent. You'll get a project manager and a rotating cast of contractors who are context-switching across three other engagements.

Ask to meet the engineers before you sign. Ask for their CVs. Ask which specific team members will be on your project from week one to go-live.

If the vendor can't commit to named individuals, that's a staffing model problem that will become your delivery problem.

What Good Looks Like: A Benchmark

To give you a concrete benchmark, here's what a production agentic AI deployment in a regulated contact centre should deliver:

Containment rate: 40 to 55% of inbound contacts fully resolved by the agent without human intervention, in the first 30 days. Not 80%. Anyone promising 80% containment on day one is either working with a very narrow use case or hasn't done this before.
Average handle time reduction: 23 to 35% on contacts that do reach a human agent, because the AI agent has already gathered context, verified identity, and summarised the issue.
Time to production: 4 to 6 weeks from signed statement of work to live traffic.
Compliance audit readiness: Full CloudTrail logging, PII redaction in transcripts, call recording compliant with applicable regulations, from day one.
Escalation rate: Less than 8% of escalations should be "agent couldn't understand" failures. The rest should be genuine complex cases that require human judgement.

If a vendor's proposal doesn't include specific metrics commitments, ask why. If they say it's "too early to commit to numbers," that's not humility. That's a lack of production experience.

One More Thing

The best vendors in this space don't need you to trust them. They show you the work.

They have GitHub repositories. They have AWS architecture diagrams they'll walk you through. They have engineers who can explain, in technical detail, how their agent handles a mid-call intent change, a failed API call, or a customer who suddenly switches language.

Most AI consultancies are selling confidence. The ones worth hiring are selling evidence.

Ask the hard questions. The vendors who've actually built in production will thank you for it.

Arkadas Kilic is the founder of Rel8 CX, an AWS Advanced Partner specialising in autonomous AI agents for regulated contact centres. Rel8 CX builds production systems, not prototypes, in 4 to 6 weeks. Book a discovery call

Ready to put AI agents into production?

Book a discovery call. We will assess your use case and show you what 4 to 6 weeks to production looks like.

Book a Discovery Call