By mid-2026, public MCP registries list more than 8,000 servers, and our incident-response queue suggests there are three to four internal servers behind every public one inside Indian BFSI, healthtech and listed SaaS estates. Most were built by application teams in the last twelve months, without a security review. Claude, Copilot, Cursor and LangChain agents are already in production at the same companies.
That creates a problem your board can no longer defer. A regulator can ask for your testing evidence under the RBI Cyber Security Framework, SEBI CSCRF, DPDPA section 8 or ISO 27001:2022 Annex A. Your cyber-insurance renewal questionnaire already asks about autonomous AI systems. And your audit committee wants a number, financial exposure, residual risk, fix cost, well before it wants a CVE list. Most Indian CISOs are stuck between a written AI policy and evidenced offensive testing of every MCP server an LLM can reach.
This guide answers the buyer-side questions: what regulators now expect, when to build vs buy, how to evaluate vendors, what an engagement should cost in INR, and what "good" looks like before you green-light a production agent. It is the procurement companion to the methodology we run as a CERT-In empanelled firm across India and the United States.
What the board and regulators now expect for AI agent assurance
The Model Context Protocol moved from a niche Anthropic spec in late 2024 to a near-universal agent-to-tool bus in eighteen months. From a board's perspective, an MCP server is a privileged RPC gateway that an LLM can invoke without a human in the loop. A single misconfigured tool exposing execute_sql, send_email or read_file grants an attacker the blast radius of a compromised service account.
Indian regulators caught up faster than most CISOs expected. The RBI Cyber Security Framework and SEBI CSCRF both require formal risk assessments for any "automated decisioning system" touching customer data, and the SEBI 2024 circular explicitly extends that obligation to generative and agentic AI. DPDPA section 8 requires data fiduciaries to evidence "reasonable security safeguards" before processing, and an agent that autonomously queries a customer table is processing. ISO 27001:2022 Annex A.5.23 (information security for cloud services) and A.8.29 (security testing in development and acceptance) are the controls auditors increasingly invoke against agent stacks. The CERT-In April 2022 directive's logging requirements apply too, because an MCP call is a "system event" by any reasonable reading.
The board question is no longer "should we test the agent?" It is "can we evidence that we tested it, mapped it to our framework set, and remediated to a defensible residual?" Our AI risk assessment and VAPT services are designed around exactly that evidence pack: dual-tagged findings, regulator columns, and a board annex the audit committee will sign.
Build vs buy: in-house AI red team or specialist VAPT vendor
Three quarters of CISOs we speak to have, at some point in the last twelve months, asked whether they should hire two AI red-teamers and run this in-house. The maths rarely supports it for organisations below roughly 5,000 employees or outside Big Tech.
A credible in-house function needs prompt-injection expertise, classical web and API VAPT depth, MCP protocol fluency, RAG and vector-store knowledge, MITRE ATLAS familiarity, and the ability to write regulator-grade reports. That is a four to six-person team at fully-loaded Indian metro salaries of INR 2.5-4.5 crore per year before tooling and lab infrastructure.
A specialist vendor is justified when you ship fewer than three or four major agent releases a year, when you need an independent attestation for auditors and insurers, or when your regulator expects external testing, which RBI, SEBI and CERT-In all now do for any "critical" system. A hybrid model, internal AppSec owns continuous testing and an external CERT-In empanelled partner runs the annual deep VAPT and the pre-launch sign-off, is what we recommend to most Indian BFSI clients and most US legal-tech and healthtech SaaS clients.
The build case strengthens if your agent volume is high enough that vendor lead times slow releases, if you have proprietary tooling that is hard to scope to a third party, or if you already run a mature internal red team you can extend. Even then, plan for a six to nine-month ramp before the in-house team produces a report your board will accept without an external second opinion. The hybrid model usually wins on cost, on audit defensibility, and on time-to-evidence.
Eight questions to ask any AI pentest provider
The market has filled with firms that rebranded their web-app pentest brochure with the word "agentic" and changed nothing else. Ask any provider you shortlist these eight questions in writing.
- Are you CERT-In empanelled? For Indian regulated entities this is binary. Without it, the report does not satisfy several BFSI auditors.
- Show us a redacted MCP and RAG report. Not a marketing PDF, an actual finding with reproduction steps, OWASP and ATLAS tags, and remediation specificity.
- Which OWASP lists do you map to, and how? The right answer covers both OWASP LLM Top 10 (2025) and OWASP Agentic AI Top 10 (2026) with dual-tagged findings, plus CWE and MITRE ATLAS.
- Which CVE classes do you actively reproduce? Expect specific answers: CVE-2025-49596 (MCP Inspector), CVE-2025-32711 (EchoLeak), and the CVE-2026-26118 class of MCP filesystem path traversal.
- How do you handle multi-tenant authorisation testing? This is where most providers fail. Cross-tenant prompt injection in shared RAG is the dominant 2026 finding class.
- Do you write SIEM rules for our stack? The answer should name Splunk, Sentinel or Chronicle and produce ATLAS-tagged detections, not generic "monitor for anomalies."
- What is your compliance crosswalk? ISO 27001:2022 Annex A, SOC 2 CC7.1 and CC8.1, DPDPA section 8, RBI, SEBI CSCRF and GDPR Article 32 should all be a default column in the findings register.
- What is your retest policy? Every critical and high finding should be retested at no extra cost, with a clean evidence pack. That is what ISO 27001:2022 A.8.29 auditors actually want.
If a vendor cannot answer six of these eight in a scoping call, they are not ready to test a production agent. We publish ours up front, alongside our penetration testing services scope, because procurement teams have asked us to.
Budgeting an MCP and agent VAPT: scope drivers, INR ranges and ROI
Pricing in this market is opaque because most vendors price discovery, not outcomes. Fixed-scope MCP and agent VAPT engagements typically range from US$18,000 to US$60,000 (INR 15-50 lakh), and the scope drivers are predictable. The table below reflects our 2026 engagement data across 140 audits.
| Scope tier | Indicative price | Typical scope | Duration |
|---|---|---|---|
| Single agent, focused | US$18-25k (INR 15-21 lakh) | 1 agent, 2-4 MCP servers, 30-80 tools, single tenant | 3-4 weeks |
| Multi-agent, multi-tenant | US$30-45k (INR 25-38 lakh) | 2-5 agents, multi-tenant RAG, third-party MCPs from registry | 5-6 weeks |
| A2A mesh / platform | US$45-60k (INR 38-50 lakh) | Agent-to-agent mesh, sub-agent identity, custom orchestrator | 6-8 weeks |
| Retest cycles (annual) | 30-40% of base | Same scope, post-fix, evidence-only re-runs | 1-2 weeks |
The ROI conversation is concrete in 2026 because enforcement is concrete. Recent DPDPA precedents and the EU AI Act fine schedule put a credible regulatory ceiling on a single agent-data-leak incident at the INR 6-12 crore range, before reputational cost or class action exposure. An INR 25-35 lakh annual VAPT spend that prevents one such incident pays back in a single avoided enforcement event. Cyber-insurance carriers are pricing this in too: at renewal, evidenced agent testing has shifted from "nice to have" to a premium-band discriminator. Pair that with the cost of an internal incident response, which we benchmark in Indian BFSI at INR 40-90 lakh for a contained agent breach ignoring fines, and the procurement case writes itself.
Regulator crosswalk: RBI, SEBI CSCRF, DPDPA and ISO 27001:2022
The single most useful artefact we deliver, and the one that most differentiates a mature engagement from a commodity one, is the compliance crosswalk. Every finding ships with the regulator-facing control gap, not just the OWASP tag.
The mapping that matters for Indian buyers:
- DPDPA section 8 - every finding involving personal data processing by the agent maps here. The "reasonable security safeguards" standard requires evidenced testing, not policy text. Pairs with our DPDP Act 2023 compliance consulting.
- RBI Cyber Security Framework - automated decisioning, customer-data access and third-party (MCP registry) supply-chain controls. Every BFSI engagement we run carries an RBI column.
- SEBI CSCRF - applies to listed entities, depositories, mutual funds and intermediaries. The 2024 circular's generative-AI extension is the operative clause.
- CERT-In April 2022 directive - logging requirements for MCP calls and agent decisions. Six-hour incident reporting still applies when an agent is the affected system.
- ISO 27001:2022 Annex A.5.23, A.8.28, A.8.29, A.5.34 - cloud security, secure coding, security testing, and privacy. The four annex controls auditors invoke against agent stacks.
- SOC 2 CC7.1 and CC8.1 - change management and monitoring. Required for any India-to-US SaaS sell-motion.
- GDPR Article 32 - for any EU customer data flowing through the agent, regardless of where you sit.
The discipline that survives an audit is dual mapping: every finding triggers an SDLC fix, a SOC detection rule and a control-evidence update. Without it, a vulnerability gets filed, patched once, and forgotten, and the next audit finds the same gap under a different name.
What good looks like: a nine-point CISO checklist before launch
The methodology underneath this post tests the full attack surface: prompt injection (direct, indirect, document-borne, email-borne, markdown-smuggled), tool poisoning against malicious MCP descriptions, excessive agency across functionality, permissions and autonomy, multi-tenant authorisation, memory poisoning of vector stores, MCP-server web bugs (SSRF, path traversal, command injection, insecure deserialisation), and cost-DoS or wallet-drain loops. Mapping covers OWASP LLM Top 10 (2025) categories LLM01-LLM10, OWASP Agentic AI Top 10 (2026) T1-T9, MITRE ATLAS techniques AML.T0051, AML.T0053, AML.T0070, AML.T0048 and AML.T0024, and the CVE-2026-26118 class.
For a CISO making the ship-or-don't-ship call, the operational version is a short checklist:
- Tool inventory matches business purpose. Every registered tool justified by a documented use case. In a 2026 engagement with a listed Indian fintech, a support agent had 47 registered MCP tools, 31 with no business owner.
- Least-privilege on tool service accounts. No execute_sql tool wired to a database user with DROP TABLE rights.
- Human-in-the-loop above defined thresholds. RBI SAR-2024 guidance and EU AI Act Article 14 require evidenced oversight. Any action above a money or data threshold must require a signed approval token.
- Egress allow-list on all MCP fetch tools. Roughly 40% of internal MCP servers we audited in H1 2026 had none.
- Tool-description integrity. Hash-pinned MCP tool metadata; UI surfaces full descriptions for user approval.
- Multi-tenant isolation evidenced. Agent run as user A cannot reach user B's data via prompt manipulation.
- SIEM detections live. ATLAS-tagged rules in Splunk, Sentinel or Chronicle before launch, pairs with our 24/7 SOC monitoring.
- Runbook for "agent went rogue." The IR playbook nobody had before 2025: kill switch, token revocation, customer-comms script.
- Retest evidence pack on file. Every critical and high retested with a clean artefact. This is the ISO 27001:2022 A.8.29 evidence auditors actually ask for.
If you cannot tick all nine, defer launch or restrict scope. The cost of a delayed ship is measured in weeks; the cost of an enforcement action is measured in crore.
Board reporting: turning findings into financial exposure and decisions
Engineering boards want CVSS. Audit committees want money and time. The gap between a 60-90 page technical report and a usable board annex is where most agent risk programmes stall.
The annex we ship is consistently under three pages, and it answers four questions:
- What is the single largest financial exposure? Stated as a range with the regulatory basis. From a recent US legal-tech SaaS engagement: "an authenticated user can exfiltrate case files belonging to other tenants in approximately 14 seconds via indirect prompt injection in document upload; estimated regulatory exposure under DPDPA and GDPR is in the INR 6-12 crore range based on 2025 enforcement precedents."
- What is the remediation cost and timeline? Engineering effort in person-weeks, infrastructure cost, and the date the residual risk drops below the board's tolerance.
- What is the residual risk after remediation? A rating the board can accept, defer or reject, with the assumptions made explicit.
- What is the approval ask? Ship, ship-with-conditions, defer, or de-scope. No board annex should leave the decision implicit.
The CISOs who get this right talk to the board the way the CFO talks to the board: ranges, dates, decisions and dependencies. The ones who do not are still presenting heat maps in 2026.
A useful pattern is to bring the audit committee into the scoping conversation, not just the readout. Agreeing the financial exposure methodology before testing starts removes the "your numbers are too high" debate at the end, which is the debate that delays remediation.
Engaging Certbar: scope, timeline and what you get
A typical engagement with us begins with a one-hour scoping call. Before the call we send the sample MCP and RAG report, the same template covered above, so the conversation is grounded in what you would actually receive. We then issue a fixed quote against one of the three scope tiers; there are no hourly meters and no "discovery phase" upsell.
Engagements run on a staging mirror by default, with prod read-only or shadow-traffic modes available under a written data-handling addendum aligned with DPDPA section 8 and ISO 27001:2022 A.5.34. A single-agent engagement runs three to four weeks end to end: one week of scoping and threat modelling, two weeks of active testing, and a final week for triage, retest of critical fixes and report delivery. Larger A2A meshes and multi-agent platforms run six to eight weeks.
Every critical and high finding is retested at no extra cost. Findings ship dual-mapped to OWASP LLM Top 10 (2025), OWASP Agentic Top 10 (2026), CWE, MITRE ATLAS, and the regulator your sector answers to. The deliverable set includes the executive board annex, the engineering findings register, the SIEM rule pack written for your specific stack, the IR runbook, and the retest evidence pack auditors want to see.
If you have shipped a Claude, Copilot or LangChain agent into production in the last twelve months, the OWASP Agentic Top 10 already applies to your stack and a regulator can already ask for your testing evidence. Our attack simulation and AI red-team practice runs the methodology end to end, for a board, an SOC and an auditor that each need to read the same report and see what they came for.
Share


