How is your chatbot doing?

9senses
Chatbot Audit

We have all come across AI-powered chatbots which are supposed to help us - they seem to be everywhere by now. Sometimes they work surprisingly well by delivering answers within a few seconds and save us from spending half an hour in a call center queue, or from navigating tons of subpages to find the information we are looking for. On the other hand, we are often stuck with an "AI assistant" who does not understand what we want, sends us into loops of irrelevant question and answer games, and makes us walk away with the feeling that we have just spent our valuable time with a solution that could not care less about our problems.

This is why 9senses created its Chatbot Audit. Our Level 1 audit is a standardized assessment tool to measure the performance of your AI-powered Chatbot and to compare it to best-in-class bots. Using a carefully developed and tested methodology within the 9senses AI Audit Framework, we review your bot and provide an assessment of its performance and where it can be improved. Delivery time for this black-box review is only 5 business days.

Please click to read a sample audit

Level 1 Audit

An independent, external review of your chatbot's observable performance (black-box). No internal access to your technical environment required. Delivered within 5 business days.

→ Scroll down to buy on this page

Level 2 Audit

A tailored deep-dive analysis, e.g. reviewing technical architecture, retrieval systems, governance, compliance, and business value - take the next step if critical issues are identified in Level 1.

→ Contact us to learn more

Level 1 Audits Completed

9senses Chatbot Audit (Level 1)

Image by Mohamed Nohassi on unsplash.com

The 9senses Chatbot Audit (Level 1) provides you with a first thorough review of the performance of your AI chatbot from a user perspective. The black-box review is based on an analysis of the following elements, with recommendations for improving your bot:

Response Quality
Speed
User Interface
Dialog Quality
Multilingual abilities (optional)

Additionally, we provide a first outside-in perspective on Business Value, Compliance and Ethics.

Please have a look at our FAQs for more details on our Chatbot Audit.

Select Chatbot Options

CHF599.00

Use case creation (+CHF200.00)

Select this option if you want us to develop 3-4 relevant use cases

Open search bot (+CHF150.00)

Select this option if your bot collects information from external sites or search engines

Internal Bot (+CHF150.00)

Select this option if your bot requires a login (e.g. for customers or employees)

Translation test (+CHF100.00)

Verify bot translations in one foreign language (supported: E/D/F/I/ES/NL)

Executive briefing (+CHF240.00)

30-minute conference call to explain and discuss findings

Base price:	CHF599.00
Options:
Order total:

Total price excluding tax

FAQs - Frequently Asked Questions

What is a Chatbot Audit?

A chatbot audit is a structured evaluation of a chatbot’s behavior. It can be performed as a closed-box audit (Level 1) that only evaluates observable behavior, or as an open-box audit (Level 2) that evaluates value, structures, and behavior based on detailed insights.

Why is a Chatbot Audit important?

Chatbots are often the first point of contact with prospects and customers and directly influence customer experience, brand perception, and operational efficiency. A Chatbot Audit identifies weaknesses before they create reputational or business risk. It provides a structured performance baseline and highlights optimization potential in containment, user guidance, and governance readiness.

Who should consider a Chatbot Audit?

Organizations using AI chatbots for customer service, sales, onboarding, support or internal employee guidance should consider a Chatbot Audit.

What does the 9senses Level 1 Chatbot Audit include?

The Level 1 Chatbot Audit includes structured use case testing, hallucination stress scenarios, redirection behavior analysis, dialog flow observation, and (optionally) multilingual consistency checks.

Performance is evaluated across response quality (50% weight), speed (20%), user interface (15%), and dialog quality (15%), resulting in a weighted overall performance score.

Click here to see a sample audit.

What methodology is used in the Level 1 Chatbot Audit?

The audit follows the 9senses AI Auditing Framework. It applies real-world functional testing, edge-case scenarios (e.g., ambiguous or invalid inputs), hallucination stress tests, and consistency checks.

Each dimension is scored on a standardized 1–5 scale and aggregated into an overall performance index to ensure comparability and objectivity.

Please click here to see a sample report with a methodology explanation.

How does the Chatbot Audit detect hallucinations?

Hallucinations, the creation of invalid or invented content, poses a serious reputation and compliance risk. The audit includes targeted hallucination stress testing.

We deliberately introduce invalid references, misspellings, and ambiguous prompts to assess whether the chatbot fabricates information or requests clarification. The evaluation examines entity validation, grounding behavior, and escalation logic.

Does the Chatbot Audit assess compliance (EU AI Act, GDPR)?

Level 1 includes a preliminary external review of observable compliance and transparency indicators (AI disclosure, GDPR-relevant UI elements, accessibility).

A full regulatory and governance analysis - including documentation and architecture review - could be part of a tailored Level 2 Chatbot Audit.

Is the Level 1 Chatbot Audit a technical review?

No. The Level 1 Chatbot Audit is a behavioral (black-box) evaluation. It assesses publicly observable system behavior from a user and governance perspective without reviewing internal architecture, training data, retrieval systems, or security infrastructure.

Some technical assumptions that can be inferred from observed behavior will be shared. Technical deep dives can be included in tailored Level 2 Audits.

What information is required to conduct a Chatbot Audit?

For Level 1, we primarily require access to the live chatbot interface and a briefing on usage context (e.g., intended business objectives, target audience, supported languages). No internal system documentation or technical configuration access is required for the behavioral audit.

If you do not book the Use Case Creation option, we will provide you with access to a form to enter your own use cases. If you select the Use Case Option, we will define use cases based on your basic briefing and send them to you for review before executing the audit.

How long does a Chatbot Audit take?

The Level 1 Chatbot Audit is completed within five business days after we have received your briefing and (if required) access information.

Should you have booked the Use Case Creation Option, please allow for 2 additional business days for the delivery of our use case suggestions.

How do you ensure confidentiality?

All audit work and deliverables are handled under confidentiality. Reports and findings are shared only with the customer. Numeric audit results are used for best-in-class benchmarking.

What Chatbot Audit options can I book?

The 9senses Level 1 Chatbot Audit can be tailored to your chatbot’s setup and business context. In addition to the core behavioral audit, you can select the following options:

Use Case Creation
In the basic version, we will ask you to define use cases internally. If you select this option, we define 3–5 realistic, business-relevant test scenarios aligned with your customer journeys (e.g., service booking, product comparison, account support). Select this option if you do not yet have structured evaluation cases.
Open Search Bot Review
This is needed if your chatbot retrieves information from external websites or search engines. This option evaluates source consistency, grounding behavior, and increased hallucination exposure.
Internal / Login-Based Bot Review
For chatbots accessible only behind login (e.g., customer portals or employee systems). This ensures evaluation within authenticated environments and role-specific flows.
Translation & Multilingual Testing
Includes structured verification of one additional language, focusing on consistency, language switching behavior, and translation accuracy.
Executive Briefing
A structured management-level walkthrough of findings. We translate audit results into decision priorities, risk implications, and concrete next steps.

These options allow you to adapt the Chatbot Audit to your technical architecture, risk exposure, and governance requirements.

What is the difference between Level 1 and Level 2 Chatbot Audits?

Level 1 is an external behavioral audit focused on user-facing performance and experience.

Level 2 is an open-book analysis tailored to your needs or concerns. For example, it could concern your bot's technical architecture, retrieval systems (RAG), governance, compliance, risk management, and business value modeling.

When should I consider a Level 2 Chatbot Audit?

A Level 2 Audit should be considered when the chatbot performs below 3.5 in answer or dialog quality and is prominently visible on your website.

Equally, a Level 2 audit is warranted if the bot has an important role in a closed user group context (e.g. customers or employees) with associated business risk.

Can you audit LLM-based or ChatGPT-based chatbots?

Yes. The 9senses Chatbot Audit framework applies to rule-based bots, retrieval-augmented generation (RAG) systems, and large language model-based assistants. The methodology focuses on observable performance, containment behavior, hallucination risk, and governance readiness rather than the technical foundation.