How Garanti BBVA Validated UGI Chatbot's LLMs with Co-one
- Busra Demir
- 3 Tem
- 2 dakikada okunur
Güncelleme tarihi: 3 Tem

Challenge: Limitations of Traditional Testing Methods for LLMs
Garanti BBVA aimed to enhance digital customer services and improve satisfaction by transitioning its chatbot system, UGI, from a natural language processing (NLP)-based architecture to one powered by large language models (LLMs). In this transformation, Garanti BBVA partnered with Co-one to ensure their LLMs were tested, validated, and optimized for real-world banking use.
This technical transformation also meant redefining the accuracy and reliability of customer communication. However, traditional testing approaches proved inadequate for the dynamic and multilayered nature of LLM-based systems. The bank needed a scalable, human-centered validation process that reflected real user behavior.
"Working with Co-one helped Garanti BBVA structure the testing and validation process of our LLM-based chatbot in a more scalable and systematic way. Their layered approach from intent testing to live data evaluation allowed us to identify and address key issues before production. The collaboration contributed meaningfully to improving the reliability of our conversational AI system, which is increasingly critical in delivering accurate and trusted digital experiences in banking." Merve Çakır – Head of Digital Marketing
Solution: Co-one's Three-Layered Testing and Evaluation Approach
Co-one deployed a custom-designed, three-layered testing framework to meet this need:
Intent Testing: Each new intent was tested by Co-one’s expert team and its powerful crowdsource community. A wide range of user profiles simulated interactions, and chatbot responses were evaluated for clarity, accuracy, and satisfaction. Issues were identified and resolved before going live, boosting overall response quality.
Model Validation: Once the chatbot was deployed, continuous monitoring ensured accuracy and identified hallucinations or biases. Co-one categorized these issues by version and delivered detailed error cluster reports to the bank for early resolution.
Customer-Centered Learning: Real-time analysis of customer queries and chatbot responses enabled the detection of model errors, missing intents, and unmet user needs. This insight extended beyond model improvement to inform Garanti BBVA’s overall service strategy.
Results: Garanti BBVA LLMs with Co-one: Measurable Performance Gains and Strategic Insights
Analysis of up to 300,000 customer interactions and an average of 20,000 test data points per month
LLM responses are evaluated based on many criteria, including accuracy of information and explanatory power
System responses are tested in challenging scenarios, like offensive language or spelling mistakes
Today, the process of testing new intents and observing their effects in the live environment is still ongoing. As Co-one, we are proud to be by Garanti BBVA’s side at every stage of this transformation journey.