Overview
This tutorial demonstrates how to use Ragas to evaluate the quality of your Contextual AI’s Retrieval Augmented Generation (RAG) agents. The purpose of this notebook is to show the flexibility of Contextual AI’s platform to support external evaluation approaches. The approach shown here with Ragas can be used similarly with other evaluation tools.What is Ragas?
Ragas is an open-source evaluation framework specifically designed for RAG systems. It provides several important metrics to assess the quality of both the retrieval and generation components:- Faithfulness - Measures if the generated answer is factually consistent with the retrieved context
- Context Relevancy - Evaluates if the retrieved passages are relevant to the question
- Context Recall - Checks if all information needed to answer the question is present in the context
- Answer Relevancy - Assesses if the generated answer is relevant to the question
This tutorial assumes you already have a Contextual AI Agent setup. If you haven’t, please follow the Contextual AI Platform Quickstart
Scope
This tutorial can be completed in under 30 minutes and covers:- Setting up the Ragas evaluation environment
- Preparing evaluation datasets
- Querying Contextual AI RAG agents
- Calculating RAGAS metrics:
- Faithfulness: Measures factual consistency with retrieved context
- Context Recall: Evaluates completeness of retrieved information
- Answer Accuracy: Assesses match with reference answers
- Analyzing and exporting evaluation results
Prerequisites
- Contextual AI API Key
- OpenAI API Key (for RAGAS evaluation)
- Python 3.8+
- Required dependencies (listed in
requirements.txt)
Environment Setup
Before running the notebook, install the required packages below. These libraries provide tracking, evaluation, export capabilities, and LLM access.Required Packages
- langfuse – Tracking and observability
- ragas – Core evaluation framework
- openpyxl – Enables exporting results to Excel
- openai – Provides LLM access (used internally by RAGAS)
- langchain-openai – LangChain integration with OpenAI
- langchain-contextual – Connects LangChain to Contextual AI
You may need to restart the kernel to use updated packages.
Import Dependencies
With the environment ready, we can import the necessary libraries and initialize our clients. The imports are grouped for clarity and maintainability:Import Structure
- Standard library imports — Core Python functionality
- Third-party imports — Data processing and API interaction
- RAGAS imports — Evaluation metrics and utilities
- Client initialization — Contextual AI client and the evaluator LLM
Contextual AI uses GPT-4o as the evaluator model, as high-quality evaluation depends on the model’s ability to understand nuance, interpret context, and accurately compare textual information.