Real-Time AI and RAG with Ververica
Ververica provides a framework to integrate Retrieval-Augmented Generation (RAG) and other machine learning models directly into its stream processing engine. This capability allows stateful stream processing applications written in Flink SQL to enrich data streams by querying external knowledge sources. The system can invoke Large Language Models (LLMs) to perform tasks such as data classification, transformation, and generation based on both the streaming data and the retrieved external context.
Core Concepts
- Retrieval-Augmented Generation (RAG): RAG is an architecture used to improve the output of LLMs by including documents or data from an external knowledge source. Before the LLM generates a response, a retriever component first searches a knowledge source (e.g., a vector database) for relevant information. This retrieved data is then passed as context along with the user's original prompt to the LLM.
- Vector Embeddings: To perform semantic retrieval, unstructured text is converted into numerical vectors known as embeddings. This allows for quantitative comparison of semantic similarity.
- Vector Database: An external database, such as Milvus, that stores vector embeddings. During retrieval, the system queries this database to find vectors similar to the incoming event data.
AI Capabilities and SQL Reference
Ververica supports AI integration through dedicated SQL statements and functions. For detailed technical information, see the following topics:
- Model DDLs: Statements for registering, viewing, and deleting AI models (e.g.,
CREATE MODEL). - ML_PREDICT: How to use the
ML_PREDICTfunction for real-time AI inference. - VECTOR_SEARCH: How to perform semantic searches using the
VECTOR_SEARCHfunction.
Architecture
The RAG workflow in Ververica is implemented as a continuous data pipeline within a Flink job:
- Ingest: The job consumes a real-time data stream.
- Embed: Data is passed to a registered embedding model to generate a vector.
- Retrieve: The job queries an external Vector DB for relevant context.
- Augment & Generate: The original data and retrieved context are sent to a generative LLM.
- Egress: The generated output is sent to a downstream sink.
Use Cases
- Customer Support Chatbots: Query order history and technical documentation to provide specific answers.
- Real-Time Recommendation Engines: Retrieve semantically similar products based on user activity.
- Financial Anomaly Detection: Enrich transactions with historical data to identify fraudulent patterns.
- Log Analysis: Retrieve relevant troubleshooting steps from a knowledge base when errors are detected.