Ecosystem: Streamhouse
Streamhouse complements Ververica's streaming-first architecture by providing seamless integration for both batch and streaming workloads. By integrating real-time data processing capabilities with advanced storage systems, Streamhouse serves as a critical tool for organizations seeking scalable and efficient data pipelines for AI, ML, and analytics workloads, all within the Ververica ecosystem.
Streamhouse is a solution that provides fast, fresh data, while remaining cost-effective and easy for engineers to implement.
How Streamhouse Fits into Ververica's Ecosystem
Streamhouse strengthens the Ververica ecosystem by providing an integrated, unified platform that makes managing streaming and batch data more efficient, scalable, and aligned with the broader goals of real-time data processing and analytics. It simplifies the complexities of real-time data processing by seamlessly integrating Apache Flink® with high-performance storage systems. Ververica Unified Streaming Data Platform is the first and only platform that supports both real-time streaming and Streamhouse.
Ververica developed Streamhouse by optimizing Flink and introducing Apache Flink CDC and Apache Paimon. It is powered by these three open-source software technologies:
- Apache Flink CDC handles Change Data Capture streaming data ingestion.
- Apache Flink serves as the computation engine for unified batch and stream processing.
- Apache Paimon provides unified Lakehouse storage for batch, OLAP, and streaming queries.
Streamhouse orchestrates the capabilities of Flink, Flink CDC, and Paimon to enhance Flink-powered applications and foster innovation in real-time data processing within the Ververica ecosystem.
Key Features
The key Streamhouse features and capabilities leveraged in the Ververica ecosystem include:
CDC Data Ingestion for Lakehouse Integration
Streamhouse supports organizations with strong requirements for CDC data ingestion, offering a reliable pathway into data lakes.
Cost-Effective Processing for Near-Real-Time SLAs
With 1-minute latency capabilities, Streamhouse is a cost-effective alternative for scenarios where strict real-time SLAs are not mandatory.
Fresh Data Availability for Up-to-Date Lakehouses
By providing up-to-date data on lakehouses with 1-minute intervals, Streamhouse ensures consistent data freshness.
Efficient Incremental Updates
Streamhouse simplifies incremental updates and streaming materialized views, avoiding costly recalculations for predefined queries.
High-Performance OLAP with Fast Writes
Streamhouse excels in scenarios requiring rapid writes and high-performance OLAP queries.
Streamlining Connections Between Upstream and Downstream Data
Streamhouse facilitates seamless connections between upstream and downstream tables, enabling tasks like deduplication, multi-stream merging, lookup joins, and business KPI aggregation within a unified system.
Testing Streaming Use Cases with Minimal Investment
Streamhouse empowers businesses to experiment with streaming scenarios without significant upfront investment, making it a practical solution for batch-oriented infrastructures exploring streaming capabilities.
Related Topics
For more information on Streamhouse, see:
- Video: VERA Core Pillar: Streamhouse
- Blog: Streamhouse Unveiled
- Blog: The Streamhouse Evolution
- Blog: Streamhouse: Data Processing Patterns
- Blog: Building real-time data views with Streamhouse
- Blog: From Kappa Architecture to Streamhouse: Making the Lakehouse Real-Time
See also Apache Flink, Apache Paimon, and Flink CDC.