Skip to main content

Ecosystem: Streamhouse

Streamhouse complements Ververica's streaming-first architecture by providing seamless integration for both batch and streaming workloads. By integrating real-time data processing capabilities with advanced storage systems, Streamhouse serves as a critical tool for organizations seeking scalable and efficient data pipelines for AI, ML, and analytics workloads, all within the Ververica ecosystem.

Streamhouse is a solution that provides fast, fresh data, while remaining cost-effective and easy for engineers to implement.

Streamhouse Integration

How Streamhouse Fits into Ververica's Ecosystem

Streamhouse strengthens the Ververica ecosystem by providing an integrated, unified platform that makes managing streaming and batch data more efficient, scalable, and aligned with the broader goals of real-time data processing and analytics. It simplifies the complexities of real-time data processing by seamlessly integrating Apache Flink® with high-performance storage systems. Ververica Unified Streaming Data Platform is the first and only platform that supports both real-time streaming and Streamhouse.

Ververica developed Streamhouse by optimizing Flink and introducing Apache Flink CDC and Apache Paimon. It is powered by these three open-source software technologies:

  • Apache Flink CDC handles Change Data Capture streaming data ingestion.
  • Apache Flink serves as the computation engine for unified batch and stream processing.
  • Apache Paimon provides unified Lakehouse storage for batch, OLAP, and streaming queries.

Streamhouse orchestrates the capabilities of Flink, Flink CDC, and Paimon to enhance Flink-powered applications and foster innovation in real-time data processing within the Ververica ecosystem.

Key Features

The key Streamhouse features and capabilities leveraged in the Ververica ecosystem include:

CDC Data Ingestion for Lakehouse Integration

Streamhouse supports organizations with strong requirements for CDC data ingestion, offering a reliable pathway into data lakes.

Cost-Effective Processing for Near-Real-Time SLAs

With 1-minute latency capabilities, Streamhouse is a cost-effective alternative for scenarios where strict real-time SLAs are not mandatory.

Fresh Data Availability for Up-to-Date Lakehouses

By providing up-to-date data on lakehouses with 1-minute intervals, Streamhouse ensures consistent data freshness.

Efficient Incremental Updates

Streamhouse simplifies incremental updates and streaming materialized views, avoiding costly recalculations for predefined queries.

High-Performance OLAP with Fast Writes

Streamhouse excels in scenarios requiring rapid writes and high-performance OLAP queries.

Streamlining Connections Between Upstream and Downstream Data

Streamhouse facilitates seamless connections between upstream and downstream tables, enabling tasks like deduplication, multi-stream merging, lookup joins, and business KPI aggregation within a unified system.

Testing Streaming Use Cases with Minimal Investment

Streamhouse empowers businesses to experiment with streaming scenarios without significant upfront investment, making it a practical solution for batch-oriented infrastructures exploring streaming capabilities.

For more information on Streamhouse, see:

See also Apache Flink, Apache Paimon, and Flink CDC.