linkedin/venice

每日信息看板 · 2026-03-03
开源项目
Category
github_search
Source
0
Score
2026-03-03T01:51:10Z
Published

AI 总结

LinkedIn 开源的 Venice 是面向超大规模负载的派生数据存储平台,提供高吞吐写入与低延迟读取并支持多区域复制,重要性在于可作为特征库等在线推理的关键状态存储。
#GitHub #repo #开源项目 #Venice #LinkedIn #CRDT #CDC

内容摘录

<html>
 <!-- We cannot use CSS anywhere in this page, because the GitHub main repo doesn't render it. CSS is fine within the other docs pages though. -->
 <div align="center">
 <img src="assets/style/venice_full_lion_logo.svg" width="50%" alt="Venice">
 <h3>
 Derived Data Platform for Planet-Scale Workloads<br/>
 </h3>
 <div>
 <!-- N.B.: We've got to leave no spaces within the <a href> tag otherwise we get blue link underlines inbetween the icons on the GitHub repo's main page (though not in the Just The Docs website). -->
 <a href="https://blog.venicedb.org/stable-releases"><img src="https://img.shields.io/docker/v/venicedb/venice-router?label=stable&color=green&logo=docker" alt="Stable Release"></a>
 <a href="https://github.com/linkedin/venice/actions?query=branch%3Amain"><img src="https://img.shields.io/github/actions/workflow/status/linkedin/venice/VeniceCI-StaticAnalysisAndUnitTests.yml" alt="CI"></a>
 <a href="https://venicedb.org/"><img src="https://img.shields.io/badge/docs-grey" alt="Docs"></a>
 </div>
 <div>
 <a href="https://github.com/linkedin/venice"><img src="https://img.shields.io/badge/github-%23121011.svg?logo=github&logoColor=white" alt="GitHub"></a>
 <a href="https://www.linkedin.com/company/venicedb/"><img src="https://img.shields.io/badge/linkedin-%230077B5.svg?logo=linkedin&logoColor=white" alt="LinkedIn"></a>
 <a href="https://twitter.com/VeniceDataBase"><img src="https://img.shields.io/badge/Twitter-%231DA1F2.svg?logo=Twitter&logoColor=white" alt="Twitter"></a>
 <a href="http://slack.venicedb.org"><img src="https://img.shields.io/badge/Slack-4A154B?logo=slack&logoColor=white" alt="Slack"></a>
 </div>
 </div>
</html>

Venice is a derived data storage platform, providing the following characteristics:
High throughput asynchronous ingestion from batch and streaming sources (e.g.
 Hadoop and Samza).
Low latency online reads via remote queries or in-process caching.
Active-active replication between regions with CRDT-based conflict resolution.
Multi-cluster support within each region with operator-driven cluster assignment.
Multi-tenancy, horizontal scalability and elasticity within each cluster.

The above makes Venice particularly suitable as the stateful component backing a Feature Store, such as
Feathr. AI applications feed the output of their ML training jobs into Venice and
then query the data for use during online inference workloads.
Overview

Venice is a system which straddles the offline, nearline and online worlds, as illustrated below.

!High Level Architecture Diagram
Dependency

You can add a dependency on Venice to any Java project as specified below. Note that, currently, Venice dependencies are
not published on Maven Central and therefore require adding an extra repository definition. All published jars can be
seen here. Usually, the project is released a few
times per week.
Gradle

Add the following to your build.gradle:
Maven

Add the following to your pom.xml:
APIs

From the user's perspective, Venice provides a variety of read and write APIs. These are fully decoupled from one
another, in the sense that no matter which write APIs are used, any of the read APIs are available.

Furthermore, Venice provides a rich spectrum of options in terms of simplicity on one end, and sophistication on the
other. It is easy to get started with the simpler APIs, and later on decide to enhance the use case via more advanced
APIs, either in addition to or instead of the simpler ones. In this way, Venice can accompany users as their
requirements evolve, in terms of scale, latency and functionality.

The following diagram presents these APIs and summarizes the components coming into play to make them work.

!API Overview
Write Path

Venice supports flexible data ingestion:
**Batch Push**: Full dataset replacement from Hadoop, Spark
**Incremental Push**: Bulk additions without full replacement
**Streaming Writes**: Real-time updates via Apache Samza or the
 Online Producer
**Write Compute**: Partial updates and collection merging for efficiency
**Hybrid Stores**: Mix batch and streaming with configurable rewind time
Read Path

Venice provides multiple read APIs and client options:

**Read APIs**:
Single get, batch get
Read compute with server-side operations (dot product, cosine similarity, field projection)

**Client Types**:
**Thin Client**: Stateless, 2 network hops, < 10ms latency
**Fast Client**: Partition-aware, 1 network hop, < 2ms latency
**Da Vinci Client**: Stateful local cache, 0 network hops, < 1ms latency

All clients share the same APIs, enabling flexible cost/performance optimization without code changes.

**Change Data Capture (CDC)**: Stream all data changes (inserts, updates, deletes) for use cases like ML feature
retrieval and client-side indexing.

---

For a comprehensive guide to Venice's architecture, write modes, client characteristics, and capabilities, see the
Architecture Overview.
Resources

The _Open Sourcing Venice_
blog and
conference talk are good starting points to get an overview of what use
cases and scale can Venice support. For more Venice posts, talks and podcasts, see our
Learn More page.
Getting Started

Start with the Getting Started guide to learn Venice concepts and deploy your first
cluster. The guide covers architecture fundamentals and provides quickstart instructions for both single and
multi-datacenter deployments. We recommend sticking to our latest
stable release.
Community

Feel free to engage with the community using our:

<!-- N.B.: The links are duplicated here between the icon and text, otherwise the blue link underline extends into the space, which does not look good. -->
<img src="assets/icons/slack-icon.svg" width="15" />
 Slack workspace
Archived and publicly searchable on Linen
<img src="assets/icons/linkedin-icon.svg" width="15" />
 LinkedIn group
<img src="assets/icons/github-icon.svg" width="15" />
 GitHub issues
<img src="assets/icons/github-icon.svg" width="15" />
 Contributor's guide

Follow us to hear m…