Eric Sun’s Post

View profile for Eric Sun, graphic

Data Expert with System Insights

https://lnkd.in/gjURFX8K >> Secondary Indices and Materialized Views are the async by-products of the tables in lakehouse. ClickHouse, Materialize, RisingWave, Lance, and StarRocks have serious support for various types of indices to accelerate queries already. >> The faster lakehouse requires optimized storage layout + unique index to accelerate Merge-On-Write and query acceleration. >> Not aware of any cloud database is using in-memory (or local block storage) to deduplicate and Merge-On-Read with committed Iceberg/Delta files. If you know any company or project in this kind, please kindly leave a comment here. #lakehouse #partition #index #lsm #bucket #merge #deduplicate #paimon #starrocks #puffin #hyperspace #compact #kafka

Improve Ingest Latency and Query Efficiency of Data Lake — Partition and Index

Improve Ingest Latency and Query Efficiency of Data Lake — Partition and Index

eric-sun.medium.com

Shiyan Xu

Apache Hudi PMC member

3w

I'd like to point out that Apache Hudi has a very rich set of indexing support covering both read and write. For e.g., there is record-level index for fast point look-up https://hudi.apache.org/blog/2023/11/01/record-level-index/, writer indexes cater for different data patterns https://blog.datumagic.com/p/apache-hudi-from-zero-to-one-410, and functional indexes for flexible indexing needs https://github.com/apache/hudi/blob/master/rfc/rfc-63/rfc-63.md

Shaeq Ahmed

Co-Founder @ Matano (YC W23) - Cloud Native SIEM

3w

I believe Apache Amoro is doing this using the Mixed Iceberg format, it uses a tiered combination of Merge On Read equality and positional deletes and a smart compaction service to enforce a primary key constraint allowing for deduplication in streaming and batch scenarios. https://github.com/apache/amoro

Roy Hasson

Product @ Upsolver | Data engineer | Advocate for better data

3w

Not a cloud database, but in Upsolver we ingest and dedupe streams into Iceberg. We leverage equality deletes instead of position deletes to speed up writes. We did contribute MoR performance improvements to Trino and Presto to take advantage of equality deletes and saw massive read improvements. More detail here - https://youtu.be/j0ax6bwMYrQ?si=uV4gGJ5OSqZGlRJQ

Luke Kim

Founder and CEO of Spice AI - we're hiring!

3w
See more comments

To view or add a comment, sign in

Explore topics