Revolutionizing Data Structures: Optimizing B-epsilon-trees for Non-Volatile Memory (NVM)

The Problem: Managing Data Growth in a Modern World

As the world moves further into the digital era, the volume of data continues to grow exponentially. Businesses and applications must grapple with this surge, seeking innovative ways to keep critical information close to the CPU for quick access. Solutions like DRAM-powered main memory database systems offer speed but fall short in scalability and cost efficiency.

Enter Non-Volatile Memory (NVM), a revolutionary technology positioned between DRAM and SSD in the memory hierarchy. NVM promises high capacity, persistence, and latency close to DRAM. These unique properties are changing how data structures like B-trees and B𝜖-trees are designed and optimized. This blog explores our journey to optimize B𝜖-trees for NVM, unlocking new possibilities in data management.

A Brief Journey Through B-trees and B𝜖-trees

B-trees have long been the backbone of database and file systems, offering efficient indexing for large datasets. Variants like B+-trees and B𝜖-trees bring improvements for specific scenarios:

B+-trees excel in range queries by storing all data in leaf nodes.
B𝜖-trees optimize write-heavy workloads by introducing buffered messages at internal nodes, enabling batch updates.

But traditional designs fall short when leveraging the unique properties of NVM. Our goal? To create an NVM-optimized B𝜖-tree that embraces modern heterogeneous storage landscapes.

Why Does NVM Change the Game?

Unlike traditional storage technologies, NVM combines the best of DRAM and SSD:

Persistence: Data remains intact even after power loss.
Speed: Access times approach that of DRAM.
Byte Addressability: Supports direct access to data.

These characteristics demand a rethink of how data structures are designed. The challenge lies in leveraging NVM’s potential while navigating its limitations, such as higher write latencies compared to DRAM.

Our Solution: An Optimized B𝜖-tree for NVM

To harness the power of NVM, we redesigned the B𝜖-tree with the following key features:

1. Direct Access to Nodes in NVM

Unlike conventional approaches that move nodes to DRAM for processing, our design accesses nodes directly in NVM. This reduces transfer costs and computational overhead.

2. Centralized Buffering for Updates

We implemented a shared buffer in NVM to store messages. This approach avoids excessive I/O and reduces fragmentation by batching updates efficiently.

3. In-place Writes

Updates are written directly to the buffer segment of a node on storage devices, minimizing costly read-modify-write cycles.

4. Read Cache for Frequent Queries

A small buffer in DRAM stores recently accessed values, improving performance for workloads with high locality.

Real-world Scenarios: How It Works

Our optimized B𝜖-tree shines in a variety of use cases:

Scenario 1: Building a Tree

Starting in DRAM, the tree grows as usual. Once nodes are flushed to persistent storage, our design ensures they are optimized for NVM, storing keys and pivots directly while buffering messages in a shared structure.

Scenario 2: Handling Random Workloads

In workloads with high randomness, centralized buffering in NVM accumulates updates, reducing the need for frequent node writes. This minimizes write amplification and improves overall efficiency.

Scenario 3: Efficient Reads and Writes

Direct NVM access eliminates the need to fetch nodes into DRAM for processing. Coupled with our shared buffer and in-place writes, this drastically reduces latency for both read and write operations.

The Numbers Speak: Early Experiments

Preliminary experiments reveal the power of granular reads and direct NVM operations. Compared to traditional methods that rely heavily on DRAM transfers, our approach shows significant improvements in read/write times and reduced computational overhead.

Performance Chart

Why This Matters: The Future of Data Structures

The insights from this work pave the way for heterogeneity-aware storage engines. Technologies like Compute Express Link (CXL) are set to revolutionize enterprise data management, and our optimized B𝜖-tree aligns perfectly with this vision.

While the implementation is ongoing, the potential is clear: faster, more efficient data structures that fully leverage modern hardware capabilities. Stay tuned for updates on our progress at GitHub.

Acknowledgments

This work is supported by Deutsche Forschungsgemeinschaft (DFG) as part of SPP 2377. A heartfelt thanks to the collaborators and researchers contributing to this journey.

Author Note: We invite feedback, collaboration, and questions. Let’s innovate together!