Diff-index: Differentiated index in distributed log-structured data stores

Wei Tan, Sandeep Tata, Liana Fong, Yuzhe Tang

Research output: Chapter in Book/Entry/PoemConference contribution

19 Scopus citations

Abstract

Log-Structured-Merge (LSM) Tree gains much attention recently because of its superior performance in write-intensive workloads. LSM Tree uses an append-only structure in memory to achieve low write latency; at memory capacity, in-memory data are flushed to other storage media (e.g. disk). Consequently, read access is slower comparing to write. These specific features of LSM, including no in-place update and asymmetric read/write performance raise unique challenges in index maintenance for LSM. The structural difference between LSM and B-Tree also prevents mature B-Tree based approaches from being directly applied. To address the issues of index maintenance for LSM, we propose Diff-Index to support a spectrum of index maintenance schemes to suit different objectives in index consistency and performance. The schemes consist of sync-full, sync-insert, async-simple and async-session. Experiments on our HBase implementation quantitatively demonstrate that Diff-Index offers various performance/consistency balance and satisfactory scalability while avoiding global coordination. Syncinsert and async-simple can reduce 60%-80% of the overall index update latency when compared to the baseline syncfull; async-simple can achieve superior index update performance with an acceptable inconsistency. Diff-Index exploits LSM features such as versioning and the flush-compact process to achieve goals of concurrency control and failure recovery with low complexity and overhead. Diff-Index is included in IBM InfoSphere BigInsights, an IBM big data offering.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2014
Subtitle of host publication17th International Conference on Extending Database Technology, Proceedings
EditorsVincent Leroy, Vassilis Christophides, Vassilis Christophides, Stratos Idreos, Anastasios Kementsietsidis, Minos Garofalakis, Sihem Amer-Yahia
PublisherOpenProceedings.org, University of Konstanz, University Library
Pages700-711
Number of pages12
ISBN (Electronic)9783893180653
DOIs
StatePublished - 2014
Externally publishedYes
Event17th International Conference on Extending Database Technology, EDBT 2014 - Athens, Greece
Duration: Mar 24 2014Mar 28 2014

Publication series

NameAdvances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings

Other

Other17th International Conference on Extending Database Technology, EDBT 2014
Country/TerritoryGreece
CityAthens
Period3/24/143/28/14

Keywords

  • Cap theorem
  • Distributed databases
  • Failure recovery
  • Log-Structured Merge (LSM) tree
  • Secondary index

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Diff-index: Differentiated index in distributed log-structured data stores'. Together they form a unique fingerprint.

Cite this