Apache Lucene development has always been vibrant, but the last few months have seen an especially high number of optimizations to query evaluation. There isn't one optimization that can be singled out, it's rather a combination of many improvements around mechanical sympathy and improved algorithms.
What is especially interesting here is that these optimizations do not only benefit some very specific cases, they translate into actual speedups in Lucene's nightly benchmarks, which aim at tracking the performance of queries that are representative of the real world. Just hover on annotations to see where a speedup (or slowdown sometimes!) is coming from. By the way, special thanks to Mike McCandless for maintaining Lucene's nightly benchmarks on his own time and hardware for almost 13 years now!
Key speedup benchmarks in Lucene
Here are some speedups that nightly benchmarks observed between Lucene 9.6 (May 2023) and Lucene 9.9 (December 2023):
- AndHighHigh: 35% faster
- AndHighMed: 15% faster
- OrHighHigh: 60% faster
- OrHighMed: 38% faster
- CountAndHighHigh: 15% faster
- CountAndHighMed: 11% faster
- CountOrHighHigh: 145% faster
- CountOrHighMed: 155% faster
- TermDTSort: 24% faster
- TermTitleSort: 290% faster (not a typo!)
- TermMonthSort: 7% faster
- DayOfYearSort: 25% faster
- VectorSearch: 5% faster
Lucene optimization resources
In case you are curious about these changes, here are resources that describe some of the optimizations that we applied:
- Bringing speedups to top-k queries with many and/or high-frequency terms (annotation FK)
- More skipping with block-max MAXSCORE (annotation FU)
- Accelerating vector search with SIMD instructions
- Vector similarity computations FMA-style
Lucene 9.9 was just released and is expected to be integrated into Elasticsearch 8.12, which should get released soon. Stay tuned!