This Week in Elasticsearch and Apache Lucene - 2018-02-05

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Elasticsearch SQL Plugin

The SQL plugin should merged into master this week. It provides a full blown (alpha) SQL engine (that does parsing, analysis and optimization) that supports read-only queries against Elasticsearch indices. Features include:

  • Projections (SELECT),
  • filtering (WHERE),
  • sorting (ORDER BY),
  • grouping (GROUP BY) including filtering (HAVING),
  • scalar (ABS, SIN, COS, ...) and aggregate (MAX, MIN, AVG, ...) functions and arbitrary match (SELECT MAX(salary)-MIN(salary)/COUNT(*) + 4) are supported,
  • and also full-text search (QUERY, MATCH)

The plugin will ship with three drivers:

  • Rest/HTTP, which accepts an SQL query wrapped in JSON and returns results in JSON - could be used by Canvas or Kibana
  • CLI, which provides a command line interface/text interface
  • JDBC, for interfacing with Java applications

An ODBC driver is in the works.

New option to control whether partial results are allowed in search requests #27435

When executing a search today, we return results from as many shards as we can, and we include a _shards section in the result body to indicate how many shards should have been searched and how many shards were actually searched. Reasons for a shard failing to return results include:

  • The search times out on the shard
  • An error occurs executing the search for the shard (including errors like a missing geo or nested field)
  • The shard is red and there are no allocated shard copies that the search can be performed on

To explain the reasoning for this, imagine you are retrieving social-media-style updates from a user's friends: if one shard is down, it may be better to display some updates instead of showing none at all. However, this logic could be the wrong choice when showing (eg) analytics - showing a graph of total visits based on partial data is just wrong, and it relies on the user checking the _shards section in the response (which almost nobody does) to know whether they are seeing meaningful results or not.

We have added a new query parameter to the search API called allow_partial_results which controls whether results should still be returned if the search fails for any reason on one or more shards. When set to true partial results are allowed and results will be returned even if not all shards successfully completed. If the parameter is set to false, an exception is thrown if the search fails on any shard.

The default is currently true, and there is an on-going discussion about whether we should consider changing the default to false in the future. This decision is contentious because it is a big breaking change and allowing partial results might not always be a bad decision. For instance, imagine a user has a field foo which is mapped as a text field on older indices and a keyword field in newer indices. Today, Kibana can run a terms aggregation on the newer indices and just ignore the exceptions from the older indices with the incorrect mapping. Perhaps there is another way of solving this particular issue while still benefiting from allow_partial_results=false.

Changes in 5.6:

  • REST high-level client: Fix parsing of script fields #28395
  • X-Pack:
    • [Security] Clear Realm Caches on role mapping health change #3782

Changes in 6.2:

  • X-Pack:
    • Watcher: Ensure state is cleaned properly in watcher life cycle service #3770

Changes in 6.3:

  • BREAKING: Add a shallow copy method to aggregation builders #28430
  • Search - new flag: allow_partial_search_results #27906
  • Add ability to index prefixes on text fields #28290
  • Move persistent tasks to core #28455
  • Allows failing shards without marking as stale #28054
  • Scripts: Fix security for deprecation warning #28485
  • Forbid trappy methods from java.time #28476
  • Synced-flush should not seal index of out of sync replicas #28464
  • Replicate writes only to fully initialized shards #28049
  • Remove Painless Type From Locals, Variables, Params, and ScriptInfo #28471
  • Remove RuntimeClass from Painless Definition in favor of Painless Struct #28486
  • Remove Painless Type From Painless Method/Field #28466
  • Remove Painless Type in favor of Java Class in FunctionRef #28429
  • Remove Painless Type from e-nodes in favor of Java Class #28364
  • Further Removal of Painless Type from Lambdas and References. #28433
  • Add lower bound for translog flush threshold #28382
  • REST high-level client: add support for split and shrink index API #28425
  • Add support for indices exists to REST high level client #27384
  • Add ranking evaluation API to High Level Rest Client #28357
  • Java high-level REST : minor code clean up #28409
  • Do not take duplicate query extractions into account for minimum_should_match attribute #28353
  • Fix AIOOB on indexed geo_shape query #28458
  • Replace Bits with new abstract class (#24088) #28334
  • Suppress assertions about rounding of times near overlapping days #28151
  • XContent: Factor deprecation handling into callback #28449
  • X-Pack:
    • Watcher: Add support for scheme in proxy configuration #3614
    • XContent: Adapt to new method on parser #3797
    • [Security] Correct DN matches in role-mapping rules #3704

Changes in 7.0:

  • BREAKING: remove deprecated percolator map_unmapped_fields_as_string setting #28060
  • Add allow_partial_search_results flag to search requests with default setting true #28440
  • BREAKING: Remove tribe node support #28443
  • X-Pack:
    • BREAKING: Remove all tribe related code, comments and documentation #3784

Apache Lucene

Multi-release JAR to take advantage of Java 9 optimizations

After several performance testing iterations to better understand potential impacts of this change, Lucene will now build a multi-release JAR in order to take advantage of some new APIs introduced in Java 9 like Objects.checkIndex and Arrays.mismatch, which can't be implemented as efficiently with Java 8.

The build still works with Java 8: this change is implemented through two new classes FutureObjects and FutureArrays which are functionally compatible with Java9's Objects and Arrays. Then the build creates the Java9 classes with ASM by remapping calls to FutureObjects/FutureArrays with calls to Objects/Arrays.

We now need to double down on testing with both Java 8 and Java 9 since different code might run depending on the JVM version.

Other