2. The Question
● I worked for 6 years as MySQL Technical Support
Engineer.
● A large percentage of cases from customers were related
to bad query plans/wrong index selection.
● Query Planning is a complex piece of code with many
knobs that can be tuned.
● When I started working at MongoDB I found that the
number of cases on that topic was very very (very) low. So
I asked myself:
Why?
3. Plan selection in other databases
● Traditional databases use a statistics approach to choose
the best plan:
○ The information about data distribution is not
accurate.
○ It is estimated by reading data with random dives in
the index tree (MySQL).
○ When some prerequisites are met (like number of
modified rows) statistics are automatically
recalculated.
4. Plan selection in MongoDB
● MongoDB uses a empirical method:
○ If there is no cached plan, then all viable execution plans,
based on the available indexes, are created.
○ MongoDB runs the query multiple times, one for each query
plan and benchmarks them. It chooses the one that
provides the best performance.
○ Once done, the plan is cached.
■ Future queries with the same shape will re-use this
plan rather than re-running the candidate plans.
■ For each such query the performance of the cached
plan is evaluated. If the plan's performance decreases
beyond a given threshold, it is evicted from the cache
and the candidate test phase runs again. This is known
as re-planning (SERVER-15225)
5. Benchmarking the plans
● All possible plans are executed in round-robin fashion.
● It gathers execution metrics and then provide a score to each
plan.
● Sort the plans by score and choose the best one.
7. Execution Metrics (II)
● Number of works:
■ The planner asks each plan for the next document, via a
call to work().
■ If the plan can supply a document, it responds with
'advanced'. Otherwise, the plan responds with
'needsTime'.
● If all documents have been retrieved, then isEOF = 1.
8. Early stop of query execution
● The query could be expensive, so there are limits to early
stop the execution. Execution stop if:
○ The maximum number of works has been reached.
○ The requested number of documents has been
retrieved (advanced).
○ We get isEOF (the resultSet has no more documents).
works
work()
isEOF advanced
Break
9. Number of work() calls before stopping
● internalQueryPlanEvaluationWorks = 10000
For large collections we take a fraction of the number of
documents:
● internalQueryPlanEvaluationCollFraction = 0.3
Then, get the maximum value.
internalQueryPlanEvaluationWorks
internalQueryPlanEvaluationCollFraction numRecords
works
10. Number of documents to retrieve before
stopping
● internalQueryPlanEvaluationMaxResults = 101
● query.getQueryRequest().getNToReturn()
○ Used in the old OP_QUERY protocol.
○ Drivers set 'ntoreturn' to min('batchSize', 'limit') in
order to fake the lack of 'limit' or 'batchSize'
mechanism in the protocol.
● query.getQueryRequest().getLimit()
○ Used in OP_QUERY protocol from 3.2 onwards.
getNToReturn
advanced getNToReturn internalQueryPlanEvaluationMaxResults
getLimit
advanced getLimit internalQueryPlanEvaluationMaxResults
advanced internalQueryPlanEvaluationMaxResults
advanced
11. Pick the best plan, count the scores
● baseScore = 1
● Productivity = queryResults / workUnits
● TieBreak (very small number) = min(1.0 / (10 * workUnits), 1e-4)
● noFetchBonus (covered index) = TieBreak or 0
● noSortBonus (blocking sort) = TieBreak or 0
● noIxisectBonus (avoiding index intersection) = TieBreak or 0
● tieBreakers = noFetchBonus + noSortBonus +
noIxisectBonus
● eofBonus (if during plan execution all possible documents are retrieved) = 0 | 1
12. Replanning: Automatic Plan Cache Eviction
● The stored data keep changing, it could possible that the
cached plan is not the best one anymore.
● While the cached plan is being used, MongoDB re-runs
the trial period for that plan and keeps a count of the
work() function calls.
● If the new trial period takes more than 10 times as many
works() as the original trial period, it evicts the plan from
the cache and re-tests all candidate plans to pick a new
winner.
● internalQueryCacheEvictionRatio = 10
maxWorksBeforeReplan internalQueryCacheEvictionRatio cachedWorks
currentWorks maxWorksBeforeReplan
replan()
13. Plans are not always cached
● In the following situations, the execution plan is not
cached:
○ Collection scan without sort()
○ hint()
○ min()
○ max()
○ explain()
○ Tailable cursors (they don’t use indexes)
○ snapshot()
○ A single viable plan
18. Query Planner Troubleshoot
● There are Plan Cache methods that can be used for
troubleshooting:
https://docs.mongodb.com/manual/reference/method/js-plan-cache/
● Check all query shapes:
○ db.collection.getPlanCache().listQueryShapes()
● Get the plan for a particular query:
○ db.collection.getPlanCache().getPlansByQuery(
<query>, <projection>, <sort> )
● Clean the plans for a particular query:
○ db.collection.getPlanCache().clearPlansByQuery()
● Clean all plans:
○ db.collection.getPlanCache().clear()