SlideShare a Scribd company logo
1 of 19
Download to read offline
Miguel Angel Nieto
miguel.nieto@mongodb.com
Technical Services Engineer, MongoDB
Query Planner
The Question
● I worked for 6 years as MySQL Technical Support
Engineer.
● A large percentage of cases from customers were related
to bad query plans/wrong index selection.
● Query Planning is a complex piece of code with many
knobs that can be tuned.
● When I started working at MongoDB I found that the
number of cases on that topic was very very (very) low. So
I asked myself:
Why?
Plan selection in other databases
● Traditional databases use a statistics approach to choose
the best plan:
○ The information about data distribution is not
accurate.
○ It is estimated by reading data with random dives in
the index tree (MySQL).
○ When some prerequisites are met (like number of
modified rows) statistics are automatically
recalculated.
Plan selection in MongoDB
● MongoDB uses a empirical method:
○ If there is no cached plan, then all viable execution plans,
based on the available indexes, are created.
○ MongoDB runs the query multiple times, one for each query
plan and benchmarks them. It chooses the one that
provides the best performance.
○ Once done, the plan is cached.
■ Future queries with the same shape will re-use this
plan rather than re-running the candidate plans.
■ For each such query the performance of the cached
plan is evaluated. If the plan's performance decreases
beyond a given threshold, it is evicted from the cache
and the candidate test phase runs again. This is known
as re-planning (SERVER-15225)
Benchmarking the plans
● All possible plans are executed in round-robin fashion.
● It gathers execution metrics and then provide a score to each
plan.
● Sort the plans by score and choose the best one.
Execution Metrics (I)
● These are the metrics:
works
advanced
needTime
isEOF
Execution Metrics (II)
● Number of works:
■ The planner asks each plan for the next document, via a
call to work().
■ If the plan can supply a document, it responds with
'advanced'. Otherwise, the plan responds with
'needsTime'.
● If all documents have been retrieved, then isEOF = 1.
Early stop of query execution
● The query could be expensive, so there are limits to early
stop the execution. Execution stop if:
○ The maximum number of works has been reached.
○ The requested number of documents has been
retrieved (advanced).
○ We get isEOF (the resultSet has no more documents).
works
work()
isEOF advanced
Break
Number of work() calls before stopping
● internalQueryPlanEvaluationWorks = 10000
For large collections we take a fraction of the number of
documents:
● internalQueryPlanEvaluationCollFraction = 0.3
Then, get the maximum value.
internalQueryPlanEvaluationWorks
internalQueryPlanEvaluationCollFraction numRecords
works
Number of documents to retrieve before
stopping
● internalQueryPlanEvaluationMaxResults = 101
● query.getQueryRequest().getNToReturn()
○ Used in the old OP_QUERY protocol.
○ Drivers set 'ntoreturn' to min('batchSize', 'limit') in
order to fake the lack of 'limit' or 'batchSize'
mechanism in the protocol.
● query.getQueryRequest().getLimit()
○ Used in OP_QUERY protocol from 3.2 onwards.
getNToReturn
advanced getNToReturn internalQueryPlanEvaluationMaxResults
getLimit
advanced getLimit internalQueryPlanEvaluationMaxResults
advanced internalQueryPlanEvaluationMaxResults
advanced
Pick the best plan, count the scores
● baseScore = 1
● Productivity = queryResults / workUnits
● TieBreak (very small number) = min(1.0 / (10 * workUnits), 1e-4)
● noFetchBonus (covered index) = TieBreak or 0
● noSortBonus (blocking sort) = TieBreak or 0
● noIxisectBonus (avoiding index intersection) = TieBreak or 0
● tieBreakers = noFetchBonus + noSortBonus +
noIxisectBonus
● eofBonus (if during plan execution all possible documents are retrieved) = 0 | 1
Replanning: Automatic Plan Cache Eviction
● The stored data keep changing, it could possible that the
cached plan is not the best one anymore.
● While the cached plan is being used, MongoDB re-runs
the trial period for that plan and keeps a count of the
work() function calls.
● If the new trial period takes more than 10 times as many
works() as the original trial period, it evicts the plan from
the cache and re-tests all candidate plans to pick a new
winner.
● internalQueryCacheEvictionRatio = 10
maxWorksBeforeReplan internalQueryCacheEvictionRatio cachedWorks
currentWorks maxWorksBeforeReplan
replan()
Plans are not always cached
● In the following situations, the execution plan is not
cached:
○ Collection scan without sort()
○ hint()
○ min()
○ max()
○ explain()
○ Tailable cursors (they don’t use indexes)
○ snapshot()
○ A single viable plan
Query Planner Troubleshoot Example (I)
● We check all query shapes:
listQueryShapes
Query Planner Troubleshoot Example (II)
● Get the execution plan for that query:
getPlansByQuery
solution
score
works
isEOF
Query Planner Troubleshoot Example (III)
● Remove the query plan for a particular query:
clearPlansByQuery
"plans": [ ]
Query Planner Troubleshoot Example (IV)
● Remove all query plans on a particular collection:
clear
Query Planner Troubleshoot
● There are Plan Cache methods that can be used for
troubleshooting:
https://docs.mongodb.com/manual/reference/method/js-plan-cache/
● Check all query shapes:
○ db.collection.getPlanCache().listQueryShapes()
● Get the plan for a particular query:
○ db.collection.getPlanCache().getPlansByQuery(
<query>, <projection>, <sort> )
● Clean the plans for a particular query:
○ db.collection.getPlanCache().clearPlansByQuery()
● Clean all plans:
○ db.collection.getPlanCache().clear()
Thanks!

More Related Content

What's hot

An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query OptimizationMongoDB
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 
MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Reading the .explain() Output
Reading the .explain() OutputReading the .explain() Output
Reading the .explain() OutputMongoDB
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsasync_io
 
Intro to HTML Elements and CSS Declarations
Intro to HTML Elements and CSS DeclarationsIntro to HTML Elements and CSS Declarations
Intro to HTML Elements and CSS DeclarationsBruce Clary
 
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsHydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler
 
Basic Concept of Node.js & NPM
Basic Concept of Node.js & NPMBasic Concept of Node.js & NPM
Basic Concept of Node.js & NPMBhargav Anadkat
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentationHyphen Call
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningPuneet Behl
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleMariaDB plc
 
3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!Edureka!
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architectureBishal Khanal
 

What's hot (20)

An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management Demystified
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Reading the .explain() Output
Reading the .explain() OutputReading the .explain() Output
Reading the .explain() Output
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.js
 
Intro to HTML Elements and CSS Declarations
Intro to HTML Elements and CSS DeclarationsIntro to HTML Elements and CSS Declarations
Intro to HTML Elements and CSS Declarations
 
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsHydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
 
Basic Concept of Node.js & NPM
Basic Concept of Node.js & NPMBasic Concept of Node.js & NPM
Basic Concept of Node.js & NPM
 
Less presentation
Less presentationLess presentation
Less presentation
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
 
3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 

Similar to Query planner

Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodbDeep Kapadia
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doMetehan Çetinkaya
 
Indexing and Query Performance in MongoDB.pdf
Indexing and Query Performance in MongoDB.pdfIndexing and Query Performance in MongoDB.pdf
Indexing and Query Performance in MongoDB.pdfMalak Abu Hammad
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overviewjoeyrobert
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDBElieHannouch
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Mongodb Performance
Mongodb PerformanceMongodb Performance
Mongodb PerformanceJack
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
complexity.pptx
complexity.pptxcomplexity.pptx
complexity.pptxDr.Shweta
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemMarsan Ma
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)Mihnea Giurgea
 
Spark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreILSpark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreILlioron22
 

Similar to Query planner (20)

Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.do
 
Indexing and Query Performance in MongoDB.pdf
Indexing and Query Performance in MongoDB.pdfIndexing and Query Performance in MongoDB.pdf
Indexing and Query Performance in MongoDB.pdf
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overview
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Lecture1
Lecture1Lecture1
Lecture1
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Mongodb Performance
Mongodb PerformanceMongodb Performance
Mongodb Performance
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
complexity.pptx
complexity.pptxcomplexity.pptx
complexity.pptx
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
 
Spark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreILSpark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreIL
 

More from Miguel Angel Nieto (14)

MySQL 5.6 GTID in a nutshell
MySQL 5.6 GTID in a nutshellMySQL 5.6 GTID in a nutshell
MySQL 5.6 GTID in a nutshell
 
MySQL - High Availability - Load Balacing - Cluster
MySQL - High Availability - Load Balacing - ClusterMySQL - High Availability - Load Balacing - Cluster
MySQL - High Availability - Load Balacing - Cluster
 
Curso SMTP avanzado
Curso SMTP avanzadoCurso SMTP avanzado
Curso SMTP avanzado
 
Apache avanzado
Apache avanzadoApache avanzado
Apache avanzado
 
Mysql Administracion
Mysql AdministracionMysql Administracion
Mysql Administracion
 
Replicación Mysql
Replicación MysqlReplicación Mysql
Replicación Mysql
 
Tomcat y Jboss
Tomcat y JbossTomcat y Jboss
Tomcat y Jboss
 
Curso SMTP
Curso SMTPCurso SMTP
Curso SMTP
 
Curso básico Linux
Curso básico LinuxCurso básico Linux
Curso básico Linux
 
Curso Squid avanzado
Curso Squid avanzadoCurso Squid avanzado
Curso Squid avanzado
 
Apache
ApacheApache
Apache
 
Nfs, Nis, DHCP
Nfs, Nis, DHCPNfs, Nis, DHCP
Nfs, Nis, DHCP
 
Monitorización
MonitorizaciónMonitorización
Monitorización
 
Administración Zimbra
Administración ZimbraAdministración Zimbra
Administración Zimbra
 

Recently uploaded

The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 

Recently uploaded (16)

The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 

Query planner

  • 1. Miguel Angel Nieto miguel.nieto@mongodb.com Technical Services Engineer, MongoDB Query Planner
  • 2. The Question ● I worked for 6 years as MySQL Technical Support Engineer. ● A large percentage of cases from customers were related to bad query plans/wrong index selection. ● Query Planning is a complex piece of code with many knobs that can be tuned. ● When I started working at MongoDB I found that the number of cases on that topic was very very (very) low. So I asked myself: Why?
  • 3. Plan selection in other databases ● Traditional databases use a statistics approach to choose the best plan: ○ The information about data distribution is not accurate. ○ It is estimated by reading data with random dives in the index tree (MySQL). ○ When some prerequisites are met (like number of modified rows) statistics are automatically recalculated.
  • 4. Plan selection in MongoDB ● MongoDB uses a empirical method: ○ If there is no cached plan, then all viable execution plans, based on the available indexes, are created. ○ MongoDB runs the query multiple times, one for each query plan and benchmarks them. It chooses the one that provides the best performance. ○ Once done, the plan is cached. ■ Future queries with the same shape will re-use this plan rather than re-running the candidate plans. ■ For each such query the performance of the cached plan is evaluated. If the plan's performance decreases beyond a given threshold, it is evicted from the cache and the candidate test phase runs again. This is known as re-planning (SERVER-15225)
  • 5. Benchmarking the plans ● All possible plans are executed in round-robin fashion. ● It gathers execution metrics and then provide a score to each plan. ● Sort the plans by score and choose the best one.
  • 6. Execution Metrics (I) ● These are the metrics: works advanced needTime isEOF
  • 7. Execution Metrics (II) ● Number of works: ■ The planner asks each plan for the next document, via a call to work(). ■ If the plan can supply a document, it responds with 'advanced'. Otherwise, the plan responds with 'needsTime'. ● If all documents have been retrieved, then isEOF = 1.
  • 8. Early stop of query execution ● The query could be expensive, so there are limits to early stop the execution. Execution stop if: ○ The maximum number of works has been reached. ○ The requested number of documents has been retrieved (advanced). ○ We get isEOF (the resultSet has no more documents). works work() isEOF advanced Break
  • 9. Number of work() calls before stopping ● internalQueryPlanEvaluationWorks = 10000 For large collections we take a fraction of the number of documents: ● internalQueryPlanEvaluationCollFraction = 0.3 Then, get the maximum value. internalQueryPlanEvaluationWorks internalQueryPlanEvaluationCollFraction numRecords works
  • 10. Number of documents to retrieve before stopping ● internalQueryPlanEvaluationMaxResults = 101 ● query.getQueryRequest().getNToReturn() ○ Used in the old OP_QUERY protocol. ○ Drivers set 'ntoreturn' to min('batchSize', 'limit') in order to fake the lack of 'limit' or 'batchSize' mechanism in the protocol. ● query.getQueryRequest().getLimit() ○ Used in OP_QUERY protocol from 3.2 onwards. getNToReturn advanced getNToReturn internalQueryPlanEvaluationMaxResults getLimit advanced getLimit internalQueryPlanEvaluationMaxResults advanced internalQueryPlanEvaluationMaxResults advanced
  • 11. Pick the best plan, count the scores ● baseScore = 1 ● Productivity = queryResults / workUnits ● TieBreak (very small number) = min(1.0 / (10 * workUnits), 1e-4) ● noFetchBonus (covered index) = TieBreak or 0 ● noSortBonus (blocking sort) = TieBreak or 0 ● noIxisectBonus (avoiding index intersection) = TieBreak or 0 ● tieBreakers = noFetchBonus + noSortBonus + noIxisectBonus ● eofBonus (if during plan execution all possible documents are retrieved) = 0 | 1
  • 12. Replanning: Automatic Plan Cache Eviction ● The stored data keep changing, it could possible that the cached plan is not the best one anymore. ● While the cached plan is being used, MongoDB re-runs the trial period for that plan and keeps a count of the work() function calls. ● If the new trial period takes more than 10 times as many works() as the original trial period, it evicts the plan from the cache and re-tests all candidate plans to pick a new winner. ● internalQueryCacheEvictionRatio = 10 maxWorksBeforeReplan internalQueryCacheEvictionRatio cachedWorks currentWorks maxWorksBeforeReplan replan()
  • 13. Plans are not always cached ● In the following situations, the execution plan is not cached: ○ Collection scan without sort() ○ hint() ○ min() ○ max() ○ explain() ○ Tailable cursors (they don’t use indexes) ○ snapshot() ○ A single viable plan
  • 14. Query Planner Troubleshoot Example (I) ● We check all query shapes: listQueryShapes
  • 15. Query Planner Troubleshoot Example (II) ● Get the execution plan for that query: getPlansByQuery solution score works isEOF
  • 16. Query Planner Troubleshoot Example (III) ● Remove the query plan for a particular query: clearPlansByQuery "plans": [ ]
  • 17. Query Planner Troubleshoot Example (IV) ● Remove all query plans on a particular collection: clear
  • 18. Query Planner Troubleshoot ● There are Plan Cache methods that can be used for troubleshooting: https://docs.mongodb.com/manual/reference/method/js-plan-cache/ ● Check all query shapes: ○ db.collection.getPlanCache().listQueryShapes() ● Get the plan for a particular query: ○ db.collection.getPlanCache().getPlansByQuery( <query>, <projection>, <sort> ) ● Clean the plans for a particular query: ○ db.collection.getPlanCache().clearPlansByQuery() ● Clean all plans: ○ db.collection.getPlanCache().clear()