...
We have recently upgraded from 6.0.4 to Mongo 7.0.2 and noticed a significant drop in query performance for certain filter queries using a sort. For example, this query takes 13 seconds with SBE: rs0 [direct: secondary] mydb> db.version(); 7.0.2 rs0 [direct: secondary] mydb> db.mycoll.find({"fsType": "PageRef","page.formData.pt_title.value": "a"}).sort({ _id: -1 }).explain('executionStats').executionStats.executionTimeMillis; 13246 With the classic engine it takes 1.5 seconds: rs0 [direct: secondary] mydb> db.adminCommand({ setParameter: 1, internalQueryFrameworkControl: 'forceClassicEngine'}) { was: 'forceClassicEngine', ok: 1, .. } rs0 [direct: secondary] mydb> db.mycoll.find({"fsType": "PageRef","page.formData.pt_title.value": "a"}).sort({ _id: -1 }).explain('executionStats').executionStats.executionTimeMillis; 1554 The collection contains around 600K documents and there is a compound index that contains the "fsType" field (as the first key). It looks like with SBE the query doesn't use the index because it seems to be processing all 600k documents. I've attached the explain outputs for the queries as files. Without the sort param SBE performs well (and seems to use the index): rs0 [direct: secondary] mydb> db.mycoll.find({"fsType": "PageRef","page.formData.pt_title.value": "a"}).explain('executionStats').executionStats.executionTimeMillis; 276
JIRAUSER1270969 commented on Thu, 30 Nov 2023 16:49:22 +0000: No, there is no stable API for switching to classic, as the exact query engine used to process the query is a deep implementation detail and we are working to make the transition to SBE as invisible as possible. I understand that in that case we failed to do so. In general, our advice is not to set internalQueryFrameworkControl without instructions from our customer support. We have a plan to address the issue of SBE multiplanning in the next few patch releases for 7.0. With regards to continue using classic engine right now, your options are: Set internalQueryFrameworkControl to forceClassicEngine. I would recommend removing this option after you upgrade to version >= 7.0.6. Postpone 7.0 upgrade until 7.0.6. Implement other suggestions and keep using default value of internalQueryFrameworkControl. I understand that these might require changes in your applications. JIRAUSER1275485 commented on Thu, 30 Nov 2023 15:59:01 +0000: Thanks for the info. I'll give these options a try. I do have a question regarding: If this doesn't help than switching back to Classic engine should. Is there a public/stable API for switching back to the classic engine? We've been using the internalQueryFrameworkControl parameter for that, but as edwin.zhou@mongodb.com pointed out this is an internal parameter. JIRAUSER1270969 commented on Wed, 29 Nov 2023 11:29:36 +0000: philipp.boersteken@crownpeak.com In the meantime, what can you do to in your case is: 1. Clear plan cache and run the query with a real value for "page.formData.pt_title.value" field. This will allow multi planner to correctly estimate the usefulness of index of "fsType". 2. Create an index on {fsType: 1, page.formData.pt_title.value: 1}. 3. Or even create an index on {fsType: 1, page.formData.pt_title.value: 1, _id: -1} - this way sort won't be needed at all. 4. If this doesn't help than switching back to Classic engine should. JIRAUSER1270969 commented on Tue, 28 Nov 2023 16:24:36 +0000: I have attached repro.js- js script that reproduces the problem. JIRAUSER1270969 commented on Tue, 28 Nov 2023 16:20:38 +0000: philipp.boersteken@crownpeak.com Thanks again for taking time to report this and giving us the practical example of this problem. This situation is a part of bigger issue with multi planning design for SBE that we are actively working on. Unfortunately, there is no quick fix for this problem, but more systematic solution is being created as I type this. I am going to close this ticket as a duplicate of SERVER-83196. JIRAUSER1270969 commented on Tue, 28 Nov 2023 16:15:14 +0000: I investigated some more. Here are my findings: In the provided example, non-existing "a" string is used in a predicate for "page.formData.pt_title.value" field. That confuses the multi planner, as any plan will return 0 documents. In the absence of "productivity" estimation, we are left with 2 things: heuristic bonuses and EOF bonus. EOF bonus just makes sure that the plan is selected as best if it was able to complete in trial time. There are several small heuristic bonuses, like noFetchBonus or noSortBonus that are basically tie breakers in cases when productivity is roughly the same. Now let's examine what's happening in Classic engine: When using classic engine multi planner can call all plans (even "blocking" ones with sort) in round robin fashion. This allows us to set pretty high limits on the total number of calls to work() function during trial period: the default is currently 0.3 * collection size. Because of this, (IXSCAN fsType + SORT) plan reaches EOF, as fsType predicate is selective enough to leave less than 30% of the collection to check. As this plan reaches EOF in trial time, it is selected as "correct" and cached. What's happening in SBE engine: In SBE there is no NEED_TIME result for plan execution, so blocking stages like sort or group will try to consume all of the input on the first call. There is no graceful way to interrupt this process. So multi planner in SBE has to set much smaller limits on trial period. Currently it is allowed to read around 10000 documents. We can't afford to let an inefficient plan to read 30% of a whole collection like we do in Classic. Because of this, no plan reaches EOF during trial in SBE. Because productivity of both plans are equally 0, the plan without sort wins, as it gets noSortBonus. JIRAUSER1270969 commented on Mon, 27 Nov 2023 10:21:22 +0000: I also see that there are no returned documents at all (because of "a" as a search value). So multi planner have no data to compare both plans, as they both return 0 documents, so it picks non-blocking plan (we need to re-think this choice). It should behave better if a real value is searched for (plan cache might need to be cleared). JIRAUSER1270969 commented on Mon, 27 Nov 2023 10:13:04 +0000: How does the classic multi planner handles blocking stages? JIRAUSER1257066 commented on Fri, 27 Oct 2023 13:37:40 +0000: Hi philipp.boersteken@crownpeak.com, Thank you for your detailed ticket description and for providing the explain plans. Since "internalQueryFrameworkControl" is an internal parameter, we have chosen not to document this at this time. I will pass this over to our Query team to evaluate the execution of these queries. Kind regards, Edwin JIRAUSER1275485 commented on Mon, 16 Oct 2023 14:37:19 +0000: On a side note: It took us quite some time to figure out how to force the classic query engine because we couldn't find any documentation. Are there any plans to document the internalQueryFrameworkControl parameter (or any other public API variant of that parameter)?