Tuesday 21 February 2017

Ineffective execution plan due to outdated statistics

Recently encounter a performance issue where the query plan generated is ineffective due to outdated statistics.

  1. There'a batch job periodically updating a table (recurrence job) based on the daily activities on another table
  2. External call from web - calling a stored proc and executed a query (generated the execution plan and cached it)
  3. One of the occurrence has made large number of record update but not large enough to hit the threshold to trigger statistic update
  4. External call from web - calling the same stored proc
  5. Due to the large update on #3 has cause the data changes, the query plan no longer effective
  6. The query in the stored proc is running extremely long

TableA: ~16m records
TableB: ~1m records
TableC: ~100k records
TableD: ~10k records
TableE: ~5 records

The query in the stored proc is joining a few table (Eg. TableA - E), one of the table always has very few records (between 1 to 10), on normal circumstances, the query always generated a query plan which join TableE first to other table so that it reduce the number of records required on the next join to the rest of the table.

But somehow the first run of the stored proc has generated a query plan which use TableC as first join, joining it to TableA, then to TableB, TableD, and TableE. This appear to be correct on that run itself due to the parameter value passed in does run (slightly) better with this plan. But the next time when it run, unfortunately the previously cached plan is bad for that parameter, it has cause the query run much longer.

TableE is specifically designed to reduce the join operation, but unfortunately on the bad query plan, it put it to the last join operation, hence doing a lot of unnecessary join.

The solution we put in is to make the TableE join with TableA as subquery.
All other join has where condition which correlated to TableA, so this will make the subquery being executed first, this then achieve our initial purpose, to ensure the join order always has the TableE first, then the rest of the order does not matter, they can be join as SQL optimizer feel appropriate.

No comments:

Post a Comment