S1: Csci 8701 - Week 8
- Administrative
- Mid-Sem. Exam. grades Thursday
- Project - make progress
- Goals:
- Review of Query Optimization
- Recommended Readings:
- Selinger, Access Path Selection in System R
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Problem Formulation, Contributions
- SQL Query Optimization Problem
- Given: A SQL query Q over N tables
- ... Space of semantically equivalent Execution Plans
- Find most efficient plan tree.
- Objective: Minimize time to answer query
- Constraints: simple queries (N is small)
- Major Contributions
- Define Space of execution plans
- Cost estimation models for plans
- Systematic Search strategy for cheapest plan
- ... Dynamic Programming
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: Key Concepts : Scope of Plan Tree
- Terms: SQL Query vs. Query Block
- Recall SQL Query Block
SELECT [DISTINCT] < list of columns >
FROM < list of tables >
[WHERE < list of "Boolean Factors" (predicates in CNF) > ]
[GROUP BY < list of columns >
[HAVING < list of Boolean Factors > ]]
[ORDER BY < list of columns > ];
- Meaning
- Plan Tree would
- ... Take product of tables in FROM clause
- ... Project onto columns relevant to other clauses
- ... WHERE clause = > apply all filters in it
- Postprocessing would
- ... GROUP BY clause = > form groups on the result
- ... HAVING clause = > filter groups with it
- ... ORDER BY clause = > ensure right order of output
- ... DISTINCT modifier = > remove duplicates
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: Key Concepts: Plan Space
- SQL query -- > a set of SQL blocks
- Plan space for single block query
- sequence of 1 table or 2 table operations
- only left-deep plan trees
- postpone Cartesian products
- Q? Which of the following were not considered by System R?
- Access Methods (strategies) for select, project, join
- Scans: sequential & index (clustered/unclustered)
- NL-join, sort-merge join, hash join
- sorting & hash-based grouping
- Q? Which join-methods was not considered by System R?
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: Key Concepts: Cost Estimation
- Parameters of cost models (pp. 143)
- # of tuples & pages
- # of values per indexed column
- # of RSS calls
- Cost Model for an operator (Tables 1-2, pp. 144-5)
- Components: I/O and CPU costs
- Based on size of base tables
- Size of intermediate result is based on
- ... estimate of predicate selectivities
- Q? Are there any surprises in Table 1 or 2?
- Q? Where are cost models for join strategies?
- Do they show the dominance zones of 2 methods?
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Key Concepts: Searching the Plan Space
- Algorithm structure - Dynamic Programming
- Only n! left-deep join-orders
- many share common prefixes
- Do not enumerate all join-orders
- Execution Trace
- Find all plans for accessing each base relation
- For each table, save cheapest plan, and
- Try joining pairs of 1-table plans saved so far.
- Save cheapest 2-table plans
- Try combining a 2-table plan with a 1-table plan.
- Save cheapest 3-way plans.
- ...
- Combining k-way and 1-way plans
- ... until there is a full plan trees
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Validation Methodology
- Evaluation:
- Computational complexity = n * exp(2, n-1)
- Memory complexity = choose(n, n/2)
- Not scalable beyond 10-15 joins
- Paper presents a detailed example
- Illustrates the key concepts
- Shows the usefulness of proposed approach
- Quotes previous studies on join strategies
- ... motivates the approach
- Conceptually better than competition at the time
- Other
- Group designing a prototype relational DBMS
- Methods has stood test of time
- Minor revisions over 25 years
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Assumptions
- Cost Components
- Cost of query optimization is negligible
- Note: search space size is exponential
- Cost Models are available
- Algebraic optimization has occurred already
- Push selections, projects below joins
- Cost Models - an inexact science!
- Uniform distribution of key values in a table
- No difference b/w Random and Sequential IO
- Independence across predicates
- Search space explorations
- Only left-deep trees are interesting
- Other assumptions - Sequential execution
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Rewrite today
- Reduce Assumptions
- User defined data types and cost models
- Optimizing for parallel executions
- Update literature survey
- Semantic Query Optimization
- SQL Query Rewrite (IBM)
- Processing correlated nested queries (Decorrelation)
- Cost Estimation - New approaches:
- Sampling base relations
- Histograms of value distributions
- Q? What is cost of maintaining the estimates?
- Refresh estimates periodically
- Uniform distribution assumptions
- New Search methods
- DP-variations, Randomized, Job Scheduling
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)