S1: Csci 8701 - Week 8
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Problem Formulation, Contributions
  • SQL Query Optimization Problem
    • Given: A SQL query Q over N tables
    • ... Space of semantically equivalent Execution Plans
    • Find most efficient plan tree.
    • Objective: Minimize time to answer query
    • Constraints: simple queries (N is small)
  • Major Contributions
    • Define Space of execution plans
    • Cost estimation models for plans
    • Systematic Search strategy for cheapest plan
    • ... Dynamic Programming


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: Key Concepts : Scope of Plan Tree
  • Terms: SQL Query vs. Query Block
  • Recall SQL Query Block
    SELECT [DISTINCT] < list of columns >
    FROM < list of tables >
    [WHERE < list of "Boolean Factors" (predicates in CNF) > ]
    [GROUP BY < list of columns >
    [HAVING < list of Boolean Factors > ]]
    [ORDER BY < list of columns > ];
  • Meaning
    • Plan Tree would
    • ... Take product of tables in FROM clause
    • ... Project onto columns relevant to other clauses
    • ... WHERE clause = > apply all filters in it
    • Postprocessing would
    • ... GROUP BY clause = > form groups on the result
    • ... HAVING clause = > filter groups with it
    • ... ORDER BY clause = > ensure right order of output
    • ... DISTINCT modifier = > remove duplicates


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: Key Concepts: Plan Space
  • SQL query -- > a set of SQL blocks
  • Plan space for single block query
    • sequence of 1 table or 2 table operations
    • only left-deep plan trees
    • postpone Cartesian products
  • Q? Which of the following were not considered by System R?
  • Access Methods (strategies) for select, project, join
    • Scans: sequential & index (clustered/unclustered)
    • NL-join, sort-merge join, hash join
    • sorting & hash-based grouping
  • Q? Which join-methods was not considered by System R?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: Key Concepts: Cost Estimation
  • Parameters of cost models (pp. 143)
    • # of tuples & pages
    • # of values per indexed column
    • # of RSS calls
  • Cost Model for an operator (Tables 1-2, pp. 144-5)
    • Components: I/O and CPU costs
    • Based on size of base tables
    • Size of intermediate result is based on
    • ... estimate of predicate selectivities
    • Q? Are there any surprises in Table 1 or 2?
  • Q? Where are cost models for join strategies?
    • Do they show the dominance zones of 2 methods?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Key Concepts: Searching the Plan Space
  • Algorithm structure - Dynamic Programming
    • Only n! left-deep join-orders
    • many share common prefixes
    • Do not enumerate all join-orders
  • Execution Trace
    • Find all plans for accessing each base relation
    • For each table, save cheapest plan, and
    • Try joining pairs of 1-table plans saved so far.
    • Save cheapest 2-table plans
    • Try combining a 2-table plan with a 1-table plan.
    • Save cheapest 3-way plans.
    • ...
    • Combining k-way and 1-way plans
    • ... until there is a full plan trees


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Validation Methodology
  • Evaluation:
    • Computational complexity = n * exp(2, n-1)
    • Memory complexity = choose(n, n/2)
    • Not scalable beyond 10-15 joins
  • Paper presents a detailed example
    • Illustrates the key concepts
    • Shows the usefulness of proposed approach
  • Quotes previous studies on join strategies
    • ... motivates the approach
  • Conceptually better than competition at the time
    • ... - Ingres, Oracle 6
  • Other
    • Group designing a prototype relational DBMS
    • Methods has stood test of time
    • Minor revisions over 25 years


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Assumptions
  • Cost Components
    • Cost of query optimization is negligible
    • Note: search space size is exponential
    • Cost Models are available
  • Algebraic optimization has occurred already
    • Push selections, projects below joins
  • Cost Models - an inexact science!
    • Uniform distribution of key values in a table
    • No difference b/w Random and Sequential IO
    • Independence across predicates
  • Search space explorations
    • Only left-deep trees are interesting
  • Other assumptions - Sequential execution


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Rewrite today
  • Reduce Assumptions
    • User defined data types and cost models
    • Optimizing for parallel executions
  • Update literature survey
    • Semantic Query Optimization
    • SQL Query Rewrite (IBM)
    • Processing correlated nested queries (Decorrelation)
  • Cost Estimation - New approaches:
    • Sampling base relations
    • Histograms of value distributions
  • Q? What is cost of maintaining the estimates?
    • Refresh estimates periodically
    • Uniform distribution assumptions
  • New Search methods
    • DP-variations, Randomized, Job Scheduling


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)