S1: Ch. 15: Query Processing and Optimization
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Query Processing
  • Process is similar to compilation (Fig. 15.1, pp 494)
    • SQL Query -- > [scanner, parser] -- > query tree/graph
    • ...-- > [OPTIMIZER] -- > efficient execution plan
    • ...-- > [code generator] -- > code to execute the plan
    • ...-- > [run-time support] -- > query result
  • We will FOCUS on the optimizer module
    • Output:: Execution plan = set of selected access routines
    • ...from possible access routines for each relational operator
    • Input: Query tree or query graph
    • Algorithms: Systematic Enumeration/ Dynamic Programming
    • ...Algebraic optimization, Other heuristics
    • ...Semantic Optimization, Source to Source transforms
  • Access Routines for relational operations & cost models
    • Select (Point, Range): scan, binary search, index-search
    • Join: nested loop, sort-merge, hybrid hash


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: 15.8.1: Cost Models for Access Routines
  • Cost Metrics = #block accesses (OR response time in seconds)
    • focus on I/O cost, not CPU/communication
  • I/O Cost DEPENDS ON
    • Search criteria: point/range query on ordering /other fields
    • File Structures: heap, sorted, hashed.
    • Index-type: primary, clustering, secondary, B+tree, multi-level,...
    • Other factors, e.g. buffering, disk placement, materialization,
    • ...Overflow / free-space mgmt, spanned/unspanned blocking, ...
  • Parameters of Cost Models
    • s = selectivity = fraction of tuples selected = |result| / |table|
    • S(field) = selection cardinality = r(d) / (#distinct values of field)
    • B, bd, bi(j) = block size, #blocks in datafile/ at level j of index
    • r(d) = #records in data file
    • u, bfr(d), bfr(i) = utilization, blocking factors for datafile/index
    • fo, d = fanout and depth of index
    • R, P, Pr = record size, tree pointersize , data record pointer size
    • e = Pr[a record in overflow area] * E[overflow chain length]
  • Q? Compare these parameters to those in Oracle system catalog.


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: 15.3: Access Routines for Relational Operators
  • Access routine - is an execution strategy for a relational operator
    • May use available file-structures and indexes
    • Specifies algorithms , and applicability
  • Point Query: (i) Linear, (ii) Binary, (iii) Hash, (iv) Indexed search
  • Range Query: (1) Linear search , (2) Binary search then scan
    • ...3. Find 1st record in, then scan datafile (Primary/Clustered index)
    • ...4. Find 1st record in, then scan index leafs (Secondary index/B+tree)
  • Conjunctive selection - 1. Use composite index
    • ...2. Process one condition first followed by scan of result
    • ...3. Intersect (record pointers retrieved by different conditions)
  • Q? How do we choose an appropriate plan?
    • Rule based models
    • Cost based models
  • Ex. Provide algebraic formula for access routines
    • Point Search (UNIQUE field): bd/2 ; log(bd); 1+ e; d + 1;
    • ...not unique = > + S/bfr(d) to sorted/hashed file; bd for heap
    • Range Search: bd or bd/2 + s*bd; log(bd) + s*bd; d + 1 + s*bd;
    • ...secondary index : d + s*[bi(leaf) + r(d)] worst case


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: 15.3.2 + 15.8.4: JOIN strategies, Cost models
  • Focus on two way equi-join:: join(R, S, R.A = S.B)
  • Join-selectivity= js(R.A=S.B)=|join result|/(|R|*|S|); |X|=#records in X
    • = > Join result has b(RS)= (js*|R|*|S| / bfr(RS)) blocks
  • (1)Nested loop: For each block in R, (a) Read matching record in S
    • ...using scan or indexed search, and (b) Write partial result
    • I/O Cost = bR + (bR * bS) + b(RS)
    • ...Indexed search cost= f(available index(S.B)), see Page 499
    • ...= bR + [|R|*(d(S.B) + S(S.B))] + b(RS), secondary index(S.B)
  • (2) Sort-Merge: (i) If Needed, Sort R on R.A, Sort S on S.B
    • ...(ii) Scan R and S block by block [See Fig. 15.3(a), pp 503]
    • Very efficient if R and S already sorted, Cost = bR + bS + b(RS)
    • Sorting = > Add k*{bR * log(bR) + bS * log(bS)}
  • (3) Hash: (i) hash records of R, S into a common hash file (CHF)
    • ...(ii) scan CHF to compute and write results
    • Cost= 3*(bR+bS) + b(RS), w/ buffering (hybrid hash, pp507, para 3)


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Dominance Analysis of Access Routines
  • Ex. Compare scan and secondary-index for Range queries?
    • Heap file Scan: bd; < = Secondary index: d + s*[bi(leaf) + r(d)]
    • ...OR selectivity(s) > = (bd-d) / [bi(leaf)+rd] > bd/rd = (1/bfr(d))
    • example: bfr(d) = 10; selectivity = 0.15 ;
  • Ex.Nested loop: Which relation should be in outer loop ?
    • Smaller one = > reduces cost
  • Q? Which strategy wins for join(R,S), IF (bR > bS) and
    • (1) primary index on R.A in R, S.B in S (depth = x)
    • ...Let #record/distinct value of join field = a
    • ...(bR = bS) = > sort merge if bR = bS
    • ...(bR > a*x*bS) = > nested loop with index-scan
    • (2) Heap Files , bR = bS :: hash-join (log(bR) > 10)
    • (3) Hashed Files R on R.A, S on S.B,
    • ...(bR = bS) = > hash join if bR = bS , log(bR) > 10
    • ...(bR > 4*bfr(inner loop)*bS) = > nested loop with index-scan ?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Dominance Analysis of Access Routines (Contd.)
  • Q? Which strategy wins for join(R,S), IF (bR > bS) and
    • (4) primary index on R.A, S hashed on primary key S.B,
    • ...bS = bR, bfr(R) = 3, index-depth = 1, (Hint: unique matches)
    • ...Nested loop (inner = indexed search on R) cost = 4bR + b(RS)
    • (5)bS < |memory| << bR ; cost = bR + bS + B(RS) for all 3
    • ...1. buffer + index/sort/hash S in memory;
    • ...2. For each block in R, bring the block and write result
  • Q? Which strategies can be used for non-equijoins?
    • ...for select-project-equijoin?
  • Summary comparison of join strategies
    • Nested loop is a dark horse, watch out!
    • ...e.g. (bR/bS) >> 1, or bfr >> 1
    • ...OR only one file has good search structure
    • Both files sorted, sort-merge can win
    • ...unless (bR/bS) >> 1, or bfr >> 1
    • Both files have No good search structure, Hybrid hash can win
    • ...unless (bR/bS) >> 1, or bfr >> 1


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: 15.7: Heuristic Query Transformations
  • Output: Sequence or Set of access routines
  • Input:(i) Query Tree: syntax tree for eqv. rel. algebra query
    • ...Leafs = input relations :: Non-leafs = rel. algebra operations
    • Initial canonical tree: Easy to generate from parser (Fig. 15.5(a))
    • ...cartesian product of all rel.+ 1 select + 1 project
    • ...Ex. Fig. 15.5 (b) pp 516 with Q (pp 515)
  • (ii)Query graph: constraints in tuple calculus exp. (Fig. 15.4/pp 513)
    • Nodes = relations (tuple variables) OR constants (double circle)
    • ...Edges = join conditions
    • We will not use it. Came from Ingres/QUEL optimizer.
  • ALGORITHM 1: Source to Source Transform (Ch 15.7.2, pp 520-1)
    • Semantic transformation rules, e.g. integrity constraints (IC)
    • ...Ex. Q(pp 533): Find Emps who earn more than their supervisors
    • ...IC: No employee earns more than his/her supervisor.
    • Implemented by Thm. prover = > exponential in #applicable rules
    • Alternative is hand recoding


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Query Optimization Algorithms (Contd.)
  • ALGORITHM 2: Algebraic optimization by transforming query trees
    • Start w/ Initial canonical tree
    • Use laws of relational algebra (Ex. rule 4, pp 519)
    • General steps (pp 520): apply those operations first
    • ...which reduce the size of intermediate results
  • Ex. Trace: Fig. 15.5 (pp 516-518)
    • Fig. 15.5(b) move selects down the tree (close to table)
    • Fig. 15.5(c) more restrictive select 1st (pname = 'Aquarium')
    • ...commute cartesian products, rearrange select conditions
    • Fig. 15.5(d): Combine (product, select) into joins
    • Fig. 15.5(e): Move projects down the tree, add project
    • ...to drop fields not needed for result (and join/selects)
  • ALGORITHM 3: Dynamic Programming w/ cost models
    • 1. Choose an ordering of join operation (e.g. left to right)
    • 2. Given ordering, enumerate combination of access routines
    • ...choose low-cost combination, 1 access routine per rel. op.
    • Used in commercial systems, Step 2 is cubic in #relations,
    • ...Step 1. is exponential and optimizers scrimp here!


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)