Lab 2 : Query Optimization

The goal of Lab 2 is to learn about query optimizers by examining and analyzing query execution plans of the following SQL query statments on given database tables, movies, actors and casting.

Part A: Preliminaries

  1. Database Schemas
    CREATE TABLE actor
    (id INTEGER NOT NULL, name VARCHAR(35), PRIMARY KEY (id));
    
    CREATE INDEX actor_name ON actor(name);
    
    CREATE TABLE movie
    (id INTEGER NOT NULL, title VARCHAR(70)
    ,yr DECIMAL(4), score FLOAT, votes INTEGER, director INTEGER
    ,PRIMARY KEY (id), FOREIGN KEY (director) REFERENCES actor(id)
    );
    CREATE INDEX movie_title ON movie(title);
    CREATE INDEX movie_votes ON movie(votes);
    
    CREATE TABLE casting
    (movieid INTEGER NOT NULL, actorid INTEGER NOT NULL
    ,ord INTEGER, PRIMARY KEY (movieid, actorid)
    ,OREIGN KEY (movieid) REFERENCES movie(id)
    , FOREIGN KEY (actorid) REFERENCES actor(id)
    );
    
    CREATE INDEX casting_movie ON casting(movieid);
    CREATE INDEX casting_actor ON casting(actorid);
    CREATE INDEX casting_ord   ON casting(ord);
    
    NOTE: TA account(S1G30) has these three tables with data. Since these tables are quite large, you should NOT copy those in your own accounts. You can refer to these tables either using prefixing owner account name or using synonyms. For example,
    SELECT * FROM S1G30.movie;
    or
    CREATE SYNONYM movie FOR S1G30.movie;
    SELECT * FROM movie;
    
    TA executed the following ANALYZE statements so that Oracle uses cost-based optimization.
    ANALYZE TABLE movie COMPUTE STATISTICS;
    ANALYZE TABLE actor COMPUTE STATISTICS;
    ANALYZE TABLE casting COMPUTE STATISTICS;
    
    Examine indices info related to above tables.
    (Use INDEX_NAME, TABLE_NAME, LEAF_BLOCKS,AVG_LEAF_BLOCKS_PER_KEY, AVG_DATA_BLOCKS_PER_KEY, NUM_ROWS, DISTINCT_KEYS) columns in ALL_INDEXES table.)

  2. Setup for using EXPLAIN PLAN.

    The EXPLAIN PLAN statement displays execution plans chosen by the Oracle optimizer for SELECT, UPDATE, INSERT, and DELETE statements. A statement's execution plan is the sequence of operations Oracle performs to run the statement.

    1. Creating the PLAN_TABLE Output Table
      ORACLE stores the execution plans of a given SQL into a table called PLAN_TABLE. Execute the following SQL script for it.
      SQL> @$ORACLE_HOME/rdbms/admin/utlxplan
      
    2. Running EXPLAIN PLAN
      The following SQL code template is used to populate PLAN_TABLE with execution plan for a SQL statement. You need to edit the code template to put specific value for <some-name> before running it.
      EXPLAIN PLAN
      SET STATEMENT_ID = '<some-name>'
      FOR
      <select statement to be analyzed>;
      
      NOTE: PLAN_TABLE allows duplicate values in statement_id column. This may result in spurious data for execution plans for queries using an existing value of statement_id. You may consider cleaning up old information using DELETE FROM PLAN_TABLE; before asking for execution plan for a new query.

    3. Displaying PLAN_TABLE Output
      Use the following SQL to view created execution plans
      SELECT LPAD(' ', 2*LEVEL)||OPERATION||' '||OPTIONS||' '||OBJECT_NAME Query_Plan
      FROM PLAN_TABLE
      CONNECT BY PRIOR ID = PARENT_ID and STATEMENT_ID = '<some-name>'
      START WITH ID=1 and STATEMENT_ID = '<some-name>'
      ORDER BY ID;
      
      or
      
      SELECT id, parent_id, level, position, operation, options, object_name, cost, bytes
      FROM PLAN_TABLE
      CONNECT BY PRIOR ID = PARENT_ID and STATEMENT_ID = '<some-name>'
      START WITH ID=1 and STATEMENT_ID = '<some-name>'
      ORDER BY ID;
      
    4. NOTES:
      1) You may save these statement via "SAVE <some-name>" command, recall it via "@<some-name>" and edit it via "EDIT <some-name>".
      2) You may use "spool <some-name>" command to start recording a session and "spool off" to stop recording.
      3) Additional Information on Explain Plan is available from
        - Pages 114-116 (2nd Ed.) in the supplemantary book
        - Range query processing stratagies are discussed in the text book ch 15.
        - Oracle tutorials from MIT (tuning, dwh, java) ... web series

Part B: Range Query and Index

  1. Recall that there is a secondary index on the VOTES column of MOVIE table.
  2. Write SQL statements for the follwoing VOTES based range queries.
  3. Run the explain plan for each Q1 and Q2 using the instructions given above.
  4. Submit the explain plan output of above two queries and compare these plans.
  5. Explain why the index on VOTES is not always used for range queries based on VOTES attribute.

  6. Derive an algebric condition on the selectivity of range query in terms of record size, block size, key-value size, pointer value size and depth of secondary index to quantify your arguement.

Part C: Selectivity

  1. Recall that there is a secondary index on the VOTES column of MOVIE table.
  2. Consider following SQL queries for the VOTES based range queries .
  3. Run the explain plan for each Q1 and Q2 and Submit the output.
  4. Compare the execution plans for the two queries. Which query is cheaper to execute? Why?
  5. Compare the number of tuples selected from each query. Which query do you expect to have lower selectivity (i.e. returns fewer rows)?
  6. Explain why the index on VOTES is not used for Q2.
    (Hint: The cost based optimizer assumes that VOTES values follow an uniform distribution.)

Part D: String Matching and Index

  1. Recall that there is a secondary index on the NAME column of ACTOR table.
  2. Consider following SQL queries.
  3. Run the explain plan for each Q1, Q2, Q3 and Q4, and submit the output.
  4. Compare the execution plans of query Q1 and Q2. What's the differnence? Explain.
  5. Explain the execution plans of query Q3 and Q4. Explain why the index on NAME is not always used for queries based on NAME attribute.

Part E: Aggregate Operations

  1. Consider following two equivalent SQL queries.
  2. Run the explain plan for each Q1 and Q2 and submit the output.
  3. Is there any difference in the execution plans for Q1 and Q2? Which query is more efficient to execute? Explain.

Part F: Join and Selection

  1. Recall that there are indices on TITLE and VOTES.
  2. Consider following two equivalent SQL queries for retrieving titles of movies with votes larger than that of Star Wars.
  3. Run the explain plan for each Q1 and Q2 and submit the output.
  4. Is there any difference in the execution plans for Q1 and Q2? Which query is more efficient to execute? Explain.

Part G: Alternate Join Strategies

  1. Recall that there are indices on CASTING.ACTORID, CASTING.ORD and ACTOR.ID.
  2. Consider following three equivalent SQL queries for retrieving leading actors, i.e. actors which have had a leading roles in at least one movie. The two of them have optimizer hint. Hints provide a mechanism to direct the optimizer to choose a certain query execution plan based on join method.
  3. Run the explain plan for each Q1, Q2 and Q2 and submit the output.
  4. Compare and contrast three exeuction plans for Q1,Q2 and Q3? Is there any difference? Why?