S1: Ch 14: Index Structures for Files
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Summary of Ch#5. Record Storage and File Organization
  • Storage Hierarchy, Magnetic Disks
  • Records and Files
    • Mapping them to disk blocks
    • File operations
  • Organizing records in a file: unordered, ordered, hashed
    • Different cost of Find(), FindNext()
    • Additional data-structures used in dynamic/extendible Hashing
  • Index = additional data-structure to speedup search
    • Idea similar to card-catalogs (library), indexes in books
    • Fast search of index via ordering index entries
    • ...or multi-level index (search trees)
    • Focus of Ch# 14


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: 14.1 Index: Basic Concepts and Classification
  • What is an index? f : Indexing Field Value -- > disk page(s)
    • Collection of < K(i), P(i) > (1NF)
    • Distinct from datafile, though shares the indexing field
    • Search procedure: (a) search index : K(i) -- > set of P(i)s
    • ...(b) search contents of the Pages P(i)s in main memory
    • Search time w/ index: log(bi) + |p(i)s| I/Os , bi = blocks in index
  • 3 Types of Single-level indexes
  • 1. Primary Index (Fig. 14.1, pp 458)
    • Datafile ordered by indexing field = primary key
    • nondense, i.e. #index-entries < #dat records
    • One index entry per disk page of datafile
    • Takes less space, faster search but slow insert
    • ...Insert needs shifting, update of anchor records


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: Single Level Indexes (Contd.)
  • 2. Clustered Index (Fig. 14.2, pp 460)
    • Datafile ordered by indexing field
    • Indexing field value NOT UNIQUE for each record
    • One entry per distinct value of indexing field
    • Q? Is it a dense index?
    • Q? Are insert() easier than primary index?
    • ...need shifting or free space per block (Fig. 14.3, pp 461)
  • 3. Secondary Index (Fig. 14.4, pp 463)
    • Datafile may not be ordered by indexing field
    • Dense index, i.e. an index entry per data record
    • Indexing field may be a key or a non-key field
    • Q?How to handle duplicate values of indexing field?
    • ...dense index, variable length index record, or
    • ...extra level of indirection (see Fig. 14.5)
  • Summary table 14.1, 14.2, pp. 465-6


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: Single Level Indexes: Exercises
  • Q? How many indices of each type can be defined on a datafile?
    • Heap, Hashed: Many secondary indexes
    • Sortedfile: One primary/clustered, Many secondary indexes
  • Fully inverted file = one secondary index per field
  • Q? Which index provides most gain in search time ?
    • Secondary: O(bd/2 - log(bi)); Primary: O(log(bd) - log(bi))
  • Cost Model Ex. 1 & 2 (pp 459, 462): r = 30,000;
    • ...B = 1 Kbyte; R = 100 byte; V = 9 byte; Pointer = 6 bytes
    • ...Compute I/O Cost Saving from Primary Index.
    • ...Compute I/O cost savings from a Secondary index (unordered datafile).
    • ...Hints: calculate bfr(d), bd, #index entries, bfr(i), bi
  • Q? What is the cost of FindNext() for different indexes?
    • O(1) for all, since index is sorted.
  • Table 14.1-2 (pp 465-6), give row/col. headers, ask for entries.


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Multilevel Indexes (Fig. 14.6, pp 467)
  • Concept: Speedup Step(a), i.e. index search
    • By adding a 2nd level index to 1st level index file
    • How many levels can a multilevel index have?
    • ...log(bi, base = fo), where fo = fanout = bfr(i)
    • Search procedure: level traversal (Alg. 14.1, pp 468)
    • Search time: log(bi, base = fo)) vs. log(bi) vs. log(bd) vs. bd/2
  • Commercial ISAM = 2 level primary index, ordered file
  • Ex. computer #levels and search time for multilevel index to
    • ...unordered datafile, dense secondary index, r = 300 million.
    • ...B= 1 Kbyte; R= 100 byte; V= 9 byte; Pointer= 6 bytes
  • Q? What kind of index is 2nd level index?
    • ...Primary, Clustered or Secondary?
  • Summary: Faster Search, But Slower Insert(), Delete()
    • To reduce shifting = > leave freespace in blocks OR
    • ...use overflow blocks


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: 14.3 Dynamic Multilevel Index: B-tree, B+tree
  • Definitions: Tree, nodes, root, leafs, internal nodes, level (node)
    • ...parent-child relationship, descendants, subtree(node)
    • ...Fig. 14.7 (pp 469)
  • Search Trees of order p (see Fig. 14.9/ pp 471)
    • variation of multilevel indexes
    • Node has (p-1) search values and p pointers
    • ...in order < P1, K1, P2, K2, ..., K(q-1), Pq), q < = p
    • ...where Pi = pointer to child / null; Ki = key value
    • ...For simplicity, assume search values to be unique
    • Properties (see Fig. 14.8, pp 470)
    • ...Given key value can occur in a unique child(node)
    • ...1. Within each node, K1 < K2 < ... < K(q-1)
    • ...2. all X in subtree(Pi); K(i-1) < X < Ki for 1 < i < q
    • ... X < Ki for i = 1; and K(i-1) < X for i = q


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: B-tree, B+tree (Fig. 14.10, 14.11 on pp 473, 475)
  • B-tree = Search tree + Balanced + free space mangement
    • Node = {P1, < K1, Pr1 > , ..., P(q-1), < K(q-1), Pr(q-1) > , Pq}
    • ...Pr(i) = data pointer, Pi = tree pointer or data pointer
    • Non-root nodes have ceil(p/2) < = q < = p (avg. 69% full)
    • Root has > 2 pointers, unless it has no children.
    • Leaf nodes have Pi = Null.
    • Key values in nodes are UNIQUE
  • B+tree is like B-tree, except for
    • Leaf and non-leafs have different formats (Higher fan-out):
    • ...Non-leaf node do not have Pr(i), Pi is always tree pointer
    • ...Leaf nodes do not have Pi (which were Null anyway!)
    • Leaf nodes are threaded (for sequential access)
    • Key values NON-LEAF nodes are unique
  • B+tree used in most commercial databases
  • Q? Is B+tree an index file distinct from datafile?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Comparison + Search() on B+trees
  • Q? Compare B-tree and B+-tree on fanout,
    • ...Avg. #nodes, #entries, #pointers @ levels root, 1, 2, 3 for
    • ...V= 9 bytes, P= 6 bytes, B= 512 bytes, 69% utilization
    • fo (17,23), Level 2 (4912,11638), Level 3 (83520,267674)
  • Search(key = k, Page), called with root-page (Alg. 14.2/pp 477)
    • ...binary search on page within main memory
    • ...Case K = Ki in the page = > Follow Pr(i)
    • ...Case K in subtree(notNull Pi) = > Search(K, page(Pi))
    • ...Case K in subtree(Null Pi) = > not in tree
    • Time = depth of B+tree = log(r, base = fo), i.e. 2 to 4
  • Ex. List strategies for point and range search using B+tree
    • ...as primary (secondary) index.
    • Primary: Find first record, scan datafile from there
    • Secondary: Find first index-leaf, scan index-leafs


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S10: Insert(), Delete() with B+trees
  • Complex recursive algorithms, Better use intuition!
  • Insert(key = k, ...), see Alg. 14.3 (pp 478)
    • Produce a legal B+tree compatible with following steps:
    • 1. Search(key = k, ...) to locate right Leaf
    • 2. Case Leaf has space = > insert in leaf
    • 2. Case Leaf is full = > insert; split leaf into 2 pages
    • 3. Parent(Leaf) is updated to add pointer to new leaf
    • ...overflow = > recursion of {insert, split}up the level
  • Ex. Fig. 14.12 (pp 480): Insert 8, 5, 1, 7, 3, 12, 9, 6
    • ...insert(1) = > splits root
    • ...insert(12) , insert(6) = > split propagates
  • Deletion from a B+tree(order 3): 5, 12, 9
  • Ex. 14.13 (pp 481)
  • Ex. How many distinct B+ tree are possible for
    • p = 3, depth < = 3
    • Draw all the trees.
    • Identify sequence of trees from inserting
    • ...a sorted list of keys
    • ...an unordered list of keys


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S11: 14.4 Index on Multiple Keys
  • Motivation
    • Star joins in data warehouses
    • Applications - Geographic Information, multimedia
    • keys are not linear
  • Idea 1: 14.4.1 Linearize
    • Order (linear) keys -Ex. (pp. 483)
  • Idea 2: Partitioned Hashing
    • hash-bucket-address = concat( h1(key1), ..., hn(keyN) )
    • Useful for point query, not useful for range query
  • Idea 3: Truly multi-dim index
    • Grid Files (Fig. 14.14, pp. 484)
    • Example of search


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)