Slides

S1: Ch 14: Index Structures for Files

S2: Summary of Ch#5. Record Storage and File Organization

Storage Hierarchy, Magnetic Disks
Records and Files
- Mapping them to disk blocks
- File operations
Organizing records in a file: unordered, ordered, hashed
- Different cost of Find(), FindNext()
- Additional data-structures used in dynamic/extendible Hashing
Index = additional data-structure to speedup search
- Idea similar to card-catalogs (library), indexes in books
- Fast search of index via ordering index entries
- ...or multi-level index (search trees)
- Focus of Ch# 14

S3: 14.1 Index: Basic Concepts and Classification

S4: Single Level Indexes (Contd.)

S5: Single Level Indexes: Exercises

Q? How many indices of each type can be defined on a datafile?
- Heap, Hashed: Many secondary indexes
- Sortedfile: One primary/clustered, Many secondary indexes
Fully inverted file = one secondary index per field
Q? Which index provides most gain in search time ?
- Secondary: O(bd/2 - log(bi)); Primary: O(log(bd) - log(bi))
Cost Model Ex. 1 & 2 (pp 459, 462): r = 30,000;
- ...B = 1 Kbyte; R = 100 byte; V = 9 byte; Pointer = 6 bytes
- ...Compute I/O Cost Saving from Primary Index.
- ...Compute I/O cost savings from a Secondary index (unordered datafile).
- ...Hints: calculate bfr(d), bd, #index entries, bfr(i), bi
Q? What is the cost of FindNext() for different indexes?
- O(1) for all, since index is sorted.
Table 14.1-2 (pp 465-6), give row/col. headers, ask for entries.

S6: Multilevel Indexes (Fig. 14.6, pp 467)

Concept: Speedup Step(a), i.e. index search
- By adding a 2nd level index to 1st level index file
- How many levels can a multilevel index have?
- ...log(bi, base = fo), where fo = fanout = bfr(i)
- Search procedure: level traversal (Alg. 14.1, pp 468)
- Search time: log(bi, base = fo)) vs. log(bi) vs. log(bd) vs. bd/2
Commercial ISAM = 2 level primary index, ordered file
Ex. computer #levels and search time for multilevel index to
- ...unordered datafile, dense secondary index, r = 300 million.
- ...B= 1 Kbyte; R= 100 byte; V= 9 byte; Pointer= 6 bytes
Q? What kind of index is 2nd level index?
- ...Primary, Clustered or Secondary?
Summary: Faster Search, But Slower Insert(), Delete()
- To reduce shifting = > leave freespace in blocks OR
- ...use overflow blocks

S7: 14.3 Dynamic Multilevel Index: B-tree, B+tree

Definitions: Tree, nodes, root, leafs, internal nodes, level (node)
- ...parent-child relationship, descendants, subtree(node)
- ...Fig. 14.7 (pp 469)
Search Trees of order p (see Fig. 14.9/ pp 471)
- variation of multilevel indexes
- Node has (p-1) search values and p pointers
- ...in order < P1, K1, P2, K2, ..., K(q-1), Pq), q < = p
- ...where Pi = pointer to child / null; Ki = key value
- ...For simplicity, assume search values to be unique
- Properties (see Fig. 14.8, pp 470)
- ...Given key value can occur in a unique child(node)
- ...1. Within each node, K1 < K2 < ... < K(q-1)
- ...2. all X in subtree(Pi); K(i-1) < X < Ki for 1 < i < q
- ... X < Ki for i = 1; and K(i-1) < X for i = q

S8: B-tree, B+tree (Fig. 14.10, 14.11 on pp 473, 475)

S9: Comparison + Search() on B+trees

Q? Compare B-tree and B+-tree on fanout,
- ...Avg. #nodes, #entries, #pointers @ levels root, 1, 2, 3 for
- ...V= 9 bytes, P= 6 bytes, B= 512 bytes, 69% utilization
- fo (17,23), Level 2 (4912,11638), Level 3 (83520,267674)
Search(key = k, Page), called with root-page (Alg. 14.2/pp 477)
- ...binary search on page within main memory
- ...Case K = Ki in the page = > Follow Pr(i)
- ...Case K in subtree(notNull Pi) = > Search(K, page(Pi))
- ...Case K in subtree(Null Pi) = > not in tree
- Time = depth of B+tree = log(r, base = fo), i.e. 2 to 4
Ex. List strategies for point and range search using B+tree
- ...as primary (secondary) index.
- Primary: Find first record, scan datafile from there
- Secondary: Find first index-leaf, scan index-leafs

S10: Insert(), Delete() with B+trees

Complex recursive algorithms, Better use intuition!
Insert(key = k, ...), see Alg. 14.3 (pp 478)
- Produce a legal B+tree compatible with following steps:
- 1. Search(key = k, ...) to locate right Leaf
- 2. Case Leaf has space = > insert in leaf
- 2. Case Leaf is full = > insert; split leaf into 2 pages
- 3. Parent(Leaf) is updated to add pointer to new leaf
- ...overflow = > recursion of {insert, split}up the level
Ex. Fig. 14.12 (pp 480): Insert 8, 5, 1, 7, 3, 12, 9, 6
- ...insert(1) = > splits root
- ...insert(12) , insert(6) = > split propagates
Deletion from a B+tree(order 3): 5, 12, 9
Ex. 14.13 (pp 481)
Ex. How many distinct B+ tree are possible for
- p = 3, depth < = 3
- Draw all the trees.
- Identify sequence of trees from inserting
- ...a sorted list of keys
- ...an unordered list of keys

S11: 14.4 Index on Multiple Keys

Motivation
- Star joins in data warehouses
- Applications - Geographic Information, multimedia
- keys are not linear
Idea 1: 14.4.1 Linearize
- Order (linear) keys -Ex. (pp. 483)
Idea 2: Partitioned Hashing
- hash-bucket-address = concat( h1(key1), ..., hn(keyN) )
- Useful for point query, not useful for range query
Idea 3: Truly multi-dim index
- Grid Files (Fig. 14.14, pp. 484)
- Example of search