S1: Welcome to CS 5708
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Ch#13. Record Storage and File Organization
  • Introduction to Storage Hierarchy
  • Secondary Storage Devices - Magnetic Disks
    • Description and Cost Model
  • Placing file record on Disk
    • Records, Blocking and Placement
    • Physical File Structure (Block-pointer diagrams)
  • Operations of Files
  • File Organization - Heap, Sorted, Hased
    • Cost Model of Operations for different files
  • Growing Hash Files - Linear, Dynamic, Extendible


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: 13.1 Memory Hierarchies, Characteristics
  • Hierarchy of Storage - Multiple levels
    • Primary Storage (e.g. main memory, cache, registers)
    • ...volatile, micro-sec. access time, limited storage, high cost/unit
    • Secondary storage (e.g. magnetic disks, CD-ROM, ...)
    • ...persistent, milli-sec access time, larger capacity, lower cost/unit
    • Tertiary Storage (e.g. magnetic tapes, off-line archives, ...)
    • ...persistent, off-line, second access time, cheapest
  • Secondary storage
    • Database use it for persistence
    • Data-structures: records, blocks, files
    • ...Block pointer diagrams
    • Logical access pattern: File operations
    • Physical Access Pattern: block at a time
  • Q? What is the price or 1 Gbyte of storage
    • ...for RAM, Magnetic disk, tapes?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: 13.2 Secondary Storage Devices
  • Magnetic Disk: Description Fig. 13.1, 13.2 (pp 416-7)
    • ...track, platter, cylinder, sector
    • ...synonyms: sector, block, pages on disk
  • Access time = s + r + btt
    • s = seek time (move head to given track)
    • ...Linear function of (current track - given track)
    • r = rotational latency (wait for sector to be under head)
    • btt = block transfer time (read + copy data to memory)
    • ...(block size / transfer rate)
  • Q? Compare the following sector-pairs for access time:
    • sectors on a cylinder, on a platter (different cylinder)
    • Table 13.1 (pp. 418-9)
  • Q? Compare the following fragment-size for backing up 1 Gb file:
    • Fragment: a collection of sectors on a cylinder
    • 1 Kb, 8Kb, 1Mb
  • 13.10 RAIDs, parallel disk access
    • Striping (Fig. 13.12, pp. 444-5)
    • Levels of RAID (Fig. 13.13, pp. 448)
    • ...Disk Arrays, Mirroring, Parity bits


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: 13.4 Placing Files & Records on Disk Blocks
  • Records (Ex. on pp 423), terminator char
    • Fixed length vs. Variable length (Fig. 13.5, pp 424)
    • ...separator (variable length / multi-values fields)
  • Blocking of records into sectors (Fig. 13.6, pp 426)
    • Unspanned: each record within a block
    • Spanned: some records are split across two blocks
    • Blocking factor = [floor](block size / record size)
    • Space vs. retrieval time trade-off
  • Allocating Disk Blocks to Files
    • Contiguous sectors on disk to a file
    • Linked list of random sectors
    • Indexed list of sectors
    • Hashed list of bucket sectors
  • Crash-recoverable file design
    • Header block - record format, data address, ...
    • Data block: records, local directory, next block


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: 13.5 Logical Operations on Files
  • File Interface: record-at-a-time, set-at-a-time
    • Find : key value - > record address
    • Read : record address - > record
    • FindNext, Delete, Insert, Modify,...
    • FindAll : key value - > set of record addresses
    • FindOrdered, Reorganize
  • File Design: Goal and Definitions
    • Given operations, find efficient support structures
    • File Organization- structure records, blocks and files
    • ...to facilitate operations (e.g. Find)
    • Access Method- programs implementing operations


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: File Organizations
  • 13.6 Heap - file of unordered records
    • fast insert O(1), slow find and findnext O(b/2)
  • 13.7 Sorted Files - records ordered by key (Fig. 13.7, pp. 432)
    • Find and Findnext O(log b), slow insert O(log b + b/2)
    • ...free-space/overflow improves insert to O(log b)
  • 13.8 Hash or Direct Files
    • Fast find and insert O(1), slow findnext O(b/2)
    • External Hashing (Fig. 13.9, 13.10 , pp 437-8)
    • Hash function, buckets, collisions
    • Manage Bucket Overflows - chaining, etc.
    • Static space allocation
    • ...What if data expands to fill the space available?
  • Smooth Space Allocation with Hashing
    • ...instead of doubling space at reorganization!
    • Linear, Extendible or Dynamic Hashing schemes


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Smooth file expansion in Hashing
  • Extendible Hashing
    • Key idea - indirection (Fig. 13.11, pp/ 440)
    • ...Directory entries can share pointers
    • ...Depth: Global(directory), Local(bucket)
    • ...Overflow = > split bucket w/ overflow, double directory
    • Execution Trace: Start with Fig 13.11
    • ...Use 4-bit representation as input to hash function
    • ...Assume bucket capacity = 1.
    • ...insert 1, 3, 7, 10, 12, 15, 2, 4, 6, 8, 9
    • Analysis
    • ...Data space - linear in size of data
    • ...Overhead space - directory size
    • ...Point search cost - 1 (directory) + 1 (bucket)
  • Linear Hashing
    • Key idea: split buckets in order
    • ...Overflow = > split next bucket, not bucket w/ overflow
    • ...No directory needed, but overflow buckets
    • Keeps 2 hash functions from a family
    • Search Algorithm 13.3, pp. 442
    • Execution Trace:
    • ...h(k, i) = mod (k, power(2, i)) for i = 1, 2, 3, ...
    • ...Assume bucket capacity = 1.
    • ...insert 3, 4, 5, 6, 7, 8, 9, 10
    • ...show h(i), h(i+1), n, buckets
    • Analysis
    • ...Data space - linear in size of data
    • ...Overhead space - overflow buckets
    • ...Point search cost - 1 (bucket) + epsilon (2nd look up)


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)