S1: Welcome to CS 5708
- Override policy and Attendance
- Introductions: Instructor, T.A. and
- ...In-class and UNITE students
- Goals - Learn concepts to be a power-user of databases
- Topics - Performance Tuning of SQL systems,
- ...OLTP, Distributed DBs (e.g. Client-Server)
- How to customize the course to your needs?
- Course Organization - Schedule (for lectures, hws, exams)
- Textbook, Papers
- Homeworks (pen-N-paper and Lab. Ex.)
- Topics for Discussion + Readings
- Chapter 13 (pp 411-449), Appendix C (pp 951)
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Ch#13. Record Storage and File Organization
- Introduction to Storage Hierarchy
- Secondary Storage Devices - Magnetic Disks
- Description and Cost Model
- Placing file record on Disk
- Records, Blocking and Placement
- Physical File Structure (Block-pointer diagrams)
- Operations of Files
- File Organization - Heap, Sorted, Hased
- Cost Model of Operations for different files
- Growing Hash Files - Linear, Dynamic, Extendible
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: 13.1 Memory Hierarchies, Characteristics
- Hierarchy of Storage - Multiple levels
- Primary Storage (e.g. main memory, cache, registers)
- ...volatile, micro-sec. access time, limited storage, high cost/unit
- Secondary storage (e.g. magnetic disks, CD-ROM, ...)
- ...persistent, milli-sec access time, larger capacity, lower cost/unit
- Tertiary Storage (e.g. magnetic tapes, off-line archives, ...)
- ...persistent, off-line, second access time, cheapest
- Secondary storage
- Database use it for persistence
- Data-structures: records, blocks, files
- ...Block pointer diagrams
- Logical access pattern: File operations
- Physical Access Pattern: block at a time
- Q? What is the price or 1 Gbyte of storage
- ...for RAM, Magnetic disk, tapes?
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: 13.2 Secondary Storage Devices
- Magnetic Disk: Description Fig. 13.1, 13.2 (pp 416-7)
- ...track, platter, cylinder, sector
- ...synonyms: sector, block, pages on disk
- Access time = s + r + btt
- s = seek time (move head to given track)
- ...Linear function of (current track - given track)
- r = rotational latency (wait for sector to be under head)
- btt = block transfer time (read + copy data to memory)
- ...(block size / transfer rate)
- Q? Compare the following sector-pairs for access time:
- sectors on a cylinder, on a platter (different cylinder)
- Table 13.1 (pp. 418-9)
- Q? Compare the following fragment-size for backing up 1 Gb file:
- Fragment: a collection of sectors on a cylinder
- 1 Kb, 8Kb, 1Mb
- 13.10 RAIDs, parallel disk access
- Striping (Fig. 13.12, pp. 444-5)
- Levels of RAID (Fig. 13.13, pp. 448)
- ...Disk Arrays, Mirroring, Parity bits
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: 13.4 Placing Files & Records on Disk Blocks
- Records (Ex. on pp 423), terminator char
- Fixed length vs. Variable length (Fig. 13.5, pp 424)
- ...separator (variable length / multi-values fields)
- Blocking of records into sectors (Fig. 13.6, pp 426)
- Unspanned: each record within a block
- Spanned: some records are split across two blocks
- Blocking factor = [floor](block size / record size)
- Space vs. retrieval time trade-off
- Allocating Disk Blocks to Files
- Contiguous sectors on disk to a file
- Linked list of random sectors
- Indexed list of sectors
- Hashed list of bucket sectors
- Crash-recoverable file design
- Header block - record format, data address, ...
- Data block: records, local directory, next block
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: 13.5 Logical Operations on Files
- File Interface: record-at-a-time, set-at-a-time
- Find : key value - > record address
- Read : record address - > record
- FindNext, Delete, Insert, Modify,...
- FindAll : key value - > set of record addresses
- FindOrdered, Reorganize
- File Design: Goal and Definitions
- Given operations, find efficient support structures
- File Organization- structure records, blocks and files
- ...to facilitate operations (e.g. Find)
- Access Method- programs implementing operations
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: File Organizations
- 13.6 Heap - file of unordered records
- fast insert O(1), slow find and findnext O(b/2)
- 13.7 Sorted Files - records ordered by key (Fig. 13.7, pp. 432)
- Find and Findnext O(log b), slow insert O(log b + b/2)
- ...free-space/overflow improves insert to O(log b)
- 13.8 Hash or Direct Files
- Fast find and insert O(1), slow findnext O(b/2)
- External Hashing (Fig. 13.9, 13.10 , pp 437-8)
- Hash function, buckets, collisions
- Manage Bucket Overflows - chaining, etc.
- Static space allocation
- ...What if data expands to fill the space available?
- Smooth Space Allocation with Hashing
- ...instead of doubling space at reorganization!
- Linear, Extendible or Dynamic Hashing schemes
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Smooth file expansion in Hashing
- Extendible Hashing
- Key idea - indirection (Fig. 13.11, pp/ 440)
- ...Directory entries can share pointers
- ...Depth: Global(directory), Local(bucket)
- ...Overflow = > split bucket w/ overflow, double directory
- Execution Trace: Start with Fig 13.11
- ...Use 4-bit representation as input to hash function
- ...Assume bucket capacity = 1.
- ...insert 1, 3, 7, 10, 12, 15, 2, 4, 6, 8, 9
- Analysis
- ...Data space - linear in size of data
- ...Overhead space - directory size
- ...Point search cost - 1 (directory) + 1 (bucket)
- Linear Hashing
- Key idea: split buckets in order
- ...Overflow = > split next bucket, not bucket w/ overflow
- ...No directory needed, but overflow buckets
- Keeps 2 hash functions from a family
- Search Algorithm 13.3, pp. 442
- Execution Trace:
- ...h(k, i) = mod (k, power(2, i)) for i = 1, 2, 3, ...
- ...Assume bucket capacity = 1.
- ...insert 3, 4, 5, 6, 7, 8, 9, 10
- ...show h(i), h(i+1), n, buckets
- Analysis
- ...Data space - linear in size of data
- ...Overhead space - overflow buckets
- ...Point search cost - 1 (bucket) + epsilon (2nd look up)
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)