S1: Data Warehouses and Data Mining
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: DataWarehouses - history
  • OLAP Databases (vs. OLTP Databases)
    • Infrequent Periodically refresh
    • Frequent read-only aggregate reports
    • Central repository of data for analysis
    • Process shown in Fig. 28.1, pp. 901
  • Marketed by consultants in late 1980s
    • Sold methodology and DWH servers
    • Methodology to gather data from many sources
    • ...data cleaning, refresh etc.
    • New vocabulory - OLAP, star schema, pivot, ...
    • Custom software products with consulting packages
    • ...Many absorbed into relational DBMS product families !
  • Academics got involved in early 1990s
    • Data Cube - an abstract model of hierarchical reports
    • ...specifies all summary reports together
    • SQL - cube operator
    • Performance enhancement - e.g. star join strategy


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: Datawarehouses - Common Features
  • Motivation: OLAP, DSS, EIS, ...
    • Few metrics of evaluations - e.g. sales, profit
    • Analysis of impact of many dimension on the metric
    • ...Via hierarchy of multiple aggregation reports
    • ...e.g. top ten stores = > drill down to find out why
  • Abstract Example
    • Multi-dim data - Figure 28.4, pp. 904
    • Rollup - Figure 28.5, pp. 905
    • Drill down (by dimension & concept hierarchy) - Figure 28.6, pp. 905
  • Ex. relation R(SI) schema: (store, item, sales)
    • Data: (s1, i1, 3), (s1, i2, 40), (s2, i1, 20), (s2, i2, 6)
    • Report R(S): Sales summary by stores: (s1, 43), (s2, 26)
    • Report R(I): Sales summary by items: (i1, 23), (i2, 46)
    • Report R on total Sales : 69
  • Aggregation Report Hierarchies
    • ...dimension lattice cube: R(SI) -- > { R(S), R(I) } -- > R
    • ...concept hierarchy, e.g. R(S) -- > R(Store groups by region)
  • Operations in a datawarehouse (Section 28.5, pp. 910)
    • Pivot: get non-relational view
    • Dice/Slice: select parts of reports
    • Navigation in hierarchy of reports
    • ...rollup: R(SI) -- > R(S) -- > R
    • ...drill-down: R -- > R(I) -- > R(IS)
  • Match items in "Data" menu in Excel to DWH operations
    • PivotTable and PivotChart Report...
    • SubTotals
    • Group and Ungroup
    • Consolidate


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: DataWarehouses - Conceptual Data Model
  • Multi-dimensional data model
    • Star Schema (Fig. 28.7, pp. 906)
    • 2 types of tables: 1 fact table, dimension tables
    • dimension table Ti = < Di key, Di attributes >
    • fact table = < D1, D2, ..., Dn, mesaurement y >
  • Comparison with Entity Relationship Model
    • Star schema is subset of ER model
    • ...It is an n-ry relationship
    • ...1 relationship table, N entities
    • Snowflake Schema (Fig. 28.8)- normalized dim. tables
  • Multi-dimensional representation of fact table
    • Function y = f(D1, D2, ..., Dn)
    • ...natural for statistical analysis
    • N dimension cube: domain(D1) X domain(D2) X ... X domain(Dn)
    • ...each cell has value of y
    • ...pivot spreadsheet view if n = 2
    • Smaller cubes to show summary reports
  • Q? Show multi-dim. cube representation of
    • ...reports R(SI), R(S), R(I) and R.
    • Arrange four cubes and sub-cubes to show
    • ...relationship of partial totals to total-sales


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: DataWarehouses - Logical Data Model
  • OLAP Analysis Queries to generate reports
    • Fact table: y = f(D1, D2, ..., Dn)
    • Reports: aggregate(y) = g(subset of dimensions)
    • ...power(2, N) distinct aggregation reports
    • Each report needs a "group by" SQL query
  • Extending SQL Query Language
    • Cube operator - generates data for all reports
    • ...Shorthand for power(2, N) group by queries
    • Cube(Fact table, subset with N dimensions) =
    • ...Union of power(2,N) "group by" queries on the fact table
    • ALL keyword - dual of NULL
    • ...fills in columns missing from summary reports
    • ...to denote that those have been aggregated away.
  • cube( R(SI) ) schema = (store, item, sales) = R(SI) schema
    • tuples: (s1, i1, 3), (s1, i2, 40), (s2, i1, 20), (s2, i2, 9)
    • ...(s1, all, 43), (s2, all, 26) from R(S)
    • ...(all, i1, 23), (all, i2, 46) from R(I)
    • ...(all, all, 69) from R


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: DataWarehouses - Logical Data Model
  • Rollup( Fact table, ordered list of N dimensions ) =
    • Union of N "group by" queries on the fact table
    • ...Aggregate away I right-most dimensions, I = 1, 2, ..., N
    • rollup(fact(store, item, sales), (store, item) )
    • tuples: (s1, i1, 3), (s1, i2, 40), (s2, i1, 20), (s2, i2, 9)
    • ...(s1, all, 43), (s2, all, 26) from R(S)
    • ...(all, all, 69) from R
    • rollup(fact(store, item, sales), (item, store) )
    • tuples: (s1, i1, 3), (s1, i2, 40), (s2, i1, 20), (s2, i2, 9)
    • ...(all, i1, 23), (all, i2, 46) from R(I)
    • ...(all, all, 69) from R
    • ...Note: result lists column in same order as in the list!
  • Ex. Write SQL expressions to get reports R(S)
    • (A) to get reports R(S) from data table R(SI)
    • (B) to get reports R(I) from cube( R(SI) )
    • (C) to get cube( R(SI), (SI) ) from data table R(SI)
    • (D) to get rollup( R(SI), (SI) ) from data table R(SI)


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: DataWarehouses - Logical Data Model
  • Oracle examples for part (C) in previous slide
    • given table fact(store, item, sales) = R(SI)
      select store, item, sum(sales)
      from fact
      group by cube(store, item)
  • Oracle examples for part (D) in previous slide
    select store, item, sum(sales)
    from fact
    group by rollup(store, item)
  • Oracle operations: supp. book - pp. 220
    • GROUP BY cluase may have ROLLUP, CUBE
    • Operations: ranking : RANK,PERCENT_RANK, ROW_NUMBER, ...
    • ...windowing/reporting : STDDEV, VARIANCE, FIRST_VALUE, ...
    • ...Statistics : covariance, correlation, linear regression
    • ...Lag/lead : compare similar time periods e.g. 2001 Q1, 2002 Q1


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: DataWarehouses - Physical Data Model
  • DWH logical data model can be implemented on
    • Custom systems using flat files
    • Relational database servers (Relational OLAP)
    • New performance needs of DWH - > new solutions
    • ...bit-map index, star-join strategy,
    • ...view maintenace, parallel processing
  • Storage Methods: Bit-map index (Fig. S9.3, pp 218)
    • index = < attribute-value, bitmap vector >
    • Example of bit-map indices
    • ...R(SI).store index tuples: (s1, 1100), (s2, 0011)
    • ...R(SI).item index tuples: (i1, 1010), (i2, 1010)
    • Storage overhead comparison(bit-map index, B+trees)
    • ...bitmaps = 1 bit per tuple in the table
    • ...index-size = (nR / 8) * size(Domain)
    • ...size of B+tree = (nR)*(key-size + pointer size)
    • Useful for Attributes with a small domain of values
    • ...e.g. state - domain has 50 values for USA
  • Join Strategies: Star Join Strategy
    • For N-ry join (fact table, all dimension tables)
    • Scans fact table once only!
    • Better than sequence of N binary joins
    • ...when size(fact table) >> size(dimension tables)
    • Suppl. book, pp. 217 under Query Optimization
  • Q. Review the relational table in Figure 6.1 (pp. 158)
    • Which column may be indexed using bit-map index?
    • How many entries in index? How many bit per bit-vectors?
    • List first few bits for each bit-vectors in index.
  • Q. Compare I/O cost for bit-map index and B+tree secondary index for
    • assume index : table.A1, table.A2
    • select(table, A1 = value1)
    • select(table, A1 = value1 and A2 = value2)
    • insert a new tuple
  • Q. How will Oracle cluster data-structure store a cluster
    • ...consisting of a fact table, and its dimension tables?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)