S1: Ch. 26: XML and Databases
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Generic Motivation
  • Many web-based app. servers use a dedicated DBMS as a XML store
    • Store XML objects,e.g. documents, messages
    • ...Ex. Fig. 26.3 XML element < project > , pp. 847
    • ...Exercise: Contrast XML and HTML (Fig. 26.2, pp. 845)
    • Retreive a couple of XML object as per client request
    • During execution, they manipulate XML data internally
    • ...and send those to web browsers for display and manipulation!
  • Example Application Domains
    • Content management, e.g. Legal Documents
    • Work-flow management within/across organizations
    • ...Web-based EDI - XML based communication across organization
  • Workload assumptions
    • Database is dedicated to web-based applications
    • Limited set of queries - get/put/xpath query XML documents
    • Database is shared among only web-apps, not exposed to users
  • Comparison with relational model assumptions
    • Applications: what-if analysis, data mining, ...
    • Human may interact with DBMS directly or via DSS tools
    • Unlimited set of ad-hoc queries (e.g. 3NF)
    • Database may be shared across multiple applications
  • Optimizing DBMS dedicated to one application
    • Denormalize schema to favor workload
    • Reduce semantic gap with the application domain
    • ...Add or map ontologies of databases and application software
    • ...If application is web-centric, then SQL should add XML concepts


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: An Example - EDI
  • Business Goal: Electronic Data Interchange
    • Reduces delays, costs, and errors
    • Ability to react quick to market conditions
    • Reduce inventory costs via just-in-time management
  • EDI (Electronic Data Interchange)
    • Purpose: allows businesses to exchange data
    • ...via a electronic messaging services
    • Provides
    • ...a collection of standard message formats
    • ...a standard element dictionary
    • EDI Elements
    • ...Business Language
    • ...Business Practices
    • ...Trading Partner Profile
    • ...Logging and Archiving
    • ...Acknowledgements
    • ...Application APIs
    • ...Transaction Expertise
    • ...Message Standards
  • Problem: Data communication among DBMS across a supplier chain
    • Entities: Merchandise, Purchase Orders, Shipping Order,
    • ...Delivery Receipts, Invoices, Payments
    • Organizations: Retailers, Vendors, Shippers
    • Each organizations has its own DBMS
    • ...diverse relational schemas
    • ...diverse brand-names DBMS software
    • ...diverse data-type associated with common data-item, e.g. date
  • Issues - Domain Specific
    • Federated Schemas, e.g. element dictionary
    • Standard message formats for data exchange
    • Communication Protocols
    • Translators between SDEF and local DBMS
  • Approaches - domain specific standards
    • pre-internet: standard file formats, communication protocols
    • post-internet: XML data format, web-style http protocol
  • XML is a language to define standard data-structures
    • Pros: web-compliant, getting popular
    • Cons: text format, usually large


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: XML and XML/EDI
  • XML (eXtensible Markup Language)
    • a file format for representing data
    • a schema for describing data-structures
    • Mechanism to extend and annotate HTML with semantic info.
  • XML Terminology
    • Tagging Standard
    • Script Attachments
    • Transaction Validation
    • Search Techniques
    • Linking and Reference
    • Multimedia
    • WWW
    • Authoring Tools
  • %XML/EDI
    • A Standard to exchange EDI data
    • ...e.g. invoices, project status
    • Search, decode, manipulate, display data from
    • ...Database, catalogs, workflow document, message
    • ...messages, transaction (exchanged via an API)
    • Implement business language, rules and objects
    • ...EDI dictionaries


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: Conceptual Data Models
  • Choices
    • DTD - older, popular, weak in data modeling
    • ...Figure 26.4, pp. 848, DTD for projects
    • XML Schema - new, richer data modeling like ERDs
    • ...Ex. Figure 26.5, pp. 850-3, XML Schema for company
  • XML DTD
    • Context free grammar to describe XML documents
    • Non-terminals: Elements, Attributes
    • Terminals: PCDATA, i.e. strings
    • Cons - hard to model relationships, keys, etc.
  • XML Schema :
    • grammar based syntax with following concepts
    • schema, element, attributes
    • Type System: Types - simpleType, Complex
    • ...constructors - sequence, compositor
    • ...group (of elements), attributeGroup
    • Constraints - unique, key, keyref
    • ...Null and default values
    • Modulatization: abstract data types, namespaces
    • Documentation and other annotations
    • Ref.: www.xml.com/pub/a/2000/11/29/schemas/part1.html
  • Example XML Schema (Fig. 26.5, pp. 850-3)
    • Note: 1. Annotations, documentation - pp. 850
    • ...2. Hierarchical document structure - pp. 850
    • ...3. Constructors, e.g. Sequence - pp. 850
    • ...4. Database constraints - domain, referential - pp. 851
    • ...captures relationship name, min/max cardinality constraints
    • ...5. Composite types - similar to SQL3 ADTs - pp. 852
  • Compare EER diagram and XML Schema for company database
    • Relational model (Fig 5.5 pp. 136) and ERD (Fig 4.1 pp. 87)
    • XML Schema (Fig. 26.5, pp. 850-3)
    • Q? Is XML schema closer to relational model or ERD ?
    • Q? Is XML schema a conceptual DM or a logical DM ?
  • Converting EER to XML - Issues
    • XML schema is a logical model like relational model
    • Given ERD can be translated to alternative XML schemas.
    • Issues 1. XML looks for a hierarchy of entities
    • ...Many hierarchies may be possible
    • ...Ex. choices for roots - Fig. 26.8, 26.10, 26.12
    • ...Choices for XML schemas: Fig. 26.9, 11, 13
    • EER can have Cycles, e.g. Figure 26.6 (pp. 857)
    • ...Breaking the cycle, Fig. 26.13 (pp. 862)


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Conceptual Data Model: Comparison with EER
  • XML Schema vs. EER - High level comparison
    • XML Schema Pros:
    • ...web-based comm. among applications
    • ...text based, nested tree like syntax
    • ...Likely to be processed by diverse computer programs
    • ...Data transfer among multiple applications
    • ...modularization primitives, e.g. namespace
    • ...though ER models in practice have "subject areas"
    • Cons:
    • ...Readable by programmers but not by business users
    • ...Can't replace EER for comm. b/w analyst and business users
    • ...No graphic notation like EER
    • ...Too detailed like SQL3, e.g. ADTs type constructors
    • ...Not suitable for shared database across non-web applications
  • Should one compare XML Schema with EER ?
    • Probably not! XML schema looks similar to logical model!
  • Q. Study latest draft of XML schema standard from web source,
    • e.g. http://www.xml.com/pub/a/2000/11/29/schemas/part1.html
    • Does it model inheritance ?
    • Does it provide diagramming conventions to support analysis ?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Logical Data Model
  • XML vs. Relational Model - High level comparison
    • XML - web-based comm. among applications
    • ...Goal: structure and exchange documents via web
    • ...More of a data format than a data model
    • ...Schema - explicit specification is optional
    • ...Schema may be implicit, partial, incomplete, and may change
    • ...Schema tags are duplicated across document
    • Relational Model-
    • ...Goals Store large volume of data, enable efficient access
    • ...Ensure data quality via integrity constraints, e.g. domain, ref.
    • ...Schema is mandatory, stored in a DBMS
    • ...Tuples are checked for consistency with schema before insertion
    • ... Schema evolution requires revision of legacy data
    • ...Schema stored once in a DBMS
  • Compare Relational model and XML Schema for company database
    • EER diagram: Fig 5.5 pp. 136 (for ERD in Fig 4.1 pp. 87)
    • XML Schema (Fig. 26.5, pp. 850-3)
    • Cons:
  • XML Schema vs. Relational Model - detailed comparison
    • Pros:
    • ...Web-based data transfer, may become popular!
    • ...Specifies a denormalization, i.e. hierarchical document
    • ...modularization primitives, e.g. namespace
    • ...though relational models in practice have "subject areas"
    • Cons:
    • ...But has few data types, i.e. strings
    • ...Does not model a few situations well
    • ...identifying relationships, weak entities
    • ...disjoint/overlapping sub-classes
    • ...Inheritance: no distinction b/w part-of and sub-type
    • ...Relationships are represented by predicates
    • ...Hierarchical - hard to model cycles
    • ...e.g. two relationships between a entity pair
    • ...Normal Forms - not popular yet with XML Schema!


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Query Languages
  • XML Standards
    • XPath - tree transformation language
  • Basic Elements of XPath
    • Path Expressions
    • Example - Figure 26.14, pp. 863
    • FLWR expressions
    • Example - Figure 26.15, pp. 864
  • Comparison with relational algebra
    • Pros: Tree traversal for pre-materialized join
    • Cons: denormalization, ad-hoc queries, collections of documents
    • ...Relational algebra - declarative query language
    • ...Normal Forms - allow ad-hoc querying for what if analysis
  • SQL Standard: extensions for XML
    • See Local Guide to SQL
    • ...http://www-users.cs.umn.edu/~lpeng/sql\%20guide.htm#SQL/XML
    • Mapping SQL tables to XML schema types
    • XML publishing functions (SQL data expressed as XML)
    • New built-in type for SQL:XML
    • ...Inserting XML into an SQL database
    • ...Validating XML in an SQL context
    • ...Retrieving XML from an SQL database


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Physical data storage
  • Source: Chapter 19, Chaudhuri et al, XML Data Management,
    • ...Addison Wesley, ISBN 0-201-84452-4
  • Benchmark Queries - pp. 536
    • Store/extract/delete XML documents
    • Extract parts of a XML document, identified by
    • ...position of elements in the document
    • Replace parts of a XML document
  • Representation Choices
    • ...Non-typed - Fig. 19.1 pp. 521
    • ...Typed - Fig. 19.2 pp. 523
  • Storage Manager Choices
    • Relational Storage Manager
    • ...Exploring XML documents to tables
    • ...Extracting XML documents from tables
    • Object Storage Manager
    • Native Storage Data Manager
  • Storage Manager Comparison
    • Tables 19.7-11 , pp. 536-537
    • No-typed relational storage is preferred,
    • ...except for reconstruction of complete XML documents!
  • Indexing - Index types
    • Standard value based, e.g. search nodes based on attribute values
    • Text indexes on keywords
    • XML structure indexes
    • ...Path index - All paths occurring in an document
    • ...Encode Tree into a linear address space
    • ...Collection index - All nodes with a given element type
  • Query Processing


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)