S1: Ch. 26: XML and Databases
- Learning Goals:
- Learn about DBMS relevant XML standards
- Compare XML with Database Concepts
- Topics:
- Motivation
- Conceptual Data Model - XML Schema
- Logical Data Model - XML, XPath
- Physical Data Model - storage models
- Source: Textbook Chapter 26, Articles from Web
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Generic Motivation
- Many web-based app. servers use a dedicated DBMS as a XML store
- Store XML objects,e.g. documents, messages
- ...Ex. Fig. 26.3 XML element < project > , pp. 847
- ...Exercise: Contrast XML and HTML (Fig. 26.2, pp. 845)
- Retreive a couple of XML object as per client request
- During execution, they manipulate XML data internally
- ...and send those to web browsers for display and manipulation!
- Example Application Domains
- Content management, e.g. Legal Documents
- Work-flow management within/across organizations
- ...Web-based EDI - XML based communication across organization
- Workload assumptions
- Database is dedicated to web-based applications
- Limited set of queries - get/put/xpath query XML documents
- Database is shared among only web-apps, not exposed to users
- Comparison with relational model assumptions
- Applications: what-if analysis, data mining, ...
- Human may interact with DBMS directly or via DSS tools
- Unlimited set of ad-hoc queries (e.g. 3NF)
- Database may be shared across multiple applications
- Optimizing DBMS dedicated to one application
- Denormalize schema to favor workload
- Reduce semantic gap with the application domain
- ...Add or map ontologies of databases and application software
- ...If application is web-centric, then SQL should add XML concepts
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: An Example - EDI
- Business Goal: Electronic Data Interchange
- Reduces delays, costs, and errors
- Ability to react quick to market conditions
- Reduce inventory costs via just-in-time management
- EDI (Electronic Data Interchange)
- Purpose: allows businesses to exchange data
- ...via a electronic messaging services
- Provides
- ...a collection of standard message formats
- ...a standard element dictionary
- EDI Elements
- ...Business Language
- ...Business Practices
- ...Trading Partner Profile
- ...Logging and Archiving
- ...Acknowledgements
- ...Application APIs
- ...Transaction Expertise
- ...Message Standards
- Problem: Data communication among DBMS across a supplier chain
- Entities: Merchandise, Purchase Orders, Shipping Order,
- ...Delivery Receipts, Invoices, Payments
- Organizations: Retailers, Vendors, Shippers
- Each organizations has its own DBMS
- ...diverse relational schemas
- ...diverse brand-names DBMS software
- ...diverse data-type associated with common data-item, e.g. date
- Issues - Domain Specific
- Federated Schemas, e.g. element dictionary
- Standard message formats for data exchange
- Communication Protocols
- Translators between SDEF and local DBMS
- Approaches - domain specific standards
- pre-internet: standard file formats, communication protocols
- post-internet: XML data format, web-style http protocol
- XML is a language to define standard data-structures
- Pros: web-compliant, getting popular
- Cons: text format, usually large
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: XML and XML/EDI
- XML (eXtensible Markup Language)
- a file format for representing data
- a schema for describing data-structures
- Mechanism to extend and annotate HTML with semantic info.
- XML Terminology
- Tagging Standard
- Script Attachments
- Transaction Validation
- Search Techniques
- Linking and Reference
- Multimedia
- WWW
- Authoring Tools
- %XML/EDI
- A Standard to exchange EDI data
- ...e.g. invoices, project status
- Search, decode, manipulate, display data from
- ...Database, catalogs, workflow document, message
- ...messages, transaction (exchanged via an API)
- Implement business language, rules and objects
- ...EDI dictionaries
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: Conceptual Data Models
- Choices
- DTD - older, popular, weak in data modeling
- ...Figure 26.4, pp. 848, DTD for projects
- XML Schema - new, richer data modeling like ERDs
- ...Ex. Figure 26.5, pp. 850-3, XML Schema for company
- XML DTD
- Context free grammar to describe XML documents
- Non-terminals: Elements, Attributes
- Terminals: PCDATA, i.e. strings
- Cons - hard to model relationships, keys, etc.
- XML Schema :
- grammar based syntax with following concepts
- schema, element, attributes
- Type System: Types - simpleType, Complex
- ...constructors - sequence, compositor
- ...group (of elements), attributeGroup
- Constraints - unique, key, keyref
- ...Null and default values
- Modulatization: abstract data types, namespaces
- Documentation and other annotations
- Ref.: www.xml.com/pub/a/2000/11/29/schemas/part1.html
- Example XML Schema (Fig. 26.5, pp. 850-3)
- Note: 1. Annotations, documentation - pp. 850
- ...2. Hierarchical document structure - pp. 850
- ...3. Constructors, e.g. Sequence - pp. 850
- ...4. Database constraints - domain, referential - pp. 851
- ...captures relationship name, min/max cardinality constraints
- ...5. Composite types - similar to SQL3 ADTs - pp. 852
- Compare EER diagram and XML Schema for company database
- Relational model (Fig 5.5 pp. 136) and ERD (Fig 4.1 pp. 87)
- XML Schema (Fig. 26.5, pp. 850-3)
- Q? Is XML schema closer to relational model or ERD ?
- Q? Is XML schema a conceptual DM or a logical DM ?
- Converting EER to XML - Issues
- XML schema is a logical model like relational model
- Given ERD can be translated to alternative XML schemas.
- Issues 1. XML looks for a hierarchy of entities
- ...Many hierarchies may be possible
- ...Ex. choices for roots - Fig. 26.8, 26.10, 26.12
- ...Choices for XML schemas: Fig. 26.9, 11, 13
- EER can have Cycles, e.g. Figure 26.6 (pp. 857)
- ...Breaking the cycle, Fig. 26.13 (pp. 862)
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Conceptual Data Model: Comparison with EER
- XML Schema vs. EER - High level comparison
- XML Schema Pros:
- ...web-based comm. among applications
- ...text based, nested tree like syntax
- ...Likely to be processed by diverse computer programs
- ...Data transfer among multiple applications
- ...modularization primitives, e.g. namespace
- ...though ER models in practice have "subject areas"
- Cons:
- ...Readable by programmers but not by business users
- ...Can't replace EER for comm. b/w analyst and business users
- ...No graphic notation like EER
- ...Too detailed like SQL3, e.g. ADTs type constructors
- ...Not suitable for shared database across non-web applications
- Should one compare XML Schema with EER ?
- Probably not! XML schema looks similar to logical model!
- Q. Study latest draft of XML schema standard from web source,
- e.g. http://www.xml.com/pub/a/2000/11/29/schemas/part1.html
- Does it model inheritance ?
- Does it provide diagramming conventions to support analysis ?
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Logical Data Model
- XML vs. Relational Model - High level comparison
- XML - web-based comm. among applications
- ...Goal: structure and exchange documents via web
- ...More of a data format than a data model
- ...Schema - explicit specification is optional
- ...Schema may be implicit, partial, incomplete, and may change
- ...Schema tags are duplicated across document
- Relational Model-
- ...Goals Store large volume of data, enable efficient access
- ...Ensure data quality via integrity constraints, e.g. domain, ref.
- ...Schema is mandatory, stored in a DBMS
- ...Tuples are checked for consistency with schema before insertion
- ... Schema evolution requires revision of legacy data
- ...Schema stored once in a DBMS
- Compare Relational model and XML Schema for company database
- EER diagram: Fig 5.5 pp. 136 (for ERD in Fig 4.1 pp. 87)
- XML Schema (Fig. 26.5, pp. 850-3)
- Cons:
- XML Schema vs. Relational Model - detailed comparison
- Pros:
- ...Web-based data transfer, may become popular!
- ...Specifies a denormalization, i.e. hierarchical document
- ...modularization primitives, e.g. namespace
- ...though relational models in practice have "subject areas"
- Cons:
- ...But has few data types, i.e. strings
- ...Does not model a few situations well
- ...identifying relationships, weak entities
- ...disjoint/overlapping sub-classes
- ...Inheritance: no distinction b/w part-of and sub-type
- ...Relationships are represented by predicates
- ...Hierarchical - hard to model cycles
- ...e.g. two relationships between a entity pair
- ...Normal Forms - not popular yet with XML Schema!
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Query Languages
- XML Standards
- XPath - tree transformation language
- Basic Elements of XPath
- Path Expressions
- Example - Figure 26.14, pp. 863
- FLWR expressions
- Example - Figure 26.15, pp. 864
- Comparison with relational algebra
- Pros: Tree traversal for pre-materialized join
- Cons: denormalization, ad-hoc queries, collections of documents
- ...Relational algebra - declarative query language
- ...Normal Forms - allow ad-hoc querying for what if analysis
- SQL Standard: extensions for XML
- See Local Guide to SQL
- ...http://www-users.cs.umn.edu/~lpeng/sql\%20guide.htm#SQL/XML
- Mapping SQL tables to XML schema types
- XML publishing functions (SQL data expressed as XML)
- New built-in type for SQL:XML
- ...Inserting XML into an SQL database
- ...Validating XML in an SQL context
- ...Retrieving XML from an SQL database
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Physical data storage
- Source: Chapter 19, Chaudhuri et al, XML Data Management,
- ...Addison Wesley, ISBN 0-201-84452-4
- Benchmark Queries - pp. 536
- Store/extract/delete XML documents
- Extract parts of a XML document, identified by
- ...position of elements in the document
- Replace parts of a XML document
- Representation Choices
- ...Non-typed - Fig. 19.1 pp. 521
- ...Typed - Fig. 19.2 pp. 523
- Storage Manager Choices
- Relational Storage Manager
- ...Exploring XML documents to tables
- ...Extracting XML documents from tables
- Object Storage Manager
- Native Storage Data Manager
- Storage Manager Comparison
- Tables 19.7-11 , pp. 536-537
- No-typed relational storage is preferred,
- ...except for reconstruction of complete XML documents!
- Indexing - Index types
- Standard value based, e.g. search nodes based on attribute values
- Text indexes on keywords
- XML structure indexes
- ...Path index - All paths occurring in an document
- ...Encode Tree into a linear address space
- ...Collection index - All nodes with a given element type
- Query Processing
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)