Sample Questions for Csci 8705, Fall 2001
Enclosed are a few sample questions for the final examination.
Validation Methodologies
-
A validation methodologies allows researchers to back up their claims in
objective manner. It allows the reader of a paper to reproduce the results and
conclusions of the author. This is a fundamental requirement for science and
scientific research. List a few papers in your reading list and identify their
validation methodologies. Do the paper describe their methodology in
sufficient detail for ensuring reproducibility of their claims?
- Which validation methodologies are commonly used in database
literature to back up following kinds of claims:
- correctness,
- completeness
- worst case computational complexity
- Pictograms can simplify conceptual models (e.g. ERD, UML) for
spatial application domains.
- Topological predicates proposed in OGIS model can represent all
topological relationships among spatial objects.
- Concurrency control for multi-diemnsional data: Predicate locks
are not effective as multiple granularity for locks.
- Buffer management for R-trees: Least recently used is a
reasonable buffer replacement policy.
- Simulation based experimental evaluation is a popular validation
methodology in Database Systems Research. What are the strengths
and weaknesses of this methodology?
- What is the statistical "confidence level" of performance
measures (e.g. mean response time) based on simulation?
How would one design simulation experiments to provide appropriate
confidence levels?
- Compare and Contrast the following pairs of concepts within
simulation methodology. Hint: Define each term, identify the
strengths and weakness of each.
- (a) Workload simulation: trace driven vs. Statistical Distribution driven,
- (b) Performance measure: response time vs. throughput,
- (c) Candidate comparison: statistical mean based vs. 95-th percentile based.
- Examine the performance studies reported in various papers in
the reading list. Identify the style of workload simulation, choice
of performance measures, and candidate comparison. Whenever possible,
use the alternatives identified in previous question for these dimensions.
- Repeat previous question for the dimension of estimation of performance of
alternative candidates. The performance of candidates can be estimated in two
different ways: (a) algebraic cost model or (b) measurements on execution
of a software prototype. For example, query optimizers use algebraic cost
models to
compare performance of alternative query processing strategies (recall Csci
5708). In contrast, some research papers (e.g. Efficient Cost Models for
Spatial Queries) presented analytical model and measured the cost in term of
node and disk acccess.
- Consider
the claims made in research papers comparing data models, e.g. object data
model, spatial data model, and relational data model. Which methodologies can
be used to compare them? What are evaluation measures for this task?
Weekly Topics
Past, Present and Future of Spatial Database Research
- Describe the need for SDB research.
- List 2 key research accomplishments of spatial database research in the
following subareas: Data Model, Query Processing, Storage Management,
Data Mining, spatial network. You may not identify accomplishments in the
topics not covered so far in the lectures.
- List 5 key research challenges for spatial database research
according to the readings.
- List a few new spatial database research challenges which were
not covered by the papers in the reading list. Answering following questions
may help you in identifying new research needs :
- Name major pull info. technologies (e.g. cell phones, PDAs,
global positioning systems, satellites, internet, multimedia,
data mining) of last decade.
- Name
major killer applications of last decade. Examples include geographic
information services (e.g. mapquest, vindigo.com), in-car navigation
devices, public health (e.g. identifying geographic clusters of cancer),
public safety (e.g. identifying geographic patterns in crime), etc.
- List
a few challenges to presented to traditional databases due to the new
technologies and applications?
- Which were not anticipated by authors of papers in the reading list?
Spatial Data Model
- List
two unique issues related to the data models needed by Spatial Databases.
Explain briefly.
- Compare and contrast the following pairs:
- Spatial data model and relational data model
- Spatial data model and object data model
- Rules and Integrity Constraints
- Absolute direction and object-orientation-based direction
- Consider
the support for spatial datatypes and operator (e.g. those in Open Geodata
Interchange Standard) inside a query language such SQL. Which data models
facilitate it? Explain.
- Answer
the following questions related to spatial pictorgrgam, UML papers:
- What are benefits of pictograms in conceptual model?
- Translate "Forest-stand is within the Forest" rule and
provide possible tables.
- What do pictogram translate to logical (SQL DDL) model?
- Compare and contract UML and ERD for modelling spatial application.
- What are the various types of metadata that should be included in a
geospatial-related repository?Give specific examples.
- Plug-in for Visual Languages (PVL) is a graphical notation
which can be used as part
of the conceptual data modeling process.List PVL's three basic geometric
constructs and their dimensionalities.List the four ways in which these
constructs can be presented to depict geometric properties of objects and
attributes, and give an example of each.
Spatial Query Languages
- List two unique issues related to querying of spatial databases.
Explain briefly.
- Explain
briefly how SQL/SDA supports the spatial data analysis and its web-based
implementation.
- What's the Pros and Cons of SQL/SDA relative of SQL3/OGIS.
- Why does SQL/SDA restricts new spatial extensions to FROM clause
of SELECT statement?
- Recall
"Reclarify" GIS operator corresponding to aggregate queries in SQL requiring
GROUPBY clause and aggregate function (e.g. SUM). Does SQL/SDA provide a
feature to express reclarify operator in a SQL/SDA query?
- Answer the following questions for direction predicates model:
- List the new spatial data types and operation proposed in model
direction relationships among point objects.
- Are
these predicate adequate to model direction relationship among extended
objects? Justify your answer.
- Consider the following tables:
Location(ID: INTEGER, Type:STRING, Location: POLYGON), and
Soil((ID: INTEGER, Type:STRING, Location: POLYGON),
and the query Q to display the land parcels and their area
if landuse type is 'Brushland' and soil type is 'A'.
Write a SQL/SDA expression and an SQL3/OGIS expression for query Q.
Storage and Indexing
- List
two unique issues related to the storage and indexing needs of Spatial
Databases. Explain briefly.
- Compare
and contrast design choices made by a relatinal DBMS and those made by a
spatial DBMS in the following areas:
- file-structures
- index-structures
- buffer
pool management (prefetching, replacement policy).
- Compare and contrast: R trees, R+ tree, GiST.
- State
the basic cost model for range query processing using R-tree. Answer the
following questions to understand the implication of the cost models:
- Consider
a range query of same size (area) but of different shapes, one is a squre
window and the other is a rectangular window. Which query is cheaper to
answer according to the cost model?
- Consider
twe sequences S1=(Q1, Q2) and S2=(Q1,Q3) of range queries. Queries Q1 and Q2
overlap significantly while Q1 and Q3 are disjoint as well as distant from
each other. What will be impact of buffering on total I/O cost of processing
sequences S1 and S2? Explain briefly using the cost model proposed by the
authors?
- Why
does pinning not improve the query performance if the buffer is very large as
compared to the number of nodes to be pinned.
Concurrency Control in Multi-dimensional data
- Define
the following term of concurrency control in multidimensional access: Phantom
problem, GiST, Granular locking.
- Consider
the following schedule, determine if it has phantom problem
T1 search (cond C); T2 insert x; T1 search (cond C);
where cond C is common across two search queries.
- Consider
a tree structure with each node having the entry E=<p, ptr>, where E.p
is a predicate that describe the subtree ponted by E.ptr.
What is the bounding predicate of a node.
- What
are the problems with using two phase read/write locking for concurrency
control on operations to Generalized Search Tree (GiST)?
- Compare
and contrast: 2 phase read/write locking,
GL/R-tree protocol, GL/GiST protocol (Kaushik, et al).
- Answer the following about GL/GiST protocol:
Does it follows two phase locking?
Is it is free of deadlock ? starvation?
What is order of visiting nodes for search operation? insert operation?
- Simulation
is a popular validation methodology in Database Systems Research. How can a
simulation study help in study of GL/GiST protocol.
Query Processing
- Why traditional algebraic transformations (heuristic rules)
are not suitable for Spatial query optimization?
- List differences between R-Trees and B+-tree?
- Many efficient spatial join algorithms involve two steps
in order to join two spatial relations. They are the
Filter step and the Refinement step. Briefly describe what each step
is and how they are different from each other.
- Scalable Sweeping Based Spatial Join algorithm is designed to
be used in the case where both the spatial relations to be joined
are non-indexed. Can you state an instance (example) where such
a case (i.e, where both the relations to be joined are non-indexed) arises.
- Comapre and contrast absolute direction and
object-orientation-based direction?
- Consider the classical puzzle related to mirror reflections.
The left and right sides of an object are swapped in its mirror image
while the top and bottom sides are not.
Explain this using the concepts of
relative and absolute directions.
- Compare
and contrast design choices made by a relatinal DBMS and those made by a
spatial DBMS in the following areas:
- query processing strategies
- query optimization algorithms
- List
different strategies for processing following spatial queries: spatial
selection using topological predicate, spatial selection using direction
predicate, spatial join, nearest neighbor.
- List
a situation when Open Shape Strategy (OSS) outperforms Range Query Strategy
(RQS). List another situation when OSS does not outperform RQS.
- List
a situation when Sweeping based spatial join (SBSJ) outperforms nested loop
strategy (NLS). List another situation when SBSJ does not outperform NLS
Spatial Networks
- Consider the data model for spatial networks.
Add a new data type to repsent routes in a graph.
Provide a few attributes and operations for route data-type.
Discuss the pros and cons of adding route data-type.
- Compare the storage and indexing needs of spatial networks
relative to those for other (e.g. OGIS) spatial data types.
- Compare and contrast CCAM and R-tree for indexing spatial networks.
- Consider the problem of efficient path computation in GIS.
Why is hierarchical path view model (HPV) better than Flat path view (FPV)?
- What are the four major tasks in the HEPV model? In these tasks,
which tasks should meet near real system requirement?
Spatial Data Mining
-
Spatial data can be mined either via custom methods (e.g. co-location rules,
spatial outlier detection, SAR) or via classical methods (e.g.
association rules, global outliers, regression) after selecting
relevant spatial features. Compare and contrast these two
approaches.
-
List differences between the spatial co-location rule discovery problem
and the classical association rule discovery problem.
- There are several approaches to defining the semantics
of co-location rules. Compare the event centric model with
the reference feature centric model.
- Find the table instances and the participation index of co-location
of size 2 with the for a given spatial dataset.
- Explain (fast) clustering. Give an example of how it may
be applied. Use a specific algorithm with a real-world data set.
- Explain why an efficient SQL implementation of the
EM (Expectation Maximization) clustering Algorithm would be useful.
Rastor Databases, Spatial Data Warehouses
- Define following categories of aggreagte functions: distributive,
algebraic, holistic. Provide examples of spatial aggregate functions in
each category.
- In "Microsoft TerraServer: A Spatial Data Warehouse" the
authors created the TerraScale program to create lower resolution
tiles and store them in the database.
What were the authors reasons for creating this program?
- What is Universal Transverse Mercator (UTM) ?
How it is used in rastor spatial databases?
- List issues in storage and indexing of rastor data
relative to those in storage of vector data.
- Define following categories of operations on
fields: local, focal, zonal.
Provide examples of field functions in each category.
- Define "content based retrieval". Compare it with querying
using OGIS operations.