Feedback on HW 4 (Project Draft)
- General Comments :
A few comments apply to multiple papers. These comments are summarized
below:
- A. Review the format used by COmputing Survey paper on recovery
methods in our readings. Follow similar formats for your papers.
Consider showing page numbers, section numbers, figure
numbers, table numbers in your paper.
Provide a table of content and a list of figures and tables
to help readers navigate the paper. Consider using latex to compose your
papers. It is easy to learn and we can get short tutorial from fellow
graduate students.
It will check for many of the stylistics problems and help you
focus on technical writing.
- B.Ensure there is a list of refernces at the end of papers.
Every reference should identify authors, title, publication forum
(publisher, conference/journal/book, volume, number, url, ...), date of
publication etc. so that readers can locate them easily.
- C.
Cite the sources (entries in your list of references) near the text
referring to relevant concepts/contributions.
Clearly identify text quotations, figures or tables from sources.
- D.Use figures and table to highlight key messages in your paper.
- E.
Avoid using white-paper-ish material from commercial websites and marketing
literature. Focus on technical material from database research conferences,
journals, books, etc.
- G1: Middleware, TP Monitor :
It is a survey paper describing commercial products in SQL middleware and TP
monitor. Bulk of material in sections 1 and 2 is superficial (from
whitepapers ?). Note that material in section 1 is taught in required
undergraduate courses (e.g. Csci 4061) and does not deserve a place
in a research paper in a graduate course without any serious analysis.
Why did you choose to focus on whitepaper-ish material?
The paper does not seem to overlap much with your project proposal which
focussed on Distributed Databases. There are no references to any technical
literature even though your project proposal did have them. If you are
interested in TP monitors look at the book by Jim Gray on this subject.
We should definitely talk in TUesday office hour to find a strategy
to salvage this paper. The project proposal was reasonabley promising with
reasonable number of technical sources (e.g. Distributed DB book by Ceri).
- G2: XML/GML Map Rendering :
It is paper describing an implementation project.
Group has already received detailed comments. The main area of concern is
the lack of analysis. Consider comparing DOM and SAX for single pass and
multi-pass spatial computations (e.g. counting nodes in a polygon, computing
area of a polygon, ...).
- G3: Trends in DB Research :
It is a short paper with interesting results. It provides an interesting tool
for tracking the popularity of topics in a research area.
The paper does have a list of references. It omits summary of readings of
key sources (i.e. related work and our contributions).
It would be useful to expand
the paper from current 4 pages (not counting the code in appendix) and two
figures to approximately 15 pages ( 5 to 7 figures). Every section deserves
expansion. Analysis section may be expanded to discuss how the automated
analysis tools were validated (e.g. show comparison against hand collected
data for a subset of data). It may be useful to show breakup of
applications topics (spatial, web, biological, banking, ...) since it seems
to dominate the counts for every year. Think of other results to show
sensitivity of your results to decisions (e.g. keyword to topic matching,
use of abstract vs. title vs. conference sessions)
made in implementaing the automated tools for data analysis.
Description of experiement design deserves special attention for
reproducibility of results. Provide a diagram to show the major steps in
data collection and analysis. Describe the details of each steps provide
the key algorithms (psudo-code) and design decisions.
Discuss alternatives to resolving design decisions in developing tools for
automatic analysis of trends in DBMS.
Think of writing a short user manual for your tool to help you write a good
paper.
Can your tool be used to analyze trends in literature on other research
topics? What files and parameter values would need to be provided?
What decisions would need to be made by the user?
- G4: Data warehouses :
A paper providing an organization of reearch papers in data warehousing.
Group received comments last Friday during an oral presentation.
Main strength are the organization (grouping of papers into coherent
topics). Minor improvements are possible in a few areas. It will be useful
to reduce the white-paper-ish material in sections 1, 2 and 3. Expand the
technical material in section 4 by adding discussion of star-join
algorithm. Also visit the web-site of Dr. Widon (Stanford) to check out the
reading list for her graduate seminar on data warehousing. Ensure that you
are covering the major papers in her reading list.
- G5: Mapcube implementation w/ Javacc:
All I received was a proposal. This group should complete a draft of the
final report and bring it in by Tuesday class for timely feedback.
Have you implemented mapcube using javacc yet?
- G6: Mobile DBMS :
A survey / tutorial paper on mobile databases. The content seems quite
redable and complete. Formatting leaves a lot to be desired. Look at the
formatting comments in the general section and address all of those.
Carefully separate the knowledge in D. Barabara's survey paper and new
knowledge relative to that. Add Navathe/Savasare's paper on intermittantly
connected databases to the list of sources. Use this paper to improve the
discussion of commercial systems by providing a critique of their strenthgs
and weakneeses. Also useful would be an elaboration of the challenges by
identify the hard issues in each challenge.
- G7: Spatial Data Mining :
Nice survey of papers in spatial data mining.
Group has received comments in email and office ours.
It would be useful to reduce emphasis on formal definition for clustering
and other patterns learned using unsupervised learning. Consider adding
examples to help readers who may not be familiar with either
data mining or spatial databases.
It may be useful to add a basic concepts section with two
subsections. First subsection could briefly summarize classical
data mining in terms of patterns and process. You already have text
on patterns. It may suffice to put in a diagram to show the process steps.
This diagram based be based on a diagram in Remote Sensing book. I had
place a note-it marker onthe page.
Second subsection may describe spatial data and their intereting properties
(e.g. density, depth, ... topological,. ...) . We can reuse text from the
Spatial Database survey paper (IEEE TKDE 1/99) if needed.
Also consider adding a short section at the end to classify the
papers in spatial data mining using three different dimentsions, namely
data mining process, patterns, spatial properties. For simplicity we could
a table or a picture for this information. THis section should briefly
discusss a few areas of research opportunities, i.e. categories not covered
by any paper so far.
- G8: Benchmark for Spatial Database :
A nice paper on designing a dataset and queries to help one understand the
OGIS spatial data model. Few minor revisions would be useful. Consider
adding a short description of OGIS data types and operators either in an
appendix or in a basic concepts subsection in section 1 or 2.
Place your dataset on a website in a readable format. Think about the
issues a user may face in trying to use the dataset. Try to provide a short
user manual with illustrative steps for using the data in commercial
spatial database supporting SQL3/OGIS or SQL2/OGIS.
Review the queries carefully since you had no opportunity to run those on a
commercial system. For example, should Q2 include C2.name in result?
Consider provide description of each query in both SQL
3/1999 as well as in SQL2.
Consider expanding section 5 on validation in a few different ways.
Expand the set of data types and operatios to improve coverage OGIS data
types (point collection, polygon collection) and operators.
Identify the queries covering each data type and operation. ALso identify
the data types and operations covered by each query.
Include Sequoia in comparison to show it poor coverage of
OGIS data types and operators. Expand measures of evaluation beyond
coverage to include portability, scalability, ... (see lecture notes for
the topic of benchmarks).
Feedback on HW 1 (Paper Analysis)
- General Comments :
- S0: Slide Format : Put page number on slides. Provide an outline slide
listing titles and page number of major groups of slides.
Use same typesetting style across all slides (e.g. font size
and type for slide titles, first level bullets, second level bullet).
For example slidetitle may be in bold font fontsize 20.
First level bullets may be in fontsize 18.
Second level bullets may be in fontsize 16.
- S1:
Slide Format : Slides do not use paragraphs and long sentences. Text
should be decomposed into a set of 6 to 8 phrases. Each phrase should be
condensed into 6 to 7 words.
- S2:
Slide Format : Use of diagrams are encouraged. Refer to
figures and tables from the research paper in textbook by listing figure
numbers and textbook page numbers. Do not have to reproduce the figures
in your slides
- S3:
Slide Content : Note that your presentation will help the audience
understand a research paper in the reading list. Thus it is important to
EXPLAIN not only list the key concepts in the paper in detail using examples, figures
etc. Ideally half (i.e. 5 out of 10) of your slides should focus on
key concepts.
- S4:
Focus : Choose 2 to 3 key ideas to
focus on in your talk.
This is about how much an audience can remember after a typical talk.
Use examples, and explanations to ensure that the selected key ideas
are well communicated. Explanations can rely onthe common background
(e.g. undergraduate coursework in databases).
- S5:
Explain 1 assumption : Choose 1 assumption to focus on.
Exaplain the significance of the assumption clearly.
- S6:
Problem statement slide should list follwing things - what
information is given, what is to be found, what are the objective
functions and constrains. It should be posed in a way that make the
proposed solution is a viable solution while allowing other solutions
to be discovered by other researchers.
- G1 : Recovery
Overall well presented slides. Area of improvements are in general
comments S0, S1 and S4. Writeup is quite good and can benefit from
some examples (see comment S4).
S1: Condense long phrases (e.g. first one on page 2, last 2 on page 3 ...)
S4: Use examples and diagrams in paper to explain 2 to 3 selected
concepts.
- G2 : Association Rules Need a hard copy of slides.
Writeup is very well-written following the structure of a short paper.
Other groups are encourage to review this writeup for seeing a sample of
rigour.
Comments for improvements follow:
S3: Add a slide to compare concepts of "Association Rules", "confidence",
"support" with statistical notions like conditional probability,
correlation, etc.
S3: Would be useful to add examples to illustrate
trace of algorithms (sections 5.1-3) to explain the working of each
algorithm for comparison purposes.
Consider describing the algorithms in SQL syntax.
S5: Add a slide on "Assumptions" towards the end. Explain one assumption.
Add a slide on "Rewrite today" towards the end.
- G3 : OODB Got comments in office hours.
- G4 : Data Warehouses
Well-designed slides following format comments S0, S1.
A few areas of improvements are listed below.
Order of slides: Groups slides on key concepts together
by moving slides on "Assumptions", "Validation" towards the end
just before last slide on "Rewrite".
S3, S4, S5: Need to expand the slides on key concepts to explain notion of
cube, rollup, drill-down, slice, dice etc. Free up space
by referring to figure in
the textbook instead of reproducing those on your slides (Figures 2, 3).
Instead compare cube, rollup, slice, dice with relational algebra / SQL
to help audience understand these better.
- G5 : Distributed Databases
Comments on the slides follow.
S1: Break up paragraphs and long sentences (see slide 1, bullet 1, 3, 5;
slide 3 bullet 2, ...).
S2: Refer to diagrams in the book to illustrate key concepts.
S3: There is a long list of key concepts with little explanation.
Choose a couple of key concepts and ad examples, illustration to explain
those. Preferably compare with familiar concepts from Csci 5708.
Bring "Contributions" slide before "Key Concepts".
Add "Strengths and Weaknesses" slide after "Key Concepts".
S5: Omit slide on "System R* status and future plans". Instead add a
slide on assumptions.
Revise Paper rewrite slide to include comparison between distributed
systems in 1980s with internet environment of today. Which new internet
issues should system R* design address?
- G6 : Parallel Databases
The writeup is better organized. Use the writeup to improve the slides.
Consider using the comments from editors of textbook to strengthen
the last few slides on "Assumptions" and "Rewrite today".
S2: It is good to see the slides use diagrams from the textbook. The diagrams
need not be reproduced instead appropriate slides should refer to
textbook (see S2).
S3: Number of slides without diagram is too few
particularly in the are of key concepts (see S3). Expand the slides on
key concepts to explains at least 2 to 3 concepts in detail.
S1: A few phrases (slide 1 bullet 3, slide 3 bullet 1, slide 5 bullet 4 ...)
are too long and need to be condensed.
S0: Finally the slides format should be made consistent (e.g. slidetitle on
page 3) as detailed in comment S0.
S5: Some explanation is needed with assumptions. You may want to choose one
assumption and give example to explain.
- G7 : Bit-map index
Writeup is much better than the slides. Use writeup to improve slides in
areas of problem statement, assumptions etc.
Please see me after the lecture or in office hours today to discuss your
slides. Please review general comments on top of this page.
Also consider reading the explanation of bit-map index in the textbook
by Ullman to supplement the paper.
Here are a few specific comments on your slides:
S0: Slide format needs improvement. Do not use bullet for Slidetitle (see
page 2, 3, ...).
S2: Refer to diagrams from the textbook to explain key conepts like
bit-map index.
S4: Compare bit-map index with something (e.g. B-tree index ) everyone
is familiar with. Use a common set of keys and create two indices to help
everyone understand the differences. Key concept slides are listing
concepts and need additional explanation via examples, figures etc.
S6: Problem statement slide (page 2) is actually describing the key
contributions. Bit-map index is a solution to the general problem we
expect to see in this slide.
S5: Add a slide on assumptions and explain its significance.
- G8 : ORDB Group got comments in office hours.