Feedback on HW 4 (Project Draft)

General Comments : A few comments apply to multiple papers. These comments are summarized below:
1. A. Review the format used by COmputing Survey paper on recovery methods in our readings. Follow similar formats for your papers. Consider showing page numbers, section numbers, figure numbers, table numbers in your paper. Provide a table of content and a list of figures and tables to help readers navigate the paper. Consider using latex to compose your papers. It is easy to learn and we can get short tutorial from fellow graduate students. It will check for many of the stylistics problems and help you focus on technical writing.
2. B.Ensure there is a list of refernces at the end of papers. Every reference should identify authors, title, publication forum (publisher, conference/journal/book, volume, number, url, ...), date of publication etc. so that readers can locate them easily.
3. C. Cite the sources (entries in your list of references) near the text referring to relevant concepts/contributions. Clearly identify text quotations, figures or tables from sources.
4. D.Use figures and table to highlight key messages in your paper.
5. E. Avoid using white-paper-ish material from commercial websites and marketing literature. Focus on technical material from database research conferences, journals, books, etc.
G1: Middleware, TP Monitor : It is a survey paper describing commercial products in SQL middleware and TP monitor. Bulk of material in sections 1 and 2 is superficial (from whitepapers ?). Note that material in section 1 is taught in required undergraduate courses (e.g. Csci 4061) and does not deserve a place in a research paper in a graduate course without any serious analysis. Why did you choose to focus on whitepaper-ish material? The paper does not seem to overlap much with your project proposal which focussed on Distributed Databases. There are no references to any technical literature even though your project proposal did have them. If you are interested in TP monitors look at the book by Jim Gray on this subject. We should definitely talk in TUesday office hour to find a strategy to salvage this paper. The project proposal was reasonabley promising with reasonable number of technical sources (e.g. Distributed DB book by Ceri).
G2: XML/GML Map Rendering : It is paper describing an implementation project. Group has already received detailed comments. The main area of concern is the lack of analysis. Consider comparing DOM and SAX for single pass and multi-pass spatial computations (e.g. counting nodes in a polygon, computing area of a polygon, ...).
G3: Trends in DB Research : It is a short paper with interesting results. It provides an interesting tool for tracking the popularity of topics in a research area. The paper does have a list of references. It omits summary of readings of key sources (i.e. related work and our contributions). It would be useful to expand the paper from current 4 pages (not counting the code in appendix) and two figures to approximately 15 pages ( 5 to 7 figures). Every section deserves expansion. Analysis section may be expanded to discuss how the automated analysis tools were validated (e.g. show comparison against hand collected data for a subset of data). It may be useful to show breakup of applications topics (spatial, web, biological, banking, ...) since it seems to dominate the counts for every year. Think of other results to show sensitivity of your results to decisions (e.g. keyword to topic matching, use of abstract vs. title vs. conference sessions) made in implementaing the automated tools for data analysis. Description of experiement design deserves special attention for reproducibility of results. Provide a diagram to show the major steps in data collection and analysis. Describe the details of each steps provide the key algorithms (psudo-code) and design decisions. Discuss alternatives to resolving design decisions in developing tools for automatic analysis of trends in DBMS. Think of writing a short user manual for your tool to help you write a good paper. Can your tool be used to analyze trends in literature on other research topics? What files and parameter values would need to be provided? What decisions would need to be made by the user?
G4: Data warehouses : A paper providing an organization of reearch papers in data warehousing. Group received comments last Friday during an oral presentation. Main strength are the organization (grouping of papers into coherent topics). Minor improvements are possible in a few areas. It will be useful to reduce the white-paper-ish material in sections 1, 2 and 3. Expand the technical material in section 4 by adding discussion of star-join algorithm. Also visit the web-site of Dr. Widon (Stanford) to check out the reading list for her graduate seminar on data warehousing. Ensure that you are covering the major papers in her reading list.
G5: Mapcube implementation w/ Javacc: All I received was a proposal. This group should complete a draft of the final report and bring it in by Tuesday class for timely feedback. Have you implemented mapcube using javacc yet?
G6: Mobile DBMS : A survey / tutorial paper on mobile databases. The content seems quite redable and complete. Formatting leaves a lot to be desired. Look at the formatting comments in the general section and address all of those. Carefully separate the knowledge in D. Barabara's survey paper and new knowledge relative to that. Add Navathe/Savasare's paper on intermittantly connected databases to the list of sources. Use this paper to improve the discussion of commercial systems by providing a critique of their strenthgs and weakneeses. Also useful would be an elaboration of the challenges by identify the hard issues in each challenge.
G7: Spatial Data Mining : Nice survey of papers in spatial data mining. Group has received comments in email and office ours. It would be useful to reduce emphasis on formal definition for clustering and other patterns learned using unsupervised learning. Consider adding examples to help readers who may not be familiar with either data mining or spatial databases. It may be useful to add a basic concepts section with two subsections. First subsection could briefly summarize classical data mining in terms of patterns and process. You already have text on patterns. It may suffice to put in a diagram to show the process steps. This diagram based be based on a diagram in Remote Sensing book. I had place a note-it marker onthe page. Second subsection may describe spatial data and their intereting properties (e.g. density, depth, ... topological,. ...) . We can reuse text from the Spatial Database survey paper (IEEE TKDE 1/99) if needed. Also consider adding a short section at the end to classify the papers in spatial data mining using three different dimentsions, namely data mining process, patterns, spatial properties. For simplicity we could a table or a picture for this information. THis section should briefly discusss a few areas of research opportunities, i.e. categories not covered by any paper so far.
G8: Benchmark for Spatial Database : A nice paper on designing a dataset and queries to help one understand the OGIS spatial data model. Few minor revisions would be useful. Consider adding a short description of OGIS data types and operators either in an appendix or in a basic concepts subsection in section 1 or 2. Place your dataset on a website in a readable format. Think about the issues a user may face in trying to use the dataset. Try to provide a short user manual with illustrative steps for using the data in commercial spatial database supporting SQL3/OGIS or SQL2/OGIS. Review the queries carefully since you had no opportunity to run those on a commercial system. For example, should Q2 include C2.name in result? Consider provide description of each query in both SQL 3/1999 as well as in SQL2. Consider expanding section 5 on validation in a few different ways. Expand the set of data types and operatios to improve coverage OGIS data types (point collection, polygon collection) and operators. Identify the queries covering each data type and operation. ALso identify the data types and operations covered by each query. Include Sequoia in comparison to show it poor coverage of OGIS data types and operators. Expand measures of evaluation beyond coverage to include portability, scalability, ... (see lecture notes for the topic of benchmarks).

Feedback on HW 1 (Paper Analysis)

General Comments :
- S0: Slide Format : Put page number on slides. Provide an outline slide listing titles and page number of major groups of slides. Use same typesetting style across all slides (e.g. font size and type for slide titles, first level bullets, second level bullet). For example slidetitle may be in bold font fontsize 20. First level bullets may be in fontsize 18. Second level bullets may be in fontsize 16.
- S1: Slide Format : Slides do not use paragraphs and long sentences. Text should be decomposed into a set of 6 to 8 phrases. Each phrase should be condensed into 6 to 7 words.
- S2: Slide Format : Use of diagrams are encouraged. Refer to figures and tables from the research paper in textbook by listing figure numbers and textbook page numbers. Do not have to reproduce the figures in your slides
- S3: Slide Content : Note that your presentation will help the audience understand a research paper in the reading list. Thus it is important to EXPLAIN not only list the key concepts in the paper in detail using examples, figures etc. Ideally half (i.e. 5 out of 10) of your slides should focus on key concepts.
- S4: Focus : Choose 2 to 3 key ideas to focus on in your talk. This is about how much an audience can remember after a typical talk. Use examples, and explanations to ensure that the selected key ideas are well communicated. Explanations can rely onthe common background (e.g. undergraduate coursework in databases).
- S5: Explain 1 assumption : Choose 1 assumption to focus on. Exaplain the significance of the assumption clearly.
- S6: Problem statement slide should list follwing things - what information is given, what is to be found, what are the objective functions and constrains. It should be posed in a way that make the proposed solution is a viable solution while allowing other solutions to be discovered by other researchers.
G1 : Recovery Overall well presented slides. Area of improvements are in general comments S0, S1 and S4. Writeup is quite good and can benefit from some examples (see comment S4).
S1: Condense long phrases (e.g. first one on page 2, last 2 on page 3 ...)
S4: Use examples and diagrams in paper to explain 2 to 3 selected concepts.
G2 : Association Rules Need a hard copy of slides. Writeup is very well-written following the structure of a short paper. Other groups are encourage to review this writeup for seeing a sample of rigour.
Comments for improvements follow:
S3: Add a slide to compare concepts of "Association Rules", "confidence", "support" with statistical notions like conditional probability, correlation, etc.
S3: Would be useful to add examples to illustrate trace of algorithms (sections 5.1-3) to explain the working of each algorithm for comparison purposes.
Consider describing the algorithms in SQL syntax.
S5: Add a slide on "Assumptions" towards the end. Explain one assumption.
Add a slide on "Rewrite today" towards the end.
G3 : OODB Got comments in office hours.
G4 : Data Warehouses Well-designed slides following format comments S0, S1. A few areas of improvements are listed below.
Order of slides: Groups slides on key concepts together by moving slides on "Assumptions", "Validation" towards the end just before last slide on "Rewrite".
S3, S4, S5: Need to expand the slides on key concepts to explain notion of cube, rollup, drill-down, slice, dice etc. Free up space by referring to figure in the textbook instead of reproducing those on your slides (Figures 2, 3). Instead compare cube, rollup, slice, dice with relational algebra / SQL to help audience understand these better.
G5 : Distributed Databases Comments on the slides follow. S1: Break up paragraphs and long sentences (see slide 1, bullet 1, 3, 5; slide 3 bullet 2, ...).
S2: Refer to diagrams in the book to illustrate key concepts.
S3: There is a long list of key concepts with little explanation. Choose a couple of key concepts and ad examples, illustration to explain those. Preferably compare with familiar concepts from Csci 5708.
Bring "Contributions" slide before "Key Concepts".
Add "Strengths and Weaknesses" slide after "Key Concepts".
S5: Omit slide on "System R* status and future plans". Instead add a slide on assumptions.
Revise Paper rewrite slide to include comparison between distributed systems in 1980s with internet environment of today. Which new internet issues should system R* design address?
G6 : Parallel Databases The writeup is better organized. Use the writeup to improve the slides. Consider using the comments from editors of textbook to strengthen the last few slides on "Assumptions" and "Rewrite today". S2: It is good to see the slides use diagrams from the textbook. The diagrams need not be reproduced instead appropriate slides should refer to textbook (see S2).
S3: Number of slides without diagram is too few particularly in the are of key concepts (see S3). Expand the slides on key concepts to explains at least 2 to 3 concepts in detail.
S1: A few phrases (slide 1 bullet 3, slide 3 bullet 1, slide 5 bullet 4 ...) are too long and need to be condensed.
S0: Finally the slides format should be made consistent (e.g. slidetitle on page 3) as detailed in comment S0.
S5: Some explanation is needed with assumptions. You may want to choose one assumption and give example to explain.
G7 : Bit-map index Writeup is much better than the slides. Use writeup to improve slides in areas of problem statement, assumptions etc. Please see me after the lecture or in office hours today to discuss your slides. Please review general comments on top of this page. Also consider reading the explanation of bit-map index in the textbook by Ullman to supplement the paper. Here are a few specific comments on your slides:
S0: Slide format needs improvement. Do not use bullet for Slidetitle (see page 2, 3, ...).
S2: Refer to diagrams from the textbook to explain key conepts like bit-map index.
S4: Compare bit-map index with something (e.g. B-tree index ) everyone is familiar with. Use a common set of keys and create two indices to help everyone understand the differences. Key concept slides are listing concepts and need additional explanation via examples, figures etc.
S6: Problem statement slide (page 2) is actually describing the key contributions. Bit-map index is a solution to the general problem we expect to see in this slide.
S5: Add a slide on assumptions and explain its significance.
G8 : ORDB Group got comments in office hours.