CSE134A LECTURE NOTES

October 14, 2002
 
 

ANNOUNCEMENTS

The deadline for the second project is Friday October 25, at the start of section.  Start immediately so you can get the bonus for early submission!

As I said before, some issues are very important in all 134A projects: good user interface design, good software design, teamwork, and a well-written report.  I'm handing out the team self-evaluation form now.  With the report for each project, each team must hand in one copy of this form.  The team must meet and fill in the form together.  All the numerical scores on the form must be realistic and truthful.  No one is perfect at every aspect of a project.

Teamwork skills are important for graduate school, for software jobs with top companies, and for project leader and management jobs in all organizations.  Recently I heard an engineer saying that the biggest difference between employment and school is that internal politics is so important when you have a job.  You may have to get along with and work with a boss or colleagues that you don't like for several years.  Working in teams for this class lets you learn from your successes and failures now rather than on the job.

About the project itself, here are some design principles to follow:

For this project, no single feature is mandatory, except those described under security and spell-checking in the project handout.  A very common mistake is to provide too much functionality, not implemented well enough.  It's better to implement a few features well than many badly.
 
 

MORE COMPLEX INDEXES

An index can be on two or more columns, for example
alter table messages add index dax (date,author)
This would make order by date, author efficient but not order by author, date

An index on multiple columns, e.g. (state, city, zip), is also an index on its prefixes, i.e. (state, city) and (state).

The keyword unique before the keyword index adds the constraint that every tuple must have a different value for the column(s) named.  You might make the index on (date,author) be unique if dates include times down to the second, but you wouldn't want the index on just date to be unique.

A special alternative to unique is primary key.  Each table can have many different unique indexes, but only one primary key.  Often the table is actually stored sorted by its primary key.
 
 

GUIDELINES FOR INDEXING

Indexes are only useful for <, <=, =, >=, > comparisons on columns of the same type in where clauses, such that the indexed column is not part of any larger expression.  For example, an index on c2 is not useful for the query
select c1, c2 from t1, t2 where c1 = c2*2
When indexing a text column, make the index be on a string prefix only.  For example to make an index on the first ten characters only of an author's name, use the command alter table messages add index ax (author(10))

Indexes are less useful for values that are repeated, most useful for unique values.

Always do performance experiments when you are not 100% sure about the value of an index.  Note that the same query can be much more than ten times slower when the database is ten times larger.  This can be true even if the smaller database is actually the larger one narrowed done with a where clause.

The same query can be ten times faster when you run it a second time, because the tables have been loaded into the server's in-memory cache.

It is possible to use data from the index directly, instead of using the index to guide a read from the table.  Consider the  query

select author, zip from messages, address where messages.author = address.name
Now we are taking the  zip value from the table address, for each author in messages.   Suppose we have an index on (name,zip) in the address table.  Then the additional values can be taken straight from the index, so the table itself is never accessed at all.  This can be a make-or-break optimization.
 
 



Copyright (c) by Charles Elkan, 2002.