CSE134A LECTURE NOTES

October 9, 2002
 
 

ANNOUNCEMENTS

The deadline for the second project is Friday October 25, at the start of section.  I'm handing out the second project description now.

Some issues are very important in all 134A projects: good user interface design, good software design, teamwork, and a well-written report.  I'll be talking more later about the others.  About the report:

You should start early, and you are welcome to bring your outline or your draft to a TA office hour to discuss it and get ideas for improving it. The report should be detailed enough to be interesting, but it should also be six pages single-spaced at most. You should use the report-writing skills that you learned in your general education and lab science classes.  These skills are important for graduate school, for software jobs with top companies, and for project leader and management jobs in all organizations.
 
 

SUMMARIES IN "SELECT" QUERIES

This example reports how many messages are in the database from each of the seven days of the week:
select dayofweek(date) as d, count(*) from messages group by d;
Note that dayofweek() is a builtin function.  Other builtin group-based functions include min, max, sum, and avg.
 
 

JOIN QUERIES

Suppose that we have information about messages in two different tables.  We need to use both to decide which messages to display.  For example:
select messages.*, address.zip from messages, address
   where messages.author = address.author
This type of query is called a "join" query.

Notice that the join is specified in the where clause, just like a logical condition on values in a single record.
 
 

INDEXES

Indexes are additional data structures that the db server can use to execute queries faster.  As a db user you need to know which indexes exist, but not how they are implemented exactly.

A simple index is on exactly one column of a table.  Fundamentally, it allows the table to be scanned efficiently with records sorted according to the value in that column.

To see why indexes are necessary, suppose that you want to scan messages sometimes by author in alphabetical order, and sometimes by date in time order.  You could do either of these if the table was stored in the right sequence, but not both.

The general way to create an index is with the alter table command, whose syntax is rather verbose.  The column you want the index on is specified inside parentheses.  For example:

alter table messages add index dx (date)
Now queries with order by date or group by date can be efficient.
 
 

INDEXES AND JOIN QUERIES

Consider the query
select c1, c2 from t1, t2 where c1=c2
With an index on tablet2 this query can be executed by one scan of table t1.

Doing direct accesses to each row of a table is still much slower than scanning the table sequentially.  Reason: for each row, one block must be read from disk.  When scanning, many rows are processed per read of a block.
 
 

EFFICIENCY WITH AND WITHOUT INDEXES

We discussed the big-O time efficiency (linear, n log n, quadratic) of select queries with and without indexes in detail.  Be sure to read carefully the relevant sections of the MySQL book, especially in the chapter entitled Query Optimization.

For most queries, the index is used to guide a read from the table.  For some queries, it is possible to use data from the index directly, so accessing the table is not necessary at all.

We also discussed the definition of scalability from the point of view of a user: constant or near-constant response time despite increases in amount of usage, e.g. database size, number of users, number of simultaneous users.

Every index causes additional slowdown for inserts and updates, and uses space proportional to the number of rows in the table.  So, only create an index if it is necessary.
 
 



Copyright (c) by Charles Elkan, 2002.