CSE134A LECTURE NOTES

October 16, 2002
 
 

ANNOUNCEMENTS

The deadline for the second project is Friday October 25, at the start of section.  Start immediately so you can get the bonus for early submission!
 
 

SEEING HOW A SELECT QUERY WILL BE EXECUTED

Use the explain SQL command to see how a query will be executed.  For example, here's how an index is used for a query involving a range restriction on the value of the date field:
mysql> explain select distinct ticker from messages where date>'2001-09-23';
+----------+-------+---------------+------+---------+------+-------+-------+
| table    | type  | possible_keys | key  | key_len | ref  | rows  | Extra |
+----------+-------+---------------+------+---------+------+-------+-------+
| messages | range | date          | date |    NULL | NULL | 11321 |       |
+----------+-------+---------------+------+---------+------+-------+-------+
Which index is used is shown by the key column.  Which value from another table will be looked up in this index is shown by the ref column.

Here's how an index is used for an equijoin:

mysql> explain select distinct ticker from messages, sorted where messages.id=sorted.k;
+----------+--------+---------------+---------+---------+----------+--------+-------------+
| table    | type   | possible_keys | key     | key_len | ref      | rows   | Extra       |
+----------+--------+---------------+---------+---------+----------+--------+-------------+
| sorted   | index  | k             | k       |       4 | NULL     | 124113 | Using index |
| messages | eq_ref | PRIMARY       | PRIMARY |       4 | sorted.k |      1 |             |
+----------+--------+---------------+---------+---------+----------+--------+-------------+
The product of rows from each table is an estimate of the number of combinations to be processed.  How the index is used is shown by the type column.  The least efficient value for type is ALL, then index, then range, ref, eq_ref.

If only index is shown under Extra, then the the actual table does not need to be accessed at all, which is the major efficiency bonus mentioned on Monday.
 
 

GOOD AND BAD DATABASE DESIGN

Typically you know which columns you want to store data in, e.g. first and last name, SSN, zip code, date of message, date of userid creation, etc.  But how should you group these into tables?

The most important guideline is to avoid redundancy.  Consider this schema:

messages: first name, last name, address, zip, title, body
To reduce redundancy, we should have two tables.  We still have redundancy in the names, so we should introduce unique ids for authors:
messages: aid, title, body
  person: aid, first name, last name, address, zip
Note that we will often want unique ids for many different entities, e.g. for authors and separately for messages.  That's why the new field is named aid and not just id.

Why is redundancy bad?  There are at least four reasons:

How can we prevent these problems?  Answer: By making constraints explicit, then enforcing them.
 
 

CONSTRAINTS

A constraint is a property that the data in a database should always satisfy.  For any database, we usually know many constraints.  For example: Ideally, we would be able to state all these constraints (also called meta-knowledge) in some formal language.  The database server would give an error message whenever an insert or delete or update operation would cause any constraint to be violated.

Different database servers can enforce more or fewer types of constraint.  For example, MySQL can enforce the first constraint by saying that the id field is not null in the messages table.  The fourth constraint can be enforced with a primary key declaration on the pair of fields (name,address).

The second constraint is accommodated by not declaring id to be unique  for the messages table.  The third constraint is accommodated by making the messages and person tables separate.  This constraint would be violated if we used one big table.

The last constraint is an example of a functional dependency. Unlike more sophisticated database systems, MySQL has no features to enforce functional dependencies.

Guideline: Before choosing a database design, write down what constraints you know should be true.  Then select a design that enforces these constraints.
 
 



Copyright (c) by Charles Elkan, 2002.