CSE 134A
Midterm rubric and comments
11/23/2002
Overall, the midterm scores averaged at 36.7 out of 50 maximum points, with
a standard deviation of 5.72 points. The midterms are available for return;
please see a TA about getting your midterm back. The grades are also available
on Gradesource.
Problem 1: regular expressions
Problem 1 was graded by Greg Hamerly. Please contact him with any questions
about this problem.
When writing regular expressions, it is very easy to make subtle mistakes,
therefore it is especially important to be careful. It is also important
to make your regular expression "as simple as possible, but no simpler" (Einstein).
Some students lost points for having overly complex regular expressions when
a much simpler one would suffice.
Part A: a correct answer is "^(\([0-9]{3}\) |[0-9]{3}( |-))?[0-9]{3}-[0-9]{4}$"
Common mistakes and point values:
- -1 point: using brackets "[]" when they should have used parentheses
"()". They are not equivalent in regular expressions!
- -1/2 point: not escaping the parentheses for the area code
- -1 point: using an area-code matching like this: "\(?[0-9]{3}\)?".
This won't work, because it allows 0 parentheses, either parentheses (which
is wrong), or both parentheses.
- -1 point: allowing the following types of phone number patterns: "(858)-111-2222".
According to the examples given and common practice in the USA, a hyphen
is not allowed between the area code and the prefix when using parentheses.
- More points were deducted for other errors.
Nitpicks:
- Don't use this: "{1}". It is a no-op; it doesn't do anything,
and it only clutters your regular expression.
- Rather than using "{0,1}", use the terser "?".
Other comments:
- Some people went beyond the call of duty to limit the numbers that
were matched, such as not accepting a leading zero when matching the area
code or prefix: "[1-9][0-9]{2}". This was good, but not necessary
for this simple example.
- The leading "^" and trailing "$" are not necessary, but may be a good
idea depending on the context of the pattern match.
Part B: a correct answer is "^\$[0-9]{1,3}(,[0-9]{3})*(\.[0-9]{2})?$"
Common mistakes and point values:
- -1/2 point: not escaping the leading dollar sign or the decimal point
- -1 point: not including the leading dollar sign
- -1 point: not matching commas properly
- -1 point: allowing a mismatch of digits and commas, such as "$1,234,5,34,234"
- -1 point: allowing the following match: "\$[0-9]+". This is
not correct, since the problem statement says the numbers should have commas
to separate the significant digits appropriately. Many students lost a point
here.
- -1 point: not matching the trailing decimal point and digits properly
- More points were deducted for other errors.
Other comments:
- Some students matched the period that was given at the end of every
example sentence. This is actually incorrect (since the example "$13" did
not have a period), but no points were taken off.
- No points were awarded or deducted for allowing negative numbers.
- Again, we were not concerned with validating whether a number could
have any number of leading zeros; so no points were awarded or deducted for
having a construct like "[1-9][0-9]{0,2}" at the beginning of the
number match.
Problem 2: database queries
Problem 2 was graded by Greg Chun. Please contact him with any questions
about this problem.
Part A:
- 1 point: Printing HTML tags
- 1 point: Connecting to DB
- 1 point: Selecting DB
- 2 points: Getting SQL query right with correct ORDER BY
- 3 points: Code for fetching and displaying results
- 1 point: Reserved for random "little" mistakes. Enough of them,
and this point was lost.
Part B:
- 1 point: Getting the correct column to be indexed.
- 2 points: Explain why this column should be indexed.
Comments: For some, their answer was wrong but relative to their response
in (a), they still got full credit. For example, if someone ordered
by FirstName, they lost points in part (a), but if they indicated that they
should index FirstName in (b), they could still get full credit.
Part C:
The answer we were looking for was that one should make sure that the server
is "warmed-up", so that performance of cached data is being evaluated.
Partial credit was given to other reasons such as server load, and network
anomalies.
Problem 3
Dana graded Problem 3; contact him with any questions.
Each of the five parts was scored roughly like this:
- 3/3 for a good, understandable response
- 2/3 for a response that seemed on the right track but muddled, or mostly correct except for a mildly spurious statement
- 1/3 for serious misunderstandings, or for minimal relevance to the issue at hand (perhaps due to misinterpreting the question)
- 0/3 for something totally wrong or irrelevant, or nothing at all
Here is some discussion of each part:
Part A:
As usage volume increases, Design A will scale better with respect to
network and CPU resources because it doesn't have to retrieve and parse
documents from other servers on every request. But as the amount of data
increases, Design B will scale better with respect to storage requirements.
Part B:
With Design A a new presentation medium need only cull information from the
database where it already resides in a structured form; the existing retrieval
engine can continue to operate as usual. With Design B, to reuse the retrieval
and extraction requires making this code modular and interfacing it to multiple
front-ends for different presentation media. This modularity is certainly
possible, but it would need to be considered at the outset in Design B, whereas
in Design A it comes with the territory.
Part C:
Design B relays users' requests to other servers, making them subject to
analysis by third parties at those servers or on the network in between. Design
A avoids this theoretical vulnerability, but is still subject to analysis of
incoming requests. Both designs are vulnerable to compromise in the security of
the Web server itself. Database server compromise is unlikely to yield users'
private information; the problem only mentions storing in the database content
retrieved from other Web sites.
Parts D and E:
Design A will give faster responses since local database queries generally
have less latency than remote requests, and since data extraction is done ahead
of time. Also, since the response will not depend upon remote servers, Design A
will be more consistent and reliable when those servers are down or under heavy
load. Furthermore, Design A could keep archived data even if it disappears from
the sources, and it could draw statistics from the database.
A response from Design B will have more up-to-date information since it is
retrieved from the source on the spot. It could also allow users, at request
time, to specify sources the designers didn't think of.
Problem 4
Professor Elkan graded Problem 4; contact him with any questions.
Following are reasonable answers:
Part A:
Each tier in a dynamic Web site is a component that communicates with the adjacent tiers, usually over a network. Typically, the three tiers are:
- Web browser
- Web server
- Database server
Part B:
Yes, this system has a three-tier architecture:
- Wireless gadget
- Web server (sending VoiceXML)
- Database (containing traffic information)
Part C:
Session IDs can be embedded in URLs.
Part D:
Yes, sessions make sense for VoiceXML applications. Sessions maintain state
on the server across multiple sequential requests by the same user. A VoiceXML
application could make multiple requests during a single conversation, and it
might be useful to keep state such as the user's identity, preferences, and
recent activity, on the server instead of passing it back and forth to the
client.