CSE 134A
Midterm rubric and comments
11/23/2002

Overall, the midterm scores averaged at 36.7 out of 50 maximum points, with a standard deviation of 5.72 points. The midterms are available for return; please see a TA about getting your midterm back. The grades are also available on Gradesource.

Problem 1: regular expressions

Problem 1 was graded by Greg Hamerly. Please contact him with any questions about this problem.

When writing regular expressions, it is very easy to make subtle mistakes, therefore it is especially important to be careful. It is also important to make your regular expression "as simple as possible, but no simpler" (Einstein). Some students lost points for having overly complex regular expressions when a much simpler one would suffice.

Part A: a correct answer is "^(\([0-9]{3}\) |[0-9]{3}( |-))?[0-9]{3}-[0-9]{4}$"

Common mistakes and point values:
Nitpicks:
Other comments:

Part B: a correct answer is "^\$[0-9]{1,3}(,[0-9]{3})*(\.[0-9]{2})?$"

Common mistakes and point values:
Other comments:

Problem 2: database queries 

Problem 2 was graded by Greg Chun. Please contact him with any questions about this problem.

Part A:
Part B:
Comments: For some, their answer was wrong but relative to their response in (a), they still got full credit.  For example, if someone ordered by FirstName, they lost points in part (a), but if they indicated that they should index FirstName in (b), they could still get full credit.

Part C:
The answer we were looking for was that one should make sure that the server is "warmed-up", so that performance of cached data is being evaluated.  Partial credit was given to other reasons such as server load, and network anomalies.

Problem 3

Dana graded Problem 3; contact him with any questions.

Each of the five parts was scored roughly like this:

Here is some discussion of each part:

Part A:

As usage volume increases, Design A will scale better with respect to network and CPU resources because it doesn't have to retrieve and parse documents from other servers on every request. But as the amount of data increases, Design B will scale better with respect to storage requirements.

Part B:

With Design A a new presentation medium need only cull information from the database where it already resides in a structured form; the existing retrieval engine can continue to operate as usual. With Design B, to reuse the retrieval and extraction requires making this code modular and interfacing it to multiple front-ends for different presentation media. This modularity is certainly possible, but it would need to be considered at the outset in Design B, whereas in Design A it comes with the territory.

Part C:

Design B relays users' requests to other servers, making them subject to analysis by third parties at those servers or on the network in between. Design A avoids this theoretical vulnerability, but is still subject to analysis of incoming requests. Both designs are vulnerable to compromise in the security of the Web server itself. Database server compromise is unlikely to yield users' private information; the problem only mentions storing in the database content retrieved from other Web sites.

Parts D and E:

Design A will give faster responses since local database queries generally have less latency than remote requests, and since data extraction is done ahead of time. Also, since the response will not depend upon remote servers, Design A will be more consistent and reliable when those servers are down or under heavy load. Furthermore, Design A could keep archived data even if it disappears from the sources, and it could draw statistics from the database.

A response from Design B will have more up-to-date information since it is retrieved from the source on the spot. It could also allow users, at request time, to specify sources the designers didn't think of.

Problem 4

Professor Elkan graded Problem 4; contact him with any questions.

Following are reasonable answers:

Part A:

Each tier in a dynamic Web site is a component that communicates with the adjacent tiers, usually over a network. Typically, the three tiers are:

  1. Web browser
  2. Web server
  3. Database server

Part B:

Yes, this system has a three-tier architecture:

  1. Wireless gadget
  2. Web server (sending VoiceXML)
  3. Database (containing traffic information)

Part C:

Session IDs can be embedded in URLs.

Part D:

Yes, sessions make sense for VoiceXML applications. Sessions maintain state on the server across multiple sequential requests by the same user. A VoiceXML application could make multiple requests during a single conversation, and it might be useful to keep state such as the user's identity, preferences, and recent activity, on the server instead of passing it back and forth to the client.