In the current implementation of SARA, the user selects the desired data server based on his/her intuition about which site is ``closest'', i.e. which site will transfer remote data the most quickly. Traffic on shared networks can substantially impact performance and can render a non-intuitive choice of data server to be the ``closest''. Our goal for the Simple SARA AppLeS was to develop a strategy for determining the ``closest'' data server based on dynamic information and prediction provided by the Network Weather Service, and to test our approach in ordinary shared, distributed environments.
To do so, we distributed files representative in size and structure of SARA data files to a variety of geographically distributed sites connected by a diverse set of networks. The machine used as the processing node was alicatado.ucsd.edu, a Pentium class x86 machine running Linux 2.0.34. The data servers we used included
We implemented two prototype modules: a data server and a processing server. Our experimental data server is virtually unchanged from the data server used in the actual SARA application.6 Our processing server is based on the actual SARA processing server module, but is modified to skip the data filtering and image encoding phases. The AppLeS agent is coded into the processing server.
The experiments consisted of trials, during which the AppLeS agent contacted the NWS to obtain bandwidth forecasts between each of the potential data servers and the processing node. The server with the highest bandwidth forecast is designated the selected server. We assessed the effectiveness of the AppLeS selection of a data server by performing the data transfer from all available data servers and comparing the resulting transfer times.7 We considered the AppLeS scheduler successful for a given trial if the selected server yielded the lowest transfer time for that trial.
Figure 3 shows a representative Simple SARA
experiment, performed during a normal workday. Each trace represents
a series of trials using a particular server, and the server selected
by the AppLeS agent is indicated (with a
symbol) for
each trial. We see that during this experiment, the AppLeS
scheduler selected the fastest server in 80% of the trials.
Furthermore, the AppLeS agent never does worse than an
``intuitive'' static scheduler which selects the server based on
geographical distance between the client PC and the remote data
server.
In the next subsection, we discuss the idea of ``network closeness'' and show some results which suggest that it can be used to construct efficient application schedules.