New Directions in Adaptive Disk Reorganization
by
Dana Dahlstrom and Eric Wiewiora
The Goal: Minimize file access latency
Caches prevent many requests from accessing the disk.
Reorganization may help with the requests that get through.
Reorganize Data Based on What?
Make frequently used (hot) data easy to access.
Keep data often accessed together efficient to access together
Disk Latency Decomposed
Seek time: time to move the disk arm to the correct cylinder.
Rotational Delay: time it takes the relevant data to spin underneath the head.
Transfer Time: time it takes to read requested data when data is available.
Simple Orginization Schemes
Organ Pipe
Zone Placement
FFS Layout
Organ Pipe
Place hot data in the middle of the disk.
Reduces average and worst case seek distances to hot data.
Optimal seek minimizing strategy if disk access in independent.
Zone Placement
Modern disks have larger tracks on the outer portion of the disk.
Outer tracks have increased bandwidth;
More data spins under the read head per time unit.
Place hot blocks on outer tracks in order to minimize transfer time.
FFS
Minimize seeks by paying attention to dependency.
Keep inodes near their data blocks.
Attempt to minimize file fragmentation.
Keep all data for a directory together.
New Trends in Disks
Capacity is increasing much faster than latency.
Prefetch cache: a small cache that reads next sectors after last request
Zones
New Request Scheduling Policies
Question: Where is the best place on disk?
Pipe organ has good worst-case seek times.
But, New scheduling strategies visit the middle of the disk less.
The outer cylinders have good transfer rates.
But, the outer cylinders are remote.
Are there better uses for the outer cylinders, such as large files?
Question: Has dependency become as important as frequency?
Large file caches already improve access based on frequency.
The prefetch cache provides more incentive to group dependencies.
But, keeping dependency data is expensive.
Arranging for dependency comes at the expense of arranging for frequency.
Using Dependence Data in Reorganization
Good layouts will work with the prefetch cache.
Keep distance between dependent data small, so cache has time to read it.
If A references B, make sure A comes before B on disk.
NP hard problem, but we propose a heuristic.
Data Replication to Enhance Dependence Locality
Some highly referenced files may be impossible to place within all referer's prefetch area.
Could introducing replicants of the file help?
Trades space for performance, which sounds like a good idea.
Added write overhead if multiple copies must be updated.
Restrict to stable, read-only files?