Temperature Aware Task Scheduling in MPSoCs:
Project Website

In deep submicron circuits, thermal hot spots and temperature variations have brought new challenges in reliability, performance, cooling costs and leakage power. Conventional thermal management sacrifices performance to control the thermal behavior by slowing down or stalling the processors when a critical temperature threshold is exceeded. Moreover, such techniques do not target minimizing the temporal and spatial variations in temperature, which impact system reliability adversely.

In our work, we explore temperature-aware task scheduling for multiprocessor systems-on-a-chip (MPSoC). We design and evaluate OS-level dynamic scheduling policies with negligible performance overhead. We show that, using simple-to-implement scheduling policies that make decisions based on temperature measurements, frequency of high-magnitude thermal cycles and spatial gradients can be decreased dramatically in comparison to state-of-the-art schedulers. OS-level temperature aware scheduling can also be combined with reactive methods such as dynamic thread migration in order to further decrease the hot spots and temperature variations at low performance cost.

Analysis of Temperature Induced Reliability Problems in MPSoCs:

The combination of increasing integration level and the rising power consumption leads to higher power densities in deep submicron and nanoscale SoCs. This trend results in large temperature offsets and hot spots on chip, constituting a significant design challenge for system reliability. Conventional power management policies reduce the system level power consumption and the overall temperature on chip, and thus are expected to contribute to improved reliability. However, high temperature differentials caused by power management may adversely affect system reliability, and create conflicting demands for system design.

The goal of my research is to introduce a simulation methodology to analyze reliability on SoCs, in order to accurately evaluate the effects of power management policies as well as workload scheduling, system topology and thermal packaging on multi-core SoC failure rates.

Fault Tolerant Architectures:

I am also interested in fault tolerant computer architectures. I have worked on developing an architecture for superscalar processor pipeline to provide high transient fault coverage while incurring minimal performance and hardware overhead.

 

 

 

 

 

 

E-mail: acoskun (at) cs.ucsd.edu
Fax: (858) 534-7029
Address: 9500 Gilman Drive
CSE Department
La Jolla, CA 92093-0404