welcome home
I recently received my Ph.D. in computer science from UC San Diego where I worked in high performance computing (HPC) under Dr. Allan Snavely.
My area of specialization was performance modeling of large-scale systems and in particular, the memory subsystem. Have a look through the projects and publications on this site to learn more.
Chameleon is an automated framework that addresses three of the classic problems in memory behavior analysis:
(1) Characterizing memory reference locality in applications
(2) Generating accurate synthetic address traces
(3) Creating benchmark proxies for applications
|
Publications
|
The Chameleon Framework: Practical Solutions for Memory Behavior Analysis
J. Weinberg. Ph.D. Dissertation, University of California, San Diego, 2008.
|
Accurate Memory Signatures and Synthetic Address Traces for HPC Applications.
J. Weinberg, A. Snavely. In The 22nd ACM International Conference on Supercomputing (ICS08), Island of Kos, Greece, June 7-12, 2008.
|
Chameleon: A Framework for Observing, Understanding, and Imitating Memory Behavior
J. Weinberg, A. Snavely. In PARA'08: Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May 13-16, 2008.
|
|
The Chameleon Framework: Practical Solutions for Memory Behavior Analysis
PDF
|
This dissertation presents the Chameleon framework, an integrated solution to three classic problems in the field of memory performance analysis: reference locality modeling, accurate synthetic address trace generation, and the creation of synthetic benchmark proxies for applications. The framework includes software tools to capture a concise, machine-independent memory signature from any application and produce synthetic memory address traces that mimic that signature. It also includes the Chameleon benchmark, a fully tunable synthetic executable whose memory behavior can be dictated by these signatures.
By simultaneously modeling both spatial and temporal locality, Chameleon produces uniquely accurate, general-purpose synthetic traces. Results demonstrate that the cache hit rates generated by each synthetic trace are nearly identical to those of the application it targets on dozens of memory hierarchies representing many of today's commercial offerings.
|
Accurate Memory Signatures and Synthetic Address Traces for HPC Applications
PDF
|
The Chameleon framework is a software suite that includes
tools to capture a concise, machine-independent memory
signature from any application and produce synthetic memory
address traces that mimic that signature.
In this work, we apply the framework to high-performance computing
(HPC) by leveraging sampling techniques to capture the
memory signatures of full-scale, parallel applications with
only a 5x slowdown. The overall result is therefore a con-
cise, observable, and machine-independent representation of
the memory requirements of full-scale applications that can
be tractably captured and accurately mimicked.
@INPROCEEDINGS{weinberg08accurate,
author = {J. Weinberg and A. Snavely},
title = {Accurate Memory Signatures and Synthetic Address Traces for HPC Applications},
booktitle = {The 22nd ACM International Conference on Supercomputing (ICS08)},
year = {2008},
address = {Kos, Greece},
month = {June}
}
|
Chameleon: A framework for observing, understanding, and imitating memory behavior
PDF
|
In this work, we present an integrated solution to three classic
problems in the field of performance analysis: memory modeling,
synthetic address trace generation, and the creation of synthetic
benchmark proxies for applications.
First, we describe an intuitive characterization of memory access
locality that can accurately predict an application's hit rates
on arbitrary cache con gurations, even when block sizes and cache
depths change. We then describe the implementation of a memory
tracer that can extract this characterization from applications
and a software tool that can generate synthetic address traces to
match. Lastly, we describe Chameleon, a fully tunable synthetic
benchmark whose memory behavior can be dictated by the traces
described above.
We show that applications and their Chameleon counterparts display
highly similar memory behavior as measured by simulated and observed
cache hit rates. Errors are normally within 2%.
@INPROCEEDINGS{weinberg08chameleon,
author = {J. Weinberg and A. Snavely},
title = {Chameleon: A framework for observing, understanding, and imitating memory behavior},
booktitle = {PARA08: Workshop on State-of-the-Art in Scientific and Parallel Computing},
year = {2008},
address = {Trondheim, Norway},
month = {May}
}
Symbiotic space-sharing is a scheduling technique that improves throughput on SMP systems by executing parallel applications in combinations and configurations that alleviate pressure on shared resources.
|
Publications
|
User-Guided Symbiotic Space-Sharing of Real Workloads
J. Weinberg, A. Snavely. In The 20th ACM International Conference on Supercomputing (ICS06), Cairns, Australia, June 28-July 1, 2006.
|
Symbiotic Space-Sharing on SDSC's DataStar System
J. Weinberg, A. Snavely. In The 12th Workshop on Job Scheduling Strategies for Parallel Processing, Saint-Malo, France, June 27, 2006 (LNCS 4376, pp.192-209, 2007).
|
When Jobs Play Nice: The Case For Symbiotic Space-Sharing
J. Weinberg, A. Snavely. In Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing (HPDC 15), Paris, France, June 19-23, 2006.
|
|
User-Guided Symbiotic Space-Sharing of Real Workloads
[PDF]
|
Symbiotic space-sharing is a technique that can improve system
throughput by executing parallel applications in combinations and
configurations that alleviate pressure on shared resources. We
have shown prototype schedulers that leverage such techniques to
improve throughput by 20% over conventional space-sharing
schedulers when resource bottlenecks are known. Such evaluations
have utilized benchmark workloads and proposed that schedulers be
informed of resource bottlenecks by users at job submission time;
in this work, we investigate the accuracy with which users can
actually identify resource bottlenecks in real applications and
the implications of these predictions for symbiotic space-sharing
of production workloads. Using a large HPC platform, a
representative application workload, and a sampling of expert
users, we show that user inputs are of value and that for our
chosen workload, user-guided symbiotic scheduling can improve
throughput over conventional space-sharing by 15-22%.
@INPROCEEDINGS{weinberg06user-guided,
author = {J. Weinberg and A. Snavely},
title = {User-Guided Symbiotic Space-Sharing of Real Workloads},
booktitle = {The 20th {ACM} International Conference on Supercomputing (ICS'06)},
year = {2006},
month = {June}
}
|
Symbiotic Space-Sharing on SDSC's Datastar System
[PDF]
|
Using a large HPC platform, we investigate the effectiveness of
"symbiotic space-sharing", a technique that improves system
throughput by executing parallel applications in combinations and
configurations that alleviate pressure on shared resources. We
demonstrate that relevant benchmarks commonly suffer a 10-60%
penalty in runtime efficiency due to memory resource bottlenecks
and up to several orders of magnitude for I/O. We show that this
penalty can be often mitigated, and sometimes virtually
eliminated, by symbiotic space-sharing techniques and deploy a
prototype scheduler that leverages these findings to improve
system throughput by 20%.
@INPROCEEDINGS{weinberg06symbiotic,
author = {J. Weinberg and A. Snavely},
title = {Symbiotic Space-Sharing on SDSC's Datastar System},
booktitle = {The 12th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP '06)},
year = {2006},
address = {St. Malo, France},
month = {June}
}
|
When Jobs Play Nice: The Case For Symbiotic Space-Sharing
[PDF]
|
Using a large HPC platform, we investigate the effectiveness of
"symbiotic space-sharing", a technique that improves system
throughput by executing parallel applications in combinations and
configurations that alleviate pressure on shared resources. We
demonstrate that relevant benchmarks commonly suffer a 10-60%
penalty in runtime efficiency due to memory resource bottlenecks
and up to several orders of magnitude for I/O. We show that this
penalty can be often mitigated, and sometimes virtually
eliminated, by symbiotic space-sharing techniques and deploy a
prototype scheduler that leverages these findings to improve
system throughput by 20%.
@INPROCEEDINGS{weinberg06symbiosisHPDC,
author = {J. Weinberg and A. Snavely},
title = {When Jobs Play Nice: The Case For Symbiotic Space-Sharing},
booktitle = {Proceedings of the 15th {IEEE} {I}nternational {S}ymposium on {H}igh
{P}erformance {D}istributed {C}omputing ({HPDC}-15 '06)},
year = {2006},
address = {Paris, France},
month = {June}
}
|
2006
|
Job Scheduling on Parallel Systems
J. Weinberg. Ph.D. Research Examination, University of California, San Diego, June 2006.
|
|
2005
|
Quantifying Locality In The Memory Access Patterns of HPC Applications
J. Weinberg, M. O. McCracken, A. Snavely, E. Strohmaier. In Supercomputing 2005, Seattle, WA, November 12-16, 2005.
|
Datagridflows: Managing Long-Run Processes on Datagrids
A. Jagatheesan, J. Weinberg, et al. In Lecture Notes in Computer Science-3836 Springer 2005, ISBN 3-540-31212-9 &
VLDB Workshop on Data Management in Grids, Trondheim, Norway, September 2-3, 2005.
|
Quantifying Locality In The Memory Access Patterns of HPC Applications
J. Weinberg. Masters Thesis, University of California, San Diego, August, 26, 2005.
|
|
2004
|
Gridflow Description, Query, and Execution at SCEC using the SDSC Matrix
J. Weinberg, A. Jagatheesan, A. Ding, M. Faerman, Y. Hu. Proceedings of the 13th IEEE International Symposium on High-Performance Distributed Computing (HPDC 13), Honolulu, Hawaii, June 4-6, 2004.
|
|
Job Scheduling on Parallel Systems
[PDF]
|
Parallel systems such as supercomputers are valuable resources
which are each commonly shared among a community of users. The
problem of job scheduling is to determine how that sharing should
be done in order to maximize the system's utility. This problem
has been extensively studied for well over a decade, yielding a
great breadth of knowledge and techniques. In this work, we survey
the ideas and approaches that have proven most influential to how
jobs are scheduled on today's large-scale parallel systems. With
this background in mind, we discuss how deployed scheduling
policies can be improved to meet existing requirements and how
trends in parallel processing are currently altering those
requirements.
|
Quantifying Locality In The Memory Access Patterns of HPC Applications
[PDF]
|
Several benchmarks for measuring the memory performance of HPC
systems along dimensions of spatial and temporal memory locality
have recently been proposed. However, little is understood about the
relationships of these benchmarks to real applications and to each
other. We propose a methodology for producing architecture-neutral
characterizations of the spatial and temporal locality exhibited by
the memory access patterns of applications. We demonstrate that the
results track intuitive notions of locality on several synthetic and
application benchmarks. We employ the methodology to analyze the
memory performance components of the HPC Challenge Benchmarks, the
Apex-MAP benchmark, and their relationships to each other and other
benchmarks and applications.
@inproceedings{weinberg05quantifying,
author = {Jonathan Weinberg and Michael O. McCracken and Erich Strohmaier and Allan Snavely},
title = {Quantifying Locality In The Memory Access Patterns of HPC Applications},
booktitle = {SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing},
year = {2005},
isbn = {1-59593-061-2},
doi = {http://dx.doi.org/10.1109/SC.2005.59},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA}}
|
Datagridflows: Managing Long-Run Processes on Datagrids
[PDF]
|
Data grids have become important for managing large, unstructured data and storage re-
sources distributed over autonomous administrative domains. The datagrids that
are operating in production provide us an idea of new requirements and chal-
lenges that will be faced in future datagrid environments. One such requirement is
the coordinated execution of long-run data management processes in datagrids.
This paper is intended to introduce the challenges of datagrid environments to other research-
ers, including those new to grid computing. We provide motivation through dis-
cussion of datagridflow requirements and real production scenarios. We intro-
duce current work on datagridflow technologies including the Datagrid Language
(DGL) for describing datagridflows in datagrids.
@inproceedings {Jagatheesan05Datagridflows
'authors' = 'Arun Jagatheesan and Jonathan Weinberg and Reena Mathew
and Allen Ding and Erik Vandekieft and Daniel Moore and Reagan W. Moore
and Lucas Gilbert and Mark Tran and Jeffrey Kuramoto',
'booktitle' = 'DMG',
'ee' = 'http://dx.doi.org/10.1007/11611950_10',
'key' = 'conf/dmg/JagatheesanWMDVMMGTK05',
'pages' = '113-128',
'title' = 'Datagridflows: Managing Long-Run Processes on Datagrids',
'year' = '2005'}
|
Quantifying Locality In The Memory Access Patterns of HPC Applications
[PDF]
|
Several benchmarks for measuring the memory performance of HPC
systems along dimensions of spatial and temporal memory locality
have recently been proposed. However, little is understood about the
relationships of these benchmarks to real applications and to each
other. We propose a methodology for producing architecture-neutral
characterizations of the spatial and temporal locality exhibited by
the memory access patterns of applications. We demonstrate that our
results track intuitive notions of locality on several synthetic and
application benchmarks. We employ the methodology to analyze the
memory performance components of the HPC Challenge Benchmarks, the
Apex-MAP benchmark, and their relationships to each other and other
benchmarks and applications. We show that our methodology can be
applied to scoring real large-scale parallel applications and that
this analysis can be used to both increase understanding of the
benchmarks and enhance their usefulness by mapping them, along with
applications, to a 2-D space along axes of spatial and temporal
locality.
|
Gridflow Description, Query, and Execution at SCEC using the SDSC Matrix
[PDF]
|
While conventional workflow systems have been
around for many years, the deployment of analogous
systems onto a grid infrastructure introduces a number
of unique questions and challenges. Innovative
approaches to grid workflow (gridflow) are needed to
leverage the heterogeneity, autonomy, dynamic
behavior, and wide-area distribution that characterize
grid resources. The Matrix Project carries out
research and development to deliver the language
descriptions and protocols necessary to build
collaborative gridflow management systems for the
emerging grid infrastructures. We describe here our
activities to date including development of the Data
Grid Language (DGL) and the usage of the Matrix
gridflow management system by the Southern
California Earthquake Center (SCEC) to manage its
gridflows.