Parallelizing LZW Compression

Proposal

The purpose of this project is to parallelize compression algorithms and to investigate the factors that contribute to successful parallel implementations. Through modifying the parameters to an LZW-like implementation, we will determine how to perform compression and decompression in a parallel environment. In particular, we will look at different methods of building and sharing the encoding tree, at adjustments to the algorithms to provide backwards compatibility with the current implementations, and at ways to optimize communications between processors working on compressing or decompressing data.

This project will be successful if can implement a version of the gzip algorithm which achieves speedup over a serial version. It should also be the case that we are able to determine the best way to share common data structures, and that we can illustrate this with data collected from test runs. Alternatively, if there is no such parallel implementation, we should be able to convincingly argue why such an algorithm exists or is computationally infeasible.

The proposal can be downloaded as a PDF: project-proposal.pdf

Milestone 1 has been reached. See the updated design and status report as a PDF: milestone1.pdf

Milestone 2 has been reached. See the updated design and status report as a PDF: milestone2.pdf

The project paper has been finished. Download it as a PDF: final-paper.pdf

Resources

The gzip home page: http://www.gzip.org/

The gzip sources: gzip-1.2.4.tar

The ZLIB Specification, RFC #1950: RFC #1950

The DEFLATE Specification, RFC #1951: RFC #1951

The GZIP Specification, RFC #1952: RFC #1952

The GZIP file format: http://www.gzip.org/format.txt

The GZIP algorithm: http://www.gzip.org/deflate.html

The comp.compression newsgroup FAQ: comp.compression FAQ

The comp.compression newsgroup via Google: comp.compression

The SGI Origin Hardware information: NCSA SGI Origin2000

P. Franaszek, J. Robinson, J.Thomas. Parallel compression with cooperative dictionary construction. Data Compression Conference (DCC '96).: PDF

J Lee, M. Winslett, X. Ma, and S.Yu. Enhancing Data Migration Performance via Parallel Data Compression. Proceedings of the International Parallel and Distributed Processing Symposium, 2002.: Postscript

Lynn M. Stauffer and Daniel S. Hirschberg. Parallel Text Compression. GZipped Postscript

Announcements

Thursday, October 24, 2002

The project proposal has been posted.

Tuesday, October 29, 2002

The resources section has been updated.

Wednesday, October 30, 2002

Expanded resources section to include the comp.compression and Origin2000 information.

Tuesday, November 5, 2002

Milestone 1 has been posted. It is available as a PDF: milestone1.pdf

Monday, November 18, 2002

Milestone 2 has been posted. It is available as a PDF: milestone2.pdf

Friday, December 13, 2002

The final paper has been posted. It is available as a PDF: final-paper.pdf

Christopher L Tuttle
Last modified: Fri Dec 13 15:22:34 EST 2002