HPCA Test of Time Award 2020: Eligible Papers

HPCA 1998
HPCA 1999
HPCA 2000
HPCA 2001
HPCA 2002


HPCA 1998

Article Title Supporting highly speculative execution via adaptive branch trees
Authors Tien-Fu Chen
Article Title Virtual-physical registers
Authors A. Gonzalez; J. Gonzalez; M. Valero
Article Title Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors
Authors Ye Zhang; L. Rauchwerger; J. Torrellas
Article Title Enhancing memory use in Simple Coma: Multiplexed Simple Coma
Authors S. Basu; J. Torrellas
Article Title PRISM: an integrated architecture for scalable shared memory
Authors K. Ekanadham; Beng-Hong Lim; P. Pattnaik; M. Snir
Article Title Architectural implications of a family of irregular applications
Authors D. O’Hallaron; J. R. Shewchuk; T. Gross
Article Title The emergence of workstation clusters: Should we continue to build mpps? [panel session]
Authors D. K. Panda
Article Title Challenging applications on fast networks
Authors K. Langendoen; R. Hofman; H. Bal
Article Title Fine-grain software distributed shared memory on SMP clusters
Authors D. J. Scales; K. Gharachorloo; A. Aggarwal
Article Title A very efficient distributed deadlock detection mechanism for wormhole networks
Authors P. Lopez; J. M. Martinez; J. Duato
Article Title Home-based SVM protocols for SMP clusters: Design and performance
Authors R. Samanta; A. Bilas; L. Iftode; J. P. Singh
Article Title Credit-flow-controlled ATM for MP interconnection: The ATLAS I single-chip ATM switch
Authors M. Katevenis; D. Serpanos; E. Spyridakis
Article Title The impact of data transfer and buffering alternatives on network interface design
Authors S. S. Mukherjee; M. D. Hill
Article Title The effectiveness of SRAM network caches in clustered DSMs
Authors A. Moga; M. Dubois
Article Title Control speculation in multithreaded processors through dynamic loop detection
Authors J. Tubella; A. Gonzalez
Article Title The sensitivity of communication mechanisms to bandwidth and latency
Authors F. T. Chong; R. Barua; F. Dahlgren; J. D. Kubiatowicz; A. Agarwal
Article Title Partial sampling with reverse state reconstruction: A new technique for branch predictor performance estimation
Authors D. E. Vengroff; G. R. Gao
Article Title Using multicast and multithreading to reduce communication in software DSM systems
Authors E. Speight; J. K. Bennett
Article Title Speculative versioning cache
Authors S. Gopal; T. N. Vijaykumar; J. E. Smith; G. S. Sohi
Article Title The architectural costs of streaming I/O: A comparison of workstations, clusters, and SMPs
Authors R. H. Arpaci-Dusseau; A. C. Arpaci-Dusseau; D. E. Culler; J. M. Hellerstein; D. A. Patterson
Article Title Address translation mechanisms in network interfaces
Authors I. Schoinas; M. D. Hill
Article Title The potential for using thread-level data speculation to facilitate automatic parallelization
Authors J. G. Steffan; T. C. Mowry
Article Title Performance study of a concurrent multithreaded processor
Authors Jenn-Yuan Tsai; Zhenzhen Jiang; E. Ness; Pen-Chung Yew
Article Title FPGA based custom computing machines for irregular problems
Authors D. Abramson; P. Logothetis; A. Postula; M. Randall
Article Title Exploiting two-case delivery for fast protected messaging
Authors K. Mackenzie; J. Kubiatowicz; M. Frank; W. Lee; V. Lee; A. Agarwal; M. F. Kaashoek
Article Title Non-stalling counterflow architecture
Authors M. F. Miller; K. J. Janik; Shih-Lien Lu
Article Title Temporal-based procedure reordering for improved instruction cache performance
Authors J. Kalamationos; D. R. Kaeli
Article Title Performance evaluation of tiling for the register level
Authors M. Jimenez; J. M. Llaberia; A. Fernandez
Article Title Treegion scheduling for wide issue processors
Authors W. A. Havanki; S. Banerjia; T. M. Conte
Article Title Communication across fault-containment firewalls on the SGI origin
Authors K. Ghosh; A. J. Christie
Article Title Efficiently adapting to sharing patterns in software DSMs
Authors L. R. Monnerat; R. Bianchini
Article Title Comparative evaluation of latency tolerance techniques for software distributed shared memory
Authors T. C. Mowry; C. Q. C. Chan; A. K. W. Lo

HPCA 1999

Article Title A study of control independence in superscalar processors
Authors E. Rotenberg; Q. Jacobson; J. Smith
Article Title Impulse: building a smarter memory controller
Authors J. Carter; W. Hsieh; L. Stoller; M. Swanson; Lixin Zhang; E. Brunvand; A. Davis; Chen-Chi Kuo; R. Kuramkote; M. Parker; L. Schaelicke; T. Tateyama
Article Title Sensitivity of parallel applications to large differences in bandwidth and latency in two-layer interconnects
Authors A. Plaat; H. E. Bal; R. F. H. Hofman
Article Title Improving CC-NUMA performance using Instruction-based Prediction
Authors S. Kaxiras; J. R. Goodman
Article Title Distributed modulo scheduling
Authors M. M. Fernandes; J. Llosa; N. Topham
Article Title A performance comparison of homeless and home-based lazy release consistency protocols in software shared memory
Authors A. L. Cox; E. de Lara; C. Hu; W. Zwaenepoel
Article Title The synergy of multithreading and access/execute decoupling
Authors J. -M. Parcerisa; A. Gonzalez
Article Title Supporting fine-grained synchronization on a simultaneous multithreading processor
Authors D. M. Tullsen; J. L. Lo; S. J. Eggers; H. M. Levy
Article Title LAPSES: a recipe for high performance adaptive router design
Authors A. S. Vaidya; A. Sivasubramaniam; C. R. Das
Article Title Using Lamport clocks to reason about relaxed memory models
Authors A. E. Condon; M. D. Hill; M. Plakal; D. J. Sorin
Article Title Memory hierarchy considerations for fast transpose and bit-reversals
Authors K. S. Gatlin; L. Carter
Article Title The impact of link arbitration on switch performance
Authors M. Pirvu; L. Bhuyan; N. Ni
Article Title Limits to the performance of software shared memory: a layered approach
Authors A. Bilas; Dongming Jiang; Yuanyuan Zhou; J. P. Singh
Article Title Dynamically variable line-size cache exploiting high on-chip memory bandwidth of merged DRAM/logic LSIs
Authors K. Inoue; K. Kai; K. Murakami
Article Title Impact of buffer size on the efficiency of deadlock detection
Authors J. M. Martinez; P. Lopez; J. Duato
Article Title Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors
Authors R. Iyer; L. N. Bhuyan
Article Title Exploiting basic block value locality with block reuse
Authors Jian Huang; D. J. Lilja
Article Title Instruction pre-processing in trace processors
Authors Q. Jacobson; J. E. Smith
Article Title Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading
Authors S. Hily; A. Seznec
Article Title Comparative evaluation of fine- and coarse-grain approaches for software distributed shared memory
Authors S. Dwarkadas; K. Gharachorloo; L. Kontothanassis; D. J. Scales; M. L. Scott; R. Stets
Article Title WildFire: a scalable path for SMPs
Authors E. Hagersten; M. Koster
Article Title MP-LOCKs: replacing H/W synchronization primitives with message passing
Authors Chen-Chi Kuo; J. Carter; R. Kuramkote
Article Title Dynamically exploiting narrow width operands to improve processor power and performance
Authors D. Brooks; M. Martonosi
Article Title Access order and effective bandwidth for streams on a Direct Rambus memory
Authors S. I. Hong; S. A. McKee; M. H. Salinas; R. H. Klenke; J. H. Aylor; W. A. Wulf
Article Title RAPID-Cache-a reliable and inexpensive write cache for disk I/O systems
Authors Yiming Hu; Qing Yang; T. Nightingale
Article Title Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors
Authors M. Durbhakula; V. S. Pai; S. Adve
Article Title A scalable cache coherent scheme exploiting wormhole routing networks
Authors Yunseok Rhee; Joonwon Lee
Article Title Global context-based value prediction
Authors T. Nakra; R. Gupta; M. L. Soffa
Article Title Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols
Authors B. Falsafi; D. A. Wood
Article Title Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors
Authors Ye Zhang; L. Rauchwerger; J. Torrellas
Article Title Efficient all-to-all broadcast in all-port mesh and torus networks
Authors Yuanyuan Yang; Jianchao Wang
Article Title Instruction recycling on a multiple-path processor
Authors S. Wallace; D. M. Tullsen; B. Calder
Article Title Permutation development data layout (PDDL)
Authors T. J. E. Schwarz; J. Steinberg; W. A. Burkhard
Article Title Design and performance of directory caches for scalable shared memory multiprocessors
Authors M. M. Michael; A. K. Nanda
Article Title MMR: a high-performance MultiMedia Router-architecture and design trade-offs
Authors J. Duato; S. Yalamanchili; M. B. Caminero; D. Love; F. J. Quiles
Article Title Lightweight hardware distributed shared memory supported by generalized combining
Authors K. Tanaka; T. Matsumoto; K. Hiraki
Article Title Communication studies of single-threaded and multithreaded distributed-memory machines
Authors A. Sohn; Yunheung Paek; Jui-Yuan Ku; Y. Kodama; Y. Yamaguchi

HPCA 2000

Article Title A prefetching technique for irregular accesses to linked data structures
Authors M. Karlsson; F. Dahlgren; P. Stenstrom
Article Title Cache-efficient matrix transposition
Authors S. Chatterjee; S. Sen
Article Title On the performance of hand vs. automatically optimized numerical codes
Authors M. Jimenez; J. M. Llaberia; A. Fernandez
Article Title Evaluation of active disks for decision support databases
Authors M. Uysal; A. Acharya; J. Saltz
Article Title Improving the throughput of synchronization by insertion of delays
Authors R. Rajwar; A. Kagi; J. R. Goodman
Article Title The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing
Authors R. Stets; S. Dwarkadas; L. Kontothanassis; U. Rencuzogullari; M. L. Scott
Article Title Coherence communication prediction in shared-memory multiprocessors
Authors S. Kaxiras; C. Young
Article Title Register organization for media processing
Authors S. Rixner; W. J. Dally; B. Khailany; P. Mattson; U. J. Kapasi; J. D. Owens
Article Title Cache memory design for network processors
Authors Tzi-Cker Chiueh; P. Pradhan
Article Title Flit-reservation flow control
Authors Li-Shiuan Peh; W. J. Dally
Article Title Combining static and dynamic branch prediction to reduce destructive aliasing
Authors H. Patil; J. Emer
Article Title Trace cache redundancy: red and blue traces
Authors A. Ramirez; J. Ll. Larriba-Pey; M. Valero
Article Title Design of a parallel vector access unit for SDRAM memory systems
Authors B. K. Mathew; S. A. McKee; J. B. Carter; A. Davis
Article Title High-throughput coherence controllers
Authors A. K. Nanda; A. -T. Nguyen; M. M. Michael; D. J. Joseph
Article Title Reducing code size with run-time decompression
Authors C. Lefurgy; E. Piccininni; T. Mudge
Article Title Investigating the performance of two programming models for clusters of SMP PCs
Authors F. Cappello; O. Richard; D. Etiemble
Article Title PowerMANNA: a parallel architecture based on the PowerPC MPC620
Authors P. M. Behr; S. Pletner; A. C. Sodan
Article Title Architectural issues in Java runtime systems
Authors R. Radhakrishnan; N. Vijaykrishnan; L. K. John; A. Sivasubramaniam
Article Title Performance evaluation of dynamic reconfiguration in high-speed local area networks
Authors R. Casado; A. Bermudez; F. J. Quiles; J. L. Sanchez; J. Duato
Article Title Modified LRU policies for improving second-level cache behavior
Authors W. A. Wong; J. -L. Baer
Article Title Decoupled value prediction on trace processors
Authors Sang-Jeong Lee; Yuan Wang; Pen-Chung Yew
Article Title Performance analysis and visualization of parallel systems using SimOS and Rivet: a case study
Authors R. Bosch; C. Stolte; G. Stoll; M. Rosenblum; P. Hanrahan
Article Title A DSM architecture for a parallel computer Cenju-4
Authors T. Hosomi; Y. Kanoh; M. Nakamura; T. Hirose
Article Title The best distribution for a parallel OpenGL 3D engine with texture caches
Authors A. Vartanian; J. -L. Bechennec; N. Drach-Temam
Article Title Investigating QoS support for traffic mixes with the MediaWorm router
Authors Ki Hwan Yum; A. Vaidya; C. R. Das; A. Sivasubramaniam
Article Title eXtended block cache
Authors S. Jourdan; L. Rappoport; Y. Almog; M. Erez; A. Yoaz; R. Ronen
Article Title Branch transition rate: a new metric for improved branch classification analysis
Authors M. Haungs; P. Sallee; M. Farrens
Article Title Impact of chip-level integration on performance of OLTP workloads
Authors L. A. Barroso; K. Gharachorloo; A. Nowatzyk; B. Verghese
Article Title Memory dependence speculation tradeoffs in centralized, continuous-window superscalar processors
Authors A. Moshovos; G. S. Sohi
Article Title Quantifying the SMT layout overhead-does SMT pull its weight?
Authors J. Burns; J. -L. Gaudiot
Article Title Toward a cost-effective DSM organization that exploits processor-memory integration
Authors J. Torrellas; Liuxi Yang; A. -T. Nguyen
Article Title A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks
Authors H. Neefs; H. Vandierendonck; K. De Bosschere
Article Title Software-controlled multithreading using informing memory operations
Authors T. C. Mowry; S. R. Ramkissoon
Article Title Impact of heterogeneity on DSM performance
Authors R. J. O. Figueiredo; J. A. B. Fortes
Article Title Dynamic cluster assignment mechanisms
Authors R. Canal; J. M. Parcerisa; A. Gonzalez

HPCA 2001

Article Title Stack Value File: Custom Microarchitecture for the Stack
Authors H.-H. S. Lee; M. Smelyanskiy; C. J. Newburn; G. S. Tyson
Article Title Register Renaming and Scheduling for Dynamic Execution of Predicated Code
Authors P. H. Wang; H. Wang; R. M. Kling; K. Ramakrishnan; J. P. Shen
Article Title Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order
Authors P. Michaud; A. Seznec
Article Title Speculative Data-Driven Multithreading
Authors A. Roth; G. S. Sohi
Article Title Towards Virtually-Addressed Memory Hierarchies
Authors X. Qiu; M. Dubois
Article Title Reevaluating Online Superpage Promotion with Hardware Support
Authors Z. Fang; L. Zhang; J. B. Carter; W.C. Hsieh; S. A. McKee
Article Title Performance of Hardware Compressed Main Memory
Authors B. Abali; H. Franke; X. Shen; D. E. Poff; T. B. Smith
Article Title JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
Authors A. Moshovos; G. Memik; B. Falsafi; A. Choudhary
Article Title A New Scalable Directory Architecture for Large-scale Multiprocessors
Authors M. E. Acacio; J. Gonzalez; J. M. Garcia; J. Duato
Article Title Self-Tuned Congestion Control for Multiprocessor Networks
Authors M. Thottethodi; A. R. Lebeck; S. S. Mukherjee
Article Title Automatically Mapping Code on an Intelligent Memory Architecture
Authors J. Lee; Y. Solihin; J. Torrellas
Article Title An Integrated Circuit/Architecture Approach to Reducing Leakage
in Deep-Submicron High-Performance I-Caches
Authors S.-H. Yang; M. D. Powell; B. Falsafi; K. Roy; T. N. Vijaykumar
Article Title DRAM Energy Management Using Software and Hardware Directed
Power Mode Control
Authors V. Delaluz; M. Kandemir; N. Vijaykrishnan; A. Sivasubramaniam;
M. J. Irwin
Article Title Dynamic Thermal Management for High-Performance Microprocessors
Authors D. Brooks; M. Martonosi
Article Title Dynamic Prediction of Critical Path Instructions
Authors E. Tune; D. Liang; D. M. Tullsen; B. Calder
Article Title Dynamic Branch Prediction with Perceptrons
Authors D.A. Jimenez; C. Lin
Article Title Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency
Authors B. Goeman; H. Vandierendonck; K. De Bosschere
Article Title DLP +TLP Processors for the Next Generation of Media Workloads
Authors J. Corbal; R. Espasa; M. Valero
Article Title An Architectural Evaluation of Java TPC-W
Authors H. W. Cain; R. Rajwar; M. Marden; M. H. Lipasti
Article Title A Programmable Co-Processor for Profiling
Authors C. B. Zilles; G. S. Sohi
Article Title A Delay Model and Speculative Architecture for Pipelined Routers
Authors L. Peh; W.J. Dally
Article Title Quantifying the Impact of Architectural Scaling on Communication
Authors T. Heath; S. Kaw; R. P. Martin; T.D. Nguyen
Article Title Call Graph Prefetching for Database Applications
Authors M. Annavaram; J. M. Patel; E. S. Davidson
Article Title Branch History Guided Instruction Prefetching
Authors V. Srinivasan; E. S. Davidson; G. S. Tyson; M. J. Charney; T. R. Puzak
Article Title Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
Authors W. Lin; S. K. Reinhardt; D. Burger

HPCA 2002

Article Title Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management
Authors Kevin Skadron; Tarek Abdelzaher; Mircea R. Stan
Article Title Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling
Authors Greg Semeraro; Grigorios Magklis; Rajeev Balasubramonian; David H. Albonesi; Sandhya Dwarkadas; Michael L. Scott
Article Title A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
Authors G.Edward Suh; Srinivas Devadas; Larry Rudolph
Article Title Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach
Authors Sudhanva Gurumurthi; Anand Sivasubramaniam; Mary Jane Irwin; N. Vijaykrishnan; Mahmut Kandemir; Tao Li; Lizy Kurian John
Article Title Loose Loops Sink Chips
Authors Eric Borch; Srilatha Manne; Joel Emer; Eric Tune
Article Title Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay
Authors Se-Hyun Yang; Babak Falsafi; Michael D. Powell; T. N. Vijaykumar
Article Title Power Issues Related to Branch Prediction
Authors Dharmesh Parikh; Kevin Skadron; Yan Zhang; Marco Barcella; Mircea R. Stan
Article Title Improving Value Communication for Thread-Level Speculation
Authors J. Gregory Steffan; Christopher B. Colohan; Antonia Zhai; Todd C. Mowry
Article Title Thread-Spawning Schemes for Speculative Multithreading
Authors Pedro Marcuello; Antonio González
Article Title Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors
Authors Marcelo Cintra; Josep Torrellas
Article Title Microarchitectural Simulation and Control of di/dt-induced Power Supply Voltage Variation
Authors Ed Grochowski; Dave Ayers; Vivek Tiwari
Article Title Let’s Study Whole-Program Cache Behaviour Analytically
Authors Xavier Vera; Jingling Xue
Article Title Bandwidth Adaptive Snooping
Authors Milo M. K. Martin; Daniel J. Sorin; Mark D. Hill; David A. Wood
Article Title Tuning Garbage Collection in an Embedded Java Environment
Authors G. Chen; R. Shetty; M. Kandemir; N. Vijaykrishnan; M.J. Irwin; M. Wolczko
Article Title Evaluation of a Multithreaded Architecture for Cellular Computing
Authors Calin Cascaval; Jose G. Castanos; Luis Ceze; Monty Denneau; Manish Gupta; Derek Lieber; Jose E. Moreira; Karin Strauss; Henry S. Warren Jr
Article Title Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs. Speculative Precomputation
Authors Perry H. Wang; Hong Wang; Jamison D. Collins; Ed Grochowski; Ralph M. Kling; John P. Shen
Article Title User-Level Communication in Cluster-Based Server
Authors Enrique V. Carrera; Srinath Rao; Liviu Iftode; Ricardo Bianchini
Article Title The Minimax Cache: An Energy-Efficient Framework for Media Processor
Authors Osman S. Unsal; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
Article Title Fine-grain Priority Scheduling on Multi-channel Memory Systems
Authors Zhichun Zhu; Zhao Zhang; Xiaodong Zhang
Article Title Non-vital Loads
Authors Ryan Rakvic; Bryan Black; Deepak Limaye; John P. Shen
Article Title Modeling Value Speculation
Authors Yiannakis Sazeides
Article Title Quantifying Load Stream Behavior
Authors Suleyman Sair, Timothy Sherwood, Brad Calder
Article Title Using Internal Redundant Representations and Limited Bypass to Support Pipelined Adders and Register Files
Authors Mary D. Brown; Yale N. Patt
Article Title The FAB Predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches
Authors Martin Kampe; Per Stenstrom; Michel Dubois
Article Title Reverse Tracer: A Software Tool for Generating Realistic Performance Test Programs
Authors Larry Brisson; Mariko Sakamot; Akira Katsuno; Aiichiro Inoue; Yasunori Kimura
Article Title CableS: Thread Control and Memory Management Extensions for Shared Virtual Memory Clusters
Authors Peter Jamieson; Angelos Bilas
Article Title CARS: A New Code Generation Framework for Clustered ILP Processors
Authors K. Kailas; K. Ebcioglu; A. Agrawala