HPCA 2000
HPCA 2001
HPCA 2002
HPCA 2003
HPCA 2004


HPCA 2000

Article Title A prefetching technique for irregular accesses to linked data structures
Authors M. Karlsson; F. Dahlgren; P. Stenstrom
Article Title Cache-efficient matrix transposition
Authors S. Chatterjee; S. Sen
Article Title On the performance of hand vs. automatically optimized numerical codes
Authors M. Jimenez; J. M. Llaberia; A. Fernandez
Article Title Evaluation of active disks for decision support databases
Authors M. Uysal; A. Acharya; J. Saltz
Article Title Improving the throughput of synchronization by insertion of delays
Authors R. Rajwar; A. Kagi; J. R. Goodman
Article Title The effect of network total order, broadcast, and remote-write capability on network-based
shared memory computing
Authors R. Stets; S. Dwarkadas; L. Kontothanassis; U. Rencuzogullari; M. L. Scott
Article Title Coherence communication prediction in shared-memory multiprocessors
Authors S. Kaxiras; C. Young
Article Title Register organization for media processing
Authors S. Rixner; W. J. Dally; B. Khailany; P. Mattson; U. J. Kapasi; J. D. Owens
Article Title Cache memory design for network processors
Authors Tzi-Cker Chiueh; P. Pradhan
Article Title Flit-reservation flow control
Authors Li-Shiuan Peh; W. J. Dally
Article Title Combining static and dynamic branch prediction to reduce destructive aliasing
Authors H. Patil; J. Emer
Article Title Trace cache redundancy: red and blue traces
Authors A. Ramirez; J. Ll. Larriba-Pey; M. Valero
Article Title Design of a parallel vector access unit for SDRAM memory systems
Authors B. K. Mathew; S. A. McKee; J. B. Carter; A. Davis
Article Title High-throughput coherence controllers
Authors A. K. Nanda; A. -T. Nguyen; M. M. Michael; D. J. Joseph
Article Title Reducing code size with run-time decompression
Authors C. Lefurgy; E. Piccininni; T. Mudge
Article Title Investigating the performance of two programming models for clusters of SMP PCs
Authors F. Cappello; O. Richard; D. Etiemble
Article Title PowerMANNA: a parallel architecture based on the PowerPC MPC620
Authors P. M. Behr; S. Pletner; A. C. Sodan
Article Title Architectural issues in Java runtime systems
Authors R. Radhakrishnan; N. Vijaykrishnan; L. K. John; A. Sivasubramaniam
Article Title Performance evaluation of dynamic reconfiguration in high-speed local area networks
Authors R. Casado; A. Bermudez; F. J. Quiles; J. L. Sanchez; J. Duato
Article Title Modified LRU policies for improving second-level cache behavior
Authors W. A. Wong; J. -L. Baer
Article Title Decoupled value prediction on trace processors
Authors Sang-Jeong Lee; Yuan Wang; Pen-Chung Yew
Article Title Performance analysis and visualization of parallel systems using SimOS and Rivet: a case
study
Authors R. Bosch; C. Stolte; G. Stoll; M. Rosenblum; P. Hanrahan
Article Title A DSM architecture for a parallel computer Cenju-4
Authors T. Hosomi; Y. Kanoh; M. Nakamura; T. Hirose
Article Title The best distribution for a parallel OpenGL 3D engine with texture caches
Authors A. Vartanian; J. -L. Bechennec; N. Drach-Temam
Article Title Investigating QoS support for traffic mixes with the MediaWorm router
Authors Ki Hwan Yum; A. Vaidya; C. R. Das; A. Sivasubramaniam
Article Title eXtended block cache
Authors S. Jourdan; L. Rappoport; Y. Almog; M. Erez; A. Yoaz; R. Ronen
Article Title Branch transition rate: a new metric for improved branch classification analysis
Authors M. Haungs; P. Sallee; M. Farrens
Article Title Impact of chip-level integration on performance of OLTP workloads
Authors L. A. Barroso; K. Gharachorloo; A. Nowatzyk; B. Verghese
Article Title Memory dependence speculation tradeoffs in centralized, continuous-window superscalar
processors
Authors A. Moshovos; G. S. Sohi
Article Title Quantifying the SMT layout overhead-does SMT pull its weight?
Authors J. Burns; J. -L. Gaudiot
Article Title Toward a cost-effective DSM organization that exploits processor-memory integration
Authors J. Torrellas; Liuxi Yang; A. -T. Nguyen
Article Title A technique for high bandwidth and deterministic low latency load/store accesses to multiple
cache banks
Authors H. Neefs; H. Vandierendonck; K. De Bosschere
Article Title Software-controlled multithreading using informing memory operations
Authors T. C. Mowry; S. R. Ramkissoon
Article Title Impact of heterogeneity on DSM performance
Authors R. J. O. Figueiredo; J. A. B. Fortes
Article Title Dynamic cluster assignment mechanisms
Authors R. Canal; J. M. Parcerisa; A. Gonzalez
Article Title Power Issues Related to Branch Prediction
Authors Dharmesh Parikh; Kevin Skadron; Yan Zhang; Marco Barcella; Mircea R. Stan

HPCA 2001

Article Title Stack Value File: Custom Microarchitecture for the Stack
Authors H.-H. S. Lee; M. Smelyanskiy; C. J. Newburn; G. S. Tyson
Article Title Register Renaming and Scheduling for Dynamic Execution of Predicated Code
Authors P. H. Wang; H. Wang; R. M. Kling; K. Ramakrishnan; J. P. Shen
Article Title Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order
Authors P. Michaud; A. Seznec
Article Title Speculative Data-Driven Multithreading
Authors A. Roth; G. S. Sohi
Article Title Towards Virtually-Addressed Memory Hierarchies
Authors X. Qiu; M. Dubois
Article Title Reevaluating Online Superpage Promotion with Hardware Support
Authors Z. Fang; L. Zhang; J. B. Carter; W.C. Hsieh; S. A. McKee
Article Title Performance of Hardware Compressed Main Memory
Authors B. Abali; H. Franke; X. Shen; D. E. Poff; T. B. Smith
Article Title JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
Authors A. Moshovos; G. Memik; B. Falsafi; A. Choudhary
Article Title A New Scalable Directory Architecture for Large-scale Multiprocessors
Authors M. E. Acacio; J. Gonzalez; J. M. Garcia; J. Duato
Article Title Self-Tuned Congestion Control for Multiprocessor Networks
Authors M. Thottethodi; A. R. Lebeck; S. S. Mukherjee
Article Title Automatically Mapping Code on an Intelligent Memory Architecture
Authors J. Lee; Y. Solihin; J. Torrellas
Article Title An Integrated Circuit/Architecture Approach to Reducing Leakage
in Deep-Submicron
High-Performance I-Caches
Authors S.-H. Yang; M. D. Powell; B. Falsafi; K. Roy; T. N. Vijaykumar
Article Title DRAM Energy Management Using Software and Hardware Directed
Power Mode Control
Authors V. Delaluz; M. Kandemir; N. Vijaykrishnan; A. Sivasubramaniam;
M. J. Irwin
Article Title Dynamic Thermal Management for High-Performance Microprocessors
Authors D. Brooks; M. Martonosi
Article Title Dynamic Prediction of Critical Path Instructions
Authors E. Tune; D. Liang; D. M. Tullsen; B. Calder
Article Title Dynamic Branch Prediction with Perceptrons
Authors D.A. Jimenez; C. Lin
Article Title Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency
Authors B. Goeman; H. Vandierendonck; K. De Bosschere
Article Title DLP +TLP Processors for the Next Generation of Media Workloads
Authors J. Corbal; R. Espasa; M. Valero
Article Title An Architectural Evaluation of Java TPC-W
Authors H. W. Cain; R. Rajwar; M. Marden; M. H. Lipasti
Article Title A Programmable Co-Processor for Profiling
Authors C. B. Zilles; G. S. Sohi
Article Title A Delay Model and Speculative Architecture for Pipelined Routers
Authors L. Peh; W.J. Dally
Article Title Quantifying the Impact of Architectural Scaling on Communication
Authors T. Heath; S. Kaw; R. P. Martin; T.D. Nguyen
Article Title Call Graph Prefetching for Database Applications
Authors M. Annavaram; J. M. Patel; E. S. Davidson
Article Title Branch History Guided Instruction Prefetching
Authors V. Srinivasan; E. S. Davidson; G. S. Tyson; M. J. Charney; T. R. Puzak
Article Title Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
Authors W. Lin; S. K. Reinhardt; D. Burger

HPCA 2002

Article Title Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic
Thermal Management
Authors Kevin Skadron; Tarek Abdelzaher; Mircea R. Stan
Article Title Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and
Frequency Scaling
Authors Greg Semeraro; Grigorios Magklis; Rajeev Balasubramonian; David H. Albonesi; Sandhya
Dwarkadas; Michael L. Scott
Article Title A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
Authors G.Edward Suh; Srinivas Devadas; Larry Rudolph
Article Title Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach
Authors Sudhanva Gurumurthi; Anand Sivasubramaniam; Mary Jane Irwin; N. Vijaykrishnan; Mahmut
Kandemir; Tao Li; Lizy Kurian John
Article Title Loose Loops Sink Chips
Authors Eric Borch; Srilatha Manne; Joel Emer; Eric Tune
Article Title Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor
Energy-Delay
Authors Se-Hyun Yang; Babak Falsafi; Michael D. Powell; T. N. Vijaykumar
Article Title Improving Value Communication for Thread-Level Speculation
Authors J. Gregory Steffan; Christopher B. Colohan; Antonia Zhai; Todd C. Mowry
Article Title Thread-Spawning Schemes for Speculative Multithreading
Authors Pedro Marcuello; Antonio González
Article Title Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization
for Multiprocessors
Authors Marcelo Cintra; Josep Torrellas
Article Title Microarchitectural Simulation and Control of di/dt-induced Power Supply Voltage Variation
Authors Ed Grochowski; Dave Ayers; Vivek Tiwari
Article Title Let’s Study Whole-Program Cache Behaviour Analytically
Authors Xavier Vera; Jingling Xue
Article Title Bandwidth Adaptive Snooping
Authors Milo M. K. Martin; Daniel J. Sorin; Mark D. Hill; David A. Wood
Article Title Tuning Garbage Collection in an Embedded Java Environment
Authors G. Chen; R. Shetty; M. Kandemir; N. Vijaykrishnan; M.J. Irwin; M. Wolczko
Article Title Evaluation of a Multithreaded Architecture for Cellular Computing
Authors Calin Cascaval; Jose G. Castanos; Luis Ceze; Monty Denneau; Manish Gupta; Derek Lieber; Jose
E. Moreira; Karin Strauss; Henry S. Warren Jr
Article Title Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs.
Speculative Precomputation
Authors Perry H. Wang; Hong Wang; Jamison D. Collins; Ed Grochowski; Ralph M. Kling; John P. Shen
Article Title User-Level Communication in Cluster-Based Server
Authors Enrique V. Carrera; Srinath Rao; Liviu Iftode; Ricardo Bianchini
Article Title The Minimax Cache: An Energy-Efficient Framework for Media Processor
Authors Osman S. Unsal; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
Article Title Fine-grain Priority Scheduling on Multi-channel Memory Systems
Authors Zhichun Zhu; Zhao Zhang; Xiaodong Zhang
Article Title Non-vital Loads
Authors Ryan Rakvic; Bryan Black; Deepak Limaye; John P. Shen
Article Title Modeling Value Speculation
Authors Yiannakis Sazeides
Article Title Quantifying Load Stream Behavior
Authors Suleyman Sair, Timothy Sherwood, Brad Calder
Article Title Using Internal Redundant Representations and Limited Bypass to Support Pipelined Adders and
Register Files
Authors Mary D. Brown; Yale N. Patt
Article Title The FAB Predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches
Authors Martin Kampe; Per Stenstrom; Michel Dubois
Article Title Reverse Tracer: A Software Tool for Generating Realistic Performance Test Programs
Authors Larry Brisson; Mariko Sakamot; Akira Katsuno; Aiichiro Inoue; Yasunori Kimura
Article Title CableS: Thread Control and Memory Management Extensions for Shared Virtual Memory Clusters
Authors Peter Jamieson; Angelos Bilas
Article Title CARS: A New Code Generation Framework for Clustered ILP Processors
Authors K. Kailas; K. Ebcioglu; A. Agrawala

HPCA 2003

Article Title Variability in Architectural Simulations of Multi-Threaded Workloads
Authors Alaa R. Alameldeen; David A. Wood
Article Title Front-End Policies for Improved Issue Efficiency in SMT Processors
Authors Ali El-Moursy; David H. Albonesi
Article Title Control Techniques to Eliminate Voltage Emergencies in High Performance
Processors
Authors Russ Joseph; David M. Brooks; Margaret Martonosi
Article Title Dynamic Voltage Scaling with Links for Power Optimization of Interconnection
Networks
Authors Li Shang, Li-Shiuan Peh; Niraj K. Jha
Article Title Deterministic Clock Gating for Microprocessor Power Reduction.
Authors Hai Li; Swarup Bhunia; Yiran Chen; T. N. Vijaykumar; Kaushik Roy
Article Title Runahead Execution: An Alternative to Very Large Instruction Windows for
Out-of-Order Processors
Authors Onur Mutlu; Jared Stark; Chris Wilkerson; Yale N. Patt
Article Title A Statistically Rigorous Approach for Improving Simulation Methodology
Authors Joshua J. Yi; David J. Lilja; Douglas M. Hawkins
Article Title Caches and Hash Trees for Efficient Memory Integrity Verification
Authors Blaise Gassend; G. Edward Suh; Dwaine E. Clarke; Marten van Dijk; Srinivas
Devadas
Article Title TCP: Tag Correlating Prefetchers.
Authors Zhigang Hu; Margaret Martonosi; Stefanos Kaxiras
Article Title Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned
Architecture
Authors Michael Bedford Taylor; Walter Lee; Saman P. Amarasinghe; Anant Agarwal
Article Title Mini-Threads: Increasing TLP on Small-Scale SMT Processors
Authors Joshua Redstone; Susan Eggers; Henry Levy
Article Title Reconsidering Complex Branch Predictors
Authors Daniel A. Jiménez
Article Title Incorporating Predicate Information into Branch Predictors
Authors Beth Simon; Brad Calder; Jeanne Ferrante
Article Title Dynamic Data Dependence Tracking and its Application to Branch Prediction
Authors Lei Chen; Steve Dropsho; David H. Albonesi
Article Title Power-Aware Control Speculation through Selective Throttling
Authors Juan L. Aragón; José González; Antonio González
Article Title Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for
Enterprise Server Systems
Authors Mariko Sakamoto; Akira Katsuno; Aiichiro Inoue; Takeo Asakawa; Haruhiko Ueno; Kuniki Morita;
Yasunori Kimura
Article Title Exploring the VLSI Scalability of Stream Processors
Authors Brucek Khailany; William J. Dally; Scott Rixner; Ujval J. Kapasi; John D. Owens; Brian
Towles
Article Title Dynamic Optimization of Micro-Operations
Authors Brian Slechta; David Crowe; Brian Fahs; Michael Fertig; Gregory Muthler; Justin Quek;
Francesco Spadini; Sanjay J. Patel; Steven S. Lumetta
Article Title Slipstream Execution Mode for CMP-Based Multiprocessors
Authors Khaled Z. Ibrahim; Gregory T. Byrd; Eric Rotenberg
Article Title Tradeoffs in Buffering Memory State for Thread-Level Speculation in
Multiprocessors
Authors María Jesús Garzarán; Milos Prvulovic; José María Llabería; Víctor Viñals; Lawrence
Rauchwerger; Josep Torrellas
Article Title Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory
Clusters
Authors Rosalia Christodoulopoulou; Reza Azimi; Angelos Bilas
Article Title Memory System Behavior of Java-Based Middleware
Authors Martin Karlsson; Kevin E. Moore; Erik Hagersten; David A. Wood
Article Title Evaluating the Impact of Communication Architecture on the Performability of
Cluster-Based Services
Authors Kiran Nagaraja; Neeraj Krishnan; Ricardo Bianchini; Richard P. Martin; Thu D. Nguyen
Article Title Hierarchical Backoff Locks for Nonuniform Communication Architectures
Authors Zoran Radović; Erik Hagersten
Article Title Performance Enhancement Techniques for InfiniBand™ Architecture
Authors Eun Jung Kim; Ki Hwan Yum; Chita R. Das; Mazin Yousif; José Duato
Article Title Catching Accurate Profiles in Hardware
Authors Satish Narayanasamy; Timothy Sherwood; Suleyman Sair; Brad Calder; George Varghese
Article Title Just Say No: Benefits of Early Cache Miss Determination
Authors Gokhan Memik; Glenn Reinman; William H. Mangione-Smith
Article Title Cost-Sensitive Cache Replacement Algorithms
Authors Jaeheon Jeong; Michel Dubois
Article Title Inter-Cluster Communication Models for Clustered VLIW Processors
Authors Andrei Terechko; Erwan Le Thenaff; Manish Garg; Jos van Eijndhoven; Henk Corporaal
Article Title Active I/O Switches in System Area Networks
Authors Ming Hao; Mark Heinrich

HPCA 2004

Article Title Hardware Support for Prescient Instruction Prefetch
Authors Tor M. Aamodt; Paul Chow; Per Hammarlund; Hong Wang; John Paul Shen
Article Title Low-Complexity Distributed Issue Queue
Authors Jaume Abella; Antonio González
Article Title Perceptron-Based Branch Confidence Estimation
Authors Haitham Akkary; Srikanth T. Srinivasan; Rajendar Koltur; Yogesh Patil; Wael
Refaai
Article Title Improving Disk Throughput in Data-Intensive Servers
Authors Enrique V. Carrera; Ricardo Bianchini
Article Title Accurate and Complexity-Effective Spatial Pattern Prediction
Authors Chi F. Chen; Se-Hyun Yang; Babak Falsafi; Andreas Moshovos
Article Title Out-of-Order Commit Processors
Authors Adrián Cristal; Daniel Ortega; Josep Llosa; Mateo Valero
Article Title Reducing the Scheduling Critical Cycle Using Wakeup Prediction
Authors Todd E. Ehrhart; Sanjay J. Patel
Article Title A Low-Complexity; High-Performance Fetch Unit for Simultaneous Multithreading
Processors
Authors Ayose Falcón; Alex Ramírez; Mateo Valero
Article Title Link-Time Path-Sensitive Memory Redundancy Elimination
Authors Manel Fernández; Roger Espasa
Article Title Reducing Branch Misprediction Penalty via Selective Branch Recovery
Authors Amit Gandhi; Haitham Akkary; Srikanth T. Srinivasan
Article Title Program Counter Based Techniques for Dynamic Power Management
Authors Chris Gniady; Y. Charlie Hu; Yung-Hsiang Lu
Article Title Exploring Wakeup-Free Instruction Scheduling
Authors Jie S. Hu; Narayanan Vijaykrishnan; Mary Jane Irwin
Article Title Stream Register Files with Indexed Access
Authors Nuwan Jayasena; Mattan Erez; Jung Ho Ahn; William J. Dally
Article Title Wavelet Analysis for Microprocessor Design – Experiences with Wavelet-Based
dI/dt Characterization
Authors Russ Joseph; Zhigang Hu; Margaret Martonosi
Article Title Processor Aware Anticipatory Prefetching in Loops
Authors Spiros Kalogeropulos; Mahadevan Rajagopalan; Vikram Rao; Yonghong Song; Partha Tirumalai
Article Title Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses
Authors Mazen Kharbutli; Keith Irwin; Yan Solihin; Jaejin Lee
Article Title Understanding Scheduling Replay Schemes
Authors Ilhyun Kim; Mikko H. Lipasti
Article Title The Thrifty Barrier – Energy-Aware Synchronization in Shared-Memory
Multiprocessors
Authors Jian Li; José F. Martínez; Michael C. Huang
Article Title Organizing the Last Line of Defense before Hitting the Memory Wall for CMP
Authors Chun Liu; Anand Sivasubramaniam; Mahmut T. Kandemir
Article Title Architectural Characterization of TCP/IP Packet Processing on the Pentium M
Microprocessor
Authors Srihari Makineni; Ravi R. Iyer
Article Title Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with
Execution Migration
Authors Pierre Michaud
Article Title Creating Converged Trace Schedules Using String Matching
Authors Satish Narayanasamy; Yuanfang Hu; Suleyman Sair; Brad Calder
Article Title Data Cache Prefetching Using a Global History Buffer
Authors Kyle J. Nesbit; James E. Smith
Article Title Signature Buffer – Bridging Performance Gap between Registers and Caches
Authors Lu Peng; Jih-Kwon Peir; Konrad Lai
Article Title Exploiting Prediction to Reduce Power on Buses
Authors Victor Wen; Mark Whitney; Yatish Patel; John Kubiatowicz
Article Title Synthesizing Representative I/O Workloads for TPC-H
Authors Jianyong Zhang; Anand Sivasubramaniam; Hubertus Franke; Natarajan Gautam; Yanyong Zhang;
Shailabh Nagar
Article Title Reducing Energy Consumption of Disk Storage Using Power-Aware Cache Management
Authors Qingbo Zhu; Francis M. David; Christo Frank Devaraj; Zhenmin Li; Yuanyuan Zhou; Pei Cao