HPCA Test of Time Award 2018: Eligible Papers
HPCA 1995
HPCA 1996
HPCA 1997
HPCA 1998
HPCA 1999
HPCA 2000
HPCA 1995
Article Title |
Non-consistent dual register files to reduce register pressure |
Authors |
J. Llosa; M. Valero; E. Ayguade |
Article Title |
How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? |
Authors |
K. I. Farkas; N. P. Jouppi; P. Chow |
Article Title |
Reducing communication latency with path multiplexing: in optically interconnected multiprocessor systems |
Authors |
Chunming Qiao; R. Melhem |
Article Title |
Creating a wider bus using caching techniques |
Authors |
D. Citron; L. Rudolph |
Article Title |
Massively parallel array processor for logic, fault, and design error simulation |
Authors |
Y. Hur; S. A. Szygenda; E. Scott Fehr; G. E. Ott; Sungho Kang |
Article Title |
Toward high communication performance through compiled communications on a circuit switched interconnection network |
Authors |
F. Cappello; C. Germain |
Article Title |
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors |
Authors |
S. Fiske; W. J. Dally |
Article Title |
Two techniques for improving performance on bus-based multiprocessors |
Authors |
C. Anderson; J. -L. Baer |
Article Title |
Origin-based fault-tolerant routing in the mesh |
Authors |
R. Libeskind-Hadas; E. Brandt |
Article Title |
Architectural support for inter-stream communication in a MSIMD system |
Authors |
V. Garg; D. E. Schimmel |
Article Title |
The Named-State Register File: implementation and performance |
Authors |
P. R. Nuth; W. J. Dally |
Article Title |
Abstracting network characteristics and locality properties of parallel systems |
Authors |
A. Sivasubramaniam; M. Singla; U. Ramachandran; H. Venkateswaran |
Article Title |
The effects of STEF in finely parallel multithreaded processors |
Authors |
Yamin Li; Wanming Chu |
Article Title |
Simulation study of cached RAID5 designs |
Authors |
R. Treiber; J. Menon |
Article Title |
U-cache: a cost-effective solution to synonym problem |
Authors |
Jesung Kim; Sang Lyul Min; Sanghoon Jeon; Byoungchu Ahn; Deog-Kyoon Jeong; Chong Sang Kim |
Article Title |
Design and performance evaluation of a multithreaded architecture |
Authors |
R. Govindarajan; S. S. Nemawarkar; P. LeNir |
Article Title |
Modeling virtual channel flow control in hypercubes |
Authors |
Y. M. Boura; C. R. Das |
Article Title |
Implementation of atomic primitives on distributed shared memory multiprocessors |
Authors |
M. M. Michael; M. L. Scott |
Article Title |
An argument for simple COMA |
Authors |
A. Saulsbury; T. Wilkinson; J. Carter; A. Landin |
Article Title |
Efficient and balanced adaptive routing in two-dimensional meshes |
Authors |
J. H. Upadhyay; V. Varavithya; P. Mohapatra |
Article Title |
DASC cache |
Authors |
A. Seznec |
Article Title |
Optimizing instruction cache performance for operating system intensive workloads |
Authors |
J. Torrellas; Chun Xia; R. Daigle |
Article Title |
Implementing register interlocks in parallel-pipeline, multiple instruction queue, superscalar processors |
Authors |
S. Weiss |
Article Title |
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors |
Authors |
F. Dahlgren; P. Stenstrom |
Article Title |
A VLSI architecture for computing the tree-to-tree distance |
Authors |
R. Sastry; N. Ranganathan |
Article Title |
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms |
Authors |
D. K. Panda |
Article Title |
Access ordering and memory-conscious cache utilization |
Authors |
S. A. Mckee; W. A. Wulf |
Article Title |
Fine-grain multi-thread processor architecture for massively parallel processing |
Authors |
T. Kawano; S. Kusakabe; R. -I. Taniguchi; M. Amamiya |
Article Title |
An initial evaluation of the Convex SPP-1000 for earth and space science applications |
Authors |
T. L. Sterling; D. F. Savarese; P. R. Merkey; J. P. Gardner |
Article Title |
Improving performance by cache driven memory management |
Authors |
K. Westerholz; S. Honal; J. Plankl; C. Hafer |
Article Title |
Software cache coherence for large scale multiprocessors |
Authors |
L. I. Kontothanassis; M. L. Scott |
Article Title |
Software assistance for data caches |
Authors |
O. Temam; N. Drach |
Article Title |
Fault-tolerant adaptive routing for two-dimensional meshes |
Authors |
C. M. Cunningham; D. R. Avresky |
Article Title |
Memory access reordering in vector processors |
Authors |
De-Lei Lee |
Article Title |
A design framework for hybrid-access caches |
Authors |
K. B. Theobald; H. H. J. Hum; G. R. Gao |
Article Title |
Program balance and its impact on high performance RISC architectures |
Authors |
L. K. John; V. Reddy; P. T. Hulina; L. D. Coraor |
HPCA 1996
Article Title |
Shuffle-Ring: overcoming the increasing degree of hypercube |
Authors |
Guihai Chen; F. C. M. Lau |
Article Title |
On the multiplexing degree required to embed permutations in a class of networks with direct interconnects |
Authors |
Chunming Qiao; Yousong Mei |
Article Title |
RMB-a reconfigurable multiple bus network |
Authors |
H. ElGindy; H. Schroder; A. Spray; A. K. Somani; H. Schmeck |
Article Title |
Bus-based COMA-reducing traffic in shared-bus multiprocessors |
Authors |
A. Landin; F. Dahlgren |
Article Title |
Improving the data cache performance of multiprocessor operating systems |
Authors |
Chun Xia; J. Torrellas |
Article Title |
Parallel intersecting compressed bit vectors in a high speed query server for processing postal addresses |
Authors |
Wen-Jann Yang; R. Sridhar; V. Demjanenko |
Article Title |
Using memory-mapped network interfaces to improve the performance of distributed shared memory |
Authors |
L. I. Kontothanassis; M. L. Scott |
Article Title |
Distance-adaptive update protocols for scalable shared-memory multiprocessors |
Authors |
A. Raynaud; Zheng Zhang; J. Torrellas |
Article Title |
Performance study of a multithreaded superscalar microprocessor |
Authors |
M. Gulati; N. Bagherzadeh |
Article Title |
The impact of shared-cache clustering in small-scale shared-memory multiprocessors |
Authors |
B. A. Nayfeh; K. Olukotun; J. P. Singh |
Article Title |
A cache coherency protocol for optically connected parallel computer systems |
Authors |
J. A. Reisner; T. S. Wailes |
Article Title |
Protected, user-level DMA for the SHRIMP network interface |
Authors |
M. A. Blumrich; C. Dubnicki; E. W. Felten; Kai Li |
Article Title |
Distributed prefetch-buffer/cache design for high performance memory systems |
Authors |
T. Alexander; G. Kedem |
Article Title |
A shared-bus control mechanism and a cache coherence protocol for a high-performance on-chip multiprocessor |
Authors |
M. Takahashi; H. Takano; E. Kaneko; S. Suzuki |
Article Title |
Decoupled vector architectures |
Authors |
R. Espasa; M. Valero |
Article Title |
Representative traces for processor models with infinite cache |
Authors |
V. S. Iyengar; L. H. Trevillyan; P. Bose |
Article Title |
Multitasking and multithreading on a multiprocessor with virtual shared memory |
Authors |
H. L. Muller; P. W. A. Stallard; D. H. D. Warren |
Article Title |
A comparison of entry consistency and lazy release consistency implementations |
Authors |
S. V. Adve; A. L. Cox; S. Dwarkadas; R. Rajamony; W. Zwaenepoel |
Article Title |
Telegraphos: high-performance networking for parallel processing on workstation clusters |
Authors |
E. P. Markatos; M. G. H. Katevenis |
Article Title |
Predictive sequential associative cache |
Authors |
B. Calder; D. Grunwald; J. Emer |
Article Title |
Fault-tolerant multicast routing in the mesh with no virtual channels |
Authors |
R. Libeskind-Hadas; K. Watkins; T. Hehre |
Article Title |
Two adaptive hybrid cache coherency protocols |
Authors |
C. Anderson; A. R. Karlin |
Article Title |
Performance characterization of the Alpha 21164 microprocessor using TP and SPEC workloads |
Authors |
Z. Cvetanovic; D. Bhandarkar |
Article Title |
Register file design considerations in dynamically scheduled processors |
Authors |
K. I. Farkas; N. P. Jouppi; P. Chow |
Article Title |
A topology-independent generic methodology for deadlock-free wormhole routing |
Authors |
H. Park; D. P. Agrawal |
Article Title |
Co-scheduling hardware and software pipelines |
Authors |
R. Govindarajan; E. R. Altman; G. R. Gao |
Article Title |
Fault-tolerance with multimodule routers |
Authors |
S. Chalasani; R. V. Boppana |
Article Title |
Performance evaluation of a cluster-based multiprocessor built from ATM switches and bus-based multiprocessor servers |
Authors |
M. Karlsson; P. Stenstrom |
Article Title |
Improving release-consistent shared virtual memory using automatic update |
Authors |
L. Iftode; C. Dubnicki; E. W. Felten; Kai Li |
HPCA 1997
Article Title |
Software-managed address translation |
Authors |
B. Jacob; T. Mudge |
Article Title |
Towards a communication characterization methodology for parallel applications |
Authors |
S. Chodnekar; V. Srinivasan; A. S. Vaidya; A. Sivasubramaniam; C. R. Das |
Article Title |
Distributed path reservation algorithms for multiplexed all-optical interconnection networks |
Authors |
X. Yuan; R. Melhem; R. Gupta |
Article Title |
Design issues and tradeoffs for write buffers |
Authors |
K. Skadron; D. W. Clark |
Article Title |
An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors |
Authors |
H. Abdel-Shafi; J. Hall; S. V. Adve; V. S. Adve |
Article Title |
On the use and performance of explicit communication primitives in cache-coherent multiprocessor systems |
Authors |
X. Qin; J. -L. Baer |
Article Title |
Software DSM protocols that adapt between single writer and multiple writer |
Authors |
C. Amza; A. L. Cox; S. Dwarkadas; W. Zwaenepoel |
Article Title |
ATM and fast Ethernet network interfaces for user-level communication |
Authors |
M. Welsh; A. Basu; T. von Eicken |
Article Title |
Message proxies for efficient, protected communication on SMP clusters |
Authors |
B. -H. Lim; P. Heidelberger; P. Pattnaik; M. Snir |
Article Title |
A framework for statistical modeling of superscalar processor performance |
Authors |
D. B. Noonburg; J. P. Shen |
Article Title |
Speeding up the memory hierarchy in Flat COMA multiprocessors |
Authors |
L. Yang; J. Torrellas |
Article Title |
Architectural support for compiler-synthesized dynamic branch prediction strategies: Rationale and initial results |
Authors |
D. I. August; D. A. Connors; J. C. Gyllenhaal; W. -M. W. Hwu |
Article Title |
Multithreaded vector architectures |
Authors |
R. Espasa; M. Valero |
Article Title |
Reducing the replacement overhead in bus-based COMA multiprocessors |
Authors |
F. Dahlgren; A. Landin |
Article Title |
Scheduling communication on an SMP node parallel machine |
Authors |
B. Falsafi; D. A. Wood |
Article Title |
Multicast on irregular switch-based networks with wormhole routing |
Authors |
R. Kesavan; K. Bondalapati; D. K. Panda |
Article Title |
Reducing remote conflict misses: NUMA with remote cache versus COMA |
Authors |
Z. Zhang; J. Torrellas |
Article Title |
A performance comparison of hierarchical ring- and mesh-connected multiprocessor networks |
Authors |
G. Ravindran; M. Stumm |
Article Title |
Reducing the communication overhead of dynamic applications on shared memory multiprocessors |
Authors |
A. Sivasubramaniam |
Article Title |
Global address space, non-uniform bandwidth: a memory system performance characterization of parallel systems |
Authors |
T. Stricker; T. Cross |
Article Title |
The impact of instruction-level parallelism on multiprocessor performance and simulation methodology |
Authors |
V. S. Pai; P. Ranganathan; S. V. Adve |
Article Title |
Multiple branch and block prediction |
Authors |
S. Wallace; N. Bagherzadeh |
Article Title |
The memory performance of DSS commercial workloads in shared-memory multiprocessors |
Authors |
P. Trancoso; J. -L. Larriba-Pey; Z. Zhang; J. Torrellas |
Article Title |
Performance characterization of the Pentium Pro processor |
Authors |
D. Bhandarkar; J. Ding |
Article Title |
Architectural support for reducing communication overhead in multiprocessor interconnection networks |
Authors |
B. Vien Dao; S. Yalamanchili; J. Duato |
Article Title |
User-level DMA without operating system kernel modification |
Authors |
E. P. Markatos; M. G. H. Katevenis |
Article Title |
Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers |
Authors |
Kai Hwang; Choming Wang; Cho-Li Wang |
Article Title |
Datapath design for a VLIW video signal processor |
Authors |
A. Wolfe; J. Fritts; S. Dutta; E. S. T. Fernandes |
Article Title |
Control flow speculation in multiscalar processors |
Authors |
Q. Jacobson; S. Bennett; N. Sharma; J. E. Smith |
Article Title |
Advances of the counterflow pipeline microarchitecture |
Authors |
K. J. Janik; S. -L. Lu; M. F. Miller |
HPCA 1998
Article Title |
Supporting highly speculative execution via adaptive branch trees |
Authors |
Tien-Fu Chen |
Article Title |
Virtual-physical registers |
Authors |
A. Gonzalez; J. Gonzalez; M. Valero |
Article Title |
Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors |
Authors |
Ye Zhang; L. Rauchwerger; J. Torrellas |
Article Title |
Enhancing memory use in Simple Coma: Multiplexed Simple Coma |
Authors |
S. Basu; J. Torrellas |
Article Title |
PRISM: an integrated architecture for scalable shared memory |
Authors |
K. Ekanadham; Beng-Hong Lim; P. Pattnaik; M. Snir |
Article Title |
Architectural implications of a family of irregular applications |
Authors |
D. O’Hallaron; J. R. Shewchuk; T. Gross |
Article Title |
The emergence of workstation clusters: Should we continue to build mpps? [panel session] |
Authors |
D. K. Panda |
Article Title |
Challenging applications on fast networks |
Authors |
K. Langendoen; R. Hofman; H. Bal |
Article Title |
Fine-grain software distributed shared memory on SMP clusters |
Authors |
D. J. Scales; K. Gharachorloo; A. Aggarwal |
Article Title |
A very efficient distributed deadlock detection mechanism for wormhole networks |
Authors |
P. Lopez; J. M. Martinez; J. Duato |
Article Title |
Home-based SVM protocols for SMP clusters: Design and performance |
Authors |
R. Samanta; A. Bilas; L. Iftode; J. P. Singh |
Article Title |
Credit-flow-controlled ATM for MP interconnection: The ATLAS I single-chip ATM switch |
Authors |
M. Katevenis; D. Serpanos; E. Spyridakis |
Article Title |
The impact of data transfer and buffering alternatives on network interface design |
Authors |
S. S. Mukherjee; M. D. Hill |
Article Title |
The effectiveness of SRAM network caches in clustered DSMs |
Authors |
A. Moga; M. Dubois |
Article Title |
Control speculation in multithreaded processors through dynamic loop detection |
Authors |
J. Tubella; A. Gonzalez |
Article Title |
The sensitivity of communication mechanisms to bandwidth and latency |
Authors |
F. T. Chong; R. Barua; F. Dahlgren; J. D. Kubiatowicz; A. Agarwal |
Article Title |
Partial sampling with reverse state reconstruction: A new technique for branch predictor performance estimation |
Authors |
D. E. Vengroff; G. R. Gao |
Article Title |
Using multicast and multithreading to reduce communication in software DSM systems |
Authors |
E. Speight; J. K. Bennett |
Article Title |
Speculative versioning cache |
Authors |
S. Gopal; T. N. Vijaykumar; J. E. Smith; G. S. Sohi |
Article Title |
The architectural costs of streaming I/O: A comparison of workstations, clusters, and SMPs |
Authors |
R. H. Arpaci-Dusseau; A. C. Arpaci-Dusseau; D. E. Culler; J. M. Hellerstein; D. A. Patterson |
Article Title |
Address translation mechanisms in network interfaces |
Authors |
I. Schoinas; M. D. Hill |
Article Title |
The potential for using thread-level data speculation to facilitate automatic parallelization |
Authors |
J. G. Steffan; T. C. Mowry |
Article Title |
Performance study of a concurrent multithreaded processor |
Authors |
Jenn-Yuan Tsai; Zhenzhen Jiang; E. Ness; Pen-Chung Yew |
Article Title |
FPGA based custom computing machines for irregular problems |
Authors |
D. Abramson; P. Logothetis; A. Postula; M. Randall |
Article Title |
Exploiting two-case delivery for fast protected messaging |
Authors |
K. Mackenzie; J. Kubiatowicz; M. Frank; W. Lee; V. Lee; A. Agarwal; M. F. Kaashoek |
Article Title |
Non-stalling counterflow architecture |
Authors |
M. F. Miller; K. J. Janik; Shih-Lien Lu |
Article Title |
Temporal-based procedure reordering for improved instruction cache performance |
Authors |
J. Kalamationos; D. R. Kaeli |
Article Title |
Performance evaluation of tiling for the register level |
Authors |
M. Jimenez; J. M. Llaberia; A. Fernandez |
Article Title |
Treegion scheduling for wide issue processors |
Authors |
W. A. Havanki; S. Banerjia; T. M. Conte |
Article Title |
Communication across fault-containment firewalls on the SGI origin |
Authors |
K. Ghosh; A. J. Christie |
Article Title |
Efficiently adapting to sharing patterns in software DSMs |
Authors |
L. R. Monnerat; R. Bianchini |
Article Title |
Comparative evaluation of latency tolerance techniques for software distributed shared memory |
Authors |
T. C. Mowry; C. Q. C. Chan; A. K. W. Lo |
HPCA 1999
Article Title |
A study of control independence in superscalar processors |
Authors |
E. Rotenberg; Q. Jacobson; J. Smith |
Article Title |
Impulse: building a smarter memory controller |
Authors |
J. Carter; W. Hsieh; L. Stoller; M. Swanson; Lixin Zhang; E. Brunvand; A. Davis; Chen-Chi Kuo; R. Kuramkote; M. Parker; L. Schaelicke; T. Tateyama |
Article Title |
Sensitivity of parallel applications to large differences in bandwidth and latency in two-layer interconnects |
Authors |
A. Plaat; H. E. Bal; R. F. H. Hofman |
Article Title |
Improving CC-NUMA performance using Instruction-based Prediction |
Authors |
S. Kaxiras; J. R. Goodman |
Article Title |
Distributed modulo scheduling |
Authors |
M. M. Fernandes; J. Llosa; N. Topham |
Article Title |
A performance comparison of homeless and home-based lazy release consistency protocols in software shared memory |
Authors |
A. L. Cox; E. de Lara; C. Hu; W. Zwaenepoel |
Article Title |
The synergy of multithreading and access/execute decoupling |
Authors |
J. -M. Parcerisa; A. Gonzalez |
Article Title |
Supporting fine-grained synchronization on a simultaneous multithreading processor |
Authors |
D. M. Tullsen; J. L. Lo; S. J. Eggers; H. M. Levy |
Article Title |
LAPSES: a recipe for high performance adaptive router design |
Authors |
A. S. Vaidya; A. Sivasubramaniam; C. R. Das |
Article Title |
Using Lamport clocks to reason about relaxed memory models |
Authors |
A. E. Condon; M. D. Hill; M. Plakal; D. J. Sorin |
Article Title |
Memory hierarchy considerations for fast transpose and bit-reversals |
Authors |
K. S. Gatlin; L. Carter |
Article Title |
The impact of link arbitration on switch performance |
Authors |
M. Pirvu; L. Bhuyan; N. Ni |
Article Title |
Limits to the performance of software shared memory: a layered approach |
Authors |
A. Bilas; Dongming Jiang; Yuanyuan Zhou; J. P. Singh |
Article Title |
Dynamically variable line-size cache exploiting high on-chip memory bandwidth of merged DRAM/logic LSIs |
Authors |
K. Inoue; K. Kai; K. Murakami |
Article Title |
Impact of buffer size on the efficiency of deadlock detection |
Authors |
J. M. Martinez; P. Lopez; J. Duato |
Article Title |
Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors |
Authors |
R. Iyer; L. N. Bhuyan |
Article Title |
Exploiting basic block value locality with block reuse |
Authors |
Jian Huang; D. J. Lilja |
Article Title |
Instruction pre-processing in trace processors |
Authors |
Q. Jacobson; J. E. Smith |
Article Title |
Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading |
Authors |
S. Hily; A. Seznec |
Article Title |
Comparative evaluation of fine- and coarse-grain approaches for software distributed shared memory |
Authors |
S. Dwarkadas; K. Gharachorloo; L. Kontothanassis; D. J. Scales; M. L. Scott; R. Stets |
Article Title |
WildFire: a scalable path for SMPs |
Authors |
E. Hagersten; M. Koster |
Article Title |
MP-LOCKs: replacing H/W synchronization primitives with message passing |
Authors |
Chen-Chi Kuo; J. Carter; R. Kuramkote |
Article Title |
Dynamically exploiting narrow width operands to improve processor power and performance |
Authors |
D. Brooks; M. Martonosi |
Article Title |
Access order and effective bandwidth for streams on a Direct Rambus memory |
Authors |
S. I. Hong; S. A. McKee; M. H. Salinas; R. H. Klenke; J. H. Aylor; W. A. Wulf |
Article Title |
RAPID-Cache-a reliable and inexpensive write cache for disk I/O systems |
Authors |
Yiming Hu; Qing Yang; T. Nightingale |
Article Title |
Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors |
Authors |
M. Durbhakula; V. S. Pai; S. Adve |
Article Title |
A scalable cache coherent scheme exploiting wormhole routing networks |
Authors |
Yunseok Rhee; Joonwon Lee |
Article Title |
Global context-based value prediction |
Authors |
T. Nakra; R. Gupta; M. L. Soffa |
Article Title |
Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols |
Authors |
B. Falsafi; D. A. Wood |
Article Title |
Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors |
Authors |
Ye Zhang; L. Rauchwerger; J. Torrellas |
Article Title |
Efficient all-to-all broadcast in all-port mesh and torus networks |
Authors |
Yuanyuan Yang; Jianchao Wang |
Article Title |
Instruction recycling on a multiple-path processor |
Authors |
S. Wallace; D. M. Tullsen; B. Calder |
Article Title |
Permutation development data layout (PDDL) |
Authors |
T. J. E. Schwarz; J. Steinberg; W. A. Burkhard |
Article Title |
Design and performance of directory caches for scalable shared memory multiprocessors |
Authors |
M. M. Michael; A. K. Nanda |
Article Title |
MMR: a high-performance MultiMedia Router-architecture and design trade-offs |
Authors |
J. Duato; S. Yalamanchili; M. B. Caminero; D. Love; F. J. Quiles |
Article Title |
Lightweight hardware distributed shared memory supported by generalized combining |
Authors |
K. Tanaka; T. Matsumoto; K. Hiraki |
Article Title |
Communication studies of single-threaded and multithreaded distributed-memory machines |
Authors |
A. Sohn; Yunheung Paek; Jui-Yuan Ku; Y. Kodama; Y. Yamaguchi |
HPCA 2000
Article Title |
A prefetching technique for irregular accesses to linked data structures |
Authors |
M. Karlsson; F. Dahlgren; P. Stenstrom |
Article Title |
Cache-efficient matrix transposition |
Authors |
S. Chatterjee; S. Sen |
Article Title |
On the performance of hand vs. automatically optimized numerical codes |
Authors |
M. Jimenez; J. M. Llaberia; A. Fernandez |
Article Title |
Evaluation of active disks for decision support databases |
Authors |
M. Uysal; A. Acharya; J. Saltz |
Article Title |
Improving the throughput of synchronization by insertion of delays |
Authors |
R. Rajwar; A. Kagi; J. R. Goodman |
Article Title |
The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing |
Authors |
R. Stets; S. Dwarkadas; L. Kontothanassis; U. Rencuzogullari; M. L. Scott |
Article Title |
Coherence communication prediction in shared-memory multiprocessors |
Authors |
S. Kaxiras; C. Young |
Article Title |
Register organization for media processing |
Authors |
S. Rixner; W. J. Dally; B. Khailany; P. Mattson; U. J. Kapasi; J. D. Owens |
Article Title |
Cache memory design for network processors |
Authors |
Tzi-Cker Chiueh; P. Pradhan |
Article Title |
Flit-reservation flow control |
Authors |
Li-Shiuan Peh; W. J. Dally |
Article Title |
Combining static and dynamic branch prediction to reduce destructive aliasing |
Authors |
H. Patil; J. Emer |
Article Title |
Trace cache redundancy: red and blue traces |
Authors |
A. Ramirez; J. Ll. Larriba-Pey; M. Valero |
Article Title |
Design of a parallel vector access unit for SDRAM memory systems |
Authors |
B. K. Mathew; S. A. McKee; J. B. Carter; A. Davis |
Article Title |
High-throughput coherence controllers |
Authors |
A. K. Nanda; A. -T. Nguyen; M. M. Michael; D. J. Joseph |
Article Title |
Reducing code size with run-time decompression |
Authors |
C. Lefurgy; E. Piccininni; T. Mudge |
Article Title |
Investigating the performance of two programming models for clusters of SMP PCs |
Authors |
F. Cappello; O. Richard; D. Etiemble |
Article Title |
PowerMANNA: a parallel architecture based on the PowerPC MPC620 |
Authors |
P. M. Behr; S. Pletner; A. C. Sodan |
Article Title |
Architectural issues in Java runtime systems |
Authors |
R. Radhakrishnan; N. Vijaykrishnan; L. K. John; A. Sivasubramaniam |
Article Title |
Performance evaluation of dynamic reconfiguration in high-speed local area networks |
Authors |
R. Casado; A. Bermudez; F. J. Quiles; J. L. Sanchez; J. Duato |
Article Title |
Modified LRU policies for improving second-level cache behavior |
Authors |
W. A. Wong; J. -L. Baer |
Article Title |
Decoupled value prediction on trace processors |
Authors |
Sang-Jeong Lee; Yuan Wang; Pen-Chung Yew |
Article Title |
Performance analysis and visualization of parallel systems using SimOS and Rivet: a case study |
Authors |
R. Bosch; C. Stolte; G. Stoll; M. Rosenblum; P. Hanrahan |
Article Title |
A DSM architecture for a parallel computer Cenju-4 |
Authors |
T. Hosomi; Y. Kanoh; M. Nakamura; T. Hirose |
Article Title |
The best distribution for a parallel OpenGL 3D engine with texture caches |
Authors |
A. Vartanian; J. -L. Bechennec; N. Drach-Temam |
Article Title |
Investigating QoS support for traffic mixes with the MediaWorm router |
Authors |
Ki Hwan Yum; A. Vaidya; C. R. Das; A. Sivasubramaniam |
Article Title |
eXtended block cache |
Authors |
S. Jourdan; L. Rappoport; Y. Almog; M. Erez; A. Yoaz; R. Ronen |
Article Title |
Branch transition rate: a new metric for improved branch classification analysis |
Authors |
M. Haungs; P. Sallee; M. Farrens |
Article Title |
Impact of chip-level integration on performance of OLTP workloads |
Authors |
L. A. Barroso; K. Gharachorloo; A. Nowatzyk; B. Verghese |
Article Title |
Memory dependence speculation tradeoffs in centralized, continuous-window superscalar processors |
Authors |
A. Moshovos; G. S. Sohi |
Article Title |
Quantifying the SMT layout overhead-does SMT pull its weight? |
Authors |
J. Burns; J. -L. Gaudiot |
Article Title |
Toward a cost-effective DSM organization that exploits processor-memory integration |
Authors |
J. Torrellas; Liuxi Yang; A. -T. Nguyen |
Article Title |
A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks |
Authors |
H. Neefs; H. Vandierendonck; K. De Bosschere |
Article Title |
Software-controlled multithreading using informing memory operations |
Authors |
T. C. Mowry; S. R. Ramkissoon |
Article Title |
Impact of heterogeneity on DSM performance |
Authors |
R. J. O. Figueiredo; J. A. B. Fortes |
Article Title |
Dynamic cluster assignment mechanisms |
Authors |
R. Canal; J. M. Parcerisa; A. Gonzalez |