==================================================================

2nd Workshop on Accelerated Machine Learning (AccML)

 

Co-located with the ISCA 2020 Conference

(https://iscaconf.org/isca2020/)

 

May 31, 2020

Valencia, Spain

==================================================================

 

UPDATE: DEADLINE EXTENSION TO MAY 8, 2020

 

————————————————————————-

CALL FOR CONTRIBUTIONS

————————————————————————-

In the last 5 years, the remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while convolutional neural networks have motivated much of this effort, numerous applications and models involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.

 

————————————————————————-

Links to the Workshop pages

————————————————————————-

Organizers: http://workshops.inf.ed.ac.uk/accml/

 

ISCA: https://www.iscaconf.org/isca2020/program/workshops.html

 

————————————————————————-

Invited Speakers

————————————————————————-

 

– Antonio Gonzalez (Universitat Politècnica de Catalunya)

 

Title: “Removing Ineffectual Computations in Neural Networks”

 

Abstract: There is a growing interest in extending computing devices with the ability to analyze and understand signals and data coming from a large variety of activities in our daily live, and provide real time responses in complex situations, with the goal to emulate human perception and problem solving. Examples include personal assistants, self-driving cars, domestics robots and health-care devices just to name a few. Neural networks have proven to be an effective approach to support many of these functionalities.

 

Most of these systems have very limited energy budgets so the effectiveness of this approach is strongly dependent on the energy-efficiency of the adopted solution. In this talk we present several alternative directions for improving the energy-efficiency of neural networks based on identifying and removing ineffectual computations.

 

Bio: Antonio González (Ph.D. 1989) is a Full Professor at the Computer Architecture Department of the Universitat Politècnica de Catalunya, Barcelona (Spain), and the director of the Architecture and Compiler research group. He was the founding director of the Intel Barcelona Research Center from 2002 to 2014. His research has focused on computer architecture. In this area, Antonio holds 52 patents, has published over 370 research papers and has given over 120 invited talks. He has also made multiple contributions to the design of the architecture of several commercial microprocessors.

 

Antonio has been program chair for ICS, ISPASS, MICRO, HPCA and ISCA, and general chair for MICRO and HPCA among other symposia. He has served on the program committee for over 130 international symposia in the field of computer architecture, and has been Associate Editor of the IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, IEEE Computer Architecture Letters, ACM Transactions on Architecture and Code Optimization, ACM Transactions on Parallel Computing, and Journal of Embedded Computing.

 

Antonio’s awards include the award to the best student in computer engineering in Spain, the Rosina Ribalta award as the advisor of the best PhD project in Information Technology and Communications, the Duran Farrell award for research in technology, the Aritmel National Award of Informatics to the Computer Engineer of the Year, the King James I award for his contributions in research on new technologies, and the ICREA Academia Award. He is an IEEE Fellow.

 

 

– David Kaeli (Northeastern University)

 

Title: “Scaling Machine Learning Workloads on Today’s GPUs”

 

Abstract: Machine learning applications place large computational demands on hardware resources when performing classification, regression, clustering and training.  What is common in many of these applications is that the quality of the outcome or model improves as we process more data.  GPUs have been shown to be an effective platform for accelerating machine learning workloads, though have limits in terms of the amount a single GPU can process.  This talk will look at ongoing work in hardware compaction and multi-GPU acceleration, enabling further scaling of machine learning workloads.

 

Bio: David Kaeli received his BS and PhD in Electrical Engineering from Rutgers University, and an MS in Computer Engineering from Syracuse University. He is presently a COE Distinguished Full Processor on the ECE faculty at Northeastern University, Boston, MA.  Dr. Kaeli has published over 350 critically reviewed publications, 7 books, and 13 patents. He serves as the Editor in Chief of ACM Transactions on Computer Architecture and Code Optimization, and an Associate Editor of the IEEE Transactions on Parallel and Distributed Systems and the Journal of Parallel and Distributed Computing. Dr. Kaeli is an IEEE Fellow and an ACM Distinguished Scientist.

 

 

– Tushar Krishna (Georgia Tech)

 

Title: “A Communication-Centric Approach for Designing Flexible DNN Accelerators”

 

Abstract: Deep Neural Networks (DNN) have demonstrated highly promising results across computer vision and speech recognition, and are becoming foundational for ubiquitous AI. The computational complexity of these algorithms and a need for high energy-efficiency has led to a surge in research on hardware accelerators. To reduce the latency and energy costs of accessing DRAM, most DNN accelerators are spatial in nature, with hundreds of processing elements (PE) operating in parallel and communicating with each other directly.

 

DNNs are evolving at a rapid rate – leading to myriad layer types (convolution, attention, LSTM, MLP) of varying shape (regular and irregular). Given a DNN there can be myriad computationally efficient implementations (e.g., via pruning) – leading to structured and unstructured sparsity. Finally, a given DNN can be tiled and partitioned in myriad ways to exploit data reuse. All of the above can lead to irregular dataflow patterns within the accelerator substrate. Getting high mapping efficiency for all these cases is highly challenging in accelerators today that are often tightly coupled 2D grids with rigid near-neighbor connectivity.

 

First, given a target DNN, we will demonstrate a systematic methodology for understanding data reuse opportunities within the algorithm and determine the cost vs benefit for efficiently exploiting them in hardware using our dataflow + microarchitectural model called MAESTRO (MICRO 2019 + IEEE Micro Top Picks). Next, we present a systematic communication-centric methodology for accelerator design, that can provide ~100% efficiency for arbitratry DNNs shapes, sparsity ratios and mappings. We demonstrate instances of this approach with two accelerators, MAERI (ASPLOS 2018 + IEEE Micro Top Picks Hon’ mention) and SIGMA (HPCA 2020 + Best Paper Award), that show orders of magnitude better utilization over state-of-the-art baselines like NVIDIA’s NVDLA and Google’s TPU.

 

Bio: Tushar Krishna is an Assistant Professor in the School of Electrical and Computer Engineering at Georgia Tech. He also holds the ON Semiconductor Junior Professorship. He has a Ph.D. in Electrical Engineering and Computer Science from MIT (2014), a M.S.E in Electrical Engineering from Princeton University (2009), and a B.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Delhi (2007). Before joining Georgia Tech in 2015, Dr. Krishna spent a year as a post-doctoral researcher at Intel, Massachusetts.

 

Dr. Krishna’s research spans computer architecture, interconnection networks, networks-on-chip (NoC) and deep learning accelerators with a focus on optimizing data movement in modern computing systems. Three of his papers have been selected for IEEE Micro’s Top Picks from Computer Architecture, one more received an honorable mention, and three have won best paper awards. He received the National Science Foundation (NSF) CRII award in 2018, and both a Google Faculty Award and a Facebook Faculty Award in 2019.

 

 

– Cliff Young (Google)

 

Title: Reflections on TPUs, Current Problems in Acceleration, and What’s Next

 

Abstract: Google’s first TPU has been a remarkably successful accelerator, spawning a sequence of successors and inspiring a wave of new chips from established companies and startups. I’ll start with some retrospection about what we got right and the ways in which we were lucky in building that first TPU. Then I’ll pivot to the problems I think are currently hard and possibly underserved by our NN accelerator systems (to spoil: programmability, memory, and multi-tenancy). Lastly I’ll speculate about where ML might take us: how much might the algorithms and computations change, the implications of the Accelerator Wall, and the virtuous feedback between algorithms and architecture that might be the basis of a true Golden Age for our field.

 

Bio: Cliff Young is a software engineer in the Google Brain team, where he works on codesign for deep learning accelerators. He is one of the designers of Google’s Tensor Processing Unit (TPU), which is used in production applications including Search, Maps, Photos, and Translate. TPUs also powered AlphaGo’s historic 4-1 victory over Go champion Lee Sedol. Previously, Cliff built special-purpose supercomputers for molecular dynamics at D. E. Shaw Research and worked at Bell Labs. Cliff holds AB, MS, and PhD degrees in computer science from Harvard University.

 

————————————————————————-

Topics

————————————————————————-

Topics of interest include (but are not limited to):

 

– Novel ML systems: heterogeneous multi/many-core systems, GPUs, FPGAs;

– Novel ML hardware accelerators and associated software;

– Emerging semiconductor technologies with applications to ML hardware acceleration;

– ML for the construction and tuning of systems;

– Cloud and edge ML computing: hardware and software to accelerate training and inference;

– Computing systems research addressing the privacy and security of ML-dominated systems.

 

————————————————————————-

Submission

————————————————————————-

Papers will be reviewed by the workshop’s technical program committee according to criteria regarding a submission’s quality, relevance to the workshop’s topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.

 

The workshop does not have formal proceedings, so accepted papers do not preclude publishing at future conferences and/or journals.

 

————————————————————————-

Important Dates

————————————————————————-

Submission deadline: May 8, 2020

Notification of decision: May 20, 2020

 

————————————————————————-

Organizers

————————————————————————-

José Cano (University of Glasgow)

José L. Abellán (Catholic University of Murcia)

Albert Cohen (Google)

Alex Ramirez (Google)

CFP: 2nd Workshop on Accelerated Machine Learning (AccML) at ISCA 2020 (Extended deadline: May 8)