From 1986 to 1997 he taught at MIT where he and his group built the J–Machine and the M–Machine, parallel machines emphasizing low overhead synchronization and communication.
He became the Willard R. and Inez Kerr Bell Professor in the Stanford University School of Engineering and chairman of the computer science department at Stanford.
He developed a number of techniques used in modern interconnection networks including routing-based deadlock avoidance, wormhole routing, link-level retry, virtual channels, global adaptive routing, and high-radix routers. He has developed efficient mechanisms for communication, synchronization, and naming in parallel computers including message-driven computing and fast capability-based addressing. He has developed a number of stream processors starting in 1995 including Imagine, for graphics, signal, and Image processing, and Merrimac, for scientific computing.
He published over 200 papers the textbooks "Digital Systems Engineering" with John Poulton, and "Principles and Practices of Interconnection Networks" with Brian Towles. He was inventor or co-inventor on over 70 granted patents.
Dally's corporate involvements include various collaborations at Cray Research since 1989.
He did Internet router work at Avici Systems starting in 1997, was chief technical officer at Velio Communications from 1999 until its 2003 acquisition by LSI Logic, founder and chairman of Stream Processors, Inc until it folded.[2]
In January 2009 he was appointed chief scientist of Nvidia.[4]
He worked full-time at Nvidia, while supervising about 12 of his graduate students at Stanford.[5]
An author quoted him saying: "Locality is efficiency, Efficiency is power, Power is performance, Performance is king".[6]
Books
Dally and Poulton, Digital Systems Engineering, 1998, ISBN 0-521-59292-5.
Dally and Towles, Principles and Practices of Interconnection Networks, 2004, ISBN 0-12-200751-4.
Dally and Harting, Digital Design: A Systems Approach, 2012, ISBN 978-0-521-19950-6.
^Johnson, Matt (2011). An Analysis of Linux Scalability to Many Cores. p. 4. Locality is efficiency, Efficiency is power, Power is performance, Performance is king