Speaker: Hiroyuki Ootomo, Tokyo Institute of Technology
Title: DGEMM on Integer Tensor Cores
Date and time: Tuesday, September 5, 2 p.m. – 3 p.m.
Abstract:
In order to meet the increasing demand for dense matrix-matrix multiplication from the deep learning community, processors with specialized computing units for matrix multiplication are being developed by numerous vendors, such as NVIDIA Tensor Cores and Google TPUs. These hardware are designed to efficiently perform matrix multiplication at low precision, taking advantage of the fact that deep learning can tolerate low-precision operations, and the computation heavily relies on matrix multiplications. For machine learning inference, fixed-point value computation is commonplace, where the input and output values and the model parameters are quantized. Thus, many processors are now equipped with fast integer matrix multiplication units. This talk introduces a double-precision equivalent matrix multiplication using Int8 Tensor Cores and the Ozaki scheme, a high-precision matrix multiplication scheme using a lower-precision computing unit.
Short bio:
Hiroyuki Ootomo is a Ph.D. candidate at Tokyo Institute of Technology and studying under Dr. Rio Yokota. His research interests lie in high performance computing, especially mixed-precision computing using special hardware, randomized numerical linear algebra, and quantum circuit simulation. His current work is on a fast and high-accuracy GEMM on NVIDIA Tensor Cores and its application.
For a list of past and upcoming NHR PerfLab seminar events, see: https://hpc.fau.de/research/nhr-perflab-seminar-series/