FPGA-accelerated machine learning inference as a service for particle physics computing

Duarte, Javier; Harris, Philip; Hauck, Scott; Holzman, Burt; Hsu, Shih-Chieh; Jindariani, Sergo; Khan, Suffian; Kreis, Benjamin; Lee, Brian; Liu, Mia; Lončar, Vladimir; Ngadiuba, Jennifer; Pedro, Kevin; Perez, Brandon; Pierini, Maurizio; Rankin, Dylan; Tran, Nhan; Trahms, Matthew; Tsaris, Aristeidis; Versteeg, Colin; Way, Ted W.; Werran, Dustin; Wu, Zhenbin

Repository landing page

oai:cds.cern.ch:2695229

FPGA-accelerated machine learning inference as a service for particle physics computing

Authors: Javier Duarte
Philip Harris
Scott Hauck
Burt Holzman
Shih-Chieh Hsu
Sergo Jindariani
Suffian Khan
Benjamin Kreis
Brian Lee
Mia Liu
Vladimir Lončar
Jennifer Ngadiuba
Kevin Pedro
Brandon Perez
Maurizio Pierini
Dylan Rankin
Nhan Tran
Matthew Trahms
Aristeidis Tsaris
Colin Versteeg
Ted W. Way
Dustin Werran
Zhenbin Wu
Publication date: 18 April 2019
Publisher
Doi

Abstract

Large-scale particle physics experiments face challenging demands for high-throughput computing resources both now and in the future. New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) ms with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600–700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective

Similar works

Full text

CERN Document Server

oai:cds.cern.ch:2695229

Last time updated on 06/11/2019

This paper was published in CERN Document Server.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.