Session

Technical Poster Session 5

Location

Utah State University, Logan, UT

Abstract

Parameter-scaling techniques change the number of parameters in a machine-learning model in an effort to make the network more amenable to different device types or accuracy requirements. This research compares the performance of two such techniques. NeuralScale is a neural architecture search method which claims to generate deep neural networks for devices that are resource-constrained. It shrinks a network to a target number of parameters by adjusting the width of layers independently to achieve a higher accuracy than previous methods. The novel NeuralScale algorithm is compared to the baseline uniform scaling of MobileNet-style models, where the width of each layer in the model is scaled uniformly across the network. Measurements of the latency and runtime memory required for inference were gathered on the NVIDIA Jetson TX2 and Jetson AGX Xavier embedded GPUs using NVIDIA TensorRT. Measurements were also gathered on the Raspberry Pi 4 embedded CPU featuring ARM Cortex-A72 cores using ONNX Runtime. VGG-11, MobileNetV2, Pre-Activation ResNet-18, and ResNet-50 were all scaled to 0.25×, 0.50×, 0.75×, and 1.00× the original number of parameters. On embedded GPUs, this research finds that NeuralScale models do offer higher accuracy, but they run slower and consume much more runtime memory during inference than their equivalent uniform-scaling models. On average, NeuralScale is 40% as efficient as uniform scaling in terms of accuracy per megabyte of runtime memory, and NeuralScale uses 2.7× the runtime memory per parameter as uniform scaling. On the embedded CPU, NeuralScale is slightly more efficient than uniform scaling in terms of accuracy per megabyte of memory, using essentially the same amount of memory per parameter. However, there is on average an over 2.5× increase in the latency for inference. Importantly, parameter count does not guarantee performance in terms of runtime-memory usage between the scaling methods on embedded GPUs, while latency grows significantly on embedded CPUs.

Share

COinS
 
Aug 11th, 9:45 AM

Evaluation of Parameter-Scaling for Efficient Deep Learning on Small Satellites

Utah State University, Logan, UT

Parameter-scaling techniques change the number of parameters in a machine-learning model in an effort to make the network more amenable to different device types or accuracy requirements. This research compares the performance of two such techniques. NeuralScale is a neural architecture search method which claims to generate deep neural networks for devices that are resource-constrained. It shrinks a network to a target number of parameters by adjusting the width of layers independently to achieve a higher accuracy than previous methods. The novel NeuralScale algorithm is compared to the baseline uniform scaling of MobileNet-style models, where the width of each layer in the model is scaled uniformly across the network. Measurements of the latency and runtime memory required for inference were gathered on the NVIDIA Jetson TX2 and Jetson AGX Xavier embedded GPUs using NVIDIA TensorRT. Measurements were also gathered on the Raspberry Pi 4 embedded CPU featuring ARM Cortex-A72 cores using ONNX Runtime. VGG-11, MobileNetV2, Pre-Activation ResNet-18, and ResNet-50 were all scaled to 0.25×, 0.50×, 0.75×, and 1.00× the original number of parameters. On embedded GPUs, this research finds that NeuralScale models do offer higher accuracy, but they run slower and consume much more runtime memory during inference than their equivalent uniform-scaling models. On average, NeuralScale is 40% as efficient as uniform scaling in terms of accuracy per megabyte of runtime memory, and NeuralScale uses 2.7× the runtime memory per parameter as uniform scaling. On the embedded CPU, NeuralScale is slightly more efficient than uniform scaling in terms of accuracy per megabyte of memory, using essentially the same amount of memory per parameter. However, there is on average an over 2.5× increase in the latency for inference. Importantly, parameter count does not guarantee performance in terms of runtime-memory usage between the scaling methods on embedded GPUs, while latency grows significantly on embedded CPUs.