NN-512 appeared on HN in late 2020.
No benchmarks were provided, which may be a reason why it didn't get much attention.
I decided to try NN-512 with ResNet50. It comes with this network graph as an example, and the generated ResNet50.h file contains some code snippets in the comments of an example of how to use it.
NN-512 doesn't come with any weights / params / floats, or any examples of how to generate them.
The first attempt to save weights was with PyTorch, but eventually I found that it uses a modified ResNet:
# This variant is also known as ResNet V1.5 and improves accuracy according to
# https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.
I asked the NN-512 author 37ef what I was doing wrong, and got some useful information:
Once I had saved the caffe weights and checked it works, I moved onto generating a graph from TensorFlow / Keras and saving the weights at the same time.
I compared the speed of NN-512 with Tensorflow and Neural Magic DeepSparse on an AWS c5.large and c5.xlarge on Ubuntu Server 20.04 LTS.
View HTML for full results, I picked a rounded average looking value. Not scientific, but quick.
Machine | Type | Batch Size | Time per inference |
---|---|---|---|
C5.large | TF/Keras | 1 | 0.13 |
C5.large | TF/Keras | 2 | 0.105 |
C5.large | TF/Keras | 4 | 0.09 |
C5.large | TF/Keras | 64 | 0.10 |
C5.large | DeepSparse | 1 | 0.070 |
C5.large | DeepSparse | 2 | 0.075 |
C5.large | DeepSparse | 4 | 0.068 |
C5.large | DeepSparse | 64 | 0.068 |
C5.large | NN-512 | 1 | 0.069 |
C5.large | ONNX | 1 | 0.058 |
C5.large | ONNX | 2 | 0.058 |
C5.large | ONNX | 4 | 0.058 |
C5.large | ONNX | 64 | 0.058 |
C5.xlarge | TF/Keras | 1 | 0.088 |
C5.xlarge | TF/Keras | 2 | 0.065 |
C5.xlarge | TF/Keras | 4 | 0.05 |
C5.xlarge | TF/Keras | 64 | 0.049 |
C5.xlarge | DeepSparse | 1 | 0.033 |
C5.xlarge | DeepSparse | 2 | 0.035 |
C5.xlarge | DeepSparse | 4 | 0.032 |
C5.xlarge | DeepSparse | 64 | 0.031 |
C5.xlarge | NN-512 | 1 | 0.035 |
C5.xlarge | ONNX | 1 | 0.035 |
C5.xlarge | ONNX | 2 | 0.031 |
C5.xlarge | ONNX | 4 | 0.03 |
C5.xlarge | ONNX | 64 | 0.03 |
My interpretation of the results show that NN-512 is significantly faster than Tensorflow (without looking at optimisation) and very similar in speed to DeepSparse. ONNX runtime appears to be the fastest on c5.large, but similar to DeepSparse and NN-512 on c5.xlarge.
DeepSparse is closed source, but apparently free to use. It was also designed to be used with Pruning and Quantisation, which NN-512 has nothing to do with.
In short, if you want to run a ConvNet inference on CPU, and you want to use open source code, NN-512 looks fast.