情報処理学会第84回全国大会講演論文集

4R-02

A high performance implementation of a factorized-prior image compression model

○林　方正，孫　鶴鳴，甲藤二郎（早大）

We introduce a high performance implementation, based on pipelining and NVIDIA TensorRT, of a factorized-prior image compression model. While the original implementation is single-threaded and less efficient on system resources, our implementation features multi-threading pipelining for CPU intensive tasks such as entropy coding, and a TensorRT-optimized GPU model backend in FP16 precision. The implementation allows high frame rate, high throughput and low latency image compression which better utilizes compute capabilities, coming with only negligible accuracy loss compared to the original implementation. We also describe observed performance with our implementation deployed on Jetson Xavier NX, an embedded platform from NVIDIA. We observed a 75 fps encoding and 35 fps decoding throughput for 768x512-pixel images under the 20W power profile.