| Time | Implementation | Compiler | Benchmark date | SUPERCOP version |
| 220536 | optimized_nonSSE | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161216 | 20161026 |
| 223750 | optimized_nonSSE | gcc -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161216 | 20161026 |
| 229671 | optimized_nonSSE | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161216 | 20161026 |
| 298853 | ref | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161216 | 20161026 |
| 307962 | ref | gcc -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161216 | 20161026 |
| 327340 | optimized_nonSSE | gcc -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161216 | 20161026 |
| 334577 | ref | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161216 | 20161026 |
| 397267 | optimized_nonSSE | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -Os | 20161216 | 20161026 |
| 412125 | optimized_nonSSE | gcc -mcpu=native -mfpu=neon-vfpv4 -Os | 20161216 | 20161026 |
| 433614 | ref | gcc -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161216 | 20161026 |
| 497081 | ref | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -Os | 20161216 | 20161026 |
| 501509 | ref | gcc -mcpu=native -mfpu=neon-vfpv4 -Os | 20161216 | 20161026 |