| Time | Implementation | Compiler | Benchmark date | SUPERCOP version |
| 2017 | ref | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161210 | 20161026 |
| 2032 | ref | gcc -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161210 | 20161026 |
| 2043 | ref | gcc -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161210 | 20161026 |
| 2048 | ref | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161210 | 20161026 |
| 2049 | armneon | gcc -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161210 | 20161026 |
| 2053 | armneon | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161210 | 20161026 |
| 2054 | armneon | gcc -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161210 | 20161026 |
| 2054 | armneon2 | gcc -mcpu=native -mfpu=neon-vfpv4 -Os | 20161210 | 20161026 |
| 2058 | armneon2 | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161210 | 20161026 |
| 2059 | armneon2 | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -Os | 20161210 | 20161026 |
| 2059 | armneon2 | gcc -mcpu=native -mfpu=neon-vfpv4 -O3 | 20161210 | 20161026 |
| 2061 | armneon2 | gcc -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161210 | 20161026 |
| 2063 | armneon2 | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161210 | 20161026 |
| 2067 | armneon | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -O2 | 20161210 | 20161026 |
| 2084 | armneon | gcc -mcpu=native -mfpu=neon-vfpv4 -Os | 20161210 | 20161026 |
| 2094 | armneon | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -Os | 20161210 | 20161026 |
| 2207 | ref | gcc -funroll-loops -mcpu=native -mfpu=neon-vfpv4 -Os | 20161210 | 20161026 |
| 2349 | ref | gcc -mcpu=native -mfpu=neon-vfpv4 -Os | 20161210 | 20161026 |