Very nice work! I am curious whether you have tried this on a many-core CPU, with 32 or 64 cores? Does the performance scale well?