Out of curiosity, I rewrote their prime counter example to use a sieve instead o...

vishal0123 · on March 10, 2023

I tried implementing the same and I am getting 500ms vs 20ms with wrong answer in the first call in taichi but correct in subsequent calls. I guess I found some bug in taichi: https://imgur.com/a/lpK2iVF

Could you share your code as well.

    N = 1000000

    isnotprime = [0] * N

    def count_primes(n: int) -> int:
        count = 0
        for k in range(2, n):
            if isnotprime[k] == 0:
                count += 1
                for l in range(2, n // k):
                    isnotprime[l * k] = 1

        return count

    import taichi as ti
    ti.init(arch=ti.cpu)

    isnotprime = ti.field(ti.i8, shape=(N, ))

    @ti.kernel
    def count_primes(n: ti.i32) -> int:
        count = 0
        for k in range(2, n):
            if isnotprime[k] == 0:
                count += 1
                for l in range(2, n // k):
                    isnotprime[l * k] = 1

        return count

dgacmu · on March 11, 2023

Python:

    import time
    import math
    import numpy as np
    
    N = 1000000
    SN = math.floor(math.sqrt(N))
    sieve = [False] * N
    
    def init_sieve():
        for i in range(2, SN):
            if not sieve[i]:
                k = i*2
                while k < N:
                    sieve[k] = True
                    k += i
            
    
    def count_primes(n: int) -> int:
        return (N-2) - sum(sieve)
    
    start = time.perf_counter()
    init_sieve()
    print(f"Number of primes: {count_primes(N)}")
    print(f"time elapsed: {time.perf_counter() - start}/s")

Taichi:

    import taichi as ti
    import time
    import math
    ti.init(arch=ti.cpu)
    
    N = 1000000
    SN = math.floor(math.sqrt(N))
    sieve = ti.field(ti.i8, shape=N)
    
    @ti.kernel
    def init_sieve():
        for i in range(2, SN):
            if sieve[i] == 0:
                k = i*2
                while k < N:
                    sieve[k] = 1
                    k += i
            
    @ti.kernel
    def count_primes(n: int) -> int:
        count = 0
        for i in range(2, N):
            if (sieve[i] == 0):
                count += 1
        return count
    
    start = time.perf_counter()
    init_sieve()
    print(f"Number of primes: {count_primes(N)}")
    print(f"time elapsed: {time.perf_counter() - start}/s")

(The difference of using 0 vs False is tiny; I had just been poking at the python code to think about how I'd make it more pythonic and see if that made it worse to do taichi)

bombolo · on March 10, 2023

But how fast does it go with pypy and no changes to the code?

v3ss0n · on March 10, 2023

Interested as well

anst · on March 10, 2023

On my old Mac: Python=133.5 s, Numba=2.61 s (parallel prange in count_primes), Taichi=1.8 s. (on ti.cpu, but fails with metal).

garyrob · on March 10, 2023

Might be worthwhile to run the same code with an appropriate numba decorator. My guess is that you'd get at least as much speed up but without having to change the sieve declaration, but I'm not sure.