Accelerate Python with Taichi

Accelerate Python with Taichi#

Python has become the most popular language in many rapidly evolving sectors, such as deep learning and data sciences. Yet its easy readability comes at the cost of performance. Of course, we all complain about program performance from time to time, and Python should certainly not take all the blame. Still, it’s fair to say that Python’s nature as an interpreted language does not help, especially in computation-intensive scenarios (e.g., when there are multiple nested for loops).

This notebook is modified from the blog written by Yuanming Hu, who is the creator of Taichi. One of the most notable advantages Taichi delivers is speeding up Python code.

To install Taichi, activate you environment and type the following command:

pip install taichi

To use Taichi, import the package with the following command:

import taichi as ti

Count the number of primes#

Large-scale or nested for loops in Python always leads to poor runtime performance. The following demo counts the primes within a specified range and involves nested for loops. Simply by importing Taichi or switching to Taichi’s GPU backends, you will see a significant boost to the overall performance.

"""Count the prime numbers in the range [1, n]
"""
import time

# Checks if a positive integer is a prime number
def is_prime(n: int) -> bool:
    result = True

    # Traverses the range between 2 and sqrt(n)
    # - Returns False if n can be divided by one of them;
    # - otherwise, returns True
    for k in range(2, int(n ** 0.5) + 1):
        if n % k == 0:
            result = False
            break

    return result

# Traverses the range between 2 and n
# Counts the primes according to the return of is_prime()
def count_primes(n: int) -> int:
    count = 0
    for k in range(2, n):
        if is_prime(k):
           count += 1

    return count

t_start = time.perf_counter()
print(count_primes(1000000))
t_end = time.perf_counter()
print(f"Execution time: {t_end - t_start:.4f} seconds")
78498
Execution time: 2.1988 seconds

Now, let’s change the code a bit: import Taichi to your Python code and initialize it using the CPU backend.

import taichi as ti
ti.init(arch=ti.cpu)
[Taichi] version 1.7.3, llvm 15.0.1, commit 5ec301be, win, python 3.12.9
[Taichi] Starting on arch=x64

Decorate is_prime() with @ti.func and count_primes() with @ti.kernel.

Note

Taichi’s compiler compiles the Python code decorated with @ti.kernel and @ti.func onto different devices, such as CPU and GPU, for high-performance computation.

@ti.func
def is_prime(n: int):
    result = True
    for k in range(2, int(n ** 0.5) + 1):
        if n % k == 0:
            result = False
            break
        
    return result

@ti.kernel
def count_primes(n: int) -> int:
    count = 0
    for k in range(2, n):
        if is_prime(k):
            count += 1

    return count

t_start = time.perf_counter()
print(count_primes(1000000))
t_end = time.perf_counter()
print(f"Execution time: {t_end - t_start:.4f} seconds")
78498
Execution time: 0.1199 seconds

Exercise

  1. Increase \(N\) tenfold to 10,000,000 and rerun the codes. What is the speed-up?

  2. Change Taichi’s backend from CPU to GPU and give it a rerun. What is the speed-up?

2D Diffusion#

Import the required libraries.

# Import the required libraries
import time
import taichi as ti
ti.init(arch=ti.cpu,
        default_fp=ti.f64)
[Taichi] Starting on arch=x64

Isolate the code responsible for heavy computation (loops) and enclose the code in a Taichi kernel.

@ti.kernel
def fvm_iteration() -> ti.f64:
    # copy the current temperature field to the placeholder field
    for i, j in ti.ndrange(nx, ny):
        Told[i, j] = T[i, j]

    # loop over the grid points
    for i, j in ti.ndrange(nx, ny):
        # left-bottom corner
        if i == 0 and j == 0:
            T[i, j] = ((k*area/dx)*Told[i+1, j] + ((k*area/dx))*Told[i, j+1] + q*area + area*Tinf/(1/h + dx/(2*k))) / (2*k*area/dx + area/(1/h + dx/(2*k)))
        # right-bottom corner
        elif i == nx-1 and j == 0:
            T[i, j] = ((k*area/dx)*Told[i-1, j] + (k*area/dx)*Told[i, j+1] + area/(1/h + dx/(2*k))*Tinf) / (2*k*area/dx + area/(1/h + dx/(2*k)))
        # left-top corner
        elif i == 0 and j == ny-1:
            T[i, j] = ((k*area/dx)*Told[i+1, j] + (k*area/dx)*Told[i, j-1] + (q*area + 2*k*area/dx*Tn)) / (4*k*area/dx)
        # right-top corner
        elif i == nx-1 and j == ny-1:
            T[i, j] = ((k*area/dx)*Told[i-1, j] + (k*area/dx)*Told[i, j-1] + (2*k*area/dx*Tn)) / (4*k*area/dx)
        # left boundary
        elif i == 0:
            T[i, j] = ((k*area/dx)*Told[i+1, j] + (k*area/dx)*Told[i, j-1] + (k*area/dx)*Told[i, j+1] + q*area) / (3*k*area/dx)
        # right boundary
        elif i == nx-1:
            T[i, j] = ((k*area/dx)*Told[i-1, j] + (k*area/dx)*Told[i, j-1] + (k*area/dx)*Told[i, j+1]) / (3*k*area/dx)
        # bottom boundary
        elif j == 0:
            T[i, j] = ((k*area/dx)*Told[i-1, j] + (k*area/dx)*Told[i+1, j] + (k*area/dx)*Told[i, j+1] + (area*Tinf/(1/h + dx/(2*k)))) / (3*k*area/dx + area/(1/h + dx/(2*k)))
        # top boundary
        elif j == ny-1:
            T[i, j] = ((k*area/dx)*Told[i-1, j] + (k*area/dx)*Told[i+1, j] + (k*area/dx)*Told[i, j-1] + (2*k*area/dx*Tn)) / (5*k*area/dx)
        # internal nodes
        else:
            T[i, j] = 0.25 * (Told[i-1, j] + Told[i+1, j] + Told[i, j-1] + Told[i, j+1])
    
    # calculate the temperature difference
    Tdiff = 0.0
    for i, j in ti.ndrange(nx, ny):
        Tdiff += ti.abs(T[i, j] - Told[i, j])

    return Tdiff

Major routine of the code.

# Parameter declarations
lx = 0.3                                # length of the plate
ly = 0.4                                # height of the plate
nx = 3                                  # number of grid points in x-direction
ny = round(ly/lx*nx)                    # number of grid points in y-direction
dx = lx/nx                              # grid spacing in x-direction
dy = ly/ny                              # grid spacing in y-direction
h = 0.01                                # plate thickness
area = h*dx                             # flux area

k = 1000                                # coefficient for heat conduction
q = 500000                              # heat flux at the west boundary
Tinf = 200                              # ambient temperature in the south
h = 253.165                             # convective heat transfer coefficient at the southern edge
Tn = 100                                # constant temperature at the northern edge

# Set initial condition (Note the order of nx and ny)
T = ti.field(dtype=ti.f64, shape=(nx, ny))    # a taichi field with all elements equal to zero

# Finite volume calculations
Told = ti.field(dtype=ti.f64, shape=(nx, ny)) # placeholder field to advance the solution
Tdiff = 1                               # temperature difference for convergence
cnt = 0                                 # counter for the number of iterations
t_start = time.perf_counter()           # start time for the simulation

while Tdiff > 1e-3:                     # loop until the difference is less than 1e-3
    cnt += 1                            # increment the counter
    Tdiff = fvm_iteration()             # calculate the temperature difference

    if cnt % 10 == 0:                   # print every 100 iterations
        print('Iteration {}: Tdiff = {:.4f}'.format(cnt, Tdiff))

# Stop the timer and print the iteration results
t_end = time.perf_counter()
print('******************************************')
print('Final temperature difference: {:.4f}'.format(Tdiff))
print('Number of iterations: {}'.format(cnt))
print('Elapsed time: {:.3f} seconds'.format(t_end - t_start))
print('The temperature at the plate center is {:.4f} degree Celsius.'.format(0.5*(T[nx//2, ny//2] + T[nx//2, ny//2-1])))
Iteration 10: Tdiff = 71.3782
Iteration 20: Tdiff = 39.7494
Iteration 30: Tdiff = 22.2378
Iteration 40: Tdiff = 12.4416
Iteration 50: Tdiff = 6.9609
Iteration 60: Tdiff = 3.8945
Iteration 70: Tdiff = 2.1789
Iteration 80: Tdiff = 1.2190
Iteration 90: Tdiff = 0.6820
Iteration 100: Tdiff = 0.3816
Iteration 110: Tdiff = 0.2135
Iteration 120: Tdiff = 0.1194
Iteration 130: Tdiff = 0.0668
Iteration 140: Tdiff = 0.0374
Iteration 150: Tdiff = 0.0209
Iteration 160: Tdiff = 0.0117
Iteration 170: Tdiff = 0.0065
Iteration 180: Tdiff = 0.0037
Iteration 190: Tdiff = 0.0020
Iteration 200: Tdiff = 0.0011
******************************************
Final temperature difference: 0.0010
Number of iterations: 203
Elapsed time: 0.107 seconds
The temperature at the plate center is 193.1574 degree Celsius.

Exercise#

  1. Explore the Taichi website.