usage of all cores in numpy einsum

I have code which heavily uses np.einsum calls. I have installed numpy-intel in a python virtual environment using:

$ pip install mkl
$ pip install intel-numpy

np.show_config() prints:

blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include']
Supported SIMD extensions in this NumPy install:
    baseline = SSE,SSE2,SSE3
    found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
    not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL

I have an Intel chip with 16 logical processors (8 cores) in 1 socket. I want to make sure that the python code's np calls use all the cores on my machine. What I have tried:

inside the WSL I use to run the python scripts, I wrote export MKL_NUM_THREADS=8 then ran my script normally as python script.py.
inside the script.py, before importing numpy I wrote:

import os os.environ["MKL_NUM_THREADS"] = '8' # or 16

None of these trials resulted in a visible usage of all the cores when I monitor them in the terminal using htop.

There is always a core at 100% utilization, and from time to time (when inside a np.einsum call) some other cores (it seems randomly selected from the set not including the constant 100% utilization one - two or three usually) just light to 3% utilization for a few seconds or so, then drop to 0% utilization again.

Does anyone have any idea if this a normal behavior, or is it me who is doing something wrong?

Thank you!

EDIT TO INCLUDE THE ACTUAL np.einsum CALLS (MWE)

N_thetas = 20
N_rs = 600
N_phis = 18

Legendre_matr = np.random.rand(N_thetas,  2*N_thetas-1,  N_thetas)
after_first_int = np.random.rand(N_rs - 1, 2*N_thetas - 1, N_thetas)
eigenvectors_memory = np.random.rand(N_rs - 1, N_rs - 1, N_thetas)
normaliz = np.random.rand(N_rs - 1, )
Cilmoft = np.random.rand(2*N_thetas-1, N_rs-1, N_thetas)
Psi_m = np.random.rand( 2*N_thetas - 1,    N_rs - 1,    gl.N_thetas )
arange_for_m_values = np.arange(-N_thetas+1, N_thetas)
phis = np.linspace(0.0, 2*np.pi, N_phis)    

after_second_int = np.einsum('ijk,rjk->rij', Legendre_matr, after_first_int, optimize=True)


Psi_lm = np.einsum('jkl,j,mkl->jml', eigenvectors_memory, normaliz , Cilmoft, optimize=True) 

Psi_on_grid_from_Cilmoft = np.einsum('xjk,xn->jkn',   Psi_m,  np.exp(1j * np.outer(arange_for_m_values, phis)), optimize=True )

source https://stackoverflow.com/questions/73168829/usage-of-all-cores-in-numpy-einsum

StacksPedia

Search This Blog

usage of all cores in numpy einsum

Labels

Comments

Post a Comment

Popular posts from this blog

How to show number of registered users in Laravel based on usertype?

Why is my reports service not connecting?

ValueError: X has 10 features, but LinearRegression is expecting 1 features as input