I have code which heavily uses np.einsum
calls. I have installed numpy-intel
in a python virtual environment using:
$ pip install mkl
$ pip install intel-numpy
np.show_config()
prints:
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/sat_bot/base/conda-bld/numpy_and_dev_1643279478844/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
I have an Intel chip with 16 logical processors (8 cores) in 1 socket. I want to make sure that the python code's np calls use all the cores on my machine. What I have tried:
-
inside the WSL I use to run the python scripts, I wrote
export MKL_NUM_THREADS=8
then ran my script normally aspython script.py
. -
inside the
script.py
, before importingnumpy
I wrote:import os os.environ["MKL_NUM_THREADS"] = '8' # or 16
None of these trials resulted in a visible usage of all the cores when I monitor them in the terminal using htop
.
There is always a core at 100% utilization, and from time to time (when inside a np.einsum
call) some other cores (it seems randomly selected from the set not including the constant 100% utilization one - two or three usually) just light to 3% utilization for a few seconds or so, then drop to 0% utilization again.
Does anyone have any idea if this a normal behavior, or is it me who is doing something wrong?
Thank you!
EDIT TO INCLUDE THE ACTUAL np.einsum
CALLS (MWE)
N_thetas = 20
N_rs = 600
N_phis = 18
Legendre_matr = np.random.rand(N_thetas, 2*N_thetas-1, N_thetas)
after_first_int = np.random.rand(N_rs - 1, 2*N_thetas - 1, N_thetas)
eigenvectors_memory = np.random.rand(N_rs - 1, N_rs - 1, N_thetas)
normaliz = np.random.rand(N_rs - 1, )
Cilmoft = np.random.rand(2*N_thetas-1, N_rs-1, N_thetas)
Psi_m = np.random.rand( 2*N_thetas - 1, N_rs - 1, gl.N_thetas )
arange_for_m_values = np.arange(-N_thetas+1, N_thetas)
phis = np.linspace(0.0, 2*np.pi, N_phis)
after_second_int = np.einsum('ijk,rjk->rij', Legendre_matr, after_first_int, optimize=True)
Psi_lm = np.einsum('jkl,j,mkl->jml', eigenvectors_memory, normaliz , Cilmoft, optimize=True)
Psi_on_grid_from_Cilmoft = np.einsum('xjk,xn->jkn', Psi_m, np.exp(1j * np.outer(arange_for_m_values, phis)), optimize=True )
source https://stackoverflow.com/questions/73168829/usage-of-all-cores-in-numpy-einsum
Comments
Post a Comment