I'm getting started with OpenAI Gymnasium. My task is to speed-up trajectory generation. So to create N trajectories, I want to use a single parallel multi-env instead of running the same env N times with resets. I found the vector module of Gym and tried to use it. But for me it doesn't speed up the generation, moreover it's sometimes slower. Here's sample code:
Vectorized
n_envs = 100
vec_envs = gym.vector.make("CartPole-v0", render_mode="rgb_array", num_envs=n_envs)
vec_envs.reset()
n_actions = vec_envs.action_space[0].n
%%time
t_steps = 1000
for iter_idx in range(t_steps):
actions_batch = np.random.randint(low=0, high=n_actions, size=(n_envs,))
new_s, r, terminated, truncated, info = vec_envs.step(actions_batch)
Here %%time gives me:
CPU times: total: 5.8 s Wall time: 6.84 s
Non Vectorized
env = gym.make("CartPole-v0", render_mode="rgb_array")
env.reset()
n_actions = env.action_space.n
%%time
t_steps = 1000
def gen_single_session(env, t_steps):
for iter_idx in range(t_steps):
action = np.random.randint(low=0, high=n_actions)
new_s, r, terminated, truncated, info = env.step(action)
if terminated or truncated:
env.reset()
for session_idx in range(100):
gen_single_session(env, t_steps)
env.reset()
And here %%time gives me:
CPU times: total: 1.33 s Wall time: 1.3 s
In these lines I'm simulating the Vector behavior (if subenv terminates, it will be automatically reset):
if terminated or truncated:
env.reset()
If I change the env to LunarLander-v2 I get closer runtimes: 13s vs 15s respectively.
Am I doing something wrong? What's the best way to make it parallel?
source https://stackoverflow.com/questions/77084199/openai-gymnasium-vectorized-approach-works-slower-than-non-vectorized
Comments
Post a Comment