Business Client need Software Development
Contact person: Business Client
Phone:Show
Email:Show
Location: Prizren, Serbia
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"I have a custom Gym-style environment, Stable-Baselines3 wrappers, and a separate PyTorch implementation, yet the agents still refuse to converge. I’m mainly wrestling with algorithm selection and tuning: right now I rotate between PPO, A3C/A2C, and an in-house DDPG variant, but none deliver reliable learning curves.
What I need is an experienced reinforcement learning practitioner to dive into the codebase, inspect the training loop, replay buffers, loss computations, and any subtle details that could be sabotaging performance. A sharp eye for hyper-parameter schedules, network architectures, and normalization tricks will be invaluable. The goal is to emerge with a clear recommendation on which algorithm fits this environment and a set of tuned parameters that consistently learn.
By the end of the engagement I’d like:
• A concise report (or annotated notebook) explaining what was changed and why, backed by training curves that show stable convergence.
• Updated code snippets or pull-request ready edits that incorporate those changes.
If you’ve wrestled convergence issues in Stable-Baselines3 or custom PyTorch RL pipelines before, I’d love your perspective." (client-provided description)
Matched companies (3)

B2Bcert ISO consultants in Bangalore

El Codamics
