Kamil Lee need Projekty IT
Contact person: Kamil Lee
Phone:Show
Email:Show
Location: Remote Cooperation
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"I need help building realistic, terminal-based STEM research tasks used to evaluate frontier AI models (GPT, Gemini, etc.).What you'll build:A self-contained coding task that looks like real research work (analyzing datasets, running simulations, validating hypotheses, comparing methods). Not a textbook problem.Each submission must include:instruction.md (workflow, inputs, outputs, success criteria)Reproducible Docker environment with dataOracle solution (solve.sh) that fully solves the taskDeterministic tests for verificationtask.toml metadataAll packaged into one zipQuality bar:Multi-step, research-grade workflowHard enough that frontier models fail more than 80% of the timeOracle passes local tests 3 out of 3 timesObjectively verifiable outputsNo LLM-generated content allowedWho's a fit:STEM background (biology, chemistry, physics, ML, data science, etc.) with strong Python and Docker skills.Payout: $100 per accepted submission." (client-provided description)
Additional information:"No description" (admin-provided information)
Matched companies (2)

Crystal Infoway
