Business Client need Web Development
Contact person: Business Client
Phone:Show
Email:Show
Location: New Delhi, India
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"I’m leading a team that already fine-tunes Hugging Face models but we’re stalled on the last mile: turning those checkpoints into WebLLM artefacts that run smoothly inside the browser through WebGPU/WebAssembly. I need a short-term partner who has actually walked this path before and can sit virtually with us, show exactly how to compile a model into the WebLLM format, debug any hiccups, and prove the result works in-browser with stable latency.
What I expect from you
• A step-by-step script or notebook that converts a standard HF model (think Llama-2, GPT-J, BLOOM or similar) into WebLLM format.
• Clear explanation of the conversion tools, flags and weight slicing decisions you use, so my engineers can repeat the process later without you.
• A minimal demo web page (TypeScript or vanilla JS is fine) that loads the converted model, allocates buffers correctly, and serves a prompt via WebGPU back-end.
• Performance metrics (token / s, memory footprint) captured on at least one consumer-grade GPU so we can compare.
Acceptance criteria
1. The model loads in an evergreen Chromium-based browser with no console errors.
2. First token latency ≤ 3 s and sustained generation comparable to your benchmark notes.
3. Full reproducibility on our hardware following your instructions.
We already have the fine-tuned weights and a dev environment in place; I simply need your expertise to unblock compilation and browser inference. If you have prior commits or public demos with WebLLM, please share a link when you respond so we can hit the ground running." (client-provided description)
Matched companies (4)

Versasia Infosoft

FlowLabs

Appeonix Creative Lab
