Open-Source AI Voice Assistant Development need Web Development
Contact person: Open-Source AI Voice Assistant Development
Phone:Show
Email:Show
Location: Cape town, South Africa
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"Real-Time Open-Source AI Assistant with Voice (mobile & web) - URGENT (6 days)
I want to ship a Visually lightning-fast personal voice assistant, with text, and voice that relies only on open-source or freely accessible services and runs entirely in real time. The core tech stack is already chosen (open to other better open-source options):
• LiveKit will handle the bi-directional audio stream.
• Speech-to-text must use Groq Whisper Large for immediate transcriptions.
• Chat logic will sit on Groq’s Llama 3.3-70B Versatile endpoint, with my fine-tuned KimiR2 weights ready to drop in.
• Text-to-speech is Kokoro (fully OpenAI TTS-compatible) and needs to surface several selectable voices on the fly.
Your task is to wire these pieces into a single, production-ready application that I can demo end-to-end: hot-word detection, low-latency streaming, live transcription, contextual Llama replies, and instant playback. Everything must feel seamless and snappy.
I need this ASAP, so I’m leaning on people who have shipped similar voice or streaming AI projects before. When you reply, link me directly to past work that proves you can integrate LiveKit or comparable low-latency pipelines and large-model inference.
Deliverables (all required for sign-off)
– A runnable repo with clear setup instructions (Docker or simple script).
– Real-time assistant demo showing <->200 ms round-trip from speech end to voice reply.
– Configurable voice list sourced from Kokoro.
– Readme covering model/API keys, environment variables, and how to swap voices or models without code changes.
If you have a faster approach that still keeps everything open source, feel free to propose it—speed is the only hard constraint.
Core Functionality & How It Works
Here’s how the app operates at a conceptual and system level:
1. Client-Server Architecture
The user interface (app) handles UI, input (text, voice, images), and rendering responses.
When you send a prompt (text, voice, image), the app forwards that to our’s servers.
The server runs the language model, possibly combining it with tool calls (search, plugins, vision modules, etc.), then returns a response.
The app displays the response (text, voice, image) to you.
The heavy computation (model inference, tool orchestration) happens in the cloud, not on your device.
2. Prompting + Tooling
Your “prompt” is any input you send (question, command, image, etc.). The model interprets it, generates a response, and possibly invokes “tools” (e.g. web search, document analysis, plugin APIs) as needed.
The system uses safety and moderation filters to prevent disallowed content.
Wikipedia
In some cases, the app or backend uses “memory” (i.e. context saved across sessions) to remember details you ask it to, so future interactions are more personalized.
3. Multimodal Input and Output
Text: You type a prompt and get a text response (classic chat).
Voice / Speech: You can speak your question and have ChatGPT speak back (i.e. the app uses speech recognition and text-to-speech).
Image Input: You can upload or take photos; ChatGPT can analyze them (e.g. “what’s in this image?”, “read text from this image”, “identify objects”)
Image Output / Generation: The app supports generating images (from text prompts or modifying existing images). In 2025, ChatGPT uses GPT-4o for image generation / transformation.
TechRadar
4. History, Memory, & Context
Your past conversations (history) are saved so you can scroll back and refer to them later across devices.
The “memory” feature lets the system remember facts you tell it (preferences, personal details) and use them in future chats to make the experience more personalized. You generally have control to disable memory.
You can also run “incognito” or “temporary” chats in which history is not saved. (This mode is useful when you don’t want certain conversations to persist.)
5. Plugins, Tools & Extensions
The ChatGPT app supports “tools” or “plugins” that let it extend its capabilities—for instance, to access the web (real-time search), call external APIs, fetch data, or interact with external systems.
In 2024, “ChatGPT Search” was introduced, allowing real-time web lookups to improve the freshness of responses.
There are specialized agent modes (e.g. “Deep Research”) that allow ChatGPT to autonomously browse and compile reports using web sources.
6. Subscription Tiers & Limits
There is a free tier and paid subscriptions (Plus, Pro, Enterprise, etc.). Paid tiers unlock more features (higher usage limits, more advanced models, priority access, extended tool usage).
Some features may be limited or gated depending on your subscription. For instance, image generation quotas, advanced voice or video features may be premium.
The system may throttle or prioritize paid users during peak load times.
grio-ai-app/
├── backend/
│ ├── src/
│ │ ├── [login to view URL] # Firebase auth
│ │ ├── [login to view URL] # Redis quota management
│ │ ├── [login to view URL] # Groq + Kokoro processing
│ │ ├── [login to view URL] # Stable Diffusion gen
│ │ └── [login to view URL] # Express server
│ ├── [login to view URL]
│ └── .[login to view URL]
├── frontend/
│ ├── [login to view URL] # Main app component
│ ├── src/
│ │ ├── components/
│ │ │ ├── [login to view URL] # LiveKit voice
│ │ │ ├── [login to view URL] # AI chat UI
│ │ │ └── [login to view URL] # Image gen UI
│ │ ├── hooks/
│ │ │ └── [login to view URL] # Connection hook
│ │ └── utils/
│ │ └── [login to view URL] # Backend calls
│ ├── [login to view URL]
│ └── [login to view URL]
└── [login to view URL]" (client-provided description)
Matched companies (6)

Codetreasure Co

SJ Solutions & Infotech

HJP Media

Appsdiary Technologies

El Codamics
