Open-Source AI Voice Assistant Development need Web Development

Contact person: Open-Source AI Voice Assistant Development

Phone:Show

Email:Show

Location: Cape town, South Africa

Budget: Recommended by industry experts

Time to start: As soon as possible

Project description:
"Real-Time Open-Source AI Assistant with Voice (mobile & web) - URGENT (6 days)

I want to ship a Visually lightning-fast personal voice assistant, with text, and voice that relies only on open-source or freely accessible services and runs entirely in real time. The core tech stack is already chosen (open to other better open-source options):

• LiveKit will handle the bi-directional audio stream.
• Speech-to-text must use Groq Whisper Large for immediate transcriptions.
• Chat logic will sit on Groq’s Llama 3.3-70B Versatile endpoint, with my fine-tuned KimiR2 weights ready to drop in.
• Text-to-speech is Kokoro (fully OpenAI TTS-compatible) and needs to surface several selectable voices on the fly.

Your task is to wire these pieces into a single, production-ready application that I can demo end-to-end: hot-word detection, low-latency streaming, live transcription, contextual Llama replies, and instant playback. Everything must feel seamless and snappy.

I need this ASAP, so I’m leaning on people who have shipped similar voice or streaming AI projects before. When you reply, link me directly to past work that proves you can integrate LiveKit or comparable low-latency pipelines and large-model inference.

Deliverables (all required for sign-off)
– A runnable repo with clear setup instructions (Docker or simple script).
– Real-time assistant demo showing <->200 ms round-trip from speech end to voice reply.
– Configurable voice list sourced from Kokoro.
– Readme covering model/API keys, environment variables, and how to swap voices or models without code changes.

If you have a faster approach that still keeps everything open source, feel free to propose it—speed is the only hard constraint.

Core Functionality & How It Works

Here’s how the app operates at a conceptual and system level:

1. Client-Server Architecture

The user interface (app) handles UI, input (text, voice, images), and rendering responses.

When you send a prompt (text, voice, image), the app forwards that to our’s servers.

The server runs the language model, possibly combining it with tool calls (search, plugins, vision modules, etc.), then returns a response.

The app displays the response (text, voice, image) to you.

The heavy computation (model inference, tool orchestration) happens in the cloud, not on your device.

2. Prompting + Tooling

Your “prompt” is any input you send (question, command, image, etc.). The model interprets it, generates a response, and possibly invokes “tools” (e.g. web search, document analysis, plugin APIs) as needed.

The system uses safety and moderation filters to prevent disallowed content.
Wikipedia

In some cases, the app or backend uses “memory” (i.e. context saved across sessions) to remember details you ask it to, so future interactions are more personalized.


3. Multimodal Input and Output

Text: You type a prompt and get a text response (classic chat).

Voice / Speech: You can speak your question and have ChatGPT speak back (i.e. the app uses speech recognition and text-to-speech).


Image Input: You can upload or take photos; ChatGPT can analyze them (e.g. “what’s in this image?”, “read text from this image”, “identify objects”)

Image Output / Generation: The app supports generating images (from text prompts or modifying existing images). In 2025, ChatGPT uses GPT-4o for image generation / transformation.
TechRadar


4. History, Memory, & Context

Your past conversations (history) are saved so you can scroll back and refer to them later across devices.


The “memory” feature lets the system remember facts you tell it (preferences, personal details) and use them in future chats to make the experience more personalized. You generally have control to disable memory.


You can also run “incognito” or “temporary” chats in which history is not saved. (This mode is useful when you don’t want certain conversations to persist.)


5. Plugins, Tools & Extensions

The ChatGPT app supports “tools” or “plugins” that let it extend its capabilities—for instance, to access the web (real-time search), call external APIs, fetch data, or interact with external systems.


In 2024, “ChatGPT Search” was introduced, allowing real-time web lookups to improve the freshness of responses.


There are specialized agent modes (e.g. “Deep Research”) that allow ChatGPT to autonomously browse and compile reports using web sources.


6. Subscription Tiers & Limits

There is a free tier and paid subscriptions (Plus, Pro, Enterprise, etc.). Paid tiers unlock more features (higher usage limits, more advanced models, priority access, extended tool usage).

Some features may be limited or gated depending on your subscription. For instance, image generation quotas, advanced voice or video features may be premium.

The system may throttle or prioritize paid users during peak load times.


grio-ai-app/
├── backend/
│ ├── src/
│ │ ├── [login to view URL] # Firebase auth
│ │ ├── [login to view URL] # Redis quota management
│ │ ├── [login to view URL] # Groq + Kokoro processing
│ │ ├── [login to view URL] # Stable Diffusion gen
│ │ └── [login to view URL] # Express server
│ ├── [login to view URL]
│ └── .[login to view URL]
├── frontend/
│ ├── [login to view URL] # Main app component
│ ├── src/
│ │ ├── components/
│ │ │ ├── [login to view URL] # LiveKit voice
│ │ │ ├── [login to view URL] # AI chat UI
│ │ │ └── [login to view URL] # Image gen UI
│ │ ├── hooks/
│ │ │ └── [login to view URL] # Connection hook
│ │ └── utils/
│ │ └── [login to view URL] # Backend calls
│ ├── [login to view URL]
│ └── [login to view URL]
└── [login to view URL]" (client-provided description)


Matched companies (6)

...

Codetreasure Co

🚀 Your Expert Partner for Mobile & Web App Development Unlock the full potential of your business with Codetreasure —a leading provider of tailored … Read more

...

SJ Solutions & Infotech

SJ Solutions & Infotech is a team of highly experienced and dynamic professionals who have an enormous passion for technology. In this fast changing … Read more

...

HJP Media

I am founder and CEO of HJP Media. The fastest growing AI digital solutions company in the world, offering innovative, AI powered digital marketing a… Read more

...

Appsdiary Technologies

AppsDiary is a software house that designs and develops mobile applications, websites, and custom software solutions. They work with businesses to c… Read more

...

El Codamics

El Codamics – Company Preview About Us El Codamics is a Coimbatore-based software development firm helping startups, enterprises, and global clie… Read more

...

Kiantechwise Pvt. Ltd.

Kiantechwise is a creative tech company delivering innovative web design, software solutions, branding, and digital marketing. With expertise and vis… Read more