- Backend: FastAPI, MariaDB, SQL Model, OpenAI completions API, Tavily Search API
- Frontend: Jinja2, JS, HTMX, Bootstrap
- Security: Oauth2 password grant (ROPC)
- Infrastructure: AWS ECS/Fargate and DigitalOcean
This is a work in progress and it's planned to have multiple updates on a weekly basis.
- Added multiuser capability
- Added Tavily Extract API
- Switched from gpt-3.5-turbo to gpt-4o
- Added text to speech using gpt-4o-mini-tts for completion.choices[0].message.content (that means the agent now has a voice)
- Addied speech to text and text to speech using and Whisper and gpt-4o-mini-transcribe
- Added UTX date tool and time tools so the model can be time aware
- Added database persistence, for conversation history, in combination with in-memory python dict
- Employed a hybrid HTMX/JS solution to play TTS in browser from text and voice requests
- Added some error handling
- Added HTMX to avoid full page refreshes
- Cleaned up UI
- Switched to gpt-4o-mini
- Containerized with Podman
- Deployed to AWS ECS/Fargate: SENTyENT.com
Will work on issues here an there, but, for the most part, this PoC is finished. One issue to be solved is that, in chromium based browsers, voice resquests don't receive a voice response because of the stricter user gesture requirements for autoplay.
Modified the web app to run local:
- using llama-cpp-python server for LLMs, faster-whisper (SST) and piper (TTS)
- SQLite for memory persistence (no multi-user concurrency required offline)
- SQLite for user identity
- Modifications to chat_history - due to differences in chat templates between LLM/LMMs.
note: this runs surprisingly well on a nine year old I5 with 32GB of RAM.
Move the voice agent to edge hardware with some added tools for handling sensor data.