Skip to content

yttcs/function-calling-agent

Repository files navigation

Barebones function calling agent using the following technology:

  1. Backend: FastAPI, MariaDB, SQL Model, OpenAI completions API, Tavily Search API
  2. Frontend: Jinja2, JS, HTMX, Bootstrap
  3. Security: Oauth2 password grant (ROPC)
  4. Infrastructure: AWS ECS/Fargate and DigitalOcean

This is a work in progress and it's planned to have multiple updates on a weekly basis.

Update for week of Jun. 30, 2025:

  1. Added multiuser capability
  2. Added Tavily Extract API
  3. Switched from gpt-3.5-turbo to gpt-4o

Update for week of Jul. 21, 2025:

  1. Added text to speech using gpt-4o-mini-tts for completion.choices[0].message.content (that means the agent now has a voice)

Note: Update for week of Aug. 11, 2025:

  1. Addied speech to text and text to speech using and Whisper and gpt-4o-mini-transcribe
  2. Added UTX date tool and time tools so the model can be time aware

Note: Update for week of Oct. 20, 2025:

  1. Added database persistence, for conversation history, in combination with in-memory python dict
  2. Employed a hybrid HTMX/JS solution to play TTS in browser from text and voice requests
  3. Added some error handling
  4. Added HTMX to avoid full page refreshes
  5. Cleaned up UI
  6. Switched to gpt-4o-mini
  7. Containerized with Podman
  8. Deployed to AWS ECS/Fargate: SENTyENT.com

Will work on issues here an there, but, for the most part, this PoC is finished. One issue to be solved is that, in chromium based browsers, voice resquests don't receive a voice response because of the stricter user gesture requirements for autoplay.

Note: Update for week of Nov. 3, 2025:

Modified the web app to run local:

  1. using llama-cpp-python server for LLMs, faster-whisper (SST) and piper (TTS)
  2. SQLite for memory persistence (no multi-user concurrency required offline)
  3. SQLite for user identity
  4. Modifications to chat_history - due to differences in chat templates between LLM/LMMs.

note: this runs surprisingly well on a nine year old I5 with 32GB of RAM.

Next up:

Move the voice agent to edge hardware with some added tools for handling sensor data.

About

Multiuser OpenAI Function Calling Agent with Secure Resource Endpoints and User Interface

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published