Add total_tokens to AiMessage

paxiaatucsdedu · paxiaatucsdedu · commit c125cbbc5401 · 2025-10-20T14:52:55.000-07:00
Add total_tokens to AiMessage in cohere provider and generic provider
diff --git a/libs/oci/PR_DESCRIPTION.md b/libs/oci/PR_DESCRIPTION.md
@@ -0,0 +1,48 @@
+# Add Token Usage to AIMessage Response
+
+## Summary
+Adds `total_tokens` to `AIMessage.additional_kwargs` for non-streaming chat responses, enabling users to track token consumption when using `ChatOCIGenAI`.
+
+## Problem
+When using `ChatOCIGenAI.invoke()`, token usage information (prompt_tokens, completion_tokens, total_tokens) from the OCI Generative AI API was not accessible in the `AIMessage` response, even though the raw OCI API returns this data.
+
+## Solution
+Extract token usage from the OCI API response and add `total_tokens` to `additional_kwargs` in non-streaming mode.
+
+### Changes Made
+**File:** `langchain_oci/chat_models/oci_generative_ai.py`
+
+1. **CohereProvider.chat_generation_info()** (lines 246-248)
+   - Extract `usage.total_tokens` from `response.data.chat_response.usage`
+   - Add to `generation_info["total_tokens"]`
+
+2. **GenericProvider.chat_generation_info()** (lines 611-613)
+   - Same extraction for Meta/Llama models
+
+## Usage
+
+### Before
+```python
+response = chat.invoke("What is the capital of France?")
+# No way to access token usage
+```
+
+### After
+```python
+response = chat.invoke("What is the capital of France?")
+print(response.additional_kwargs.get('total_tokens'))  # 26
+```
+
+## Limitations
+- **Streaming mode:** Token usage is NOT available when `is_stream=True` because the OCI Generative AI streaming API does not include usage statistics in stream events.
+- **Non-streaming only:** Use `is_stream=False` to get token usage information.
+
+## Testing
+Tested with:
+- ✅ Cohere Command-R models (`cohere.command-r-plus-08-2024`)
+- ✅ Meta Llama models (`meta.llama-3.3-70b-instruct`)
+- ✅ Non-streaming mode (`is_stream=False`)
+- ❌ Streaming mode (not supported by OCI API)
+
+## Backward Compatibility
+✅ Fully backward compatible - existing code continues to work unchanged.
diff --git a/libs/oci/langchain_oci/chat_models/oci_generative_ai.py b/libs/oci/langchain_oci/chat_models/oci_generative_ai.py
@@ -242,6 +242,11 @@ def chat_generation_info(self, response: Any) -> Dict[str, Any]:
             "is_search_required": response.data.chat_response.is_search_required,
             "finish_reason": response.data.chat_response.finish_reason,
         }
+
+        # Include token usage if available
+        if hasattr(response.data.chat_response, "usage") and response.data.chat_response.usage:
+            generation_info["total_tokens"] = response.data.chat_response.usage.total_tokens
+
         # Include tool calls if available
         if self.chat_tool_calls(response):
             generation_info["tool_calls"] = self.format_response_tool_calls(
@@ -602,6 +607,11 @@ def chat_generation_info(self, response: Any) -> Dict[str, Any]:
             "finish_reason": response.data.chat_response.choices[0].finish_reason,
             "time_created": str(response.data.chat_response.time_created),
         }
+
+        # Include token usage if available
+        if hasattr(response.data.chat_response, "usage") and response.data.chat_response.usage:
+            generation_info["total_tokens"] = response.data.chat_response.usage.total_tokens
+            
         if self.chat_tool_calls(response):
             generation_info["tool_calls"] = self.format_response_tool_calls(
                 self.chat_tool_calls(response)