Ahmad
Ahmad2mo ago

LangChain+Ollama+Langfuse Not Recording Token Usage

Hello, I am using ChatOllama from LangChain to send LLM requests to local Ollama server. I am also integrating LangFuse with LangChain to trace the requests. The generation requests are being successfully traced, including the input and output of the model. However, the token usage is always 0. I attached a screenshot of one trace showing zero token usage. I checked the output of LangChain's invoke method and the usage data is in the response. It's accessible via response.usage_metadata["input_tokens"] and response.usage_metadata["output_tokens"]. I also tried langfuse_context.update_current_observation(usage={"input": response.usage_metadata["input_tokens"], "unit": "TOKENS"}) but it still shows zero. I would appreciate your help on resolving this issue.
No description
Solution:
thanks for reporting this, can you open an issue on github for this? https://langfuse.com/issues
GitHub
Issues · langfuse/langfuse
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23 - Issu...
Jump to solution
4 Replies
Solution
Marc
Marc2mo ago
thanks for reporting this, can you open an issue on github for this? https://langfuse.com/issues
GitHub
Issues · langfuse/langfuse
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23 - Issu...
Ahmad
Ahmad2mo ago
GitHub
bug: LangChain+Ollama+Langfuse Not Recording Token Usage · Issue #2...
Describe the bug I am using ChatOllama from LangChain to send LLM requests to local Ollama server. I am also integrating LangFuse with LangChain to trace the requests. The generation requests are b...
Marc
Marc2mo ago
thank you
Ahmad
Ahmad4w ago
Hi, for your information, I am currently using langfuse.generation to update the usage tokens. It's a little bit slow because I need to fetch the whole generation and provide the start and end times to prevent messing up the latency. But it works!