elsatch•2mo ago

Null values when using Haystack integration

Hi everyone! I've been checking the documentation to setup Haystack / Langfuse using the existing integration. Main problem I am facing right now is that I'm receiving nulls as input/output/metadata. Current setup: Ubuntu 22.04 running self-hosted Langfuse, launching the Haystack code from the very same machine locally. Things that work: - I have setup my project and got API keys on Langfuse side - I have configured the HAYSTACK_CONTENT_TRACING_ENABLED to True - I have added a tracer component to my pipeline in Haystack - I have been able to send traces to the server (trace name description and steps are correctly recorded, but content is empty) Additional code in thread
elsatch•2mo ago
This is the code I'm using:
#!/usr/bin/env python3
import os
from datasets import load_dataset

from haystack import Pipeline
from haystack_integrations.components.generators.llama_cpp import LlamaCppChatGenerator

from haystack.components.builders import ChatPromptBuilder
from haystack.components.builders.answer_builder import AnswerBuilder

from haystack.dataclasses import ChatMessage

os.environ["LANGFUSE_HOST"] = "http://LOCAL_NETWORK_IP:3000"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."

# Note: You must setup this variable before importing the LangfuseConnector


from haystack_integrations.components.connectors.langfuse import LangfuseConnector

system_message = ChatMessage.from_system(
Answer the question as briefly as possible. If the answer is a number, provide the number only.

user_message = ChatMessage.from_user("Question: {{question}}")
assistent_message = ChatMessage.from_assistant("Answer: ")

chat_template = [system_message, user_message, assistent_message]

batch_qa_pipeline = Pipeline()

generator = LlamaCppChatGenerator(
model_kwargs={"n_gpu_layers": -1},


batch_qa_pipeline.add_component("tracer", LangfuseConnector("Batch QA"))
batch_qa_pipeline.add_component(instance=ChatPromptBuilder(template=chat_template), name="prompt_builder")
batch_qa_pipeline.add_component(instance=generator, name="llm")
batch_qa_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder")

batch_qa_pipeline.connect("prompt_builder", "llm")
batch_qa_pipeline.connect("llm", "answer_builder")
#!/usr/bin/env python3
import os
from datasets import load_dataset

from haystack import Pipeline
from haystack_integrations.components.generators.llama_cpp import LlamaCppChatGenerator

from haystack.components.builders import ChatPromptBuilder
from haystack.components.builders.answer_builder import AnswerBuilder

from haystack.dataclasses import ChatMessage

os.environ["LANGFUSE_HOST"] = "http://LOCAL_NETWORK_IP:3000"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."

# Note: You must setup this variable before importing the LangfuseConnector


from haystack_integrations.components.connectors.langfuse import LangfuseConnector

system_message = ChatMessage.from_system(
Answer the question as briefly as possible. If the answer is a number, provide the number only.

user_message = ChatMessage.from_user("Question: {{question}}")
assistent_message = ChatMessage.from_assistant("Answer: ")

chat_template = [system_message, user_message, assistent_message]

batch_qa_pipeline = Pipeline()

generator = LlamaCppChatGenerator(
model_kwargs={"n_gpu_layers": -1},


batch_qa_pipeline.add_component("tracer", LangfuseConnector("Batch QA"))
batch_qa_pipeline.add_component(instance=ChatPromptBuilder(template=chat_template), name="prompt_builder")
batch_qa_pipeline.add_component(instance=generator, name="llm")
batch_qa_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder")

batch_qa_pipeline.connect("prompt_builder", "llm")
batch_qa_pipeline.connect("llm", "answer_builder")
Code continues:
questions = ["What is the value in Ohms of a resistor with the following color codes: red, red, orange?",
"How many musketeers were there?",]

for question in questions:
result = batch_qa_pipeline.run(
"prompt_builder": {"question": question},
"llm": {"generation_kwargs": {"max_tokens": 128, "temperature": 0.1}},
"answer_builder": {"query": question},

generated_answer = result["answer_builder"]["answers"][0]
questions = ["What is the value in Ohms of a resistor with the following color codes: red, red, orange?",
"How many musketeers were there?",]

for question in questions:
result = batch_qa_pipeline.run(
"prompt_builder": {"question": question},
"llm": {"generation_kwargs": {"max_tokens": 128, "temperature": 0.1}},
"answer_builder": {"query": question},

generated_answer = result["answer_builder"]["answers"][0]
Outputs are printed to console properly and traces are recorded on the langfuse side... but empty If anyone has faced this issue or could offer any guidance about how to solve it, I would appreciate it
mayank•2mo ago
yes i faced the similar similar issue, my output is not able to trace. try flushing, it works for me. please also see https://langfuse.com/faq/all/missing-traces
elsatch•2mo ago
I will try flushing then. Copying the reference information from the tracing: If you want to send a batch immediately, you can call the flush method on the client. In case of network issues, flush will log an error and retry the batch, it will never throw an exception. Decorator from langfuse.decorators import langfuse_context langfuse_context.flush() low-level SDK langfuse.flush() If you exit the application, use shutdown method to make sure all requests are flushed and pending requests are awaited before the process exits. On success of this function, no more events will be sent to Langfuse API. langfuse.shutdown() I have modified my code to add flushing without any significant difference. As a reference to add manual flushing:
# import low level SDK
from langfuse import Langfuse

# start the client
langfuse = Langfuse()

# your code goes here

# At the end of your program add langfuse.shutdown() or...

# Additional info at: https://langfuse.com/docs/tracing#manual-flushing
# import low level SDK
from langfuse import Langfuse

# start the client
langfuse = Langfuse()

# your code goes here

# At the end of your program add langfuse.shutdown() or...

# Additional info at: https://langfuse.com/docs/tracing#manual-flushing
Just to clarify, I am still getting nulls in the output Advances and checks so far: - I have tried adding flushing to Langfuse to see if it made any difference. Still returns nulls. - I have tried switching from LlamaCppChatGenerator to OpenAIChatGenerator, querrying local Ollama as an OpenAI compatible endpoint. Still returns nulls. - I have tried switching from OpenAIChatGenerator calling Ollama to OpenAIChatGenerator calling GPT-4o Mini at OpenAI endpoint. Still returns nulls in the traces. - I have tried switching from local Langfuse to Cloud Langfuse. I am still getting nulls in my traces. So, at this point I am quite sure there is something wrong in my code 🙂
elsatch•2mo ago
I have tried replicating the Basic example in the Haystack-Langfuse integration: https://langfuse.com/docs/integrations/haystack/example-python This is the output from my command: Trace url: https://cloud.langfuse.com/trace/d9bbf71d-8461-47e4-8b59-ca5f868861a4 Response: Truman Capote was an American author known for his works such as "Breakfast at Tiffany's" and "In Cold Blood." He was born Truman Persons but was later adopted by his stepfather and took on the last name Capote. Capote was known for his unique writing style and his ability to blend fiction and non-fiction in his works. He was also a prominent figure in the literary world and social scene in the mid-20th century. And surprisingly... Nulls again Stack: local environment, running sample code, using chatgpt-3.5 turbo at OpenAI servers, cloud langfuse servers. Null output. Maybe there is something wrong in my environment then. Time to rebuild from scratch.
Cookbook: Haystack Integration - Langfuse
Open-source observability for Haystack, a popular library to build RAG applications.
elsatch•2mo ago
Second computer using Windows OS instead of Linux. Installed environment from scratch using:
pip install haystack-ai langfuse-haystack langfuse sentence-transformers datasets mwparserfromhell torch==2.3.1
pip install haystack-ai langfuse-haystack langfuse sentence-transformers datasets mwparserfromhell torch==2.3.1
Torch 2.4.0 returns an error about version not found, but 2.3.1 works. When launching the default script from Haystack example:
import os

os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-4..."
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-3..."
os.environ["OPENAI_API_KEY"] = "sk-proj-N..."

from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

from haystack_integrations.components.connectors.langfuse import LangfuseConnector

if __name__ == "__main__":
pipe = Pipeline()
pipe.add_component("tracer", LangfuseConnector("Chat example"))
pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
pipe.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))

pipe.connect("prompt_builder.prompt", "llm.messages")

messages = [
ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}"),

response = pipe.run(
data={"prompt_builder": {"template_variables": {"location": "Berlin"}, "prompt_source": messages}}
import os

os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-4..."
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-3..."
os.environ["OPENAI_API_KEY"] = "sk-proj-N..."

from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

from haystack_integrations.components.connectors.langfuse import LangfuseConnector

if __name__ == "__main__":
pipe = Pipeline()
pipe.add_component("tracer", LangfuseConnector("Chat example"))
pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
pipe.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))

pipe.connect("prompt_builder.prompt", "llm.messages")

messages = [
ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}"),

response = pipe.run(
data={"prompt_builder": {"template_variables": {"location": "Berlin"}, "prompt_source": messages}}
elsatch•2mo ago
I get as response in the console:
ChatMessage(content='Berlin ist die Hauptstadt Deutschlands und eine der bedeutendsten Metropolen Europas. Die Stadt hat eine reiche Geschichte, die von vielen politischen und kulturellen Veränderungen geprägt wurde. Berlin ist bekannt für seine vielfältige Architektur, von historischen Gebäuden wie dem Brandenburger Tor und dem Reichstag bis hin zu modernen Glasfassaden.\n\nDie Stadt ist auch ein Zentrum für Kunst, Musik und Nachtleben. Es gibt zahlreiche Museen, Galerien und kulturelle Veranstaltungen, die das kreative Flair Berlins widerspiegeln. Die Berliner Mauer, die die Stadt während des Kalten Krieges teilte, ist ein wichtiges Symbol der Geschichte und trägt zur Identität der Stadt bei.\n\nBerlin hat auch eine multikulturelle Bevölkerung und zieht Menschen aus aller Welt an. Die vielfältigen Stadtteile bieten eine breite Palette an gastronomischen und kulturellen Erlebnissen. Die Stadt ist gut erreichbar und bietet ein umfangreiches Verkehrsnetz, das es leicht macht, die verschiedenen Regionen zu erkunden.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 200, 'prompt_tokens': 29, 'total_tokens': 229}})
ChatMessage(content='Berlin ist die Hauptstadt Deutschlands und eine der bedeutendsten Metropolen Europas. Die Stadt hat eine reiche Geschichte, die von vielen politischen und kulturellen Veränderungen geprägt wurde. Berlin ist bekannt für seine vielfältige Architektur, von historischen Gebäuden wie dem Brandenburger Tor und dem Reichstag bis hin zu modernen Glasfassaden.\n\nDie Stadt ist auch ein Zentrum für Kunst, Musik und Nachtleben. Es gibt zahlreiche Museen, Galerien und kulturelle Veranstaltungen, die das kreative Flair Berlins widerspiegeln. Die Berliner Mauer, die die Stadt während des Kalten Krieges teilte, ist ein wichtiges Symbol der Geschichte und trägt zur Identität der Stadt bei.\n\nBerlin hat auch eine multikulturelle Bevölkerung und zieht Menschen aus aller Welt an. Die vielfältigen Stadtteile bieten eine breite Palette an gastronomischen und kulturellen Erlebnissen. Die Stadt ist gut erreichbar und bietet ein umfangreiches Verkehrsnetz, das es leicht macht, die verschiedenen Regionen zu erkunden.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 200, 'prompt_tokens': 29, 'total_tokens': 229}})
https://cloud.langfuse.com/trace/6d3d165c-4245-4660-a3ca-12a4bfadd147 And as you may guess again... nulls in the output!
elsatch•2mo ago
I am done for today. Tried two computers, different OSs, my own code, two cookbook examples, self-hosted langfuse, cloud langfuse and a total of zero traces containing info using the Haystack-Langfuse integration. Not done yet! Things I've discovered: - I was able to create traces when NOT using the Haystack integration. In particular, I managed to trace using OpenAI, Ollama, LiteLLM without problems using the OpenAI SDK compatiblity of Langfuse. So it looks like the problem lies in the integration.
Marc•2mo ago
Thanks for raising this @elsatch! Does the observations within the trace include inputs/outputs? Also, thanks for trying so many variations of the setup here to verify that it is an issue with the integration
elsatch•2mo ago
If I use the LiteLLM + OpenAI SDK, correct traces are generated. Fortunately, Vladimir from Deepset is going to review the integration, to see if something has broken unexpectedly.
Marc•2mo ago
awesome, Vladimir is great!