I have had langfuse deployed and self-

I have had langfuse deployed and self-hosted for about 2 months now, and it has worked smoothly. Today, I started seeing a strange timeout error -- I wouldn't expect it to be through our deployment infra, but I wanted to drop here to see if you had any recommendations (see thread)
4 Replies
justanothergraphguy
Giving up execute_task_with_backoff(...) after 3 tries (langfuse.request.APIError: upstream request timeout (504): None)
ERROR:backoff:Giving up execute_task_with_backoff(...) after 3 tries (langfuse.request.APIError: upstream request timeout (504): None)
ERROR:langfuse:error uploading: upstream request timeout (504): None
Traceback (most recent call last):
File ".../lib/python3.9/site-packages/langfuse/request.py", line 99, in _process_response
payload = res.json()
File ".../lib/python3.9/site-packages/httpx/_models.py", line 761, in json
return jsonlib.loads(self.content, **kwargs)
File ".../lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File ".../lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File ".../lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 138, in upload
self._upload_batch(batch)
File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 165, in _upload_batch
execute_task_with_backoff(batch)
File ".../lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
ret = target(*args, **kwargs)
File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 163, in execute_task_with_backoff
return self._client.batch_post(batch=batch, metadata=metadata)
File ".../lib/python3.9/site-packages/langfuse/request.py", line 52, in batch_post
return self._process_response(
File ".../lib/python3.9/site-packages/langfuse/request.py", line 103, in _process_response
raise APIError(res.status_code, res.text)
langfuse.request.APIError: upstream request timeout (504): None
Giving up execute_task_with_backoff(...) after 3 tries (langfuse.request.APIError: upstream request timeout (504): None)
ERROR:backoff:Giving up execute_task_with_backoff(...) after 3 tries (langfuse.request.APIError: upstream request timeout (504): None)
ERROR:langfuse:error uploading: upstream request timeout (504): None
Traceback (most recent call last):
File ".../lib/python3.9/site-packages/langfuse/request.py", line 99, in _process_response
payload = res.json()
File ".../lib/python3.9/site-packages/httpx/_models.py", line 761, in json
return jsonlib.loads(self.content, **kwargs)
File ".../lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File ".../lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File ".../lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 138, in upload
self._upload_batch(batch)
File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 165, in _upload_batch
execute_task_with_backoff(batch)
File ".../lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
ret = target(*args, **kwargs)
File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 163, in execute_task_with_backoff
return self._client.batch_post(batch=batch, metadata=metadata)
File ".../lib/python3.9/site-packages/langfuse/request.py", line 52, in batch_post
return self._process_response(
File ".../lib/python3.9/site-packages/langfuse/request.py", line 103, in _process_response
raise APIError(res.status_code, res.text)
langfuse.request.APIError: upstream request timeout (504): None
Any ideas as to why this request timeout would occur? This error occurs from the requesting application -- The traces actually do make it through to the database and show appropriately in the frontend So I only saw this in the langserve output logs and not in langfuse logs hmm... actually I am getting some odd traces now as well. Some extraneous and erroneous traces that are nonsensical
Marc
Marc5mo ago
Interesting. Are you on the latest version of the Langfuse Python sdk?
justanothergraphguy
We are on 2.16.2 for the python sdk... I think I figured out the issue actually -- seems like we were submitting too many requests to the langfuse deployment, so I created replica service deployments which seems to have handled it I will let you know if the issue arises again
Marc
Marc5mo ago
I’d recommend to automatically scale the container instance if possible If you write heavily in a short period of time, IOPS of the database could be the bottleneck but you’d see related errors in the container instance logs