justanothergraphguy•5mo ago

I have had langfuse deployed and self-

I have had langfuse deployed and self-hosted for about 2 months now, and it has worked smoothly. Today, I started seeing a strange timeout error -- I wouldn't expect it to be through our deployment infra, but I wanted to drop here to see if you had any recommendations (see thread)

4 Replies

justanothergraphguy•5mo ago

Giving up execute_task_with_backoff(...) after 3 tries (langfuse.request.APIError: upstream request timeout (504): None)
ERROR:backoff:Giving up execute_task_with_backoff(...) after 3 tries (langfuse.request.APIError: upstream request timeout (504): None)
ERROR:langfuse:error uploading: upstream request timeout (504): None
Traceback (most recent call last):
  File ".../lib/python3.9/site-packages/langfuse/request.py", line 99, in _process_response
    payload = res.json()
  File ".../lib/python3.9/site-packages/httpx/_models.py", line 761, in json
    return jsonlib.loads(self.content, **kwargs)
  File ".../lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File ".../lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File ".../lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 138, in upload
    self._upload_batch(batch)
  File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 165, in _upload_batch
    execute_task_with_backoff(batch)
  File ".../lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 163, in execute_task_with_backoff
    return self._client.batch_post(batch=batch, metadata=metadata)
  File ".../lib/python3.9/site-packages/langfuse/request.py", line 52, in batch_post
    return self._process_response(
  File ".../lib/python3.9/site-packages/langfuse/request.py", line 103, in _process_response
    raise APIError(res.status_code, res.text)
langfuse.request.APIError: upstream request timeout (504): None

Giving up execute_task_with_backoff(...) after 3 tries (langfuse.request.APIError: upstream request timeout (504): None)
ERROR:backoff:Giving up execute_task_with_backoff(...) after 3 tries (langfuse.request.APIError: upstream request timeout (504): None)
ERROR:langfuse:error uploading: upstream request timeout (504): None
Traceback (most recent call last):
  File ".../lib/python3.9/site-packages/langfuse/request.py", line 99, in _process_response
    payload = res.json()
  File ".../lib/python3.9/site-packages/httpx/_models.py", line 761, in json
    return jsonlib.loads(self.content, **kwargs)
  File ".../lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File ".../lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File ".../lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 138, in upload
    self._upload_batch(batch)
  File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 165, in _upload_batch
    execute_task_with_backoff(batch)
  File ".../lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File ".../lib/python3.9/site-packages/langfuse/task_manager.py", line 163, in execute_task_with_backoff
    return self._client.batch_post(batch=batch, metadata=metadata)
  File ".../lib/python3.9/site-packages/langfuse/request.py", line 52, in batch_post
    return self._process_response(
  File ".../lib/python3.9/site-packages/langfuse/request.py", line 103, in _process_response
    raise APIError(res.status_code, res.text)
langfuse.request.APIError: upstream request timeout (504): None

Any ideas as to why this request timeout would occur? This error occurs from the requesting application -- The traces actually do make it through to the database and show appropriately in the frontend So I only saw this in the langserve output logs and not in langfuse logs hmm... actually I am getting some odd traces now as well. Some extraneous and erroneous traces that are nonsensical

Marc•5mo ago

Interesting. Are you on the latest version of the Langfuse Python sdk?

justanothergraphguy•5mo ago

We are on 2.16.2 for the python sdk... I think I figured out the issue actually -- seems like we were submitting too many requests to the langfuse deployment, so I created replica service deployments which seems to have handled it I will let you know if the issue arises again

Marc•5mo ago

I’d recommend to automatically scale the container instance if possible If you write heavily in a short period of time, IOPS of the database could be the bottleneck but you’d see related errors in the container instance logs