HallowedFlight
HallowedFlight•13mo ago

Extract data from Langfuse

I was just trying to pull data from my project for post-processing. I'm also looking for a way to backup my project. Any suggestions? Might there be a way to download a whole project to a fiile and upload/restore in case the existing project gets compromised/corrupted?
30 Replies
Max
Max•13mo ago
What are your plans on post-processing? Do you already have sth. in place?
HallowedFlight
HallowedFlight•13mo ago
Well, for example, suppose i wanted to create a QA dataset from traces that had good feedback
Max
Max•13mo ago
It depends a bit on your setup: Are you self-hosting? If yes, you could just export eh pg database. If you use the hosted version, we have measures like point in time recovery in place.
HallowedFlight
HallowedFlight•13mo ago
yes self-hosting
Max
Max•13mo ago
In that case it depends on which features your PG provider has in place. In case you want to do things manually, pgdump (https://www.postgresql.org/docs/current/app-pgdump.html) might be the way to go
PostgreSQL Documentation
pg_dump
pg_dump pg_dump — extract a PostgreSQL database into a script file or other archive file Synopsis pg_dump [connection-option...] [option...] [dbname] …
Max
Max•13mo ago
Also with https://tableplus.com/ you can download tables or the entire database
TablePlus | Modern, Native Tool for Database Management.
Modern, native client with intuitive GUI tools to create, access, query & edit multiple relational databases: MySQL, PostgreSQL, SQLite, Microsoft SQL Server, Amazon Redshift, MariaDB, CockroachDB, Vertica, Cassandra, and Redis.
HallowedFlight
HallowedFlight•13mo ago
yeah i'm just using a postgres docker image
Max
Max•13mo ago
on your local machine?
HallowedFlight
HallowedFlight•13mo ago
yes
Max
Max•13mo ago
Ahh ok in this case, i guess tableplus is an easy setup
HallowedFlight
HallowedFlight•13mo ago
ah, cool. i'll look into that. thanks
Max
Max•13mo ago
Thanks! Are you using our scores to find out which one was good? And also what are you doing with the dataset?
HallowedFlight
HallowedFlight•13mo ago
I'm setting up a model with RAG for a demo with ~ 20 users in the next few weeks. I have feedback set up on a JS webapp that sends scores to Langfuse. After the demo I'm hoping to examine the data and get a sense for what needs to be improved and potentially making a QA dataset from it. Thanks for all the hard work by the way. Having a tool like this is indispensable from my perspective. And being able to self-host is so critical, as data privacy is a major concern in my domain (healthcare).
Max
Max•13mo ago
Oh yes, healthcare is tough on that. You`re welcome, we enjoy the ride 🙂 I see. So basically having a dataset based on scores would help you to test your application and see whether it improves, correct?
HallowedFlight
HallowedFlight•13mo ago
Yeah i think being able to filter by user and score will be important