HS•11mo ago

Extended scores

Hey, thanks for building this tool. This looks great. - I was wondering what would be best way to get track score from multiple users. Say i want to have two human rate a completion - How would we separate machine and human feedback? Should each score have a source type

2 Replies

Marc•11mo ago

Hi! Both are part of a spec I’m currently working on to extend scores. - score types: human, user, eval, none - userId on score for human and user to link back to who submitted This also allows to add multiple different scores in the ui. This is helpful when creating a baseline for multiple dimensions, eg hallucinations and tonality

HS•11mo ago

Thanks @Marc . so in the first case lets say 2 source types added a same score. e.g Human 1 and Human 2 rate score say accuracy. Will. there be a way to get distribution ? Also can we add additional metadata in the score for example which algorithm and version / tool used for evaluation and how long did it take run the algo or complete the human evaluation. This will further help observing the behavior of evaluators (score types). In addition another feature could be adding type of score value such as numeric or category such that the tool can show the the corresponding summary stats. Hey @Marc , checking to see your thoughts on my last comment. Thanks