Uncovered two critical bugs in the evaluation platform
First build tracked for this project
Found bugs in the evaluation platform causing inflated scores and planned fixes.
Derived from this session's token and cost data. Not shown on the feed.
Comments