Commit graph

25 commits

Author SHA1 Message Date
Isadora White
7e7f893cf3 fix evaluation script bug 2025-03-05 10:10:24 -08:00
Isadora White
6e0d7e1eaf change evaluation_script to allow for new world names 2025-03-04 22:08:04 -08:00
Isadora White
34145168dc refactoring changes 2025-03-04 12:09:23 -08:00
Isadora White
e3ec9d34b4 fixing merge related small bugs 2025-03-04 11:54:09 -08:00
mmaheshwari2
2036288e13 tested eval script, changed model to 4o-mini 2025-03-04 10:28:11 -08:00
Isadora White
44fc1b4618 evaluation script for vllm 2025-03-03 06:10:52 +00:00
Isadora White
37b1fc0bed increase timeout length for adding bot to world for the first time 2025-03-01 21:37:20 -08:00
Isadora White
af79c78fbb fixing evaluation script to actually add bots as op and add new models 2025-03-01 19:21:35 -08:00
Isadora White
ae39028d3b longer sleeps, early breaking for scenarios where there is only one agent 2025-02-28 18:31:19 -08:00
Isadora White
39cec7cf82 changed the checking if complete cycle to be more frequent and updated the collab_profile 2025-02-28 16:59:56 -08:00
Isadora White
7cafc254d1 making the default to load in the collaborative profiles 2025-02-23 21:11:08 -08:00
Isadora White
2da97b5607 adding a mechanism to add environment variables to the keys.json automatically 2025-02-23 18:55:13 -08:00
Isadora White
8a75d8a78e changing the give to player command to account for an edge case where the players are too close together and moving away takes time 2025-02-22 17:53:42 -08:00
Isadora White
719b72da9e set up to use s3 logging instead of wandb 2025-02-21 17:02:21 -08:00
Isadora White
d4565aa68c small fixes, the items were being given twice to the agents on initialization and accounting for blocked_actions not being in the task file 2025-02-20 21:45:29 -08:00
Isadora White
7a19f34e22
Merge branch 'main' into evaluation_parallelization 2025-02-18 18:34:46 -08:00
Isadora White
aad19d616c fixed evaluation script to allow for parallel worlds again 2025-02-18 16:39:31 -08:00
Isadora White
fb5d95debe fixed the issue with garbling commands by instead putting the commands in a bash script and running them that way 2025-02-17 17:25:12 -08:00
Isadora White
bc15700196 fixed wandb logging 2025-02-15 12:02:44 -08:00
Isadora White
0b2624db6d updating evaluation script with wandb logs and fixing error in tasks.js 2025-02-13 10:46:44 -08:00
Isadora White
6e48f06e1d small changes 2025-02-12 15:39:57 -08:00
Ayush Maniar
acf4eece60 Filename includes timestamp of first experiment run 2025-02-11 12:29:29 -08:00
Isadora White
caa5c389e7 logging messages 2025-01-31 12:57:38 -08:00
Isadora White
231204d0c8 using environment variables to help differentiate the settings.js files :D 2025-01-26 11:39:21 -08:00
Ayush Maniar
3bbed21526 Added python script for task evaluation which stores experiment run times and results. 2025-01-23 12:33:21 -08:00