Skip to content

Commit 6e370c6

Browse files
committed
add README
1 parent 8f05665 commit 6e370c6

File tree

2 files changed

+59
-29
lines changed

2 files changed

+59
-29
lines changed

README.md

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,37 @@
1-
# WebMall
1+
# WebMall - A Multi-Shop Benchmark for Evaluating Web Agents
2+
3+
## Setting up WebMall
4+
5+
### Environment
6+
- WebMall requires python 3.11/3.12
7+
- WebMall requires a python environment without installed versions of BrowserGym and AgentLab, as we provide edited versions of BrowserGym and AgentLab which need local installation (steps below).
8+
9+
### Install local version of BrowserGym
10+
- As we use a fork of BrowserGym and AgentLab as submodules, they must be initialized first with ```git submodule update --init --recursive```
11+
- Run ```make install``` in a terminal in the ```WebMall/BrowserGym``` folder to install BrowserGym and to install PlayWright to run experiments in a browser.
12+
13+
### Install local version of AgentLab
14+
- Run ```pip install -e .``` in ```WebMall/AgentLab```
15+
16+
### Setup AgentLab
17+
- Set the environment variables:
18+
- export AGENTLAB_EXP_ROOT=<root directory of experiment results> # defaults to $HOME/agentlab_results this is where the experiment results will be stored.
19+
- export OPENAI_API_KEY=<your openai api key> # if openai models are used set the OPEN AI API Key.
20+
- export ANTHROPIC_API_KEY=<your anthropic api key> # if anthropic models are used set the ANTHROPIC API Key.
21+
22+
### Setup WebMall environment variabels
23+
- WebMall expects a file: WebMall/.env which contains env-variables setting the adresses to the shop websites. Make a copy of WebMall/.env.example and rename it to .env. Then set the variables SHOP1_URL, SHOP2_URL, SHOP3_URL, SHOP4_URL, FRONTEND_URL according to the shop adresses you want to use (if you use the local docker setup it uses localhost with ports 8081-8085, if the ports are available, the variables are correctly set).
24+
25+
### Setup the WebMall-Shops-Websites locally with Docker
26+
1. The local docker setup requires docker-compose
27+
2. Run ```bash docker_all/restore_all_and_deploy_local.sh``` to download the relevant files, start the containers and host the shops locally.
28+
3. If you used the default ports, the setup is done, if not, you need to change the adresses inside the WooCommerce-Containers by running ```docker_all/fix_urls_deploy.sh``` for each of the 4 shops inside the respective docker-containers.
29+
Example: ```docker exec WebMall_wordpress_shop1 /bin/bash -c "/usr/local/bin/fix_urls_deploy.sh 'http://localhost:8081' 'http://localhost:7733'```
30+
4. Verify the setup by visiting the Shop-Websites and the Submission page in your browser.
31+
32+
## Run WebMall-Benchmark
33+
- A WebMall benchmark run can be started with the script ```WebMall/run_webmall_study.py```,
34+
its results will be stored in the directory you set in AGENTLAB_EXP_ROOT. Set the task set you want to run by commenting in the relevant ```benchmark``` variable in the file.
35+
36+
## Run a singular WebMall-task
37+
- A run for a single WebMall task can be started with the script ```WebMall/run_single_task.py```. Its results will be stored in the directory you set in AGENTLAB_EXP_ROOT.

run_webmall_study.py

Lines changed: 22 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,10 @@
2121

2222
logging.getLogger().setLevel(logging.DEBUG)
2323

24-
#from agentlab.agents.webmall_generic_agent import AGENT_4o_VISION
25-
#from agentlab.agents.generic_agent import AGENT_4o_VISION
26-
2724
from agentlab.agents import dynamic_prompting as dp
2825

29-
#from agentlab.llm.eco_logits_llm_configs import CHAT_MODEL_ARGS_DICT
3026
from agentlab.llm.llm_configs import CHAT_MODEL_ARGS_DICT
3127

32-
#from agentlab.agents.webmall_generic_agent.generic_agent import GenericAgent, GenericPromptFlags, GenericAgentArgs
3328
from agentlab.agents.generic_agent.generic_agent import GenericAgent, GenericPromptFlags, GenericAgentArgs
3429

3530
FLAGS_default = GenericPromptFlags(
@@ -87,13 +82,12 @@
8782
FLAGS_AX_M.use_memory = True
8883
FLAGS_AX_M.extra_instructions = 'Use your memory to note down important information like the URLs of potential solutions and corresponding pricing information.'
8984

90-
FLAGS_AX_V_M = FLAGS_default.copy()
91-
FLAGS_AX_V_M.obs.use_screenshot = True
92-
FLAGS_AX_V_M.obs.use_som = True
93-
FLAGS_AX_V_M.use_memory = True
94-
FLAGS_AX_V_M.extra_instructions = 'Use your memory to note down important information like the URLs of potential solutions and corresponding pricing information.'
95-
9685
AGENT_41_AX = GenericAgentArgs(
86+
chat_model_args=CHAT_MODEL_ARGS_DICT["openai/gpt-4.1-2025-04-14"],
87+
flags=FLAGS_AX,
88+
)
89+
90+
AGENT_CLAUDE_AX = GenericAgentArgs(
9791
chat_model_args=CHAT_MODEL_ARGS_DICT["anthropic/claude-sonnet-4-20250514"],
9892
flags=FLAGS_AX,
9993
)
@@ -103,20 +97,29 @@
10397
flags=FLAGS_V,
10498
)
10599

100+
AGENT_CLAUDE_V = GenericAgentArgs(
101+
chat_model_args=CHAT_MODEL_ARGS_DICT["anthropic/claude-sonnet-4-20250514"],
102+
flags=FLAGS_V,
103+
)
106104

107105
AGENT_41_AX_V = GenericAgentArgs(
106+
chat_model_args=CHAT_MODEL_ARGS_DICT["openai/gpt-4.1-2025-04-14"],
107+
flags=FLAGS_AX_V,
108+
)
109+
110+
AGENT_CLAUDE_AX_V = GenericAgentArgs(
108111
chat_model_args=CHAT_MODEL_ARGS_DICT["anthropic/claude-sonnet-4-20250514"],
109112
flags=FLAGS_AX_V,
110113
)
111114

112115
AGENT_41_AX_M = GenericAgentArgs(
113-
chat_model_args=CHAT_MODEL_ARGS_DICT["anthropic/claude-sonnet-4-20250514"],
116+
chat_model_args=CHAT_MODEL_ARGS_DICT["openai/gpt-4.1-2025-04-14"],
114117
flags=FLAGS_AX_M,
115118
)
116119

117-
AGENT_41_AX_V_M = GenericAgentArgs(
118-
chat_model_args=CHAT_MODEL_ARGS_DICT["openai/gpt-4.1-2025-04-14"],
119-
flags=FLAGS_AX_V_M,
120+
AGENT_CLAUDE_AX_M = GenericAgentArgs(
121+
chat_model_args=CHAT_MODEL_ARGS_DICT["anthropic/claude-sonnet-4-20250514"],
122+
flags=FLAGS_AX_M,
120123
)
121124

122125
current_file = Path(__file__).resolve()
@@ -125,21 +128,12 @@
125128

126129

127130
# choose your agent or provide a new agent
128-
agent_args = [AGENT_41_AX_M]
131+
agent_args = [AGENT_41_AX]
129132

130133
# ## select the benchmark to run on
131-
# benchmark = "webmall_a_c_d"
132-
# benchmark = "webmall_tiny"
133-
# benchmark = "webmall"
134-
# benchmark = "miniwob"
135-
# benchmark = "workarena_l1"
136-
# benchmark = "workarena_l2"
137-
# benchmark = "workarena_l3"
138-
# benchmark = "webarena"
139-
#benchmark = "webmall_basic_v0.7"
140-
benchmark = "webmall_advanced_v0.7"
141-
#benchmark = "webmall_tiny_v0.7"
142-
# benchmark = "webmall_j_v0.7"
134+
135+
benchmark = "webmall_basic_v0.7"
136+
# benchmark = "webmall_advanced_v0.7"
143137

144138
# Set reproducibility_mode = True for reproducibility
145139
# this will "ask" agents to be deterministic. Also, it will prevent you from launching if you have

0 commit comments

Comments
 (0)