nielsr HF Staff commited on
Commit
4f5098d
·
verified ·
1 Parent(s): 5958ade

Improve model card: Add tags, link code, and expand description

Browse files

This PR significantly enhances the model card for the WebAggregator-8B model.

It adds the following metadata tags:
- `pipeline_tag: image-text-to-text`: This accurately reflects the model's capability to process multimodal inputs (images/videos from web environments) and generate text outputs (QA pairs), improving discoverability on the Hub.
- `library_name: transformers`: Confirmed by the `config.json` (specifying `Qwen3ForCausalLM` architecture and `transformers_version`), this enables the automated "how to use" widget and quick integration for users.

Additionally, the PR improves the content by:
- Expanding the model description using details from the paper abstract, providing a clearer understanding of the model's purpose and functionality.
- Including a direct link to the official [GitHub repository](https://github.com/Tencent/WebAggregator) for easy access to the codebase and further resources.
- Adding the academic citation for proper attribution.

Files changed (1) hide show
  1. README.md +31 -3
README.md CHANGED
@@ -1,9 +1,37 @@
1
  ---
 
 
2
  license: other
3
  license_name: webaggregator
4
  license_link: https://huggingface.co/CognitiveKernel/WebAggregator-8B/blob/main/LICENSE
5
- base_model:
6
- - Qwen/Qwen3-8B
7
  ---
8
 
9
- This model was the **WebAggregator-8B** model mentioned in the paper [Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents](https://arxiv.org/abs/2510.14438).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3-8B
4
  license: other
5
  license_name: webaggregator
6
  license_link: https://huggingface.co/CognitiveKernel/WebAggregator-8B/blob/main/LICENSE
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
  ---
10
 
11
+ This model was the **WebAggregator-8B** model mentioned in the paper [Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents](https://arxiv.org/abs/2510.14438).
12
+
13
+ # WebAggregator: Scaling Evolved Aggregation Logic for Deep Research Agents
14
+
15
+ WebAggregator is a series of foundation models for deep research web agents, developed under the novel **Explore to Evolve** paradigm. This approach aims to scalably construct verifiable training data for web agents, thereby enhancing their capabilities in multi-tool usage, information seeking, and crucial information aggregation.
16
+
17
+ Unlike existing open-source deep research agents that often prioritize information-seeking, WebAggregator addresses the essential need for rigorous analysis and aggregation of knowledge from diverse sources, including web environments, files, and multimodal inputs. The agent proactively explores the real web to gather grounded information, then self-evolves an aggregation program. This program selects, composes, and refines operations from 12 high-level logical types to synthesize verifiable QA pairs. This process led to the creation of **WebAggregatorQA**, a dataset of 10K samples across 50K websites and 11 domains.
18
+
19
+ Finetuned trajectories from this dataset resulted in the WebAggregator models. WebAggregator-8B matches the performance of GPT-4.1, while the 32B variant surpasses GPT-4.1 by over 10% on GAIA-text and closely approaches Claude-3.7-sonnet. It demonstrates strong performance on the challenging human-annotated evaluation split of WebAggregatorQA, highlighting its robust information aggregation capabilities.
20
+
21
+ For the codebase, detailed usage instructions, and further information, please refer to the [official GitHub repository](https://github.com/Tencent/WebAggregator).
22
+
23
+ ## Citation
24
+
25
+ If you find this work helpful, please cite our paper:
26
+
27
+ ```bibtex
28
+ @misc{wang2025exploreevolvescalingevolved,
29
+ title={Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents},
30
+ author={Rui Wang and Ce Zhang and Jun-Yu Ma and Jianshu Zhang and Hongru Wang and Yi Chen and Boyang Xue and Tianqing Fang and Zhisong Zhang and Hongming Zhang and Haitao Mi and Dong Yu and Kam-Fai Wong},
31
+ year={2025},
32
+ eprint={2510.14438},
33
+ archivePrefix={arXiv},
34
+ primaryClass={cs.CL},
35
+ url={https://arxiv.org/abs/2510.14438},
36
+ }
37
+ ```