Developing a single foundation model with the capability to excel across diverse tasks has been a long-standing objective in the field of artificial intelligence. As the wave of general-purpose foundation models sweeps across various domains, their influence has significantly extended to the field of recommendation systems. While recent efforts have explored recommendation foundation models for various generative tasks, they often overlook crucial embedding tasks and struggle with the complexities of multi-task learning, including knowledge sharing & conflict resolution, and convergence speed inconsistencies. To address these limitations, we introduce RecFound, a generative representational learning framework for recommendation foundation models. We construct the first comprehensive dataset for recommendation foundation models covering both generative and embedding tasks across diverse scenarios. Based on this dataset, we propose a novel multi-task training scheme featuring a Task-wise Mixture of Low-rank Experts (TMoLE) to handle knowledge sharing & conflict, a Step-wise Convergence-oriented Sample Scheduler (S2Sched) to address inconsistent convergence, and a Model Merge module to balance the performance across tasks. Experiments demonstrate that RecFound achieves state-of-the-art performance across various recommendation tasks, outperforming existing baselines.
Table 1: Statistics of the RecFound dataset, which includes a variety of tasks and scenarios for recommendation foundation models. Our dataset covers both embedding and generative tasks for recommendation foundation models. The embedding tasks include: (1) User2Item (U2I), which encodes user behavior sequences for personalized retrieval; (2) Query2Item (Q2I), which embeds user queries for precise matching; and (3) Item2Item (I2I), which finds similar items based on item descriptions—all primarily built on the Amazon Reviews and Shopping Queries datasets. The generative tasks span 10 subtasks across three categories: (a) General NLP, including query rewriting, attribute extraction, and answer generation; (b) User Understanding, including sequential recommendation, sentiment analysis, user profiling, and answerability prediction; and (c) Item Understanding, including product relevance prediction, cross-platform matching, and item profiling. These tasks are constructed from Amazon Reviews, MovieLens, AmazonQA, Amazon-Google Products, and related datasets.
Figure 1: Overview of RecFound, a generative representational learning framework for recommendation foundation models. RecFound is designed to handle both generative and embedding tasks, addressing the challenges of knowledge sharing & conflict resolution, and convergence speed inconsistencies. It consists of three main components: (1) Task-wise Mixture of Low-rank Experts (TMoLE) for knowledge sharing & conflict resolution; (2) Step-wise Convergence-oriented Sample Scheduler (S2Sched) for addressing inconsistent convergence; and (3) Model Merge module to balance performance across tasks.
@misc{zhou2025generativerepresentationallearningfoundation,
title={Generative Representational Learning of Foundation Models for Recommendation},
author={Zheli Zhou and Chenxu Zhu and Jianghao Lin and Bo Chen and Ruiming Tang and Weinan Zhang and Yong Yu},
year={2025},
eprint={2506.11999},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2506.11999},
}