Weekly paper roundup: Qwen2.5 Technical Report (12/16/2024)

vha14 · March 1, 2025, 2:39am

Overview

These papers collectively explore the advancements and applications of large models in different domains. Among them, the focus on enhanced large language models (LLMs), as seen in the Qwen2.5 and Apollo models, underscores efforts to expand pre-training datasets and improve multimedia understanding through scaling and specialized training strategies. ModernBERT exemplifies progress in optimizing encoder models for efficiency across a range of tasks, while TheAgentCompany highlights both the capabilities and limitations of LLM agents in real-world task execution. Furthermore, the concerns of synthetic data generation and evaluation in specific domains such as finance are addressed in the studies on text synthesis and OmniEval. The discussions on Large Action Models and GUI agents demonstrate the growing interest in models designed to perform complex interactions within dynamic environments, contributing to the ongoing pursuit of more versatile AI systems.

Spotlight

Qwen2.5 Technical Report

Hugging Face; ModelScope; Alibaba Cloud Model Studio

🤗 343

This paper discusses the advancements made in the Qwen2.5 series of large language models, focusing on improvements achieved through a massive pre-training dataset and innovative post-training techniques like supervised finetuning and reinforcement learning. The models are shown to excel in different benchmarks for language understanding and reasoning, offering various model sizes suited to diverse needs. What stands out is the Qwen2.5-72B-Instruct model, which manages to outperform larger, state-of-the-art models like the Llama-3-405B-Instruct, despite its smaller size. The paper underscores the significance of efficient model improvements and resource optimization in achieving cutting-edge performance. I find the analysis of model efficiency versus size particularly compelling, as it hints at the potential to push the boundaries of LLM development without always opting for larger models.

Raw notes: r

Spotlight

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Answer.AI; LightOn; Johns Hopkins University; NVIDIA; HuggingFace

🤗 125

This paper introduces ModernBERT, a cutting-edge, encoder-only transformer model that surpasses its predecessors, such as BERT, in performance, speed, and memory efficiency. With training on 2 trillion tokens and a sequence length extended to 8192, it achieves impressive state-of-the-art results in diverse classification and retrieval tasks. The design allows for efficient inference on common GPUs, making it particularly suitable for practical applications where hardware resources might be limited. Overall, this work represents a significant advancement in the capabilities of encoder models, combining modern architecture with practical applicability for enhancing downstream tasks. I find the integration of speed and efficiency improvements particularly compelling, making ModernBERT a valuable tool for both researchers and practitioners.

Raw notes: r

Other papers

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Meta GenAI; Stanford University

🤗 139

This paper introduces “Apollo,” a suite of large multimodal models crafted to tackle the inefficiencies in video understanding. It highlights innovative strategies like scaling consistency and fps sampling, which significantly enhance performance. I was impressed by how Apollo models not only outperform existing benchmarks but also efficiently manage longer video sequences, making it a substantial contribution to the video processing field.

Raw notes: r

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Carnegie Mellon University; Independent; Duke University

🤗 50

This paper presents TheAgentCompany, a benchmark aimed at assessing the capabilities of large language model agents in performing real-world tasks within a simulated company setting. It shows that while some tasks can be autonomously completed by the best agent, more intricate and prolonged tasks still pose a significant challenge. I find it highlights an interesting gap between current AI capabilities and the demands of complex professional environments, pointing out areas where further improvement is needed.

Raw notes: r

How to Synthesize Text Data without Model Collapse?

LUMIA Lab, Shanghai Jiao Tong University; State Key Laboratory of General Artificial Intelligence, BIGAI; Department of Electronic Engineering, Tsinghua University; Institute for Artificial Intelligence, Peking University; Shanghai Artificial Intelligence Laboratory

🤗 48

This paper tackles the problem of model collapse when generating synthetic text data for language models, which often results in reduced performance. By proposing a method that involves token editing on human-generated data to create semi-synthetic data, the authors aim to preserve model efficacy. I found the extensive experimentation providing support for this approach particularly compelling, as it suggests a practical solution to maintaining data quality and enhancing model performance.

Raw notes: r

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Gaoling School of Artificial Intelligence, Renmin University of China

🤗 41

This paper presents OmniEval, which offers an innovative benchmark for evaluating Retrieval-Augmented Generation (RAG) techniques within the financial sector. By integrating automatic data generation, human annotation, and a structured multi-stage evaluation, the authors provide a nuanced framework that accounts for the complexities of domain-specific tasks. I find the approach effective in identifying performance disparities and propose it as a valuable tool for enhancing RAG systems in specialized fields.

Raw notes: r

Large Action Models: From Inception to Implementation

Microsoft; Peking University; Zhejiang University; Eindhoven University of Technology

🤗 33

This paper explores the evolution from Large Language Models to Large Action Models, focusing on their ability to generate and execute actions in dynamic environments. It provides a detailed framework for developing these models and discusses both the limitations and potential impacts of LAMs on real-world applications. I found the case study using a Windows OS-based agent particularly informative in illustrating practical implementations.

Raw notes: r

GUI Agents: A Survey

University of Maryland; State University of New York at Buffalo; University of Oregon; Adobe Research; Meta AI; University of Rochester; University of California, San Diego; Carnegie Mellon University; Dolby Labs; Intel AI Research; University of New South Wales

🤗 25

This paper offers an in-depth look at GUI agents, particularly those enhanced by large foundation models. It does a great job of categorizing the complexities of these agents, delving into their benchmarks, evaluation metrics, architectures, and training methods. I appreciate the authors’ effort to not only highlight current advancements but also to outline the open challenges and future directions, making it a valuable resource for both practitioners and researchers.

Raw notes: r

Acknowledgements

Papers are retrieved from Hugging Face.

Topic	Replies	Views
Weekly paper roundup: SWE-Lancer Benchmark (2/17/2025) General weekly-paper-roundup	9	March 1, 2025
Weekly paper roundup: Competitive Programming with Large Reasoning Models (2/10/2025) General weekly-paper-roundup	10	March 1, 2025
Weekly paper roundup: OLMoE (9/2/2024) General weekly-paper-roundup	79	September 10, 2024
Weekly paper roundup: DeepSeek-R1 (1/20/2025) General weekly-paper-roundup	5	March 1, 2025
Weekly paper roundup: Moshi (9/16/2024) General weekly-paper-roundup	61	September 24, 2024

Weekly paper roundup: Qwen2.5 Technical Report (12/16/2024)

Overview

Spotlight

Spotlight

Other papers

Acknowledgements

Related topics