Spotlight
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
Authors: Peking University, University of California, Los Angeles, Beijing Institute for General Artificial Intelligence
Project page: RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation (craftjarvis.github.io)
Summary
RAT combines ideas from Retrieval Augmented Generation (RAG) with Chain of Thought (CoT). First, CoT is used to generate a sequence (chain) of steps. Then, RAT iterate through each of the steps, where it revises that step using a RAG approach. The intuition is that RAG can help improve the correctness of CoT steps. In extensive experiments, the authors show substantial performance gains across LLMs (GPT-3.5, GPT-4, CodeLlaMA-7B) and tasks (code generation, math reasoning, creative writing, and embodied task planning). Code and demo is shared on the project page.
Details
Both RAG and CoT have gained significant adoption in the LLM community. RAG addresses the the challenge of getting LLMs to work with a specific environment with diverse data. CoT is a simple prompting technique—injecting the phrase “let think through the problem step by step” into the prompt—that extract improved performance from LLMs for complex tasks by mimicking how human think. When prompted with CoT, LLMs generate a sequence of steps, some of which may be flawed and could be fixed during the second pass with the help of RAG. It is helpful to think about the Daniel Kahneman’s Thinking, Fast and Slow framework where the initial CoT step is quick and intuitive and the RAG-powered revision steps are slow, deliberate, and logical.
What I like about this paper are a) the simplicity of the idea—I have not looked at the code they shared, but I imagine the implementation is straightforward and b) the improvement in performance is impressive (double digit percentage gain) across LLMs (GPT-3.5, GPT-4, CodeLlaMA-7B) and tasks (code generation, math reasoning, creative writing, and embodied task planning). For practitioners, the cost of experimenting with RAT is low, with potential high reward. It is interesting to consider the LLM API cost factor: the initial CoT generation and subsequence step revision via RAG mean that there are multiple calls to LLMs. In addition, the performance gain seems larger with the more expensive GPT-4 (potentially explained by GPT-4’s more advanced in-context learning/reasoning capabilities). For scenarios where performance is highly prioritized, GPT-4 + RAT may provide the needed solution; just be prepared to pay more for that performance.
How is RAT different from previous work? In the related work section, the authors highlight the key difference versus techniques such as IRCoT, IRGR, GEEK, and ITRG: RAT performs retrieval using draft answers in an autoregressive way (previously revised steps are included in the prompt) to help improve slow thinking.
Noteworthy papers
- Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. Author: Google.
- This has been widely reported. The results are impressive. Google revealed very little about what’s under the hood (architecture, optimization techniques to reduce training/inference time and cost). Gemini 1.5 is available for limited beta testers. It’s definitely worth checking out.
- Gemma: Open Models Based on Gemini Research and Technology. Author: Google
- Small (2B-7B) open-weight models from Google with similar or better performance than SOTA (Llama & Mistral, etc).
- Chronos: Learning the Language of Time Series. Authors: Amazon, UCSD, Freiburg
- This paper is not about LLMs. It is about applying the ideas behind LLMs to time series prediction/forecasting. The authors train T5-based (20M to 710M) models on public datasets and synthetic data (generated by Gaussian processes). Remarkably, the trained models achieve zero-shot performance that is better or competitive with SOTA approached that are based on traditional ML training. Models and inference code are shared on Hugging Face.