|  | 2 jaren geleden | |
|---|---|---|
| pics | 2 jaren geleden | |
| README.md | 2 jaren geleden | 
At DAIR.AI we ❤️ reading ML papers so we've created this repo to highlight the top ML papers of every week.
| Paper | Links | 
|---|---|
| 1) Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning - applies deep reinforcement learning to synthesize agile soccer skills for a miniature humanoid robot; the resulting policy allows dynamic movement skills such as fast recovery, walking, and kicking. | Paper, Tweet | 
| 2) Scaling Transformer to 1M tokens and beyond with RMT - leverages a recurrent memory transformer architecture to increase BERT’s effective context length to two million tokens while maintaining high memory retrieval accuracy. | Paper, Tweet | 
| 3) Track Anything: Segment Anything Meets Videos - an interactive tool for video object tracking and segmentation; it’s built on top segment anything and allows flexible tracking and segmenting via user clicks. | Paper, Tweet | 
| 4) A Cookbook of Self-Supervised Learning - provides an overview of fundamental techniques and key concepts in SSL; it also introduces practical considerations for implementing SSL methods successfully. | Paper, Tweet | 
| 5. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond - a comprehensive and practical guide for practitioners working with LLMs; discusses many use cases with practical applications and limitations of LLMs in real-world scenarios. | Paper , Tweet | 
| 6) AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head - connects ChatGPT with audio foundational models to handle challenging audio tasks and a modality transformation interface to enable spoken dialogue. | Paper , Tweet | 
| 7) DataComp: In search of the next generation of multimodal datasets - releases a new multimodal dataset benchmark containing 12.8B image-text pairs. | Paper, Tweet | 
| 8) **ChatGPT for Information Extraction ** - provides a deeper assessment of ChatGPT's performance on the important information extraction task. | Paper, Tweet | 
| 9) Comparing Physician vs ChatGPT - investigates if chatbot assistants like ChatGPT can provide responses to patient questions while emphasizing quality and empathy; finds that chatbot responses were preferred over physician responses and rated significantly higher in terms of both quality and empathy. | Paper, Tweet | 
| 10) Stable and low-precision training for large-scale vision-language models - introduces methods for accelerating and stabilizing training of large-scale language vision models. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) DINOv2: Learning Robust Visual Features without Supervision - a new method for training high-performance computer vision models based on self-supervised learning; enables learning rich and robust visual features without supervision which are useful for both image-level visual tasks and pixel-level tasks; tasks supported include image classification, instance retrieval, video understanding, depth estimation, and much more. | Paper, Tweet | 
| 2) Learning to Compress Prompts with Gist Tokens - an approach that trains language models to compress prompts into gist tokens reused for compute efficiency; this approach enables 26x compression of prompts, resulting in up to 40% FLOPs reductions. | Paper, Tweet | 
| 3) Scaling the leading accuracy of deep equivariant models to biomolecular simulations of realistic size - presents a framework for large-scale biomolecular simulation; this is achieved through the high accuracy of equivariant deep learning and the ability to scale to large and long simulations; the system is able to “perform nanoseconds-long stable simulations of protein dynamics and scale up to a 44-million atom structure of a complete, all-atom, explicitly solvated HIV capsid on the Perlmutter supercomputer.” | Paper, Tweet | 
| 4) Evaluating Verifiability in Generative Search Engines - performs human evaluation to audit popular generative search engines such as Bing Chat, Perplexity AI, and NeevaAI; finds that, on average, only 52% of generated sentences are supported by citations and 75% of citations support their associated sentence. | Paper, Tweet | 
| 5. Generative Disco: Text-to-Video Generation for Music Visualization - an AI system based on LLMs and text-to-image models that generates music visualizations. | Paper , Tweet | 
| 6) Architectures of Topological Deep Learning: A Survey on Topological Neural Networks | Paper , Tweet | 
| 7) Visual Instruction Tuning - presents an approach that uses language-only GPT-4 to generate multimodal language-image instruction-following data; applies instruction tuning with the data and introduces LLaVA, an end-to-end trained large multimodal model for general-purpose visual and language understanding. | Paper, Tweet | 
| 8) ChatGPT: Applications, Opportunities, and Threats | Paper, Tweet | 
| 9) Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models - a plug-and-play compositional reasoning framework that augments LLMs and can infer the appropriate sequence of tools to compose and execute in order to generate final responses; achieves 87% accuracy on ScienceQA and 99% on TabMWP. | Paper, Tweet | 
| 10) Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models - applies latent diffusion models to high-resolution video generation; validates the model on creative content creation and real driving videos of 512 x 1024 and achieves state-of-the-art performance. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields - combines mip-NeRF 360 and grid-based models to improve NeRFs that train 22x faster than mip-NeRF 360. | Paper, Tweet | 
| 2) Generative Agents: Interactive Simulacra of Human Behavior - proposes an architecture that extends LLMs to build agents that enable simulations of human-like behavior; these capabilities are possible by storing a complete record of an agent's experiences, synthesizing memories over time into higher-level reflections, and retrieving them dynamically to plan behavior. | Paper, Tweet | 
| 3) Emergent autonomous scientific research capabilities of large language models - presents an agent that combines LLMs for autonomous design, planning, and execution of scientific experiments; shows emergent scientific research capabilities, including the successful performance of catalyzed cross-coupling reactions. | Paper, Tweet | 
| 4) Automatic Gradient Descent: Deep Learning without Hyperparameters - derives optimization algorithms that explicitly leverage neural architecture; it proposes a first-order optimizer without hyperparameters that trains CNNs at ImageNet scale. | Paper, Tweet | 
| 5. ChemCrow: Augmenting large-language models with chemistry tools - presents an LLM chemistry agent that performs tasks across synthesis, drug discovery, and materials design; it integrates 13 expert-design tools to augment LLM performance in chemistry and demonstrate effectiveness in automating chemical tasks. | Paper , Tweet | 
| 6) One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era - A Survey of ChatGPT and GPT-4 | Paper , Tweet | 
| 7) OpenAGI: When LLM Meets Domain Experts - an open-source research platform to facilitate the development and evaluation of LLMs in solving complex, multi-step tasks through manipulating various domain expert models. | Paper, Tweet | 
| 8) AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models - a new benchmark to assess foundational models in the context of human-centric standardized exams, including college entrance exams, law school admission tests, and math competitions, among others. | Paper, Tweet | 
| 9) Teaching Large Language Models to Self-Debug - proposes an approach that teaches LLMs to debug their predicted program via few-shot demonstrations; this allows a model to identify its mistakes by explaining generated code in natural language; achieves SoTA on several code generation tasks like text-to-SQL generation. | Paper, Tweet | 
| 10) Segment Everything Everywhere All at Once - a promptable, interactive model for various segmentation tasks that yields competitive performance on open-vocabulary and interactive segmentation benchmarks. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Segment Anything - presents a set of resources to establish foundational models for image segmentation; releases the largest segmentation dataset with over 1 billion masks on 11M licensed images; the model’s zero-shot performance is competitive with or even superior to fully supervised results. | Paper, Tweet | 
| 2) Instruction Tuning with GPT-4 - presents GPT-4-LLM, a "first attempt" to use GPT-4 to generate instruction-following data for LLM fine-tuning; the dataset is released and includes 52K unique English and Chinese instruction-following data; the dataset is used to instruction-tune LLaMA models which leads to superior zero-shot performance on new tasks. | Paper, Tweet | 
| 3) Eight Things to Know about Large Language Models - discusses important considerations regarding the capabilities and limitations of LLMs. | Paper, Tweet | 
| 4) A Survey of Large Language Models - a new 50 pages survey on large language models. | Paper, Tweet | 
| 5. Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data - an open-source chat model fine-tuned with LoRA. Leverages 100K dialogs generated from ChatGPT chatting with itself; it releases the dialogs along with 7B, 13B, and 30B parameter models. | Paper , Tweet | 
| 6) Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark - a new benchmark of 134 text-based Choose-Your-Own-Adventure games to evaluate the capabilities and unethical behaviors of LLMs. | Paper , Tweet | 
| 7) Better Language Models of Code through Self-Improvement - generates pseudo data from knowledge gained through pre-training and fine-tuning; adds the data to the training dataset for the next step; results show that different frameworks can be improved in performance using code-related generation tasks. | Paper, Tweet | 
| 8) Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models - an overview of applications of ChatGPT and GPT-4; the analysis is done on 194 relevant papers and discusses capabilities, limitations, concerns, and more | Paper, Tweet | 
| 9) Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling - a suite for analyzing LLMs across training and scaling; includes 16 LLMs trained on public data and ranging in size from 70M to 12B parameters. | Paper, Tweet | 
| 10) SegGPT: Segmenting Everything In Context - unifies segmentation tasks into a generalist model through an in-context framework that supports different kinds of data. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) BloombergGPT: A Large Language Model for Finance - a new 50B parameter large language model for finance. Claims the largest domain-specific dataset yet with 363 billion tokens... further augmented with 345 billion tokens from general-purpose datasets; outperforms existing models on financial tasks while not sacrificing performance on general LLM benchmarks. | Paper, Tweet | 
| 2) Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware - a low-cost system that performs end-to-end imitation learning from real demonstrations; also presents an algorithm called Action Chunking with Transformers to learn a generative model that allows a robot to learn difficult tasks in the real world. | Paper, Tweet | 
| 3) HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace - a system that leverages LLMs like ChatGPT to conduct task planning, select models and act as a controller to execute subtasks and summarize responses according to execution results. | Paper, Tweet | 
| 4) ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge - a medical chat model fine-tuned on LLaMA using medical domain knowledge. Collects data on around 700 diseases and generated 5K doctor-patient conversations to finetune the LLM. | Paper, Tweet | 
| 5. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention - a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model; generates responses comparable to Alpaca with fully fine-tuned 7B parameter; it’s also extended for multi-modal input support. | Paper , Tweet | 
| 6) ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks - demonstrates that ChatGPT can outperform crowd-workers for several annotation tasks such as relevance, topics, and frames detection; besides better zero-shot accuracy, the per-annotation cost of ChatGPT is less 20 times cheaper than MTurk. | Paper , Tweet | 
| 7) Language Models can Solve Computer Tasks - shows that a pre-trained LLM agent can execute computer tasks using a simple prompting scheme where the agent recursively criticizes and improves its outputs. | Paper, Tweet | 
| 8) DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents - a paradigm to enhance large language model completions by allowing models to communicate feedback and iteratively improve output; DERA outperforms base GPT-4 on clinically-focused tasks. | Paper, Tweet | 
| 9) Natural Selection Favors AIs over Humans - discusses why AI systems will become more fit than humans and the potential dangers and risks involved, including ways to mitigate them. | Paper, Tweet | 
| 10) Machine Learning for Partial Differential Equations - Pa review examining avenues of partial differential equations research advanced by machine learning. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Sparks of Artificial General Intelligence: Early experiments with GPT-4 - a comprehensive investigation of an early version of GPT-4 when it was still in active development by OpenAI. | Paper, Tweet | 
| 2) Reflexion: an autonomous agent with dynamic memory and self-reflection - proposes an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. | Paper, Tweet | 
| 3) Capabilities of GPT-4 on Medical Challenge Problems - shows that GPT-4 exceeds the passing score on USMLE by over 20 points and outperforms GPT-3.5 as well as models specifically fine-tuned on medical knowledge (Med-PaLM, a prompt-tuned version of Flan-PaLM 540B). | Paper, Tweet | 
| 4) GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models - investigates the potential implications of GPT models and related systems on the US labor market. | Paper, Tweet | 
| 5. CoLT5: Faster Long-Range Transformers with Conditional Computation - a long-input Transformer model that employs conditional computation, devoting more resources to important tokens in both feedforward and attention layers. | Paper , Tweet | 
| 6) Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity - compares human-generated ideas with those generated by generative AI chatbots like ChatGPT and YouChat; reports that 9.4% of humans were more creative than GPT-4 and that GAIs are valuable assistants in the creative process. | Paper , Tweet | 
| 7) A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models - a comprehensive capability analysis of GPT series models; evaluates performance on 9 natural language understanding tasks using 21 datasets. | Paper, Tweet | 
| 8) Context-faithful Prompting for Large Language Models - presents a prompting technique that aims to improve LLMs' faithfulness using strategies such as opinion-based prompts and counterfactual demonstrations. | Paper, Tweet | 
| 9) Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models - a method for extracting room-scale textured 3D meshes from 2D text-to-image models. | Paper, ProjectTweet | 
| 10) PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing - a trillion parameter language model with sparse heterogeneous computing. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) GPT-4 Technical Report - GPT-4 - a large multimodal model with broader general knowledge and problem-solving abilities. | Paper, Tweet | 
| 2) LERF: Language Embedded Radiance Fields - a method for grounding language embeddings from models like CLIP into NeRF; this enables open-ended language queries in 3D. | Paper, Tweet | 
| 3) An Overview on Language Models: Recent Developments and Outlook - an overview of language models covering recent developments and future directions. It also covers topics like linguistic units, structures, training methods, evaluation, and applications. | Paper, Tweet | 
| 4) Eliciting Latent Predictions from Transformers with the Tuned Lens - a method for transformer interpretability that can trace a language model predictions as it develops layer by layer. | Paper, Tweet | 
| 5. Meet in the Middle: A New Pre-training Paradigm - a new pre-training paradigm using techniques that jointly improve training data efficiency and capabilities of LMs in the infilling task; performance improvement is shown in code generation tasks. | Paper , Tweet | 
| 6) Resurrecting Recurrent Neural Networks for Long Sequences - demonstrates that careful design of deep RNNs using standard signal propagation arguments can recover the performance of deep state-space models on long-range reasoning tasks. | Paper , Tweet | 
| 7) UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation - a new approach to tune a lightweight and versatile retriever to automatically retrieve prompts to improve zero-shot performance and help mitigate hallucinations. | Paper, Tweet | 
| 8) Patches Are All You Need? - proposes ConvMixer, a parameter-efficient fully-convolutional model which replaces self-attention and MLP layers in ViTs with less-expressive depthwise and pointwise convolutional layers. | Paper, Tweet | 
| 9) NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes - a compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach; distills NeRFs into geometrically-accurate 3D meshes. | Paper, Tweet | 
| 10) High-throughput Generative Inference of Large Language Models with a Single GPU - a high-throughput generation engine for running LLMs with limited GPU memory. | Paper, Code , Tweet | 
| Paper | Links | 
|---|---|
| 1) PaLM-E: An Embodied Multimodal Language Model - incorporates real-world continuous sensor modalities resulting in an embodied LM that performs tasks such as robotic manipulation planning, visual QA, and other embodied reasoning tasks. | Paper, Demo , Tweet | 
| 2) Prismer: A Vision-Language Model with An Ensemble of Experts - a parameter-efficient vision-language model powered by an ensemble of domain experts; it efficiently pools expert knowledge from different domains and adapts it to various vision-language reasoning tasks. | Paper, GitHub, Project , Tweet | 
| 3) Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models - it connects ChatGPT and different visual foundation models to enable users to interact with ChatGPT beyond language format. | Paper, GitHub Tweet | 
| 4) A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT - an overview of generative AI - from GAN to ChatGPT. | Paper, Tweet | 
| 5. Larger language models do in-context learning differently - shows that with scale, LLMs can override semantic priors when presented with enough flipped labels; these models can also perform well when replacing targets with semantically-unrelated targets. | Paper , Tweet | 
| 6) Foundation Models for Decision Making: Problems, Methods, and Opportunities - provides an overview of foundation models for decision making, including tools, methods, and new research directions. | Project , Tweet | 
| 7) Hyena Hierarchy: Towards Larger Convolutional Language Models - a subquadratic drop-in replacement for attention; it interleaves implicit long convolutions and data-controlled gating and can learn on sequences 10x longer and up to 100x faster than optimized attention. | Paper, Code, Blog, Tweet | 
| 8) OpenICL: An Open-Source Framework for In-context Learning - a new open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs. | Paper, Repo, Tweet | 
| 9) MathPrompter: Mathematical Reasoning using Large Language Models - a technique that improves LLM performance on mathematical reasoning problems; it uses zero-shot chain-of-thought prompting and verification to ensure generated answers are accurate. | Paper, Tweet | 
| 10) Scaling up GANs for Text-to-Image Synthesis - enables scaling up GANs on large datasets for text-to-image synthesis; it’s found to be orders of magnitude faster at inference time, synthesizes high-resolution images, & supports various latent space editing applications. | Paper, Project , Tweet | 
| Paper | Links | 
|---|---|
| 1) Language Is Not All You Need: Aligning Perception with Language Models - introduces a multimodal large language model called Kosmos-1; achieves great performance on language understanding, OCR-free NLP, perception-language tasks, visual QA, and more. | Paper, Tweet | 
| 2) Evidence of a predictive coding hierarchy in the human brain listening to speech - finds that human brain activity is best explained by the activations of modern language models enhanced with long-range and hierarchical predictions. | Paper, Tweet | 
| 3) EvoPrompting: Language Models for Code-Level Neural Architecture Search - combines evolutionary prompt engineering with soft prompt-tuning to find high-performing models; it leverages few-shot prompting which is further improved by using an evolutionary search approach to improve the in-context examples. | Paper, Tweet | 
| 4) Consistency Models - a new family of generative models that achieve high sample quality without adversarial training. | Paper, Tweet | 
| 5. Goal Driven Discovery of Distributional Differences via Language Descriptions - a new task that automatically discovers corpus-level differences via language description in a goal-driven way; applications include discovering insights from commercial reviews and error patterns in NLP systems. | Paper , Code, Tweet | 
| 6) High-resolution image reconstruction with latent diffusion models from human brain activity - proposes an approach for high-resolution image reconstruction with latent diffusion models from human brain activity. | Project , Tweet | 
| 7) Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control - a scalable approach to planning with LLMs in embodied settings through grounding functions; GD is found to be a general, flexible, and expressive approach to embodied tasks. | Paper, Project Tweet | 
| 8) Language-Driven Representation Learning for Robotics - a framework for language-driven representation learning from human videos and captions for robotics. | Paper, Models, Evaluation, Tweet | 
| 9) Dropout Reduces Underfitting - demonstrates that dropout can mitigate underfitting when used at the start of training; it counteracts SGD stochasticity and limits the influence of individual batches when training models. | Paper, Tweet | 
| 10) Enabling Conversational Interaction with Mobile UI using Large Language Models - an approach that enables versatile conversational interactions with mobile UIs using a single LLM. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) LLaMA: Open and Efficient Foundation Language Models - a 65B parameter foundation model released by Meta AI; relies on publicly available data and outperforms GPT-3 on most benchmarks despite being 10x smaller. | Paper, Tweet | 
| 2) Composer: Creative and Controllable Image Synthesis with Composable Conditions - a 5B parameter creative and controllable diffusion model trained on billions (text, image) pairs. | Paper, Project , GitHub , Tweet | 
| 3) The Wisdom of Hindsight Makes Language Models Better Instruction Followers - an alternative algorithm to train LLMs from feedback; the feedback is converted to instruction by relabeling the original one and training the model, in a supervised way, for better alignment. | Paper, GitHub Tweet | 
| 4) Active Prompting with Chain-of-Thought for Large Language Models - a prompting technique to adapt LLMs to different task-specific example prompts (annotated with human-designed chain-of-thought reasoning); this process involves finding where the LLM is most uncertain and annotating those. | Paper, Code Tweet | 
| 5. Modular Deep Learning - a survey offering a unified view of the building blocks of modular neural networks; it also includes a discussion about modularity in the context of scaling LMs, causal inference, and other key topics in ML. | Paper , Project, Tweet | 
| 6) Recitation-Augmented Language Models - an approach that recites passages from the LLM’s own memory to produce final answers; shows high performance on knowledge-intensive tasks. | Paper , Tweet | 
| 7) Learning Performance-Improving Code Edits - an approach that uses LLMs to suggest functionally correct, performance-improving code edits. | Paper, Tweet | 
| 8) More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models - a comprehensive analysis of novel prompt injection threats to application-integrated LLMs. | Paper, Tweet | 
| 9) Aligning Text-to-Image Models using Human Feedback - proposes a fine-tuning method to align generative models using human feedback. | Paper, Tweet | 
| 10) MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in Unbounded Scenes - a memory-efficient radiance field representation for real-time view synthesis of large-scale scenes in a browser. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Symbolic Discovery of Optimization Algorithms - a simple and effective optimization algorithm that’s more memory-efficient than Adam. | Paper, Tweet | 
| 2) Transformer models: an introduction and catalog | Paper, Tweet | 
| 3) 3D-aware Conditional Image Synthesis - a 3D-aware conditional generative model extended with neural radiance fields for controllable photorealistic image synthesis. | Paper, Project Tweet | 
| 4) The Capacity for Moral Self-Correction in Large Language Models - finds strong evidence that language models trained with RLHF have the capacity for moral self-correction. The capability emerges at 22B model parameters and typically improves with scale. | Paper, Tweet | 
| 6) Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment - an unsupervised method for text-image alignment that leverages pretrained language models; it enables few-shot image classification with LLMs. | Paper , Code Tweet | 
| 7) Augmented Language Models: a Survey - a survey of language models that are augmented with reasoning skills and the capability to use tools. | Paper, Tweet | 
| 8) Geometric Clifford Algebra Networks - an approach to incorporate geometry-guided transformations into neural networks using geometric algebra. | Paper, Tweet | 
| 9) Auditing large language models: a three-layered approach - proposes a policy framework for auditing LLMs. | Paper, Tweet | 
| 10) Energy Transformer - a transformer architecture that replaces the sequence of feedforward transformer blocks with a single large Associate Memory model; this follows the popularity that Hopfield Networks have gained in the field of ML. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Toolformer: Language Models Can Teach Themselves to Use Tools - introduces language models that teach themselves to use external tools via simple API calls. | Paper, Tweet | 
| 2) Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents - proposes using language models for open-world game playing. | Paper, Tweet | 
| 3) A Categorical Archive of ChatGPT Failures - a comprehensive analysis of ChatGPT failures for categories like reasoning, factual errors, maths, and coding. | Paper, Tweet | 
| 4) Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery - optimizing hard text prompts through efficient gradient-based optimization. | Paper, Tweet | 
| 5) Data Selection for Language Models via Importance Resampling - proposes a cheap and scalable data selection framework based on an importance resampling algorithm to improve the downstream performance of LMs. | Paper, Tweet | 
| 6) Structure and Content-Guided Video Synthesis with Diffusion Models - proposes an approach for structure and content-guided video synthesis with diffusion models. | Paper , Project, Tweet | 
| 7) A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity - performs a more rigorous evaluation of ChatGPt on reasoning, hallucination, and interactivity. | Paper, Tweet | 
| 8) Noise2Music: Text-conditioned Music Generation with Diffusion Models - proposes diffusion models to generate high-quality 30-second music clips via text prompts. | Paper, Project, Tweet | 
| 9) Offsite-Tuning: Transfer Learning without Full Model - introduces an efficient, privacy-preserving transfer learning framework to adapt foundational models to downstream data without access to the full model. | Paper, Project, Tweet | 
| 10) Zero-shot Image-to-Image Translation - proposes a model for zero-shot image-to-image translation. | Paper, Project, Tweet | 
| Paper | Links | 
|---|---|
| 1) REPLUG: Retrieval-Augmented Black-Box Language Models - a retrieval-augmented LM framework that adapts a retriever to a large-scale, black-box LM like GPT-3. | Paper, Tweet | 
| 2) Extracting Training Data from Diffusion Models - shows that diffusion-based generative models can memorize images from the training data and emit them at generation time. | Paper, Tweet | 
| 3) The Flan Collection: Designing Data and Methods for Effective Instruction Tuning - release a more extensive publicly available collection of tasks, templates, and methods to advancing instruction-tuned models. | Paper, Tweet | 
| 4) Multimodal Chain-of-Thought Reasoning in Language Models - incorporates vision features to elicit chain-of-thought reasoning in multimodality, enabling the model to generate effective rationales that contribute to answer inference. | Paper, Code Tweet | 
| 5) Dreamix: Video Diffusion Models are General Video Editors - a diffusion model that performs text-based motion and appearance editing of general videos. | Paper, Project, Tweet | 
| 6) Benchmarking Large Language Models for News Summarization | Paper , Tweet | 
| 7) Mathematical Capabilities of ChatGPT - investigates the mathematical capabilities of ChatGPT on a new holistic benchmark called GHOSTS. | Paper, Tweet | 
| 8) Emergence of Maps in the Memories of Blind Navigation Agents - trains an AI agent to navigate purely by feeling its way around; no use of vision, audio, or any other sensing (as in animals). | Paper, Project, Tweet | 
| 9) SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections - a generative model that synthesizes large-scale 3D landscapes from random noises. | Paper, Tweet | 
| 10) Large Language Models Can Be Easily Distracted by Irrelevant Context - finds that many prompting techniques fail when presented with irrelevant context for arithmetic reasoning. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) MusicLM: Generating Music From Text - a generative model for generating high-fidelity music from text descriptions. | Paper, Tweet | 
| 2) Hungry Hungry Hippos: Towards Language Modeling with State Space Models - an approach to reduce the gap, in terms of performance and hardware utilization, between state space models and attention for language modeling. | Paper, Tweet | 
| 3) A Watermark for Large Language Models - a watermarking framework for proprietary language models. | Paper, Tweet | 
| 4) Text-To-4D Dynamic Scene Generation - a new text-to-4D model for dynamic scene generation from input text. | Paper, GitHub, Tweet | 
| 5) ClimaX: A foundation model for weather and climate - a foundation model for weather and climate, including many capabilities for atmospheric science tasks. | Paper, Tweet, Blog | 
| 6) Open Problems in Applied Deep Learning - If you're looking for interesting open problems in DL, this is a good reference. Not sure if intentional but it also looks useful to get a general picture of current trends in deep learning with ~300 references. | Paper , Tweet | 
| 7) DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature - an approach for zero-shot machine-generated text detection. Uses raw log probabilities from the LLM to determine if the passage was sampled from it. | Paper, Tweet | 
| 8) StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis - a new model that aims to regain the competitiveness of GANs for fast large-scale text-to-image synthesis. | Paper, Project, Code Tweet | 
| 9) Large language models generate functional protein sequences across diverse families - an LLM that can generate protein sequences with a predictable function across large protein families. | Paper, Tweet | 
| 10) The Impossibility of Parallelizing Boosting - investigates the possibility of parallelizing boosting. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Google AI Research Recap (2022 Edition) - an excellent summary of some notable research Google AI did in 2022. | Blog, Tweet | 
| 2) Dissociating language and thought in large language models: a cognitive perspective - a review paper on the capabilities of LLMs from a cognitive science perspective. | Paper, Tweet | 
| 3) Human-Timescale Adaptation in an Open-Ended Task Space - an agent trained at scale that leads to a general in-content learning algorithm able to adapt to open-ended embodied 3D problems. | Paper, Tweet | 
| 4) AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation - an approach to help provide explanations of generative transformer models through memory-efficient attention manipulation. | Paper, Tweet | 
| 5) Everything is Connected: Graph Neural Networks - short overview of key concepts in graph representation learning. | Paper, Tweet | 
| 6) GLIGEN: Open-Set Grounded Text-to-Image Generation - an approach that extends the functionality of existing pre-trained text-to-image diffusion models by enabling conditioning on grounding inputs. | Paper, Tweet, Project | 
| 7) InstructPix2Pix: Learning to Follow Image Editing Instructions - proposes a method with the capability of editing images from human instructions. | Paper, Tweet | 
| 8) Dataset Distillation: A Comprehensive Review | Paper, Tweet | 
| 9) Learning-Rate-Free Learning by D-Adaptation - a new method for automatically adjusting the learning rate during training, applicable to more than a dozen diverse ML problems. | Paper, Tweet | 
| 10) RecolorNeRF: Layer Decomposed Radiance Field for Efficient Color Editing of 3D Scenes - a user-friendly color editing approach for the neural radiance field to achieve a more efficient view-consistent recoloring. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Mastering Diverse Domains through World Models - a general algorithm to collect diamonds in Minecraft from scratch without human data or curricula, a long-standing challenge in AI. | Paper, Tweet | 
| 2) Tracr: Compiled Transformers as a Laboratory for Interpretability - a compiler for converting RASP programs into transformer weights. This way of constructing NNs weights enables the development and evaluation of new interpretability tools. | Paper, Tweet, Code | 
| 3) Multimodal Deep Learning - multimodal deep learning is a new book published on ArXiv. | Book, Tweet | 
| 4) Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk - new work analyzing how generative LMs could potentially be misused for disinformation and how to mitigate these types of risks. | Paper, Tweet | 
| 5) Why do Nearest Neighbor Language Models Work? - empirically identifies reasons why retrieval-augmented LMs (specifically k-nearest neighbor LMs) perform better than standard parametric LMs. | Paper, Code, Tweet | 
| 6) Memory Augmented Large Language Models are Computationally Universal - investigates the use of existing LMs (e.g, Flan-U-PaLM 540B) combined with associative read-write memory to simulate the execution of a universal Turing machine. | Paper , Tweet | 
| 7) A Survey on Transformers in Reinforcement Learning - transformers for RL will be a fascinating research area to track. The same is true for the reverse direction (RL for Transformers)... a notable example: using RLHF to improve LLMs (e.g., ChatGPT). | Paper, Tweet | 
| 8) Scaling Laws for Generative Mixed-Modal Language Models - introduces scaling laws for generative mixed-modal language models. | Paper, Tweet | 
| 9) DeepMatcher: A Deep Transformer-based Network for Robust and Accurate Local Feature Matching - a transformer-based network showing robust local feature matching, outperforming the state-of-the-art methods on several benchmarks. | Paper, Tweet | 
| 10) Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement - addresses the time series forecasting problem with generative modeling; involves a bidirectional VAE backbone equipped with diffusion, denoising for prediction accuracy, and disentanglement for model interpretability. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Muse: Text-To-Image Generation via Masked Generative Transformers - introduces Muse, a new text-to-image generation model based on masked generative transformers; significantly more efficient than other diffusion models like Imagen and DALLE-2. | Paper, Project, Code, Tweet | 
| 2) VALL-E Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers - introduces VALL-E, a text-to-audio model that performs state-of-the-art zero-shot performance; the text-to-speech synthesis task is treated as a conditional language modeling task. | Project, Tweet | 
| 3) Rethinking with Retrieval: Faithful Large Language Model Inference - shows the potential of enhancing LLMs by retrieving relevant external knowledge based on decomposed reasoning steps obtained through chain-of-thought prompting. | Paper, Tweet | 
| 4) SparseGPT: Massive Language Models Can Be Accurately Pruned In One-Shot - presents a technique for compressing large language models while not sacrificing performance; "pruned to at least 50% sparsity in one-shot, without any retraining." | Paper, Tweet | 
| 5) ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders - a performant model based on a fully convolutional masked autoencoder framework and other architectural improvements. CNNs are sticking back! | Paper, Code, Tweet | 
| 6) Large Language Models as Corporate Lobbyists - with more capabilities, we are starting to see a wider range of applications with LLMs. This paper utilized large language models for conducting corporate lobbying activities. | Paper , Code, Tweet | 
| 7) Superposition, Memorization, and Double Descent - aims to better understand how deep learning models overfit or memorize examples; interesting phenomena observed; important work toward a mechanistic theory of memorization. | Paper, Tweet | 
| 8) StitchNet: Composing Neural Networks from Pre-Trained Fragments - new idea to create new coherent neural networks by reusing pretrained fragments of existing NNs. Not straightforward but there is potential in terms of efficiently reusing learned knowledge in pre-trained networks for complex tasks. | Paper, Tweet | 
| 9) Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes - proposes integrated decomposition, an approach to improve Science Q&A through a human-in-the-loop workflow for refining compositional LM programs. | Paper, Code Tweet | 
| 10) A Succinct Summary of Reinforcement Learning - a nice overview of some important ideas in RL. | Paper, Tweet | 
We use a combination of AI-powered tools, analytics, and human curation to build the lists of papers.
Subscribe to our NLP Newsletter to stay on top of ML research and trends.
Join our Discord.