|  | 2 vuotta sitten | |
|---|---|---|
| pics | 2 vuotta sitten | |
| README.md | 2 vuotta sitten | 
At DAIR.AI we ❤️ reading ML papers so we've created this repo to highlight the top ML papers of every week.
| Paper | Links | 
|---|---|
| 1) BloombergGPT: A Large Language Model for Finance - a new 50B parameter large language model for finance. Claims the largest domain-specific dataset yet with 363 billion tokens... further augmented with 345 billion tokens from general-purpose datasets; outperforms existing models on financial tasks while not sacrificing performance on general LLM benchmarks. | Paper, Tweet | 
| 2) Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware - a low-cost system that performs end-to-end imitation learning from real demonstrations; also presents an algorithm called Action Chunking with Transformers to learn a generative model that allows a robot to learn difficult tasks in the real world. | Paper, Tweet | 
| 3) HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace - a system that leverages LLMs like ChatGPT to conduct task planning, select models and act as a controller to execute subtasks and summarize responses according to execution results. | Paper, Tweet | 
| 4) ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge - a medical chat model fine-tuned on LLaMA using medical domain knowledge. Collects data on around 700 diseases and generated 5K doctor-patient conversations to finetune the LLM. | Paper, Tweet | 
| 5. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention - a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model; generates responses comparable to Alpaca with fully fine-tuned 7B parameter; it’s also extended for multi-modal input support. | Paper , Tweet | 
| 6) ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks - demonstrates that ChatGPT can outperform crowd-workers for several annotation tasks such as relevance, topics, and frames detection; besides better zero-shot accuracy, the per-annotation cost of ChatGPT is less 20 times cheaper than MTurk. | Paper , Tweet | 
| 7) Language Models can Solve Computer Tasks - shows that a pre-trained LLM agent can execute computer tasks using a simple prompting scheme where the agent recursively criticizes and improves its outputs. | Paper, Tweet | 
| 8) DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents - a paradigm to enhance large language model completions by allowing models to communicate feedback and iteratively improve output; DERA outperforms base GPT-4 on clinically-focused tasks. | Paper, Tweet | 
| 9) Natural Selection Favors AIs over Humans - discusses why AI systems will become more fit than humans and the potential dangers and risks involved, including ways to mitigate them. | Paper, Tweet | 
| 10) Machine Learning for Partial Differential Equations - Pa review examining avenues of partial differential equations research advanced by machine learning. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Sparks of Artificial General Intelligence: Early experiments with GPT-4 - a comprehensive investigation of an early version of GPT-4 when it was still in active development by OpenAI. | Paper, Tweet | 
| 2) Reflexion: an autonomous agent with dynamic memory and self-reflection - proposes an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. | Paper, Tweet | 
| 3) Capabilities of GPT-4 on Medical Challenge Problems - shows that GPT-4 exceeds the passing score on USMLE by over 20 points and outperforms GPT-3.5 as well as models specifically fine-tuned on medical knowledge (Med-PaLM, a prompt-tuned version of Flan-PaLM 540B). | Paper, Tweet | 
| 4) GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models - investigates the potential implications of GPT models and related systems on the US labor market. | Paper, Tweet | 
| 5. CoLT5: Faster Long-Range Transformers with Conditional Computation - a long-input Transformer model that employs conditional computation, devoting more resources to important tokens in both feedforward and attention layers. | Paper , Tweet | 
| 6) Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity - compares human-generated ideas with those generated by generative AI chatbots like ChatGPT and YouChat; reports that 9.4% of humans were more creative than GPT-4 and that GAIs are valuable assistants in the creative process. | Paper , Tweet | 
| 7) A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models - a comprehensive capability analysis of GPT series models; evaluates performance on 9 natural language understanding tasks using 21 datasets. | Paper, Tweet | 
| 8) Context-faithful Prompting for Large Language Models - presents a prompting technique that aims to improve LLMs' faithfulness using strategies such as opinion-based prompts and counterfactual demonstrations. | Paper, Tweet | 
| 9) Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models - a method for extracting room-scale textured 3D meshes from 2D text-to-image models. | Paper, ProjectTweet | 
| 10) PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing - a trillion parameter language model with sparse heterogeneous computing. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) GPT-4 Technical Report - GPT-4 - a large multimodal model with broader general knowledge and problem-solving abilities. | Paper, Tweet | 
| 2) LERF: Language Embedded Radiance Fields - a method for grounding language embeddings from models like CLIP into NeRF; this enables open-ended language queries in 3D. | Paper, Tweet | 
| 3) An Overview on Language Models: Recent Developments and Outlook - an overview of language models covering recent developments and future directions. It also covers topics like linguistic units, structures, training methods, evaluation, and applications. | Paper, Tweet | 
| 4) Eliciting Latent Predictions from Transformers with the Tuned Lens - a method for transformer interpretability that can trace a language model predictions as it develops layer by layer. | Paper, Tweet | 
| 5. Meet in the Middle: A New Pre-training Paradigm - a new pre-training paradigm using techniques that jointly improve training data efficiency and capabilities of LMs in the infilling task; performance improvement is shown in code generation tasks. | Paper , Tweet | 
| 6) Resurrecting Recurrent Neural Networks for Long Sequences - demonstrates that careful design of deep RNNs using standard signal propagation arguments can recover the performance of deep state-space models on long-range reasoning tasks. | Paper , Tweet | 
| 7) UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation - a new approach to tune a lightweight and versatile retriever to automatically retrieve prompts to improve zero-shot performance and help mitigate hallucinations. | Paper, Tweet | 
| 8) Patches Are All You Need? - proposes ConvMixer, a parameter-efficient fully-convolutional model which replaces self-attention and MLP layers in ViTs with less-expressive depthwise and pointwise convolutional layers. | Paper, Tweet | 
| 9) NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes - a compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach; distills NeRFs into geometrically-accurate 3D meshes. | Paper, Tweet | 
| 10) High-throughput Generative Inference of Large Language Models with a Single GPU - a high-throughput generation engine for running LLMs with limited GPU memory. | Paper, Code , Tweet | 
| Paper | Links | 
|---|---|
| 1) PaLM-E: An Embodied Multimodal Language Model - incorporates real-world continuous sensor modalities resulting in an embodied LM that performs tasks such as robotic manipulation planning, visual QA, and other embodied reasoning tasks. | Paper, Demo , Tweet | 
| 2) Prismer: A Vision-Language Model with An Ensemble of Experts - a parameter-efficient vision-language model powered by an ensemble of domain experts; it efficiently pools expert knowledge from different domains and adapts it to various vision-language reasoning tasks. | Paper, GitHub, Project , Tweet | 
| 3) Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models - it connects ChatGPT and different visual foundation models to enable users to interact with ChatGPT beyond language format. | Paper, GitHub Tweet | 
| 4) A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT - an overview of generative AI - from GAN to ChatGPT. | Paper, Tweet | 
| 5. Larger language models do in-context learning differently - shows that with scale, LLMs can override semantic priors when presented with enough flipped labels; these models can also perform well when replacing targets with semantically-unrelated targets. | Paper , Tweet | 
| 6) Foundation Models for Decision Making: Problems, Methods, and Opportunities - provides an overview of foundation models for decision making, including tools, methods, and new research directions. | Project , Tweet | 
| 7) Hyena Hierarchy: Towards Larger Convolutional Language Models - a subquadratic drop-in replacement for attention; it interleaves implicit long convolutions and data-controlled gating and can learn on sequences 10x longer and up to 100x faster than optimized attention. | Paper, Code, Blog, Tweet | 
| 8) OpenICL: An Open-Source Framework for In-context Learning - a new open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs. | Paper, Repo, Tweet | 
| 9) MathPrompter: Mathematical Reasoning using Large Language Models - a technique that improves LLM performance on mathematical reasoning problems; it uses zero-shot chain-of-thought prompting and verification to ensure generated answers are accurate. | Paper, Tweet | 
| 10) Scaling up GANs for Text-to-Image Synthesis - enables scaling up GANs on large datasets for text-to-image synthesis; it’s found to be orders of magnitude faster at inference time, synthesizes high-resolution images, & supports various latent space editing applications. | Paper, Project , Tweet | 
| Paper | Links | 
|---|---|
| 1) Language Is Not All You Need: Aligning Perception with Language Models - introduces a multimodal large language model called Kosmos-1; achieves great performance on language understanding, OCR-free NLP, perception-language tasks, visual QA, and more. | Paper, Tweet | 
| 2) Evidence of a predictive coding hierarchy in the human brain listening to speech - finds that human brain activity is best explained by the activations of modern language models enhanced with long-range and hierarchical predictions. | Paper, Tweet | 
| 3) EvoPrompting: Language Models for Code-Level Neural Architecture Search - combines evolutionary prompt engineering with soft prompt-tuning to find high-performing models; it leverages few-shot prompting which is further improved by using an evolutionary search approach to improve the in-context examples. | Paper, Tweet | 
| 4) Consistency Models - a new family of generative models that achieve high sample quality without adversarial training. | Paper, Tweet | 
| 5. Goal Driven Discovery of Distributional Differences via Language Descriptions - a new task that automatically discovers corpus-level differences via language description in a goal-driven way; applications include discovering insights from commercial reviews and error patterns in NLP systems. | Paper , Code, Tweet | 
| 6) High-resolution image reconstruction with latent diffusion models from human brain activity - proposes an approach for high-resolution image reconstruction with latent diffusion models from human brain activity. | Project , Tweet | 
| 7) Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control - a scalable approach to planning with LLMs in embodied settings through grounding functions; GD is found to be a general, flexible, and expressive approach to embodied tasks. | Paper, Project Tweet | 
| 8) Language-Driven Representation Learning for Robotics - a framework for language-driven representation learning from human videos and captions for robotics. | Paper, Models, Evaluation, Tweet | 
| 9) Dropout Reduces Underfitting - demonstrates that dropout can mitigate underfitting when used at the start of training; it counteracts SGD stochasticity and limits the influence of individual batches when training models. | Paper, Tweet | 
| 10) Enabling Conversational Interaction with Mobile UI using Large Language Models - an approach that enables versatile conversational interactions with mobile UIs using a single LLM. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) LLaMA: Open and Efficient Foundation Language Models - a 65B parameter foundation model released by Meta AI; relies on publicly available data and outperforms GPT-3 on most benchmarks despite being 10x smaller. | Paper, Tweet | 
| 2) Composer: Creative and Controllable Image Synthesis with Composable Conditions - a 5B parameter creative and controllable diffusion model trained on billions (text, image) pairs. | Paper, Project , GitHub , Tweet | 
| 3) The Wisdom of Hindsight Makes Language Models Better Instruction Followers - an alternative algorithm to train LLMs from feedback; the feedback is converted to instruction by relabeling the original one and training the model, in a supervised way, for better alignment. | Paper, GitHub Tweet | 
| 4) Active Prompting with Chain-of-Thought for Large Language Models - a prompting technique to adapt LLMs to different task-specific example prompts (annotated with human-designed chain-of-thought reasoning); this process involves finding where the LLM is most uncertain and annotating those. | Paper, Code Tweet | 
| 5. Modular Deep Learning - a survey offering a unified view of the building blocks of modular neural networks; it also includes a discussion about modularity in the context of scaling LMs, causal inference, and other key topics in ML. | Paper , Project, Tweet | 
| 6) Recitation-Augmented Language Models - an approach that recites passages from the LLM’s own memory to produce final answers; shows high performance on knowledge-intensive tasks. | Paper , Tweet | 
| 7) Learning Performance-Improving Code Edits - an approach that uses LLMs to suggest functionally correct, performance-improving code edits. | Paper, Tweet | 
| 8) More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models - a comprehensive analysis of novel prompt injection threats to application-integrated LLMs. | Paper, Tweet | 
| 9) Aligning Text-to-Image Models using Human Feedback - proposes a fine-tuning method to align generative models using human feedback. | Paper, Tweet | 
| 10) MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in Unbounded Scenes - a memory-efficient radiance field representation for real-time view synthesis of large-scale scenes in a browser. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Symbolic Discovery of Optimization Algorithms - a simple and effective optimization algorithm that’s more memory-efficient than Adam. | Paper, Tweet | 
| 2) Transformer models: an introduction and catalog | Paper, Tweet | 
| 3) 3D-aware Conditional Image Synthesis - a 3D-aware conditional generative model extended with neural radiance fields for controllable photorealistic image synthesis. | Paper, Project Tweet | 
| 4) The Capacity for Moral Self-Correction in Large Language Models - finds strong evidence that language models trained with RLHF have the capacity for moral self-correction. The capability emerges at 22B model parameters and typically improves with scale. | Paper, Tweet | 
| 6) Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment - an unsupervised method for text-image alignment that leverages pretrained language models; it enables few-shot image classification with LLMs. | Paper , Code Tweet | 
| 7) Augmented Language Models: a Survey - a survey of language models that are augmented with reasoning skills and the capability to use tools. | Paper, Tweet | 
| 8) Geometric Clifford Algebra Networks - an approach to incorporate geometry-guided transformations into neural networks using geometric algebra. | Paper, Tweet | 
| 9) Auditing large language models: a three-layered approach - proposes a policy framework for auditing LLMs. | Paper, Tweet | 
| 10) Energy Transformer - a transformer architecture that replaces the sequence of feedforward transformer blocks with a single large Associate Memory model; this follows the popularity that Hopfield Networks have gained in the field of ML. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Toolformer: Language Models Can Teach Themselves to Use Tools - introduces language models that teach themselves to use external tools via simple API calls. | Paper, Tweet | 
| 2) Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents - proposes using language models for open-world game playing. | Paper, Tweet | 
| 3) A Categorical Archive of ChatGPT Failures - a comprehensive analysis of ChatGPT failures for categories like reasoning, factual errors, maths, and coding. | Paper, Tweet | 
| 4) Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery - optimizing hard text prompts through efficient gradient-based optimization. | Paper, Tweet | 
| 5) Data Selection for Language Models via Importance Resampling - proposes a cheap and scalable data selection framework based on an importance resampling algorithm to improve the downstream performance of LMs. | Paper, Tweet | 
| 6) Structure and Content-Guided Video Synthesis with Diffusion Models - proposes an approach for structure and content-guided video synthesis with diffusion models. | Paper , Project, Tweet | 
| 7) A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity - performs a more rigorous evaluation of ChatGPt on reasoning, hallucination, and interactivity. | Paper, Tweet | 
| 8) Noise2Music: Text-conditioned Music Generation with Diffusion Models - proposes diffusion models to generate high-quality 30-second music clips via text prompts. | Paper, Project, Tweet | 
| 9) Offsite-Tuning: Transfer Learning without Full Model - introduces an efficient, privacy-preserving transfer learning framework to adapt foundational models to downstream data without access to the full model. | Paper, Project, Tweet | 
| 10) Zero-shot Image-to-Image Translation - proposes a model for zero-shot image-to-image translation. | Paper, Project, Tweet | 
| Paper | Links | 
|---|---|
| 1) REPLUG: Retrieval-Augmented Black-Box Language Models - a retrieval-augmented LM framework that adapts a retriever to a large-scale, black-box LM like GPT-3. | Paper, Tweet | 
| 2) Extracting Training Data from Diffusion Models - shows that diffusion-based generative models can memorize images from the training data and emit them at generation time. | Paper, Tweet | 
| 3) The Flan Collection: Designing Data and Methods for Effective Instruction Tuning - release a more extensive publicly available collection of tasks, templates, and methods to advancing instruction-tuned models. | Paper, Tweet | 
| 4) Multimodal Chain-of-Thought Reasoning in Language Models - incorporates vision features to elicit chain-of-thought reasoning in multimodality, enabling the model to generate effective rationales that contribute to answer inference. | Paper, Code Tweet | 
| 5) Dreamix: Video Diffusion Models are General Video Editors - a diffusion model that performs text-based motion and appearance editing of general videos. | Paper, Project, Tweet | 
| 6) Benchmarking Large Language Models for News Summarization | Paper , Tweet | 
| 7) Mathematical Capabilities of ChatGPT - investigates the mathematical capabilities of ChatGPT on a new holistic benchmark called GHOSTS. | Paper, Tweet | 
| 8) Emergence of Maps in the Memories of Blind Navigation Agents - trains an AI agent to navigate purely by feeling its way around; no use of vision, audio, or any other sensing (as in animals). | Paper, Project, Tweet | 
| 9) SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections - a generative model that synthesizes large-scale 3D landscapes from random noises. | Paper, Tweet | 
| 10) Large Language Models Can Be Easily Distracted by Irrelevant Context - finds that many prompting techniques fail when presented with irrelevant context for arithmetic reasoning. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) MusicLM: Generating Music From Text - a generative model for generating high-fidelity music from text descriptions. | Paper, Tweet | 
| 2) Hungry Hungry Hippos: Towards Language Modeling with State Space Models - an approach to reduce the gap, in terms of performance and hardware utilization, between state space models and attention for language modeling. | Paper, Tweet | 
| 3) A Watermark for Large Language Models - a watermarking framework for proprietary language models. | Paper, Tweet | 
| 4) Text-To-4D Dynamic Scene Generation - a new text-to-4D model for dynamic scene generation from input text. | Paper, GitHub, Tweet | 
| 5) ClimaX: A foundation model for weather and climate - a foundation model for weather and climate, including many capabilities for atmospheric science tasks. | Paper, Tweet, Blog | 
| 6) Open Problems in Applied Deep Learning - If you're looking for interesting open problems in DL, this is a good reference. Not sure if intentional but it also looks useful to get a general picture of current trends in deep learning with ~300 references. | Paper , Tweet | 
| 7) DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature - an approach for zero-shot machine-generated text detection. Uses raw log probabilities from the LLM to determine if the passage was sampled from it. | Paper, Tweet | 
| 8) StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis - a new model that aims to regain the competitiveness of GANs for fast large-scale text-to-image synthesis. | Paper, Project, Code Tweet | 
| 9) Large language models generate functional protein sequences across diverse families - an LLM that can generate protein sequences with a predictable function across large protein families. | Paper, Tweet | 
| 10) The Impossibility of Parallelizing Boosting - investigates the possibility of parallelizing boosting. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Google AI Research Recap (2022 Edition) - an excellent summary of some notable research Google AI did in 2022. | Blog, Tweet | 
| 2) Dissociating language and thought in large language models: a cognitive perspective - a review paper on the capabilities of LLMs from a cognitive science perspective. | Paper, Tweet | 
| 3) Human-Timescale Adaptation in an Open-Ended Task Space - an agent trained at scale that leads to a general in-content learning algorithm able to adapt to open-ended embodied 3D problems. | Paper, Tweet | 
| 4) AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation - an approach to help provide explanations of generative transformer models through memory-efficient attention manipulation. | Paper, Tweet | 
| 5) Everything is Connected: Graph Neural Networks - short overview of key concepts in graph representation learning. | Paper, Tweet | 
| 6) GLIGEN: Open-Set Grounded Text-to-Image Generation - an approach that extends the functionality of existing pre-trained text-to-image diffusion models by enabling conditioning on grounding inputs. | Paper, Tweet, Project | 
| 7) InstructPix2Pix: Learning to Follow Image Editing Instructions - proposes a method with the capability of editing images from human instructions. | Paper, Tweet | 
| 8) Dataset Distillation: A Comprehensive Review | Paper, Tweet | 
| 9) Learning-Rate-Free Learning by D-Adaptation - a new method for automatically adjusting the learning rate during training, applicable to more than a dozen diverse ML problems. | Paper, Tweet | 
| 10) RecolorNeRF: Layer Decomposed Radiance Field for Efficient Color Editing of 3D Scenes - a user-friendly color editing approach for the neural radiance field to achieve a more efficient view-consistent recoloring. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Mastering Diverse Domains through World Models - a general algorithm to collect diamonds in Minecraft from scratch without human data or curricula, a long-standing challenge in AI. | Paper, Tweet | 
| 2) Tracr: Compiled Transformers as a Laboratory for Interpretability - a compiler for converting RASP programs into transformer weights. This way of constructing NNs weights enables the development and evaluation of new interpretability tools. | Paper, Tweet, Code | 
| 3) Multimodal Deep Learning - multimodal deep learning is a new book published on ArXiv. | Book, Tweet | 
| 4) Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk - new work analyzing how generative LMs could potentially be misused for disinformation and how to mitigate these types of risks. | Paper, Tweet | 
| 5) Why do Nearest Neighbor Language Models Work? - empirically identifies reasons why retrieval-augmented LMs (specifically k-nearest neighbor LMs) perform better than standard parametric LMs. | Paper, Code, Tweet | 
| 6) Memory Augmented Large Language Models are Computationally Universal - investigates the use of existing LMs (e.g, Flan-U-PaLM 540B) combined with associative read-write memory to simulate the execution of a universal Turing machine. | Paper , Tweet | 
| 7) A Survey on Transformers in Reinforcement Learning - transformers for RL will be a fascinating research area to track. The same is true for the reverse direction (RL for Transformers)... a notable example: using RLHF to improve LLMs (e.g., ChatGPT). | Paper, Tweet | 
| 8) Scaling Laws for Generative Mixed-Modal Language Models - introduces scaling laws for generative mixed-modal language models. | Paper, Tweet | 
| 9) DeepMatcher: A Deep Transformer-based Network for Robust and Accurate Local Feature Matching - a transformer-based network showing robust local feature matching, outperforming the state-of-the-art methods on several benchmarks. | Paper, Tweet | 
| 10) Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement - addresses the time series forecasting problem with generative modeling; involves a bidirectional VAE backbone equipped with diffusion, denoising for prediction accuracy, and disentanglement for model interpretability. | Paper, Tweet | 
| Paper | Links | 
|---|---|
| 1) Muse: Text-To-Image Generation via Masked Generative Transformers - introduces Muse, a new text-to-image generation model based on masked generative transformers; significantly more efficient than other diffusion models like Imagen and DALLE-2. | Paper, Project, Code, Tweet | 
| 2) VALL-E Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers - introduces VALL-E, a text-to-audio model that performs state-of-the-art zero-shot performance; the text-to-speech synthesis task is treated as a conditional language modeling task. | Project, Tweet | 
| 3) Rethinking with Retrieval: Faithful Large Language Model Inference - shows the potential of enhancing LLMs by retrieving relevant external knowledge based on decomposed reasoning steps obtained through chain-of-thought prompting. | Paper, Tweet | 
| 4) SparseGPT: Massive Language Models Can Be Accurately Pruned In One-Shot - presents a technique for compressing large language models while not sacrificing performance; "pruned to at least 50% sparsity in one-shot, without any retraining." | Paper, Tweet | 
| 5) ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders - a performant model based on a fully convolutional masked autoencoder framework and other architectural improvements. CNNs are sticking back! | Paper, Code, Tweet | 
| 6) Large Language Models as Corporate Lobbyists - with more capabilities, we are starting to see a wider range of applications with LLMs. This paper utilized large language models for conducting corporate lobbying activities. | Paper , Code, Tweet | 
| 7) Superposition, Memorization, and Double Descent - aims to better understand how deep learning models overfit or memorize examples; interesting phenomena observed; important work toward a mechanistic theory of memorization. | Paper, Tweet | 
| 8) StitchNet: Composing Neural Networks from Pre-Trained Fragments - new idea to create new coherent neural networks by reusing pretrained fragments of existing NNs. Not straightforward but there is potential in terms of efficiently reusing learned knowledge in pre-trained networks for complex tasks. | Paper, Tweet | 
| 9) Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes - proposes integrated decomposition, an approach to improve Science Q&A through a human-in-the-loop workflow for refining compositional LM programs. | Paper, Code Tweet | 
| 10) A Succinct Summary of Reinforcement Learning - a nice overview of some important ideas in RL. | Paper, Tweet | 
We use a combination of AI-powered tools, analytics, and human curation to build the lists of papers.
Subscribe to our NLP Newsletter to stay on top of ML research and trends.
Join our Discord.