Large Language Model
1121 papers with code • 0 benchmarks • 5 datasets
Benchmarks
These leaderboards are used to track progress in Large Language Model
Libraries
Use these libraries to find Large Language Model models and implementationsMost implemented papers
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
To democratize this, we train and release a family of large language models up to 16. 1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.
Generative Agents: Interactive Simulacra of Human Behavior
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data
Furthermore, we propose a new technique called Self-Distill with Feedback, to further improve the performance of the Baize models with feedback from ChatGPT.
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video.
Efficient Memory Management for Large Language Model Serving with PagedAttention
On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.
Fast Transformer Decoding: One Write-Head is All You Need
Multi-head attention layers, as used in the Transformer neural sequence model, are a powerful alternative to RNNs for moving information across and between sequences.
Muse: Text-To-Image Generation via Masked Generative Transformers
Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.