Skip to main content

Showing 1–50 of 630 results for author: Song, L

  1. arXiv:2407.19484  [pdf, ps, other

    cs.IT

    Error Correction Decoding Algorithms of RS Codes Based on An Earlier Termination Algorithm to Find The Error Locator Polynomial

    Authors: Zhengyi Jiang, Hao Shi, Zhongyi Huang, Linqi Song, Bo Bai, Gong Zhang, Hanxu Hou

    Abstract: Reed-Solomon (RS) codes are widely used to correct errors in storage systems. Finding the error locator polynomial is one of the key steps in the error correction procedure of RS codes. Modular Approach (MA) is an effective algorithm for solving the Welch-Berlekamp (WB) key-equation problem to find the error locator polynomial that needs $2t$ steps, where $t$ is the error correction capability. In… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  2. arXiv:2407.14733  [pdf, other

    cs.LG cs.AI cs.CL

    Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL

    Authors: Yunseon Choi, Sangmin Bae, Seonghyun Ban, Minchan Jeong, Chuheng Zhang, Lei Song, Li Zhao, Jiang Bian, Kee-Eung Kim

    Abstract: With the advent of foundation models, prompt tuning has positioned itself as an important technique for directing model behaviors and eliciting desired responses. Prompt tuning regards selecting appropriate keywords included into the input, thereby adapting to the downstream task without adjusting or fine-tuning the model parameters. There is a wide range of work in prompt tuning, from approaches… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2407.14302  [pdf, other

    cs.CV

    Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition

    Authors: Yurong Zhang, Honghao Chen, Xinyu Zhang, Xiangxiang Chu, Li Song

    Abstract: Parameter-efficient transfer learning (PETL) is a promising task, aiming to adapt the large-scale pre-trained model to downstream tasks with a relatively modest cost. However, current PETL methods struggle in compressing computational complexity and bear a heavy inference burden due to the complete forward process. This paper presents an efficient visual recognition paradigm, called Dynamic Adapte… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  4. arXiv:2407.14047  [pdf, other

    cs.CV cs.AI

    OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

    Authors: Zekun Qian, Ruize Han, Wei Feng, Junhui Hou, Linqi Song, Song Wang

    Abstract: We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work, we build OCTrackB, a large-scale and comprehensiv… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  5. arXiv:2407.13054  [pdf, other

    cs.AI

    Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data

    Authors: Wenjin Niu, Zijun Gao, Liyan Song, Lingbo Li

    Abstract: Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, the existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies and a lack of comprehensive evaluations. This study addresses these gaps by conducting an exhaustive review and empirical evaluation of causal disc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  6. arXiv:2407.09887  [pdf, other

    cs.LG math.OC

    Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis

    Authors: Zhicheng Yang, Yinya Huang, Wei Shi, Liang Feng, Linqi Song, Yiwei Wang, Xiaodan Liang, Jing Tang

    Abstract: Large language models (LLMs) have exhibited their problem-solving ability in mathematical reasoning. Solving realistic optimization (OPT) problems in industrial application scenarios requires advanced and applied math ability. However, current OPT benchmarks that merely solve linear programming are far from complex realistic situations. In this work, we propose E-OPT, a benchmark for end-to-end op… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  7. arXiv:2407.09026  [pdf, other

    cs.CV cs.LG cs.MM eess.IV

    HPC: Hierarchical Progressive Coding Framework for Volumetric Video

    Authors: Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang

    Abstract: Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hie… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  8. arXiv:2407.03636  [pdf, other

    cs.CV

    Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration

    Authors: Yuhong Zhang, Hengsheng Zhang, Xinning Chai, Zhengxue Cheng, Rong Xie, Li Song, Wenjun Zhang

    Abstract: Image restoration is a classic low-level problem aimed at recovering high-quality images from low-quality images with various degradations such as blur, noise, rain, haze, etc. However, due to the inherent complexity and non-uniqueness of degradation in real-world images, it is challenging for a model trained for single tasks to handle real-world restoration problems effectively. Moreover, existin… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  9. arXiv:2407.03635  [pdf, other

    cs.CV

    MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration

    Authors: Yuhong Zhang, Hengsheng Zhang, Xinning Chai, Rong Xie, Li Song, Wenjun Zhang

    Abstract: Realistic image restoration is a crucial task in computer vision, and the use of diffusion-based models for image restoration has garnered significant attention due to their ability to produce realistic results. However, the quality of the generated images is still a significant challenge due to the severity of image degradation and the uncontrollability of the diffusion model. In this work, we de… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  10. arXiv:2407.01085  [pdf, other

    cs.LG cs.CL

    Rethinking LLM-based Preference Evaluation

    Authors: Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, Jingang Wang, Zhenyu Chen, Jieyu Zhao, Hui Xiong

    Abstract: Recently, large language model (LLM)-based preference evaluation has been widely adopted to compare pairs of model responses. However, a severe bias towards lengthy responses has been observed, raising concerns about the reliability of this evaluation method. In this work, we designed a series of controlled experiments to study the major impacting factors of the metric of LLM-based preference eval… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.00934  [pdf, other

    cs.CL

    CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

    Authors: Jingheng Ye, Zishan Xu, Yinghui Li, Xuxin Cheng, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Xin Su

    Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 16 pages, 8 tables, 2 figures. Under review

  12. arXiv:2407.00617  [pdf, other

    cs.LG cs.AI cs.CL cs.GT

    Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

    Authors: Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu

    Abstract: Reinforcement Learning with Human Feedback (RLHF) has achieved great success in aligning large language models (LLMs) with human preferences. Prevalent RLHF approaches are reward-based, following the Bradley-Terry (BT) model assumption, which may not fully capture the complexity of human preferences. In this paper, we explore RLHF under a general preference framework and approach it from a game-th… ▽ More

    Submitted 7 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  13. arXiv:2407.00320  [pdf, other

    cs.CL cs.AI cs.LG

    LiteSearch: Efficacious Tree Search for LLM

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Dian Yu, Haitao Mi, Jinsong Su, Dong Yu

    Abstract: Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree s… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  14. arXiv:2406.14408  [pdf, other

    cs.AI cs.CL cs.LG

    FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

    Authors: Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang

    Abstract: Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as anoth… ▽ More

    Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  15. arXiv:2406.12227  [pdf, other

    cs.AI

    Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

    Authors: Gangwei Jiang, Caigao Jiang, Zhaoyi Li, Siqiao Xue, Jun Zhou, Linqi Song, Defu Lian, Ying Wei

    Abstract: Fine-tuning large language models (LLMs) can cause them to lose their general capabilities. However, the intrinsic mechanisms behind such forgetting remain unexplored. In this paper, we begin by examining this phenomenon by focusing on knowledge understanding and instruction following, with the latter identified as the main contributor to forgetting during fine-tuning. Consequently, we propose the… ▽ More

    Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  16. arXiv:2406.11385  [pdf, other

    cs.CL

    MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic

    Authors: Yuyan Zhou, Liang Song, Bingning Wang, Weipeng Chen

    Abstract: The advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 19 pages

  17. Ents: An Efficient Three-party Training Framework for Decision Trees by Communication Optimization

    Authors: Guopeng Lin, Weili Han, Wenqiang Ruan, Ruisheng Zhou, Lushan Song, Bingshuai Li, Yunfeng Shao

    Abstract: Multi-party training frameworks for decision trees based on secure multi-party computation enable multiple parties to train high-performance models on distributed private data with privacy preservation. The training process essentially involves frequent dataset splitting according to the splitting criterion (e.g. Gini impurity). However, existing multi-party training frameworks for decision trees… ▽ More

    Submitted 3 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: This paper is the full version of a paper to appear in ACM CCS 2024

  18. arXiv:2406.06586  [pdf, other

    cs.CL cs.AI

    Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining

    Authors: Shuqi Liu, Bowei He, Linqi Song

    Abstract: Large Language Models (LLMs) have shown human-like reasoning abilities but still face challenges in solving complex logical problems. Existing unidirectional chaining methods, such as forward chaining and backward chaining, suffer from issues like low prediction accuracy and efficiency. To address these, we propose a bidirectional chaining method, Bi-Chainer, which dynamically switches to depth-fi… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

  19. arXiv:2406.06572  [pdf, other

    cs.CL cs.AI cs.IR

    Graph Neural Network Enhanced Retrieval for Question Answering of LLMs

    Authors: Zijian Li, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian, Jun Zhang, Rui Wang

    Abstract: Retrieval augmented generation has revolutionized large language model (LLM) outputs by providing factual supports. Nevertheless, it struggles to capture all the necessary knowledge for complex reasoning questions. Existing retrieval methods typically divide reference documents into passages, treating them in isolation. These passages, however, are often interrelated, such as passages that are con… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Under review

  20. arXiv:2406.05347  [pdf, other

    q-bio.BM cs.AI cs.LG

    MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training

    Authors: Bo Chen, Zhilei Bei, Xingyi Cheng, Pan Li, Jie Tang, Le Song

    Abstract: Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high quality MSA. Although various methods have been proposed to generate virtual MSA under these conditions, they fall short in compre… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  21. arXiv:2406.05223  [pdf, other

    cs.LG cs.AI

    CorDA: Context-Oriented Decomposition Adaptation of Large Language Models

    Authors: Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem

    Abstract: Current parameter-efficient fine-tuning (PEFT) methods build adapters without considering the context of downstream task to learn, or the context of important knowledge to maintain. As a result, there is often a performance gap compared to full-parameter finetuning, and meanwhile the finetuned model suffers from catastrophic forgetting of the pre-trained world knowledge. In this paper, we propose… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  22. arXiv:2406.03503  [pdf, other

    cs.AI cs.LG

    Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems

    Authors: Yifan Xia, Xianliang Yang, Zichuan Liu, Zhihao Liu, Lei Song, Jiang Bian

    Abstract: Recent advancements in solving large-scale traveling salesman problems (TSP) utilize the heatmap-guided Monte Carlo tree search (MCTS) paradigm, where machine learning (ML) models generate heatmaps, indicating the probability distribution of each edge being part of the optimal solution, to guide MCTS in solution finding. However, our theoretical and experimental analysis raises doubts about the ef… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted by International Conference on Machine Learning (ICML 2024)

  23. arXiv:2406.02395  [pdf, other

    cs.LG cs.CV

    GrootVL: Tree Topology is All You Need in State Space Model

    Authors: Yicheng Xiao, Lin Song, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan

    Abstract: The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: The code is available at https://github.com/EasonXiao-888/GrootVL

  24. arXiv:2406.01721  [pdf, other

    cs.CL

    Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs

    Authors: Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, Ying Wei

    Abstract: Quantizing large language models (LLMs) presents significant challenges, primarily due to outlier activations that compromise the efficiency of low-bit representation. Traditional approaches mainly focus on solving Normal Outliers-activations with consistently high magnitudes across all tokens. However, these techniques falter when dealing with Massive Outliers, which are significantly higher in v… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 26 pages, 13 figures

  25. arXiv:2406.01363  [pdf, other

    cs.CL cs.IR

    Privacy in LLM-based Recommendation: Recent Advances and Future Directions

    Authors: Sichun Luo, Wei Shao, Yuxuan Yao, Jian Xu, Mingyang Liu, Qintong Li, Bowei He, Maolin Wang, Guanzhi Deng, Hanxu Hou, Xinyi Zhang, Linqi Song

    Abstract: Nowadays, large language models (LLMs) have been integrated with conventional recommendation models to improve recommendation performance. However, while most of the existing works have focused on improving the model performance, the privacy issue has only received comparatively less attention. In this paper, we review recent advancements in privacy within LLM-based recommendation, categorizing th… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  26. arXiv:2406.00977  [pdf, other

    cs.CV cs.AI

    Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model

    Authors: Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou

    Abstract: Recent advances in large multimodal models (LMMs) suggest that higher image resolution enhances the fine-grained understanding of image details, crucial for tasks such as visual commonsense reasoning and analyzing biomedical images. However, increasing input resolution poses two main challenges: 1) It extends the context length required by the language model, leading to inefficiencies and hitting… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  27. arXiv:2405.19425  [pdf, other

    cs.CL

    Adaptive In-conversation Team Building for Language Model Agents

    Authors: Linxin Song, Jiale Liu, Jieyu Zhang, Shaokun Zhang, Ao Luo, Shijian Wang, Qingyun Wu, Chi Wang

    Abstract: Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks, while the effective design of multiple agents for a particular application remains an art. It is thus intriguing to answer a critical question: Given a task, how can we build a team of LLM agents to solve it effectively? Our new adaptive team-building paradigm offers a flexible so… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  28. arXiv:2405.16854  [pdf, other

    cs.MA

    Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

    Authors: Zhihao Liu, Xianliang Yang, Zichuan Liu, Yifan Xia, Wei Jiang, Yuanyu Zhang, Lijuan Li, Guoliang Fan, Lei Song, Bian Jiang

    Abstract: Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which may result in algorithmic instability, difficulty in convergence, or entrapment in local optima. While research… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  29. arXiv:2405.14452  [pdf, other

    cs.CV cs.AI

    JointRF: End-to-End Joint Optimization for Dynamic Neural Radiance Field Representation and Compression

    Authors: Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang

    Abstract: Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos. However, rendering dynamic and long-sequence radiance fields remains challenging due to the significant data required to represent volumetric videos. In this paper, we propose a novel end-to-end joint optimization scheme of dynamic NeRF representation and compressio… ▽ More

    Submitted 8 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: ICIP2024, 8 pages, 5 figures

  30. arXiv:2405.10347  [pdf, other

    cs.CV cs.AI cs.CY

    Networking Systems for Video Anomaly Detection: A Tutorial and Survey

    Authors: Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, Victor C. M. Leung

    Abstract: The increasing prevalence of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community. With the advancements in deep learning and edge computing, VAD has… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Submitted to ACM Computing Surveys, under review,for more information and supplementary material, please see https://github.com/fdjingliu/NSVAD

  31. arXiv:2405.10345  [pdf, other

    q-bio.QM cs.AI cs.LG

    Machine Learning Driven Biomarker Selection for Medical Diagnosis

    Authors: Divyagna Bavikadi, Ayushi Agarwal, Shashank Ganta, Yunro Chung, Lusheng Song, Ji Qiu, Paulo Shakarian

    Abstract: Recent advances in experimental methods have enabled researchers to collect data on thousands of analytes simultaneously. This has led to correlational studies that associated molecular measurements with diseases such as Alzheimer's, Liver, and Gastric Cancer. However, the use of thousands of biomarkers selected from the analytes is not practical for real-world medical diagnosis and is likely unde… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  32. arXiv:2405.09308  [pdf, other

    cs.LG cs.AI

    TimeX++: Learning Time-Series Explanations with Information Bottleneck

    Authors: Zichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, Dongsheng Luo

    Abstract: Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To add… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by International Conference on Machine Learning (ICML 2024)

  33. arXiv:2405.01043  [pdf, ps, other

    cs.IT

    Reed-Solomon Codes over Cyclic Polynomial Ring with Lower Encoding/Decoding Complexity

    Authors: Wenhao Liu, Zhengyi Jiang, Zhongyi Huang, Linqi Song, Hanxu Hou

    Abstract: Reed-Solomon (RS) codes are constructed over a finite field that have been widely employed in storage and communication systems. Many fast encoding/decoding algorithms such as fast Fourier transform (FFT) and modular approach are designed for RS codes to reduce the encoding/decoding complexity defined as the number of XORs involved in the encoding/decoding procedure. In this paper, we present the… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  34. arXiv:2404.16678  [pdf, other

    cs.CV

    Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior

    Authors: Han Wang, Xinning Chai, Yiwen Wang, Yuhong Zhang, Rong Xie, Li Song

    Abstract: Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  35. arXiv:2404.16271  [pdf

    cs.CR cond-mat.mtrl-sci

    True random number generation using 1T' molybdenum ditelluride

    Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, Jingfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

    Abstract: True random numbers are essential for scientific research and various engineering problems. Their generation, however, depends on a reliable entropy source. Here, we present true random number generation using the conductance noise probed from structurally metastable 1T' MoTe2 prepared via electrochemical exfoliation. The noise, fitting a Poisson process, is a robust entropy source capable of rema… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  36. arXiv:2404.14396  [pdf, other

    cs.CV

    SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

    Authors: Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan

    Abstract: The rapid evolution of multimodal foundation model has demonstrated significant progresses in vision-language understanding and generation, e.g., our previous work SEED-LLaMA. However, there remains a gap between its capability and the real-world applicability, primarily due to the model's limited capacity to effectively respond to various user instructions and interact with diverse visual data. I… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Project released at: https://github.com/AILab-CVC/SEED-X

  37. arXiv:2404.13968  [pdf, other

    cs.CL cs.AI cs.CR

    Protecting Your LLMs with Information Bottleneck

    Authors: Zichuan Liu, Zefan Wang, Linjie Xu, Jinyu Wang, Lei Song, Tianchun Wang, Chunlin Chen, Wei Cheng, Jiang Bian

    Abstract: The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector… ▽ More

    Submitted 16 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 23 pages, 7 figures, 8 tables

  38. arXiv:2404.12253  [pdf, other

    cs.CL cs.LG

    Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

    Authors: Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced prompting techniques and the necessity of fine-tuning with high-quality data to augment LLMs' reasoning abilities. However, these approaches are inherently constrained by data availability and quality. I… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  39. arXiv:2404.12020  [pdf, other

    cs.CV

    Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

    Authors: Jie Ma, Min Hu, Pinghui Wang, Wangchun Sun, Lingyun Song, Hongbin Pei, Jun Liu, Youtian Du

    Abstract: Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackl… ▽ More

    Submitted 19 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Under Review

    ACM Class: I.2.10

  40. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  41. arXiv:2404.09715  [pdf, other

    cs.LG cs.AI cs.MA

    Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

    Authors: Linjie Xu, Zichuan Liu, Alexander Dockhorn, Diego Perez-Liebana, Jinyu Wang, Lei Song, Jiang Bian

    Abstract: One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  42. arXiv:2404.09633  [pdf, other

    cs.CV

    In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation

    Authors: Han Xue, Qianru Sun, Li Song, Wenjun Zhang, Zhiwu Huang

    Abstract: We propose In-Context Translation (ICT), a general learning framework to unify visual recognition (e.g., semantic segmentation), low-level image processing (e.g., denoising), and conditional image generation (e.g., edge-to-image synthesis). Thanks to unification, ICT significantly reduces the inherent inductive bias that comes with designing models for specific tasks, and it maximizes mutual enhan… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  43. arXiv:2404.09338  [pdf, other

    cs.CL

    Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

    Authors: Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu

    Abstract: Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination -- generating content ungrounded in the realities of training data. Recent work has focused on decoding techniques to improve factuality during inference by leveraging LLMs' hierarchical representation of factual knowledge, manipulating the predicted distributions at inference time. Current… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Work in Progress

  44. arXiv:2404.04306  [pdf, other

    cs.CR cs.AI cs.CL cs.CY

    AuditGPT: Auditing Smart Contracts with ChatGPT

    Authors: Shihao Xia, Shuai Shao, Mengting He, Tingting Yu, Linhai Song, Yiying Zhang

    Abstract: To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each containing a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to either man… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  45. arXiv:2404.04232  [pdf, other

    cs.CL

    Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

    Authors: Tianqi Zhong, Zhaoyi Li, Quan Wang, Linqi Song, Ying Wei, Defu Lian, Zhendong Mao

    Abstract: Compositional generalization, representing the model's ability to generate text with new attribute combinations obtained by recombining single attributes from the training data, is a crucial property for multi-aspect controllable text generation (MCTG) methods. Nonetheless, a comprehensive compositional generalization evaluation benchmark of MCTG is still lacking. We propose CompMCTG, a benchmark… ▽ More

    Submitted 3 June, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted to ACL 2024 (Main); 32 pages

  46. arXiv:2403.19949  [pdf, other

    cs.CV

    FairCLIP: Harnessing Fairness in Vision-Language Learning

    Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair… ▽ More

    Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  47. arXiv:2403.19094  [pdf, other

    cs.CL

    Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

    Authors: Yuxuan Yao, Han Wu, Zhijiang Guo, Biyan Zhou, Jiahui Gao, Sichun Luo, Hanxu Hou, Xiaojin Fu, Linqi Song

    Abstract: Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the… ▽ More

    Submitted 18 July, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted to COLM 2024

  48. arXiv:2403.17676  [pdf

    physics.app-ph cs.ET

    Analysis on reservoir activation with the nonlinearity harnessed from solution-processed MoS2 devices

    Authors: Songwei Liu, Yang Liu, Yingyi Wen, Jingfang Pei, Pengyu Liu, Lekai Song, Xiaoyue Fan, Wenchen Yang, Danmei Pan, Teng Ma, Yue Lin, Gang Wang, Guohua Hu

    Abstract: Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational c… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  49. arXiv:2403.15944  [pdf, other

    cs.CV cs.AI eess.IV

    Adaptive Super Resolution For One-Shot Talking-Head Generation

    Authors: Luchuan Song, Pinxin Liu, Guojun Yin, Chenliang Xu

    Abstract: The one-shot talking-head generation learns to synthesize a talking-head video with one source portrait image under the driving of same or different identity video. Usually these methods require plane-based pixel transformations via Jacobin matrices or facial image warps for novel poses generation. The constraints of using a single image source and pixel displacements often compromise the clarity… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures

  50. arXiv:2403.09849  [pdf, other

    cs.CL cs.AI

    Self-Consistency Boosts Calibration for Math Reasoning

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

    Abstract: Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development. We design three off-the-shelf calibration methods based on self-consistency (Wang et al., 2022) for math reasoning tasks. Evaluation on two popular benchmarks (GSM8K and MathQA) using strong open-source LLMs (Mistral and LLaMA2), our methods better bridge model confidence and acc… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.