Skip to main content

Showing 1–50 of 3,168 results for author: Li, S

  1. An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation

    Authors: Cheng Yang, Guoping Huang, Mo Yu, Zhirui Zhang, Siheng Li, Mingming Yang, Shuming Shi, Yujiu Yang, Lemao Liu

    Abstract: Word-level AutoCompletion(WLAC) is a rewarding yet challenging task in Computer-aided Translation. Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label (i.e., the candidate target word is treated as a label). Since the context hidden vector itself does not take the label into account… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to TACL 2024

  2. Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation

    Authors: Shiyuan Li, Yixin Liu, Qingfeng Chen, Geoffrey I. Webb, Shirui Pan

    Abstract: Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs), has received increasing attention owing to its efficacy in handling graph-structured data. However, existing UGRL methods ideally assume that the node features are noise-free, which makes them fail to distinguish between useful information and noise when applied to real data with noisy features, thus affecting… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM 2024. 11 pages, 8 figures

  3. arXiv:2407.19187  [pdf, other

    cs.LG physics.ao-ph

    Efficiently improving key weather variables forecasting by performing the guided iterative prediction in latent space

    Authors: Shuangliang Li, Siwei Li

    Abstract: Weather forecasting refers to learning evolutionary patterns of some key upper-air and surface variables which is of great significance. Recently, deep learning-based methods have been increasingly applied in the field of weather forecasting due to their powerful feature learning capabilities. However, prediction methods based on the original space iteration struggle to effectively and efficiently… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  4. arXiv:2407.19089  [pdf, other

    cs.CL cs.AI

    Many-Shot In-Context Learning for Molecular Inverse Design

    Authors: Saeed Moayedpour, Alejandro Corrochano-Navarro, Faryad Sahneh, Shahriar Noroozizadeh, Alexander Koetter, Jiri Vymetal, Lorenzo Kogler-Anele, Pablo Mas, Yasser Jangjou, Sizhen Li, Michael Bailey, Marc Bianciotto, Hans Matter, Christoph Grebner, Gerhard Hessler, Ziv Bar-Joseph, Sven Jager

    Abstract: Large Language Models (LLMs) have demonstrated great performance in few-shot In-Context Learning (ICL) for a variety of generative and discriminative chemical design tasks. The newly expanded context windows of LLMs can further improve ICL capabilities for molecular inverse design and lead optimization. To take full advantage of these capabilities we developed a new semi-supervised learning method… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  5. arXiv:2407.18957  [pdf, other

    q-fin.TR cs.AI cs.MA

    When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

    Authors: Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhengting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang

    Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large langu… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 33 pages, 10 figures

  6. arXiv:2407.18482  [pdf, other

    cs.LG

    Practical Attribution Guidance for Rashomon Sets

    Authors: Sichao Li, Amanda S. Barnard, Quanling Deng

    Abstract: Different prediction models might perform equally well (Rashomon set) in the same task, but offer conflicting interpretations and conclusions about the data. The Rashomon effect in the context of Explainable AI (XAI) has been recognized as a critical factor. Although the Rashomon set has been introduced and studied in various contexts, its practical application is at its infancy stage and lacks ad… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  7. arXiv:2407.18248  [pdf, other

    cs.CL

    Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

    Authors: Tianduo Wang, Shichen Li, Wei Lu

    Abstract: Effective training of language models (LMs) for mathematical reasoning tasks demands high-quality supervised fine-tuning data. Besides obtaining annotations from human experts, a common alternative is sampling from larger and more powerful LMs. However, this knowledge distillation approach can be costly and unstable, particularly when relying on closed-source, proprietary LMs like GPT-4, whose beh… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: ACL 2024. Code and data are available at https://github.com/TianduoWang/DPO-ST

  8. arXiv:2407.17854  [pdf, other

    cs.AI cs.CL cs.MM

    Shapley Value-based Contrastive Alignment for Multimodal Information Extraction

    Authors: Wen Luo, Yu Xia, Shen Tianshu, Sujian Li

    Abstract: The rise of social media and the exponential growth of multimodal communication necessitates advanced techniques for Multimodal Information Extraction (MIE). However, existing methodologies primarily rely on direct Image-Text interactions, a paradigm that often faces significant challenges due to semantic and modality gaps between images and text. In this paper, we introduce a new paradigm of Imag… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted at ACM Multimedia 2024

  9. arXiv:2407.17485  [pdf, other

    physics.chem-ph cs.ET physics.data-an

    Application of the Digital Annealer Unit in Optimizing Chemical Reaction Conditions for Enhanced Production Yields

    Authors: Shih-Cheng Li, Pei-Hwa Wang, Jheng-Wei Su, Wei-Yin Chiang, Shih-Hsien Huang, Yen-Chu Lin, Chia-Ho Ou, Chih-Yu Chen

    Abstract: Finding appropriate reaction conditions that yield high product rates in chemical synthesis is crucial for the chemical and pharmaceutical industries. However, due to the vast chemical space, conducting experiments for each possible reaction condition is impractical. Consequently, models such as QSAR (Quantitative Structure-Activity Relationship) or ML (Machine Learning) have been developed to pre… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  10. arXiv:2407.17428  [pdf, other

    cs.CV cs.AI

    Vision Language Model-Empowered Contract Theory for AIGC Task Allocation in Teleoperation

    Authors: Zijun Zhan, Yaxian Dong, Yuqing Hu, Shuai Li, Shaohua Cao, Zhu Han

    Abstract: Integrating low-light image enhancement techniques, in which diffusion-based AI-generated content (AIGC) models are promising, is necessary to enhance nighttime teleoperation. Remarkably, the AIGC model is computation-intensive, thus necessitating the allocation of AIGC tasks to edge servers with ample computational resources. Given the distinct cost of the AIGC model trained with varying-sized da… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 11 pages, 10 figures

  11. arXiv:2407.16974  [pdf, other

    cs.SE

    SelfPiCo: Self-Guided Partial Code Execution with LLMs

    Authors: Zhipeng Xue, Zhipeng Gao, Shaohua Wang, Xing Hu, Xin Xia, Shanping Li

    Abstract: Code executability plays a vital role in software debugging and testing (e.g., detecting runtime exceptions or assertion violations). However, code execution, especially partial or arbitrary code execution, is a non-trivial task due to missing definitions and complex third-party dependencies. To make partial code (such as code snippets posted on the web or code fragments deep inside complex softwa… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by ISSTA'24

  12. arXiv:2407.16788  [pdf, other

    cs.CV

    Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection

    Authors: Su Li, Wang Liang, Jianye Wang, Ziheng Zhang, Lei Zhang

    Abstract: Estimating abnormal posture based on 3D pose is vital in human pose analysis, yet it presents challenges, especially when reconstructing 3D human poses from monocular datasets with occlusions. Accurate reconstructions enable the restoration of 3D movements, which assist in the extraction of semantic details necessary for analyzing abnormal behaviors. However, most existing methods depend on predef… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  13. arXiv:2407.16337  [pdf, other

    cs.LG

    STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments

    Authors: Hao Zhou, Kun Sun, Shaoming Li, Yangfeng Fan, Guibin Jiang, Jiaqi Zheng, Tao Li

    Abstract: Online controlled experiments play a crucial role in enabling data-driven decisions across a wide range of companies. Variance reduction is an effective technique to improve the sensitivity of experiments, achieving higher statistical power while using fewer samples and shorter experimental periods. However, typical variance reduction methods (e.g., regression-adjusted estimators) are built upon t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  14. arXiv:2407.16295  [pdf, other

    cs.CR

    Manifoldchain: Maximizing Blockchain Throughput via Bandwidth-Clustered Sharding

    Authors: Chunjiang Che, Songze Li, Xuechao Wang

    Abstract: Bandwidth limitation is the major bottleneck that hinders scaling throughput of proof-of-work blockchains. To guarantee security, the mining rate of the blockchain is determined by the miners with the lowest bandwidth, resulting in an inefficient bandwidth utilization among fast miners. We propose Manifoldchain, an innovative blockchain sharding protocol that alleviates the impact of slow miners t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  15. arXiv:2407.16150  [pdf

    cs.LG cs.AI

    Predicting Stock Prices with FinBERT-LSTM: Integrating News Sentiment Analysis

    Authors: Wenjun Gu, Yihao Zhong, Shizun Li, Changsong Wei, Liting Dong, Zhuoyue Wang, Chao Yan

    Abstract: The stock market's ascent typically mirrors the flourishing state of the economy, whereas its decline is often an indicator of an economic downturn. Therefore, for a long time, significant correlation elements for predicting trends in financial stock markets have been widely discussed, and people are becoming increasingly interested in the task of financial text mining. The inherent instability of… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 10 pages, 6 figures, 2 tables, 2024 8th International Conference on Cloud and Big Data Computing

  16. arXiv:2407.15083  [pdf, other

    cs.LG

    Rocket Landing Control with Random Annealing Jump Start Reinforcement Learning

    Authors: Yuxuan Jiang, Yujie Yang, Zhiqian Lan, Guojian Zhan, Shengbo Eben Li, Qi Sun, Jian Ma, Tianwen Yu, Changwu Zhang

    Abstract: Rocket recycling is a crucial pursuit in aerospace technology, aimed at reducing costs and environmental impact in space exploration. The primary focus centers on rocket landing control, involving the guidance of a nonlinear underactuated rocket with limited fuel in real-time. This challenging task prompts the application of reinforcement learning (RL), yet goal-oriented nature of the problem pose… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: IROS 2024 Oral

  17. arXiv:2407.14982  [pdf, other

    cs.CV cs.AI

    GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation

    Authors: Jingzhi Gong, Sisi Li, Giordano d'Aloisio, Zishuo Ding, Yulong Ye, William B. Langdon, Federica Sarro

    Abstract: Tuning the parameters and prompts for improving AI-based text-to-image generation has remained a substantial yet unaddressed challenge. Hence we introduce GreenStableYolo, which improves the parameters and prompts for Stable Diffusion to both reduce GPU inference time and increase image generation quality using NSGA-II and Yolo. Our experiments show that despite a relatively slight trade-off (18… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: This paper is published in the SSBSE Challenge Track 2024

  18. arXiv:2407.14768  [pdf, other

    cs.LG cs.AI

    Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation

    Authors: Lirong Wu, Yunfan Liu, Haitao Lin, Yufei Huang, Stan Z. Li

    Abstract: To bridge the gaps between powerful Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs), GNN-to-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student MLP. In this paper, we revisit the knowledge samples (nodes) in teacher GNNs from the perspective of hardness, and identify that hard sample distillation may be a major per… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  19. arXiv:2407.14570  [pdf, other

    cs.CV

    Are handcrafted filters helpful for attributing AI-generated images?

    Authors: Jialiang Li, Haoyue Wang, Sheng Li, Zhenxing Qian, Xinpeng Zhang, Athanasios V. Vasilakos

    Abstract: Recently, a vast number of image generation models have been proposed, which raises concerns regarding the misuse of these artificial intelligence (AI) techniques for generating fake images. To attribute the AI-generated images, existing schemes usually design and train deep neural networks (DNNs) to learn the model fingerprints, which usually requires a large amount of data for effective learning… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures

  20. arXiv:2407.14177  [pdf, other

    cs.CV

    EVLM: An Efficient Vision-Language Model for Visual Understanding

    Authors: Kaibing Chen, Dong Shen, Hanwen Zhong, Huasong Zhong, Kui Xia, Di Xu, Wei Yuan, Yifei Hu, Bin Wen, Tianke Zhang, Changyi Liu, Dewen Fan, Huihui Xiao, Jiahong Wu, Fan Yang, Size Li, Di Zhang

    Abstract: In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the language models alongside textual tokens. However, when dealing with long sequences of visual signals or inputs such as videos, the self-attention mechanism of language models can lead to sig… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  21. PassTSL: Modeling Human-Created Passwords through Two-Stage Learning

    Authors: Yangde Wang, Haozhang Li, Weidong Qiu, Shujun Li, Peng Tang

    Abstract: Textual passwords are still the most widely used user authentication mechanism. Due to the close connections between textual passwords and natural languages, advanced technologies in natural language processing (NLP) and machine learning (ML) could be used to model passwords for different purposes such as studying human password-creation behaviors and developing more advanced password cracking met… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Journal ref: Lecture Notes in Computer Science 14897 (2024) 404-423

  22. arXiv:2407.14065  [pdf, other

    cs.LG stat.ML

    MSCT: Addressing Time-Varying Confounding with Marginal Structural Causal Transformer for Counterfactual Post-Crash Traffic Prediction

    Authors: Shuang Li, Ziyuan Pu, Nan Zhang, Duxin Chen, Lu Dong, Daniel J. Graham, Yinhai Wang

    Abstract: Traffic crashes profoundly impede traffic efficiency and pose economic challenges. Accurate prediction of post-crash traffic status provides essential information for evaluating traffic perturbations and developing effective solutions. Previous studies have established a series of deep learning models to predict post-crash traffic conditions, however, these correlation-based methods cannot accommo… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  23. arXiv:2407.13918  [pdf, other

    cs.CR

    Improving Malware Detection with Adversarial Domain Adaptation and Control Flow Graphs

    Authors: Adrian Shuai Li, Arun Iyengar, Ashish Kundu, Elisa Bertino

    Abstract: In the application of deep learning for malware classification, it is crucial to account for the prevalence of malware evolution, which can cause trained classifiers to fail on drifted malware. Existing solutions to combat concept drift use active learning: they select new samples for analysts to label, and then retrain the classifier with the new labels. Our key finding is, the current retraining… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  24. arXiv:2407.13677  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers

    Authors: Songlin Li, Despoina Paschalidou, Leonidas Guibas

    Abstract: The increased demand for tools that automate the 3D content creation process led to tremendous progress in deep generative models that can generate diverse 3D objects of high fidelity. In this paper, we present PASTA, an autoregressive transformer architecture for generating high quality 3D shapes. PASTA comprises two main components: An autoregressive transformer that generates objects as a seque… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  25. arXiv:2407.13664  [pdf, other

    cs.LG

    Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization

    Authors: Hao Zhou, Rongxiao Huang, Shaoming Li, Guibin Jiang, Jiaqi Zheng, Bing Cheng, Wei Lin

    Abstract: Marketing optimization plays an important role to enhance user engagement in online Internet platforms. Existing studies usually formulate this problem as a budget allocation problem and solve it by utilizing two fully decoupled stages, i.e., machine learning (ML) and operation research (OR). However, the learning objective in ML does not take account of the downstream optimization task in OR, whi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  26. arXiv:2407.13480  [pdf

    cs.RO cs.AI

    Risk-Aware Vehicle Trajectory Prediction Under Safety-Critical Scenarios

    Authors: Qingfan Wang, Dongyang Xu, Gaoyuan Kuang, Chen Lv, Shengbo Eben Li, Bingbing Nie

    Abstract: Trajectory prediction is significant for intelligent vehicles to achieve high-level autonomous driving, and a lot of relevant research achievements have been made recently. Despite the rapid development, most existing studies solely focused on normal safe scenarios while largely neglecting safety-critical scenarios, particularly those involving imminent collisions. This oversight may result in aut… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  27. arXiv:2407.13460  [pdf, other

    cs.CV cs.LG

    SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

    Authors: Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu

    Abstract: Existing zero-shot skeleton-based action recognition methods utilize projection networks to learn a shared latent space of skeleton features and semantic embeddings. The inherent imbalance in action recognition datasets, characterized by variable skeleton sequences yet constant class labels, presents significant challenges for alignment. To address the imbalance, we propose SA-DVAE -- Semantic Ali… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  28. arXiv:2407.13362  [pdf, other

    cs.CV

    Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

    Authors: Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

    Abstract: The scarcity of large-scale 3D-text paired data poses a great challenge on open vocabulary 3D scene understanding, and hence it is popular to leverage internet-scale 2D data and transfer their open vocabulary capabilities to 3D models through knowledge distillation. However, the existing distillation-based 3D scene understanding approaches rely on the representation capacity of 2D models, disregar… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  29. arXiv:2407.13342  [pdf, other

    cs.CV

    Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds

    Authors: Shengtao Li, Ge Gao, Yudong Liu, Ming Gu, Yu-Shen Liu

    Abstract: Neural signed distance functions (SDFs) have shown powerful ability in fitting the shape geometry. However, inferring continuous signed distance fields from discrete unoriented point clouds still remains a challenge. The neural network typically fits the shape with a rough surface and omits fine-grained geometric details such as shape edges and corners. In this paper, we propose a novel non-linear… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project page: https://list17.github.io/ImplicitFilter

  30. arXiv:2407.13303  [pdf, other

    cs.LG

    Mean Teacher based SSL Framework for Indoor Localization Using Wi-Fi RSSI Fingerprinting

    Authors: Sihao Li, Zhe Tang, Kyeong Soo Kim, Jeremy S. Smith

    Abstract: Wi-Fi fingerprinting is widely applied for indoor localization due to the widespread availability of Wi-Fi devices. However, traditional methods are not ideal for multi-building and multi-floor environments due to the scalability issues. Therefore, more and more researchers have employed deep learning techniques to enable scalable indoor localization. This paper introduces a novel semi-supervised… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 12 pages, 10 figures, under preparation for a journal publication

  31. arXiv:2407.13288  [pdf, other

    cs.LG

    Hierarchical Stage-Wise Training of Linked Deep Neural Networks for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi RSSI Fingerprinting

    Authors: Sihao Li, Kyeong Soo Kim, Zhe Tang, Graduate, Jeremy S. Smith

    Abstract: In this paper, we present a new solution to the problem of large-scale multi-building and multi-floor indoor localization based on linked neural networks, where each neural network is dedicated to a sub-problem and trained under a hierarchical stage-wise training framework. When the measured data from sensors have a hierarchical representation as in multi-building and multi-floor indoor localizati… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures, under review for journal publication

  32. arXiv:2407.13255  [pdf, other

    cs.IT eess.SP

    Interleaved Block-Sparse Transform

    Authors: Lei Liu, Ming Wang, Shufeng Li, Yuhao Chi, Ning Wei, ZhaoYang Zhang

    Abstract: Low-complexity Bayes-optimal memory approximate message passing (MAMP) is an efficient signal estimation algorithm in compressed sensing and multicarrier modulation. However, achieving replica Bayes optimality with MAMP necessitates a large-scale right-unitarily invariant transformation, which is prohibitive in practical systems due to its high computational complexity and hardware costs. To solve… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Submitted to the IEEE Journal

  33. arXiv:2407.13137  [pdf, other

    cs.CV

    OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

    Authors: Jian Sun, Yuqi Dai, Chi-Man Vong, Qing Xu, Shengbo Eben Li, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  34. arXiv:2407.13126  [pdf, other

    cs.DC

    Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration

    Authors: Tianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang

    Abstract: Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (through model retraining) and use the updated model (if available) to serve overtime arriving inference requests. It is generally beneficial to co-locate the retraining and inference together to enabl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  35. arXiv:2407.13048  [pdf, other

    cs.CL

    Establishing Knowledge Preference in Language Models

    Authors: Sizhe Zhou, Sha Li, Yu Meng, Yizhu Jiao, Heng Ji, Jiawei Han

    Abstract: Language models are known to encode a great amount of factual knowledge through pretraining. However, such knowledge might be insufficient to cater to user requests, requiring the model to integrate external knowledge sources and adhere to user-provided specifications. When answering questions about ongoing events, the model should use recent news articles to update its response; when asked to pro… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 27 pages, 8 figures, 23 tables, working in progress

  36. arXiv:2407.12973  [pdf, other

    cs.CV cs.AI

    Temporal Label Hierachical Network for Compound Emotion Recognition

    Authors: Sunan Li, Hailun Lian, Cheng Lu, Yan Zhao, Tianhua Qi, Hao Yang, Yuan Zong, Wenming Zheng

    Abstract: The emotion recognition has attracted more attention in recent decades. Although significant progress has been made in the recognition technology of the seven basic emotions, existing methods are still hard to tackle compound emotion recognition that occurred commonly in practical application. This article introduces our achievements in the 7th Field Emotion Behavior Analysis (ABAW) competition. I… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: draft for abaw7

  37. arXiv:2407.12825  [pdf, other

    cs.CL cs.AI cs.LG

    A Depression Detection Method Based on Multi-Modal Feature Fusion Using Cross-Attention

    Authors: Shengjie Li, Yinhao Xiao

    Abstract: Depression, a prevalent and serious mental health issue, affects approximately 3.8\% of the global population. Despite the existence of effective treatments, over 75\% of individuals in low- and middle-income countries remain untreated, partly due to the challenge in accurately diagnosing depression in its early stages. This paper introduces a novel method for detecting depression based on multi-m… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  38. arXiv:2407.12593  [pdf, other

    cs.CV

    EvSign: Sign Language Recognition and Translation with Streaming Events

    Authors: Pengyu Zhang, Hao Yin, Zeren Wang, Wenyue Chen, Shengming Li, Dong Wang, Huchuan Lu, Xu Jia

    Abstract: Sign language is one of the most effective communication tools for people with hearing difficulties. Most existing works focus on improving the performance of sign language tasks on RGB videos, which may suffer from degraded recording conditions, such as fast movement of hands with motion blur and textured signer's appearance. The bio-inspired event camera, which asynchronously captures brightness… ▽ More

    Submitted 21 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: To appear on ECCV 2024

  39. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  40. arXiv:2407.12491  [pdf, other

    cs.CV

    Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

    Authors: Yuqi Dai, Jian Sun, Shengbo Eben Li, Qing Xu, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Perception is essential for autonomous driving system. Recent approaches based on Bird's-eye-view (BEV) and deep learning have made significant progress. However, there exists challenging issues including lengthy development cycles, poor reusability, and complex sensor setups in perception algorithm development process. To tackle the above challenges, this paper proposes a novel hierarchical BEV p… ▽ More

    Submitted 25 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  41. arXiv:2407.12117  [pdf, other

    cs.LG cs.DC

    Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

    Authors: Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui

    Abstract: Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing f… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  42. arXiv:2407.12053  [pdf, other

    cs.LG cs.AI q-bio.QM

    Improving AlphaFlow for Efficient Protein Ensembles Generation

    Authors: Shaoning Li, Mingyu Li, Yusong Wang, Xinheng He, Nanning Zheng, Jian Zhang, Pheng-Ann Heng

    Abstract: Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. Despite the advantages of efficient sampling afforded by flow-matching, AlphaFlow still r… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML 2024 AI4Science workshop

  43. arXiv:2407.12019  [pdf, other

    cs.CL cs.AI

    DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

    Authors: Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao

    Abstract: Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dy… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Published on PRCV24

  44. arXiv:2407.11781  [pdf, other

    cs.CV

    SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

    Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

    Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  45. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: accepted by ACMMM2024

  46. arXiv:2407.11420  [pdf, other

    cs.RO

    iKalibr: Unified Targetless Spatiotemporal Calibration for Resilient Integrated Inertial Systems

    Authors: Shuolong Chen, Xingxing Li, Shengyu Li, Yuxuan Zhou, Xiaoteng Yang

    Abstract: The integrated inertial system, typically integrating an IMU and an exteroceptive sensor such as radar, LiDAR, and camera, has been widely accepted and applied in modern robotic applications for ego-motion estimation, motion control, or autonomous exploration. To improve system accuracy, robustness, and further usability, both multiple and various sensors are generally resiliently integrated, whic… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  47. arXiv:2407.11405  [pdf, other

    cs.CR cs.CV

    Cover-separable Fixed Neural Network Steganography via Deep Generative Models

    Authors: Guobiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang

    Abstract: Image steganography is the process of hiding secret data in a cover image by subtle perturbation. Recent studies show that it is feasible to use a fixed neural network for data embedding and extraction. Such Fixed Neural Network Steganography (FNNS) demonstrates favorable performance without the need for training networks, making it more practical for real-world applications. However, the stego-im… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepetd at ACMMM 2024

  48. arXiv:2407.11034  [pdf

    cs.LG

    Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Biomedical Data Analysis

    Authors: Siqi Li, Xin Li, Kunyu Yu, Di Miao, Mingcheng Zhu, Mengying Yan, Yuhe Ke, Danny D'Agostino, Yilin Ning, Qiming Wu, Ziwen Wang, Yuqing Shang, Molei Liu, Chuan Hong, Nan Liu

    Abstract: Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine l… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  49. arXiv:2407.10550  [pdf, other

    cs.CV

    Learning Natural Consistency Representation for Face Forgery Video Detection

    Authors: Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, Shiming Ge

    Abstract: Face Forgery videos have elicited critical social public concerns and various detectors have been proposed. However, fully-supervised detectors may lead to easily overfitting to specific forgery methods or videos, and existing self-supervised detectors are strict on auxiliary tasks, such as requiring audio or multi-modalities, leading to limited generalization and robustness. In this paper, we exa… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  50. arXiv:2407.10457  [pdf, other

    cs.CL cs.AI

    The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

    Authors: Yifan Song, Guoyin Wang, Sujian Li, Bill Yuchen Lin

    Abstract: Current evaluations of large language models (LLMs) often overlook non-determinism, typically focusing on a single output per example. This limits our understanding of LLM performance variability in real-world applications. Our study addresses this issue by exploring key questions about the performance differences between greedy decoding and sampling, identifying benchmarks' consistency regarding… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.