Skip to main content

Showing 1–50 of 2,061 results for author: Yang, Z

  1. arXiv:2407.19605  [pdf, other

    cs.CV

    Look Hear: Gaze Prediction for Speech-directed Human Attention

    Authors: Sounak Mondal, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubramanian, Dimitris Samaras, Gregory Zelinsky, Minh Hoai

    Abstract: For computer systems to effectively interact with humans using spoken language, they need to understand how the words being generated affect the users' moment-by-moment attention. Our study focuses on the incremental prediction of attention as a person is seeing an image and hearing a referring expression defining the object in the scene that should be fixated by gaze. To predict the gaze scanpath… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted for ECCV 2024

  2. arXiv:2407.19593  [pdf, other

    cs.CV

    Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

    Authors: ShahRukh Athar, Shunsuke Saito, Zhengyu Yang, Stanislav Pidhorsky, Chen Cao

    Abstract: Creating photorealistic avatars for individuals traditionally involves extensive capture sessions with complex and expensive studio devices like the LightStage system. While recent strides in neural representations have enabled the generation of photorealistic and animatable 3D avatars from quick phone scans, they have the capture-time lighting baked-in, lack facial details and have missing region… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  3. arXiv:2407.19239  [pdf, other

    cs.IR

    MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation

    Authors: Shun Zhang, Runsen Zhang, Zhirong Yang

    Abstract: Sequential recommendation systems aim to provide personalized recommendations by analyzing dynamic preferences and dependencies within user behavior sequences. Recently, Transformer models can effectively capture user preferences. However, their quadratic computational complexity limits recommendation performance on long interaction sequence data. Inspired by the State Space Model (SSM)representat… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  4. arXiv:2407.18930  [pdf, other

    eess.AS cs.CL cs.LG

    Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition

    Authors: Jingjing Xu, Wei Zhou, Zijian Yang, Eugen Beck, Ralf Schlueter

    Abstract: Varying-size models are often required to deploy ASR systems under different hardware and/or application constraints such as memory and latency. To avoid redundant training and optimization efforts for individual models of different sizes, we present the dynamic encoder size approach, which jointly trains multiple performant models within one supernet from scratch. These subnets of various sizes a… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by Interspeech 2024

  5. arXiv:2407.18564  [pdf, other

    cs.LG cs.SI

    Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph Data

    Authors: Hanyang Yuan, Jiarong Xu, Cong Wang, Ziqi Yang, Chunping Wang, Keting Yin, Yang Yang

    Abstract: The public sharing of user information opens the door for adversaries to infer private data, leading to privacy breaches and facilitating malicious activities. While numerous studies have concentrated on privacy leakage via public user attributes, the threats associated with the exposure of user relationships, particularly through network structure, are often neglected. This study aims to fill thi… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: In KDD'24; with full appendix

  6. Adaptive Differentially Private Structural Entropy Minimization for Unsupervised Social Event Detection

    Authors: Zhiwei Yang, Yuecen Wei, Haoran Li, Qian Li, Lei Jiang, Li Sun, Xiaoyan Yu, Chunming Hu, Hao Peng

    Abstract: Social event detection refers to extracting relevant message clusters from social media data streams to represent specific events in the real world. Social event detection is important in numerous areas, such as opinion analysis, social safety, and decision-making. Most current methods are supervised and require access to large amounts of data. These methods need prior knowledge of the events and… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM CIKM 2024

  7. arXiv:2407.17467  [pdf, other

    cs.CL cs.LG

    CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

    Authors: Jiawei Gu, Zacc Yang, Chuanghao Ding, Rui Zhao, Fei Tan

    Abstract: Large Language Models (LLMs) excel in diverse tasks but often underperform in specialized fields due to limited domain-specific or proprietary corpus. Continual pre-training (CPT) enhances LLM capabilities by imbuing new domain-specific or proprietary knowledge while replaying general corpus to prevent catastrophic forgetting. The data mixture ratio of general corpus and domain-specific corpus, ho… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  8. arXiv:2407.16888  [pdf, other

    cs.CY cs.AI cs.HC cs.LG

    A Nested Model for AI Design and Validation

    Authors: Akshat Dubey, Zewen Yang, Georges Hattab

    Abstract: The burgeoning field of artificial intelligence (AI) has yet to fully permeate real-world applications, largely due to issues of trust, transparency, and concerns about fairness and discrimination. Despite the increasing need for new and revised regulations to address the ethical and legal risks of using AI, there is a mismatch between regulatory science and AI, hindering the creation of a consist… ▽ More

    Submitted 8 June, 2024; originally announced July 2024.

  9. TWIN V2: Scaling Ultra-Long User Behavior Sequence Modeling for Enhanced CTR Prediction at Kuaishou

    Authors: Zihua Si, Lin Guan, ZhongXiang Sun, Xiaoxue Zang, Jing Lu, Yiqun Hui, Xingchao Cao, Zeyu Yang, Yichen Zheng, Dewei Leng, Kai Zheng, Chenbin Zhang, Yanan Niu, Yang Song, Kun Gai

    Abstract: The significance of modeling long-term user interests for CTR prediction tasks in large-scale recommendation systems is progressively gaining attention among researchers and practitioners. Existing work, such as SIM and TWIN, typically employs a two-stage approach to model long-term user behavior sequences for efficiency concerns. The first stage rapidly retrieves a subset of sequences related to… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM 2024

  10. arXiv:2407.15186  [pdf, other

    cs.CL

    A Survey on Employing Large Language Models for Text-to-SQL Tasks

    Authors: Liang Shi, Zhengju Tang, Zhi Yang

    Abstract: The increasing volume of data stored in relational databases has led to the need for efficient querying and utilization of this data in various sectors. However, writing SQL queries requires specialized knowledge, which poses a challenge for non-professional users trying to access and query databases. Text-to-SQL parsing solves this issue by converting natural language queries into SQL queries, th… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  11. arXiv:2407.15111  [pdf, other

    cs.CV

    D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

    Authors: Zhaotong Yang, Zicheng Jiang, Xinzhe Li, Huiyu Zhou, Junyu Dong, Huaidong Zhang, Yong Du

    Abstract: In this paper, we introduce D$^4$-VTON, an innovative solution for image-based virtual try-on. We address challenges from previous studies, such as semantic inconsistencies before and after garment warping, and reliance on static, annotation-driven clothing parsers. Additionally, we tackle the complexities in diffusion-based VTON models when handling simultaneous tasks like inpainting and denoisin… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  12. arXiv:2407.15053  [pdf, ps, other

    cs.IT

    Stacked Intelligent Metasurfaces for Task-Oriented Semantic Communications

    Authors: Guojun Huang, Jiancheng An, Zhaohui Yang, Lu Gan, Mehdi Bennis, Mérouane Debbah

    Abstract: Semantic communication leveraging advanced deep learning (DL) technologies enhances the efficiency, reliability, and security of information transmission. Emerging stacked intelligent metasurface (SIM) having a diffractive neural network (DNN) architecture allows performing complex calculations at the speed of light. In this letter, we introduce an innovative SIM-aided semantic communication syste… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  13. arXiv:2407.14926  [pdf, other

    cs.AI

    TraveLLM: Could you plan my new public transit route in face of a network disruption?

    Authors: Bowen Fang, Zixiao Yang, Shukai Wang, Xuan Di

    Abstract: Imagine there is a disruption in train 1 near Times Square metro station. You try to find an alternative subway route to the JFK airport on Google Maps, but the app fails to provide a suitable recommendation that takes into account the disruption and your preferences to avoid crowded stations. We find that in many such situations, current navigation apps may fall short and fail to give a reasonabl… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  14. arXiv:2407.14230  [pdf, other

    cs.CV cs.LG

    ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading

    Authors: Zhiyuan Yang, Bo Zhang, Yufei Shi, Ningze Zhong, Johnathan Loh, Huihui Fang, Yanwu Xu, Si Yong Yeo

    Abstract: Glaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, in the field of computer-aided glaucoma diagnosis, multi-modality methods that integrate the CFP and OCT modalities have achieved greater diagnostic accur… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted by Ophthalmic Medical Image Analysis Workshop at MICCAI'24

  15. arXiv:2407.14138  [pdf, other

    cs.CV

    Visual Text Generation in the Wild

    Authors: Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang

    Abstract: Recently, with the rapid advancements of generative models, the field of visual text generation has witnessed significant progress. However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  16. arXiv:2407.14053  [pdf, other

    cs.GR cs.CV

    DirectL: Efficient Radiance Fields Rendering for 3D Light Field Displays

    Authors: Zongyuan Yang, Baolin Liu, Yingde Song, Yongping Xiong, Lan Yi, Zhaohe Zhang, Xunbo Yu

    Abstract: Autostereoscopic display, despite decades of development, has not achieved extensive application, primarily due to the daunting challenge of 3D content creation for non-specialists. The emergence of Radiance Field as an innovative 3D representation has markedly revolutionized the domains of 3D reconstruction and generation. This technology greatly simplifies 3D content creation for common users, b… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  17. arXiv:2407.13833  [pdf, other

    cs.CL cs.AI

    Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

    Authors: Emman Haider, Daniel Perez-Becker, Thomas Portet, Piyush Madan, Amit Garg, David Majercak, Wen Wen, Dongwoo Kim, Ziyi Yang, Jianwen Zhang, Hiteshi Sharma, Blake Bullwinkel, Martin Pouliot, Amanda Minnich, Shiven Chawla, Solianna Herrera, Shahed Warreth, Maggie Engler, Gary Lopez, Nina Chikanov, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Roman Lutz, Richard Lundeen, Tori Westerhoff , et al. (5 additional authors not shown)

    Abstract: Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  18. arXiv:2407.12322  [pdf, other

    cs.CV

    Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

    Authors: Wenhan Wu, Ce Zheng, Zihao Yang, Chen Chen, Srijan Das, Aidong Lu

    Abstract: Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based approaches heavily rely on the naive attention mechanism for capturing the spatiotemporal features, which falls short in learning discriminative representations that… ▽ More

    Submitted 26 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  19. arXiv:2407.11638  [pdf, other

    cs.CL cs.IR

    A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

    Authors: He Chang, Chenchen Ye, Zhulin Tao, Jie Wu, Zhengmao Yang, Yunshan Ma, Xianglin Huang, Tat-Seng Chua

    Abstract: Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluat… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  20. arXiv:2407.11315  [pdf, other

    cs.AI

    COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation

    Authors: Sannyuya Liu, Jintian Feng, Zongkai Yang, Yawei Luo, Qian Wan, Xiaoxuan Shen, Jianwen Sun

    Abstract: The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framewo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  21. arXiv:2407.11087  [pdf, other

    eess.IV cs.CV

    Restore-RWKV: Efficient and Effective Medical Image Restoration with RWKV

    Authors: Zhiwen Yang, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

    Abstract: Transformers have revolutionized medical image restoration, but the quadratic complexity still poses limitations for their application to high-resolution medical images. The recent advent of RWKV in the NLP field has attracted much attention as it can process long sequences efficiently. To leverage its advanced design, we propose Restore-RWKV, the first RWKV-based model for medical image restorati… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: This paper introduces the first RWKV-based model for image restoration

  22. arXiv:2407.10937  [pdf, other

    cs.CV

    IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

    Authors: Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

    Abstract: Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and multi-view-based methods have difficulty controlling the human appearance and motion. In this work, we present IDOL (unIfied Dual-mOdal Latent diffusio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://yhzhai.github.io/idol/

  23. arXiv:2407.09887  [pdf, other

    cs.LG math.OC

    Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis

    Authors: Zhicheng Yang, Yinya Huang, Wei Shi, Liang Feng, Linqi Song, Yiwei Wang, Xiaodan Liang, Jing Tang

    Abstract: Large language models (LLMs) have exhibited their problem-solving ability in mathematical reasoning. Solving realistic optimization (OPT) problems in industrial application scenarios requires advanced and applied math ability. However, current OPT benchmarks that merely solve linear programming are far from complex realistic situations. In this work, we propose E-OPT, a benchmark for end-to-end op… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  24. arXiv:2407.09768  [pdf, other

    cs.CV

    Prototype Clustered Diffusion Models for Versatile Inverse Problems

    Authors: Jinghao Zhang, Zizheng Yang, Qi Zhu, Feng Zhao

    Abstract: Diffusion models have made remarkable progress in solving various inverse problems, attributing to the generative modeling capability of the data manifold. Posterior sampling from the conditional score function enable the precious data consistency certified by the measurement-based likelihood term. However, most prevailing approaches confined to the deterministic deterioration process of the measu… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 24 pages, 9 figures

  25. arXiv:2407.09268  [pdf, other

    eess.IV cs.CV

    Region Attention Transformer for Medical Image Restoration

    Authors: Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Zhou, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

    Abstract: Transformer-based methods have demonstrated impressive results in medical image restoration, attributed to the multi-head self-attention (MSA) mechanism in the spatial dimension. However, the majority of existing Transformers conduct attention within fixed and coarsely partitioned regions (\text{e.g.} the entire image or fixed patches), resulting in interference from irrelevant regions and fragmen… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by MICCAI 2024

  26. arXiv:2407.09250  [pdf

    cs.NI cs.LG

    FedsLLM: Federated Split Learning for Large Language Models over Communication Networks

    Authors: Kai Zhao, Zhaohui Yang, Chongwen Huang, Xiaoming Chen, Zhaoyang Zhang

    Abstract: Addressing the challenges of deploying large language models in wireless communication networks, this paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The method introduced in this paper utilizes LoRA technology to reduce processing loads by dividing the network into clie… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  27. arXiv:2407.09157  [pdf, other

    cs.IR cs.AI cs.LG

    Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion

    Authors: Linhan Xia, Yicheng Yang, Ziou Chen, Zheng Yang, Shengxin Zhu

    Abstract: Pre-trained models learn general representations from large datsets which can be fine-turned for specific tasks to significantly reduce training time. Pre-trained models like generative pretrained transformers (GPT), bidirectional encoder representations from transformers (BERT), vision transfomers (ViT) have become a cornerstone of current research in machine learning. This study proposes a multi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  28. arXiv:2407.09143  [pdf, other

    cs.PL

    Higher-Order Specifications for Deductive Synthesis of Programs with Pointers (Extended Version)

    Authors: David Young, Ziyi Yang, Ilya Sergey, Alex Potanin

    Abstract: Synthetic Separation Logic (SSL) is a formalism that powers SuSLik, the state-of-the-art approach for the deductive synthesis of provably-correct programs in C-like languages that manipulate Heap-based linked data structures. Despite its expressivity, SSL suffers from two shortcomings that hinder its utility. First, its main specification component, inductive predicates, only admits first-order de… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  29. arXiv:2407.09032  [pdf, other

    math.NA cs.LG

    DRM Revisited: A Complete Error Analysis

    Authors: Yuling Jiao, Ruoxuan Li, Peiying Wu, Jerry Zhijian Yang, Pingwen Zhang

    Abstract: In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number o… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  30. arXiv:2407.08639  [pdf, other

    cs.AI cs.LG

    $β$-DPO: Direct Preference Optimization with Dynamic $β$

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $β$, as well as to the quality of the preference data. We analyze the impact of $β$ and data quality on DPO, uncovering that optimal $β$ values vary with the inf… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  31. arXiv:2407.07880  [pdf, other

    cs.LG cs.AI cs.CL

    Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robus… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  32. arXiv:2407.06844  [pdf, other

    cs.CV

    Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

    Authors: Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

    Abstract: Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This pa… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: submitted to TIP

  33. Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification

    Authors: Zitai Wang, Qianqian Xu, Zhiyong Yang, Peisong Wen, Yuan He, Xiaochun Cao, Qingming Huang

    Abstract: Multi-label ranking, which returns multiple top-ranked labels for each instance, has a wide range of applications for visual tasks. Due to its complicated setting, prior arts have proposed various measures to evaluate model performances. However, both theoretical analysis and empirical observations show that a model might perform inconsistently on different measures. To bridge this gap, this paper… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  34. arXiv:2407.06516  [pdf, other

    cs.CV

    VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

    Authors: Yibo Liu, Zheyuan Yang, Guile Wu, Yuan Ren, Kejian Lin, Bingbing Liu, Yang Liu, Jinjun Shan

    Abstract: Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of in-the-wild vehicles (such as car models, manufacturers, etc.). This leads to their poor zero-shot prediction capability to handle real-world obser… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  35. arXiv:2407.05652   

    cs.SE

    StmtTree: An Easy-to-Use yet Versatile Fortran Transformation Toolkit

    Authors: Jingbo Lin, Yi Yu, Zhang Yang, Yafan Zhao

    Abstract: The Fortran programming language continues to dominate the scientific computing community, with many production codes written in the outdated Fortran-77 dialect, yet with many non-standard extensions such as Cray poiters. This creates significant maintenance burden within the community, with tremendous efforts devoted to modernization. However, despite the modern age of advanced compiler framework… ▽ More

    Submitted 20 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: We are preparing a clearer version

  36. arXiv:2407.05249  [pdf, ps, other

    cs.IT eess.SP

    RIS-assisted Coverage Enhancement in mmWave Integrated Sensing and Communication Networks

    Authors: Xu Gan, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Yong Liang Guan, Merouane Debbah

    Abstract: Integrated sensing and communication (ISAC) has emerged as a promising technology to facilitate high-rate communications and super-resolution sensing, particularly operating in the millimeter wave (mmWave) band. However, the vulnerability of mmWave signals to blockages severely impairs ISAC capabilities and coverage. To tackle this, an efficient and low-cost solution is to deploy distributed recon… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  37. arXiv:2407.04948  [pdf, other

    cs.CV

    Zero-shot Object Counting with Good Exemplars

    Authors: Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Zheng Wang, Xian Zhong, Shengfeng He

    Abstract: Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability to identify high-quality exemplars effectively. This deficiency hampers scalability across diverse classes and undermines the development of strong visual asso… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  38. arXiv:2407.04947  [pdf, other

    cs.CV

    FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

    Authors: Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen, Chunhua Shen

    Abstract: We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. Rather than concentrating on specific use cases such as appearance editing (image harmonization) or semantic editing (semantic image composition), we showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to Proc. Eur. Conf. Comp. Vision 2024. Project webpage: https://github.com/aim-uofa/FreeCompose

  39. arXiv:2407.04713  [pdf

    cs.ET physics.optics

    16-channel Photonic Solver for Optimization Problems on a Silicon Chip

    Authors: Jiayi Ouyang, Shengping Liu, Ziyue Yang, Wei Wang, Xue Feng, Yongzhuo Li, Yidong Huang

    Abstract: In this article, we proposed a programmable 16-channel photonic solver for quadratic unconstrained binary optimization (QUBO) problems. The solver is based on a hybrid optoelectronic scheme including a photonic chip and the corresponding electronic driving circuit. The photonic chip is fabricated on silicon on insulator (SOI) substrate and integrates high-speed electro-optic modulators, thermo-opt… ▽ More

    Submitted 5 June, 2024; originally announced July 2024.

  40. arXiv:2407.04381  [pdf, other

    cs.CV cs.AI

    Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection

    Authors: Zhiqiang Yang, Qiu Guan, Keer Zhao, Jianmin Yang, Xinli Xu, Haixia Long, Ying Tang

    Abstract: Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  41. arXiv:2407.04031  [pdf

    cs.CE

    Towards reproducible machine learning-based process monitoring and quality prediction research for additive manufacturing

    Authors: Jiarui Xie, Mutahar Safdar, Andrei Mircea, Yan Lu, Hyunwoong Ko, Zhuo Yang, Yaoyao Fiona Zhao

    Abstract: Machine learning (ML)-based monitoring systems have been extensively developed to enhance the print quality of additive manufacturing (AM). In-situ and in-process data acquired using sensors can be used to train ML models that detect process anomalies, predict part quality, and adjust process parameters. However, the reproducibility of the proposed AM monitoring systems has not been investigated.… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 13 pages, 6 figures, 2 tables. This paper has been accepted to be published in the proceedings of IDETC-CIE 2024

  42. arXiv:2407.03776  [pdf, other

    cs.IT

    Energy-Efficient Probabilistic Semantic Communication over Space-Air-Ground Integrated Networks

    Authors: Zhouxiang Zhao, Zhaohui Yang, Mingzhe Chen, Zhaoyang Zhang, Wei Xu, Kaibin Huang

    Abstract: Space-air-ground integrated networks (SAGINs) are emerging as a pivotal element in the evolution of future wireless networks. Despite their potential, the joint design of communication and computation within SAGINs remains a formidable challenge. In this paper, the problem of energy efficiency in SAGIN-enabled probabilistic semantic communication (PSC) system is investigated. In the considered mod… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  43. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  44. arXiv:2407.03160  [pdf, other

    cs.CR cs.CL cs.LG

    SOS! Soft Prompt Attack Against Open-Source Large Language Models

    Authors: Ziqing Yang, Michael Backes, Yang Zhang, Ahmed Salem

    Abstract: Open-source large language models (LLMs) have become increasingly popular among both the general public and industry, as they can be customized, fine-tuned, and freely used. However, some open-source LLMs require approval before usage, which has led to third parties publishing their own easily accessible versions. Similarly, third parties have been publishing fine-tuned or quantized variants of th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  45. arXiv:2407.02964  [pdf, other

    cs.CL

    FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering

    Authors: Xiaochen Wang, Junqing He, Zhe yang, Yiru Wang, Xiangdi Meng, Kunhao Pan, Zhifang Sui

    Abstract: Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. However, they tend to perform poorly on Multi-hop Question Answering (MHQA) tasks due to several challenges, including hallucination, error propagation and limited context length. We propose a prompting method, Finite State Machine (FSM) to enhance th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  46. arXiv:2407.02922  [pdf, other

    cs.IT

    Fair Resource Allocation for Probabilistic Semantic Communication in IIoT

    Authors: Siyun Liang, Zhouxiang Zhao, Chen Zhu, Zhaohui Yang, Yinchao Yang, Mohammad Shikh-Bahaei, Zhaoyang Zhang

    Abstract: In this paper, the problem of minimum rate maximization for probabilistic semantic communication (PSCom) in industrial Internet of Things (IIoT) is investigated. In the considered model, users employ semantic information extraction techniques to compress the original data before sending it to the base station (BS). During this semantic compression process, knowledge graphs are employed to represen… ▽ More

    Submitted 8 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  47. arXiv:2407.02906  [pdf, other

    cs.CV

    Single Image Rolling Shutter Removal with Diffusion Models

    Authors: Zhanglei Yang, Haipeng Li, Mingbo Hong, Bing Zeng, Shuaicheng Liu

    Abstract: We present RS-Diffusion, the first Diffusion Models-based method for single-frame Rolling Shutter (RS) correction. RS artifacts compromise visual quality of frames due to the row wise exposure of CMOS sensors. Most previous methods have focused on multi-frame approaches, using temporal information from consecutive frames for the motion rectification. However, few approaches address the more challe… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  48. arXiv:2407.02807  [pdf, other

    cs.SI

    Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada

    Authors: Zachary Yang, Anne Imouza, Maximilian Puelma Touzel, Cecile Amadoro, Gabrielle Desrosiers-Brisebois, Kellin Pelrine, Sacha Levy, Jean-Francois Godbout, Reihaneh Rabbany

    Abstract: Public health measures were among the most polarizing topics debated online during the COVID-19 pandemic. Much of the discussion surrounded specific events, such as when and which particular interventions came into practise. In this work, we develop and apply an approach to measure subnational and event-driven variation of partisan polarization and explore how these dynamics varied both across and… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages (main paper), 9 figures, 1 table

    ACM Class: J.4

  49. arXiv:2407.02797  [pdf, other

    cs.RO cs.CV

    Solving Motion Planning Tasks with a Scalable Generative Model

    Authors: Yihan Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, Qiang Liu

    Abstract: As autonomous driving systems being deployed to millions of vehicles, there is a pressing need of improving the system's scalability, safety and reducing the engineering cost. A realistic, scalable, and practical simulator of the driving world is highly desired. In this paper, we present an efficient solution based on generative models which learns the dynamics of the driving scenes. With this mod… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  50. arXiv:2407.02775  [pdf, other

    cs.CL cs.LG

    MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models

    Authors: Ying Zhang, Ziheng Yang, Shufan Ji

    Abstract: Knowledge distillation is an effective technique for pre-trained language model compression. Although existing knowledge distillation methods perform well for the most typical model BERT, they could be further improved in two aspects: the relation-level knowledge could be further explored to improve model performance; and the setting of student attention head number could be more flexible to decre… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.