Skip to main content

Showing 1–50 of 85 results for author: Lan, X

  1. arXiv:2407.11699  [pdf, other

    cs.CV

    Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

    Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan

    Abstract: This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment o… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.11497  [pdf, other

    cs.HC cs.GR

    "I Came Across a Junk": Understanding Design Flaws of Data Visualization from the Public's Perspective

    Authors: Xingyu Lan, Yu Liu

    Abstract: The visualization community has a rich history of reflecting upon flaws of visualization design, and research in this direction has remained lively until now. However, three main gaps still exist. First, most existing work characterizes design flaws from the perspective of researchers rather than the perspective of general users. Second, little work has been done to infer why these design flaws oc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.07844  [pdf, other

    cs.CV

    OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

    Authors: Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang

    Abstract: Open-vocabulary detection is a challenging task due to the requirement of detecting objects based on class names, including those not encountered during training. Existing methods have shown strong zero-shot detection capabilities through pre-training and pseudo-labeling on diverse large-scale datasets. However, these approaches encounter two main challenges: (i) how to effectively eliminate data… ▽ More

    Submitted 21 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Technical Report

  4. arXiv:2404.01622  [pdf, ps, other

    cs.HC cs.AI cs.GR

    Gen4DS: Workshop on Data Storytelling in an Era of Generative AI

    Authors: Xingyu Lan, Leni Yang, Zezhong Wang, Yun Wang, Danqing Shi, Sheelagh Carpendale

    Abstract: Storytelling is an ancient and precious human ability that has been rejuvenated in the digital age. Over the last decade, there has been a notable surge in the recognition and application of data storytelling, both in academia and industry. Recently, the rapid development of generative AI has brought new opportunities and challenges to this field, sparking numerous new questions. These questions m… ▽ More

    Submitted 5 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  5. arXiv:2403.10750  [pdf, other

    cs.CL cs.AI

    Depression Detection on Social Media with Large Language Models

    Authors: Xiaochong Lan, Yiming Cheng, Li Sheng, Chen Gao, Yong Li

    Abstract: Depression harms. However, due to a lack of mental health awareness and fear of stigma, many patients do not actively seek diagnosis and treatment, leading to detrimental outcomes. Depression detection aims to determine whether an individual suffers from depression by analyzing their history of posts on social media, which can significantly aid in early detection and intervention. It mainly faces… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  6. arXiv:2402.19231  [pdf, other

    cs.CV cs.RO

    CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

    Authors: Feng Lu, Xiangyuan Lan, Lijun Zhang, Dongmei Jiang, Yaowei Wang, Chun Yuan

    Abstract: Over the past decade, most methods in visual place recognition (VPR) have used neural networks to produce feature representations. These networks typically produce a global representation of a place image using only this image itself and neglect the cross-image variations (e.g. viewpoint and illumination), which limits their robustness in challenging scenes. In this paper, we propose a robust glob… ▽ More

    Submitted 1 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR2024

  7. arXiv:2402.17978  [pdf, other

    cs.LG cs.AI cs.MA

    Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

    Authors: Zeyang Liu, Lipeng Wan, Xinrui Yang, Zhuoran Chen, Xingyu Chen, Xuguang Lan

    Abstract: Effective exploration is crucial to discovering optimal strategies for multi-agent reinforcement learning (MARL) in complex coordination tasks. Existing methods mainly utilize intrinsic rewards to enable committed exploration or use role-based learning for decomposing joint action spaces instead of directly conducting a collective search in the entire action-observation space. However, they often… ▽ More

    Submitted 1 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: The 38th Annual AAAI Conference on Artificial Intelligence

  8. Deep Homography Estimation for Visual Place Recognition

    Authors: Feng Lu, Shuting Dong, Lijun Zhang, Bingxi Liu, Xiangyuan Lan, Dongmei Jiang, Chun Yuan

    Abstract: Visual place recognition (VPR) is a fundamental task for many applications such as robot localization and augmented reality. Recently, the hierarchical VPR methods have received considerable attention due to the trade-off between accuracy and efficiency. They usually first use global features to retrieve the candidate images, then verify the spatial consistency of matched local features for re-ran… ▽ More

    Submitted 18 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI2024

    Journal ref: AAAI 2024

  9. arXiv:2402.14505  [pdf, other

    cs.CV cs.AI

    Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

    Authors: Feng Lu, Lijun Zhang, Xiangyuan Lan, Shuting Dong, Yaowei Wang, Chun Yuan

    Abstract: Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model… ▽ More

    Submitted 3 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: ICLR2024

  10. arXiv:2402.11816  [pdf, other

    cs.CV cs.LG

    Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

    Authors: Jihai Zhang, Xiang Lan, Xiaoye Qu, Yu Cheng, Mengling Feng, Bryan Hooi

    Abstract: Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data. However, a major challenge that hinders both unimodal and multimodal contrastive learning is feature suppression, a phenomenon where the trained model captures only a limited portion of the information from the input data while overlooking other potentially valuable content. This… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: ECCV 2024 Camera-Ready

  11. arXiv:2402.11792  [pdf, other

    cs.RO

    SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction

    Authors: Jie Xu, Hanbo Zhang, Xinghang Li, Huaping Liu, Xuguang Lan, Tao Kong

    Abstract: Linguistic ambiguity is ubiquitous in our daily lives. Previous works adopted interaction between robots and humans for language disambiguation. Nevertheless, when interactive robots are deployed in daily environments, there are significant challenges for natural human-robot interaction, stemming from complex and unpredictable visual inputs, open-ended interaction, and diverse user demands. In thi… ▽ More

    Submitted 19 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  12. arXiv:2402.03699  [pdf

    cs.RO cs.CV

    Automatic Robotic Development through Collaborative Framework by Large Language Models

    Authors: Zhirong Luan, Yujun Lai, Rundong Huang, Xiaruiqi Lan, Liangjun Chen, Badong Chen

    Abstract: Despite the remarkable code generation abilities of large language models LLMs, they still face challenges in complex task handling. Robot development, a highly intricate field, inherently demands human involvement in task allocation and collaborative teamwork . To enhance robot development, we propose an innovative automated collaboration framework inspired by real-world robot developers. This fr… ▽ More

    Submitted 16 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  13. arXiv:2401.16699  [pdf, other

    cs.RO

    Towards Unified Interactive Visual Grounding in The Wild

    Authors: Jie Xu, Hanbo Zhang, Qingyi Si, Yifeng Li, Xuguang Lan, Tao Kong

    Abstract: Interactive visual grounding in Human-Robot Interaction (HRI) is challenging yet practical due to the inevitable ambiguity in natural languages. It requires robots to disambiguate the user input by active information gathering. Previous approaches often rely on predefined templates to ask disambiguation questions, resulting in performance reduction in realistic interactive scenarios. In this paper… ▽ More

    Submitted 18 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted to ICRA 2024

  14. arXiv:2401.16355  [pdf, other

    cs.CV

    PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

    Authors: Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin, Lin Yang

    Abstract: The emergence of large multimodal models has unlocked remarkable potential in AI, particularly in pathology. However, the lack of specialized, high-quality benchmark impeded their development and precise evaluation. To address this, we introduce PathMMU, the largest and highest-quality expert-validated pathology benchmark for Large Multimodal Models (LMMs). It comprises 33,428 multimodal multi-cho… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 27 pages, 12 figures

  15. arXiv:2312.11970  [pdf, other

    cs.AI cs.CL cs.CY cs.MA

    Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives

    Authors: Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, Yong Li

    Abstract: Agent-based modeling and simulation has evolved as a powerful tool for modeling complex systems, offering insights into emergent behaviors and interactions among diverse agents. Integrating large language models into agent-based modeling and simulation presents a promising avenue for enhancing simulation capabilities. This paper surveys the landscape of utilizing large language models in agent-bas… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 37 pages

  16. arXiv:2310.10467  [pdf, other

    cs.CL cs.AI

    Stance Detection with Collaborative Role-Infused LLM-Based Agents

    Authors: Xiaochong Lan, Chen Gao, Depeng Jin, Yong Li

    Abstract: Stance detection automatically detects the stance in a text towards a target, vital for content analysis in web and social media research. Despite their promising capabilities, LLMs encounter challenges when directly applied to stance detection. First, stance detection demands multi-aspect knowledge, from deciphering event-related terminologies to understanding the expression styles in social medi… ▽ More

    Submitted 16 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  17. arXiv:2310.05694  [pdf, other

    cs.CL

    A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

    Authors: Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, Erik Cambria

    Abstract: The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern due to their ability to effectively respond to freetext queries with certain professional knowledge. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  18. FoodSAM: Any Food Segmentation

    Authors: Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian Xue

    Abstract: In this paper, we explore the zero-shot capability of the Segment Anything Model (SAM) for food image segmentation. To address the lack of class-specific information in SAM-generated masks, we propose a novel framework, called FoodSAM. This innovative approach integrates the coarse semantic mask with SAM-generated masks to enhance semantic segmentation quality. Besides, we recognize that the ingre… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: Code is available at https://github.com/jamesjg/FoodSAM

  19. arXiv:2308.02831  [pdf, other

    cs.HC

    Affective Visualization Design: Leveraging the Emotional Impact of Data

    Authors: Xingyu Lan, Yanqiu Wu, Nan Cao

    Abstract: In recent years, more and more researchers have reflected on the undervaluation of emotion in data visualization and highlighted the importance of considering human emotion in visualization design. Meanwhile, an increasing number of studies have been conducted to explore emotion-related factors. However, so far, this research area is still in its early stages and faces a set of challenges, such as… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: to appear at IEEE VIS 2023

  20. NEON: Living Needs Prediction System in Meituan

    Authors: Xiaochong Lan, Chen Gao, Shiqi Wen, Xiuqi Chen, Yingge Che, Han Zhang, Huazhou Wei, Hengliang Luo, Yong Li

    Abstract: Living needs refer to the various needs in human's daily lives for survival and well-being, including food, housing, entertainment, etc. On life service platforms that connect users to service providers, such as Meituan, the problem of living needs prediction is fundamental as it helps understand users and boost various downstream applications such as personalized recommendation. However, the prob… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  21. arXiv:2307.14984  [pdf, other

    cs.SI

    S3: Social-network Simulation System with Large Language Model-Empowered Agents

    Authors: Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, Yong Li

    Abstract: Social network simulation plays a crucial role in addressing various challenges within social science. It offers extensive applications such as state prediction, phenomena explanation, and policy-making support, among others. In this work, we harness the formidable human-like capabilities exhibited by large language models (LLMs) in sensing, reasoning, and behaving, and utilize these qualities to… ▽ More

    Submitted 19 October, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  22. arXiv:2307.11458  [pdf, other

    cs.CV

    Strip-MLP: Efficient Token Interaction for Vision MLP

    Authors: Guiping Cao, Shengda Luo, Wenjian Huang, Xiangyuan Lan, Dongmei Jiang, Yaowei Wang, Jianguo Zhang

    Abstract: Token interaction operation is one of the core modules in MLP-based models to exchange and aggregate information between different spatial locations. However, the power of token interaction on the spatial dimension is highly dependent on the spatial resolution of the feature maps, which limits the model's expressive ability, especially in deep layers where the feature are down-sampled to a small s… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  23. arXiv:2307.09193  [pdf, other

    cs.AI cs.IR

    ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

    Authors: Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan

    Abstract: Large-scale online recommender system spreads all over the Internet being in charge of two basic tasks: Click-Through Rate (CTR) and Post-Click Conversion Rate (CVR) estimations. However, traditional CVR estimators suffer from well-known Sample Selection Bias and Data Sparsity issues. Entire space models were proposed to address the two issues via tracing the decision-making path of "exposure_clic… ▽ More

    Submitted 29 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  24. arXiv:2304.12592  [pdf, other

    cs.CV cs.AI

    MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

    Authors: Han Wang, Jiayuan Zhang, Lipeng Wan, Xingyu Chen, Xuguang Lan, Nanning Zheng

    Abstract: Manipulation relationship detection (MRD) aims to guide the robot to grasp objects in the right order, which is important to ensure the safety and reliability of grasping in object stacked scenes. Previous works infer manipulation relationship by deep neural network trained with data collected from a predefined view, which has limitation in visual dislocation in unstructured environments. Multi-vi… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  25. arXiv:2304.01171  [pdf, other

    cs.CV

    Revisiting Context Aggregation for Image Matting

    Authors: Qinglin Liu, Xiaoqian Lv, Quanling Meng, Zonglin Li, Xiangyuan Lan, Shuo Yang, Shengping Zhang, Liqiang Nie

    Abstract: Traditional studies emphasize the significance of context information in improving matting performance. Consequently, deep learning-based matting methods delve into designing pooling or affinity-based context aggregation modules to achieve superior results. However, these modules cannot well handle the context scale shift caused by the difference in image size during training and inference, result… ▽ More

    Submitted 14 May, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

  26. arXiv:2303.17408  [pdf, other

    cs.CL

    P-Transformer: A Prompt-based Multimodal Transformer Architecture For Medical Tabular Data

    Authors: Yucheng Ruan, Xiang Lan, Daniel J. Tan, Hairil Rizal Abdullah, Mengling Feng

    Abstract: Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remained for existing work to be effectively adapted into medical domain, such as under-utilization… ▽ More

    Submitted 9 January, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

  27. arXiv:2303.07828  [pdf, other

    cs.RO

    Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction

    Authors: Zewen Wu, Jian Tang, Xingyu Chen, Chengzhong Ma, Xuguang Lan, Nanning Zheng

    Abstract: In scenarios involving the grasping of multiple targets, the learning of stacking relationships between objects is fundamental for robots to execute safely and efficiently. However, current methods lack subdivision for the hierarchy of stacking relationship types. In scenes where objects are mostly stacked in an orderly manner, they are incapable of performing human-like and high-efficient graspin… ▽ More

    Submitted 25 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: 8 pages, 8 figures. Accepted by 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

  28. arXiv:2302.03357  [pdf, other

    cs.LG

    Towards Enhancing Time Series Contrastive Learning: A Dynamic Bad Pair Mining Approach

    Authors: Xiang Lan, Hanshu Yan, Shenda Hong, Mengling Feng

    Abstract: Not all positive pairs are beneficial to time series contrastive learning. In this paper, we study two types of bad positive pairs that can impair the quality of time series representation learned through contrastive learning: the noisy positive pair and the faulty positive pair. We observe that, with the presence of noisy positive pairs, the model tends to simply learn the pattern of noise (Noisy… ▽ More

    Submitted 28 March, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: ICLR 2024 Camera Ready (https://openreview.net/pdf?id=K2c04ulKXn)

  29. arXiv:2211.12075  [pdf, other

    cs.MA cs.LG

    Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

    Authors: Lipeng Wan, Zeyang Liu, Xingyu Chen, Xuguang Lan, Nanning Zheng

    Abstract: Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the maximal true Q value). In this paper, we derive th… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2112.04454

  30. arXiv:2211.03296  [pdf, other

    cs.HC

    The Chart Excites Me! Exploring How Data Visualization Design Influences Affective Arousal

    Authors: Xingyu Lan, Yanqiu Wu, Qing Chen, Nan Cao

    Abstract: As data visualizations have been increasingly applied in mass communication, designers often seek to grasp viewers immediately and motivate them to read more. Such goals, as suggested by previous research, are closely associated with the activation of emotion, namely affective arousal. Given this motivation, this work takes initial steps toward understanding the arousal-related factors in data vis… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  31. arXiv:2209.03642  [pdf, other

    cs.HC

    VizBelle: A Design Space of Embellishments for Data Visualization

    Authors: Qing Chen, Ziyan Liu, Chengwei Wang, Xingyu Lan, Ying Chen, Siming Chen, Nan Cao

    Abstract: Visual embellishments, as a form of non-linguistic rhetorical figures, are used to help convey abstract concepts or attract readers' attention. Creating data visualizations with appropriate and visually pleasing embellishments is challenging since this process largely depends on the experience and the aesthetic taste of designers. To help facilitate designers in the ideation and creation process,… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  32. arXiv:2208.07156  [pdf, other

    cs.NE cs.CE cs.MA eess.SY

    Cooperative guidance of multiple missiles: a hybrid co-evolutionary approach

    Authors: Xuejing Lan, Junda Chen, Zhijia Zhao, Tao Zou

    Abstract: Cooperative guidance of multiple missiles is a challenging task with rigorous constraints of time and space consensus, especially when attacking dynamic targets. In this paper, the cooperative guidance task is described as a distributed multi-objective cooperative optimization problem. To address the issues of non-stationarity and continuous control faced by cooperative guidance, the natural evolu… ▽ More

    Submitted 14 April, 2023; v1 submitted 15 August, 2022; originally announced August 2022.

    ACM Class: F.2.2; J.2

  33. arXiv:2208.04518  [pdf

    cs.CY cs.HC

    A Mixed-Methods Analysis of the Algorithm-Mediated Labor of Online Food Deliverers in China

    Authors: Zhilong Chen, Xiaochong Lan, Jinghua Piao, Yunke Zhang, Yong Li

    Abstract: In recent years, China has witnessed the proliferation and success of the online food delivery industry, an emerging type of the gig economy. Online food deliverers who deliver the food from restaurants to customers play a critical role in enabling this industry. Mediated by algorithms and coupled with interactions with multiple stakeholders, this emerging kind of labor has been taken by millions… ▽ More

    Submitted 26 August, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

    Comments: Accepted to CSCW 2022

  34. arXiv:2208.04122  [pdf

    cs.CY cs.HC cs.IR

    Practitioners Versus Users: A Value-Sensitive Evaluation of Current Industrial Recommender System Design

    Authors: Zhilong Chen, Jinghua Piao, Xiaochong Lan, Hancheng Cao, Chen Gao, Zhicong Lu, Yong Li

    Abstract: Recommender systems are playing an increasingly important role in alleviating information overload and supporting users' various needs, e.g., consumption, socialization, and entertainment. However, limited research focuses on how values should be extensively considered in industrial deployments of recommender systems, the ignorance of which can be problematic. To fill this gap, in this paper, we a… ▽ More

    Submitted 26 August, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

    Comments: Zhilong Chen and Jinghua Piao contribute equally to this work; Accepted to CSCW 2022

  35. arXiv:2207.08794  [pdf, other

    cs.CV cs.RO

    DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM

    Authors: Weicai Ye, Xingyuan Yu, Xinyue Lan, Yuhang Ming, Jinyu Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

    Abstract: We present a novel dual-flow representation of scene motion that decomposes the optical flow into a static flow field caused by the camera motion and another dynamic flow field caused by the objects' movements in the scene. Based on this representation, we present a dynamic SLAM, dubbed DeFlowSLAM, that exploits both static and dynamic pixels in the images to solve the camera poses, rather than si… ▽ More

    Submitted 13 January, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: Homepage: https://zju3dv.github.io/deflowslam

  36. arXiv:2207.02705  [pdf, other

    cs.NI cs.IT

    Incentivizing Proof-of-Stake Blockchain for Secured Data Collection in UAV-Assisted IoT: A Multi-Agent Reinforcement Learning Approach

    Authors: Xiao Tang, Xunqiang Lan, Lixin Li, Yan Zhang, Zhu Han

    Abstract: The Internet of Things (IoT) can be conveniently deployed while empowering various applications, where the IoT nodes can form clusters to finish certain missions collectively. In this paper, we propose to employ unmanned aerial vehicles (UAVs) to assist the clustered IoT data collection with blockchain-based security provisioning. In particular, the UAVs generate candidate blocks based on the coll… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: 14 pages, 10 figures, submitted to IEEE Journal

  37. arXiv:2207.01610  [pdf, other

    cs.CV cs.RO

    PVO: Panoptic Visual Odometry

    Authors: Weicai Ye, Xinyue Lan, Shuo Chen, Yuhang Ming, Xingyuan Yu, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

    Abstract: We present PVO, a novel panoptic visual odometry framework to achieve more comprehensive modeling of the scene motion, geometry, and panoptic segmentation information. Our PVO models visual odometry (VO) and video panoptic segmentation (VPS) in a unified view, which makes the two tasks mutually beneficial. Specifically, we introduce a panoptic update module into the VO Module with the guidance of… ▽ More

    Submitted 26 March, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: CVPR2023 Project page: https://zju3dv.github.io/pvo/ code: https://github.com/zju3dv/PVO

  38. arXiv:2203.05243  [pdf, other

    cs.CV cs.CL cs.MM

    A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

    Authors: Xiaohan Lan, Yitian Yuan, Xin Wang, Long Chen, Zhi Wang, Lin Ma, Wenwu Zhu

    Abstract: Temporal Sentence Grounding in Videos (TSGV), which aims to ground a natural language sentence in an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found that current benchmark datasets may have obvious moment annotation biases, enabling several simple baselines even without training to achieve SOTA performance. In this paper, we take a closer… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

  39. arXiv:2203.01217  [pdf, other

    cs.CV

    Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation

    Authors: Weicai Ye, Xinyue Lan, Ge Su, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

    Abstract: Video Panoptic Segmentation (VPS) aims to generate coherent panoptic segmentation and track the identities of all pixels across video frames. Existing methods predominantly utilize the trained instance embedding to keep the consistency of panoptic segmentation. However, they inevitably struggle to cope with the challenges of small objects, similar appearance but inconsistent identities, occlusion,… ▽ More

    Submitted 11 December, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

  40. arXiv:2203.00865  [pdf

    cs.CY cs.HC

    Beyond Virtual Bazaar: How Social Commerce Promotes Inclusivity for the Traditionally Underserved Community in Chinese Developing Regions

    Authors: Zhilong Chen, Hancheng Cao, Xiaochong Lan, Zhicong Lu, Yong Li

    Abstract: The disadvantaged population is often underserved and marginalized in technology engagement: prior works show they are generally more reluctant and experience more barriers in adopting and engaging with mainstream technology. Here, we contribute to the HCI4D and ICTD literature through a novel "counter" case study on Chinese social commerce (e.g., Pinduoduo), which 1) first prospers among the trad… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: Zhilong Chen and Hancheng Cao contribute equally to this work; Accepted to CHI 2022

  41. arXiv:2202.03631  [pdf, ps, other

    cs.RO

    Robotic Grasping from Classical to Modern: A Survey

    Authors: Hanbo Zhang, Jian Tang, Shiguang Sun, Xuguang Lan

    Abstract: Robotic Grasping has always been an active topic in robotics since grasping is one of the fundamental but most challenging skills of robots. It demands the coordination of robotic perception, planning, and control for robustness and intelligence. However, current solutions are still far behind humans, especially when confronting unstructured scenarios. In this paper, we survey the advances of robo… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  42. arXiv:2112.04454  [pdf, other

    cs.MA

    Greedy-based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

    Authors: Lipeng Wan, Zeyang Liu, Xingyu Chen, Han Wang, Xuguang Lan

    Abstract: Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the maximal true Q value). In this paper, we derive th… ▽ More

    Submitted 3 July, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

  43. arXiv:2112.01078  [pdf, other

    cs.MA

    Multi-Agent Intention Sharing via Leader-Follower Forest

    Authors: Zeyang Liu, Lipeng Wan, Xue sui, Kewu Sun, Xuguang Lan

    Abstract: Intention sharing is crucial for efficient cooperation under partially observable environments in multi-agent reinforcement learning (MARL). However, message deceiving, i.e., a mismatch between the propagated intentions and the final decisions, may happen when agents change strategies simultaneously according to received intentions. Message deceiving leads to potential miscoordination and difficul… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  44. arXiv:2109.08908  [pdf, other

    cs.LG cs.AI eess.SP

    Intra-Inter Subject Self-supervised Learning for Multivariate Cardiac Signals

    Authors: Xiang Lan, Dianwen Ng, Shenda Hong, Mengling Feng

    Abstract: Learning information-rich and generalizable representations effectively from unlabeled multivariate cardiac signals to identify abnormal heart rhythms (cardiac arrhythmias) is valuable in real-world clinical settings but often challenging due to its complex temporal dynamics. Cardiac arrhythmias can vary significantly in temporal patterns even for the same patient ($i.e.$, intra subject difference… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: preliminary version

  45. arXiv:2109.08903  [pdf, other

    cs.RO

    Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

    Authors: Deyu Yang, Hanbo Zhang, Xuguang Lan, Jishiyu Ding

    Abstract: Multi-goal reinforcement learning (RL) aims to qualify the agent to accomplish multi-goal tasks, which is of great importance in learning scalable robotic manipulation skills. However, reward engineering always requires strenuous efforts in multi-goal RL. Moreover, it will introduce inevitable bias causing the suboptimality of the final policy. The sparse reward provides a simple yet efficient way… ▽ More

    Submitted 24 September, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

    Comments: 8 pages, 7 figures

  46. arXiv:2109.08039  [pdf, other

    cs.CV cs.AI cs.MM

    A Survey on Temporal Sentence Grounding in Videos

    Authors: Xiaohan Lan, Yitian Yuan, Xin Wang, Zhi Wang, Wenwu Zhu

    Abstract: Temporal sentence grounding in videos(TSGV), which aims to localize one target segment from an untrimmed video with respect to a given sentence query, has drawn increasing attentions in the research community over the past few years. Different from the task of temporal action localization, TSGV is more flexible since it can locate complicated activities via natural languages, without restrictions… ▽ More

    Submitted 16 September, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

    Comments: 32 pages with 19 figures

  47. arXiv:2108.12863  [pdf, other

    cs.CV

    MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection

    Authors: Xun Tan, Xingyu Chen, Guowei Zhang, Jishiyu Ding, Xuguang Lan

    Abstract: Point clouds and images could provide complementary information when representing 3D objects. Fusing the two kinds of data usually helps to improve the detection results. However, it is challenging to fuse the two data modalities, due to their different characteristics and the interference from the non-interest areas. To solve this problem, we propose a Multi-Branch Deep Fusion Network (MBDF-Net)… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

  48. arXiv:2108.11092  [pdf, other

    cs.RO cs.AI

    INVIGORATE: Interactive Visual Grounding and Grasping in Clutter

    Authors: Hanbo Zhang, Yunfan Lu, Cunjun Yu, David Hsu, Xuguang Lan, Nanning Zheng

    Abstract: This paper presents INVIGORATE, a robot system that interacts with human through natural language and grasps a specified object in clutter. The objects may occlude, obstruct, or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects, from input language expressions and RGB images, (ii) infer object blocking relationships… ▽ More

    Submitted 7 January, 2024; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: 12 pages, full version

  49. arXiv:2107.06564  [pdf, other

    cs.RO cs.CV

    Probabilistic Human Motion Prediction via A Bayesian Neural Network

    Authors: Jie Xu, Xingyu Chen, Xuguang Lan, Nanning Zheng

    Abstract: Human motion prediction is an important and challenging topic that has promising prospects in efficient and safe human-robot-interaction systems. Currently, the majority of the human motion prediction algorithms are based on deterministic models, which may lead to risky decisions for robots. To solve this problem, we propose a probabilistic model for human motion prediction in this paper. The key… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted at ICRA 2021

  50. arXiv:2106.10262  [pdf, other

    cs.LG stat.ML

    A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization

    Authors: Xinjie Lan, Kenneth Barner

    Abstract: Recently, Mutual Information (MI) has attracted attention in bounding the generalization error of Deep Neural Networks (DNNs). However, it is intractable to accurately estimate the MI in DNNs, thus most previous works have to relax the MI bound, which in turn weakens the information theoretic explanation for generalization. To address the limitation, this paper introduces a probabilistic represent… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: To appear in the ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI