subscribe to arXiv mailings

Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing

Authors: Jessica Yin, Haozhi Qi, Jitendra Malik, James Pikul, Mark Yim, Tess Hellebrekers

Abstract: Recent progress in reinforcement learning (RL) and tactile sensing has significantly advanced dexterous manipulation. However, these methods often utilize simplified tactile signals due to the gap between tactile simulation and the real world. We introduce a sensor model for tactile skin that enables zero-shot sim-to-real transfer of ternary shear and binary normal forces. Using this model, we dev… ▽ More Recent progress in reinforcement learning (RL) and tactile sensing has significantly advanced dexterous manipulation. However, these methods often utilize simplified tactile signals due to the gap between tactile simulation and the real world. We introduce a sensor model for tactile skin that enables zero-shot sim-to-real transfer of ternary shear and binary normal forces. Using this model, we develop an RL policy that leverages sliding contact for dexterous in-hand translation. We conduct extensive real-world experiments to assess how tactile sensing facilitates policy adaptation to various unseen object properties and robot hand orientations. We demonstrate that our 3-axis tactile policies consistently outperform baselines that use only shear forces, only normal forces, or only proprioception. Website: https://jessicayin.github.io/tactile-skin-rl/ △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Website: https://jessicayin.github.io/tactile-skin-rl/

arXiv:2405.14144 [pdf, other]

A Single Motor Nano Aerial Vehicle with Novel Peer-to-Peer Communication and Sensing Mechanism

Authors: Jingxian Wang, Andrew G. Curtis, Mark Yim, Michael Rubenstein

Abstract: Communication and position sensing are among the most important capabilities for swarm robots to interact with their peers and perform tasks collaboratively. However, the hardware required to facilitate communication and position sensing is often too complicated, expensive, and bulky to be carried on swarm robots. Here we present Maneuverable Piccolissimo 3 (MP3), a minimalist, single motor drone… ▽ More Communication and position sensing are among the most important capabilities for swarm robots to interact with their peers and perform tasks collaboratively. However, the hardware required to facilitate communication and position sensing is often too complicated, expensive, and bulky to be carried on swarm robots. Here we present Maneuverable Piccolissimo 3 (MP3), a minimalist, single motor drone capable of executing inter-robot communication via infrared light and triangulation-based sensing of relative bearing, distance, and elevation using message arrival time. Thanks to its novel design, MP3 can communicate with peers and localize itself using simple components, keeping its size and mass small and making it inherently safe for human interaction. We present the hardware and software design of MP3 and demonstrate its capability to localize itself, fly stably, and maneuver in the environment using peer-to-peer communication and sensing. △ Less

Submitted 3 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.00260 [pdf, other]

CREPE: Coordinate-Aware End-to-End Document Parser

Authors: Yamato Okamoto, Youngmin Baek, Geewook Kim, Ryota Nakao, DongHyun Kim, Moon Bin Yim, Seunghyun Park, Bado Lee

Abstract: In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OC… ▽ More In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OCR text, and token-triggered coordinate decoding. We also proposed a weakly-supervised framework for cost-efficient training, requiring only parsing annotations without high-cost coordinate annotations. Our experimental evaluations demonstrate CREPE's state-of-the-art performances on document parsing tasks. Beyond that, CREPE's adaptability is further highlighted by its successful usage in other document understanding tasks such as layout analysis, document visual question answering, and so one. CREPE's abilities including OCR and semantic parsing not only mitigate error propagation issues in existing OCR-dependent methods, it also significantly enhance the functionality of sequence generation models, ushering in a new era for document understanding studies. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: Accepted at the International Conference on Document Analysis and Recognition (ICDAR 2024) main conference

arXiv:2404.19205 [pdf, other]

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

Authors: Yoonsik Kim, Moonbin Yim, Ka Yeon Song

Abstract: In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA. As such, the primary objective of this paper is to obta… ▽ More In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA. As such, the primary objective of this paper is to obtain these necessary components. Specifically, images are sourced either through the application of a \textit{stylesheet} or by employing the proposed table rendering system. QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table. Ultimately, the completed TableVQA-Bench comprises 1,500 QA pairs. We comprehensively compare the performance of various multi-modal large language models (MLLMs) on TableVQA-Bench. GPT-4V achieves the highest accuracy among commercial and open-sourced MLLMs from our experiments. Moreover, we discover that the number of vision queries plays a significant role in TableVQA performance. To further analyze the capabilities of MLLMs in comparison to their LLM backbones, we investigate by presenting image-formatted tables to MLLMs and text-formatted tables to LLMs, respectively. Our findings suggest that processing visual inputs is more challenging than text inputs, as evidenced by the lower performance of MLLMs, despite generally requiring higher computational costs than LLMs. The proposed TableVQA-Bench and evaluation codes are available at \href{https://github.com/naver-ai/tablevqabench}{https://github.com/naver-ai/tablevqabench}. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Technical Report

arXiv:2404.02265 [pdf, other]

Continuous Sculpting: Persistent Swarm Shape Formation Adaptable to Local Environmental Changes

Authors: Andrew G. Curtis, Mark Yim, Michael Rubenstein

Abstract: Despite their growing popularity, swarms of robots remain limited by the operating time of each individual. We present algorithms which allow a human to sculpt a swarm of robots into a shape that persists in space perpetually, independent of onboard energy constraints such as batteries. Robots generate a path through a shape such that robots cycle in and out of the shape. Robots inside the shape r… ▽ More Despite their growing popularity, swarms of robots remain limited by the operating time of each individual. We present algorithms which allow a human to sculpt a swarm of robots into a shape that persists in space perpetually, independent of onboard energy constraints such as batteries. Robots generate a path through a shape such that robots cycle in and out of the shape. Robots inside the shape react to human initiated changes and adapt the path through the shape accordingly. Robots outside the shape recharge and return to the shape so that the shape can persist indefinitely. The presented algorithms communicate shape changes throughout the swarm using message passing and robot motion. These algorithms enable the swarm to persist through any arbitrary changes to the shape. We describe these algorithms in detail and present their performance in simulation and on a swarm of mobile robots. The result is a swarm behavior more suitable for extended duration, dynamic shape-based tasks in applications such as agriculture and emergency response. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 20 pages, 17 figures

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2211.16611 [pdf, other]

Holonomic Control of Arbitrary Configurations of Docked Modboats

Authors: Zhijie Qiao, Gedaliah Knizhnik, Mark Yim

Abstract: The Modboat is a low-cost, underactuated, modular robot capable of surface swimming, docking to other modules, and undocking from them using only a single motor and two passive flippers. Undocking is achieved by causing intentional self-collision between the tails of neighboring modules in certain configurations; this becomes a challenge, however, when collective swimming as one connected componen… ▽ More The Modboat is a low-cost, underactuated, modular robot capable of surface swimming, docking to other modules, and undocking from them using only a single motor and two passive flippers. Undocking is achieved by causing intentional self-collision between the tails of neighboring modules in certain configurations; this becomes a challenge, however, when collective swimming as one connected component is desirable. Prior work has developed controllers that turn arbitrary configurations of docked Modboats into steerable vehicles, but they cannot counteract lateral forces and disturbances. In this work we present a centralized control strategy to create holonomic vehicles out of arbitrary configurations of docked Modboats using an iterative potential-field based search. We experimentally demonstrate that our controller performs well and can control surge and sway velocities and yaw angle simultaneously. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.07480 [pdf, other]

Electroadhesive Clutches for Programmable Shape Morphing of Soft Actuators

Authors: Gregory M. Campbell, Jessica Yin, Yuyang Song, Umesh Gandhi, Mark Yim, James Pikul

Abstract: Soft robotic actuators are safe and adaptable devices with inherent compliance, which makes them attractive for manipulating delicate and complex objects. Researchers have integrated stiff materials into soft actuators to increase their force capacity and direct their deformation. However, these embedded materials have largely been pre-prescribed and static, which constrains the actuators to a pre… ▽ More Soft robotic actuators are safe and adaptable devices with inherent compliance, which makes them attractive for manipulating delicate and complex objects. Researchers have integrated stiff materials into soft actuators to increase their force capacity and direct their deformation. However, these embedded materials have largely been pre-prescribed and static, which constrains the actuators to a predetermined range of motion. In this work, electroadhesive (EA) clutches integrated on a single-chamber soft pneumatic actuator (SPA) provide local programmable stiffness modulation to control the actuator deformation. We show that activating different clutch patterns inflates a silicone membrane into pyramidal, round, and plateau shapes. Curvatures from these shapes are combined during actuation to apply forces on both a 3.7 g and 820 g object along five different degrees of freedom (DoF). The actuator workspace is up to 12 mm for light objects. Clutch deactivation, which results in local elastomeric expansion, rapidly applies forces up to 3.2 N to an object resting on the surface and launches a 3.7 g object in controlled directions. The actuator also rotates a heavier, 820 g, object by 5 degrees and rapidly restores it to horizontal alignment after clutch deactivation. This actuator is fully powered by a 5 V battery, AA battery, DC-DC transformer, and 4.5 V (63 g) DC air pump. These results demonstrate a first step towards realizing a soft actuator with high DoF shape change that preserves the inherent benefits of pneumatic actuation while gaining the electrical controllability and strength of EA clutches. We envision such a system supplying human contact forces in the form of a low-profile sit-to-stand assistance device, bed-ridden patient manipulator, or other ergonomic mechanism. This technology was also demonstrated at ICRA 2022: https://www.youtube.com/watch?v=6Y6-iHWNi6s △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: This work was presented at IEEE International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2211.03256 [pdf, other]

On Web-based Visual Corpus Construction for Visual Document Understanding

Authors: Donghyun Kim, Teakgyu Hong, Moonbin Yim, Yoonsik Kim, Geewook Kim

Abstract: In recent years, research on visual document understanding (VDU) has grown significantly, with a particular emphasis on the development of self-supervised learning methods. However, one of the significant challenges faced in this field is the limited availability of publicly accessible visual corpora or extensive collections of images with detailed text annotations, particularly for non-Latin or r… ▽ More In recent years, research on visual document understanding (VDU) has grown significantly, with a particular emphasis on the development of self-supervised learning methods. However, one of the significant challenges faced in this field is the limited availability of publicly accessible visual corpora or extensive collections of images with detailed text annotations, particularly for non-Latin or resource-scarce languages. To address this challenge, we propose Web-based Visual Corpus Builder (Webvicob), a dataset generator engine capable of constructing large-scale, multilingual visual corpora from raw Wikipedia HTML dumps. Our experiments demonstrate that the data generated by Webvicob can be used to train robust VDU models that perform well on various downstream tasks, such as DocVQA and post-OCR parsing. Furthermore, when using a dataset of 1 million images generated by Webvicob, we observed an improvement of over 13% on the DocVQA Task 3 compared to a dataset of 11 million images from the IIT-CDIP. The implementation of our engine is publicly available on https://github.com/clovaai/webvicob △ Less

Submitted 2 May, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

Comments: Accepted at ICDAR2023

arXiv:2209.04000 [pdf, other]

Collective Control for Arbitrary Configurations of Docked Modboats

Authors: Gedaliah Knizhnik, Mark Yim

Abstract: The Modboat is a low-cost, underactuated, modular robot capable of surface swimming, docking to other modules, and undocking from them using only a single motor and two passive flippers. Undocking is achieved by causing intentional self-collision between the tails of neighboring modules in certain configurations; this becomes a challenge, however, when collective swimming as one connected componen… ▽ More The Modboat is a low-cost, underactuated, modular robot capable of surface swimming, docking to other modules, and undocking from them using only a single motor and two passive flippers. Undocking is achieved by causing intentional self-collision between the tails of neighboring modules in certain configurations; this becomes a challenge, however, when collective swimming as one connected component is desirable. In this work, we develop a centralized control strategy to allow \textit{arbitrary} configurations of Modboats to swim as a single steerable vehicle and guarantee no accidental undocking. We also present a simplified model for hydrodynamic interactions between boats in a configuration that is tractable for real-time control. We experimentally demonstrate that our controller performs well, is consistent for configurations of various sizes and shapes, and can control both surge velocity and yaw angle simultaneously. Controllability is maintained while swimming, but pure yaw control causes lateral movement that cannot be counteracted by the presented framework. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Comments: 11 pages. Submitted for consideration in the IEEE Transactions on Robotics (T-RO)

arXiv:2204.08586 [pdf, other]

Multimodal Proximity and Visuotactile Sensing With a Selectively Transmissive Soft Membrane

Authors: Jessica Yin, Gregory M. Campbell, James Pikul, Mark Yim

Abstract: The most common sensing modalities found in a robot perception system are vision and touch, which together can provide global and highly localized data for manipulation. However, these sensing modalities often fail to adequately capture the behavior of target objects during the critical moments as they transition out of static, controlled contact with an end-effector to dynamic and uncontrolled mo… ▽ More The most common sensing modalities found in a robot perception system are vision and touch, which together can provide global and highly localized data for manipulation. However, these sensing modalities often fail to adequately capture the behavior of target objects during the critical moments as they transition out of static, controlled contact with an end-effector to dynamic and uncontrolled motion. In this work, we present a novel multimodal visuotactile sensor that provides simultaneous visuotactile and proximity depth data. The sensor integrates an RGB camera and air pressure sensor to sense touch with an infrared time-of-flight (ToF) camera to sense proximity by leveraging a selectively transmissive soft membrane to enable the dual sensing modalities. We present the mechanical design, fabrication techniques, algorithm implementations, and evaluation of the sensor's tactile and proximity modalities. The sensor is demonstrated in three open-loop robotic tasks: approaching and contacting an object, catching, and throwing. The fusion of tactile and proximity data could be used to capture key information about a target object's transition behavior for sensor-based control in dynamic manipulation. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: Accepted to IEEE International Conference on Soft Robotics (RoboSoft) 2022

arXiv:2203.00795 [pdf, other]

doi 10.1109/ICRA46639.2022.9812381

Amplitude Control for Parallel Lattices of Docked Modboats

Authors: Gedaliah Knizhnik, Mark Yim

Abstract: The Modboat is a low-cost, underactuated, modular robot capable of surface swimming. It is able to swim individually, dock to other Modboats, and undock from them using only a single motor and two passive flippers. Undocking without additional actuation is achieved by causing intentional self-collision between the tails of neighboring modules; this becomes a challenge when group swimming as one co… ▽ More The Modboat is a low-cost, underactuated, modular robot capable of surface swimming. It is able to swim individually, dock to other Modboats, and undock from them using only a single motor and two passive flippers. Undocking without additional actuation is achieved by causing intentional self-collision between the tails of neighboring modules; this becomes a challenge when group swimming as one connected component is desirable. In this work, we develop a control strategy to allow parallel lattices of Modboats to swim as a single unit, which conventionally requires holonomic modules. We show that the control strategy is guaranteed to avoid unintentional undocking and minimizes internal forces within the lattice. Experimental verification shows that the controller performs well and is consistent for lattices of various sizes. Controllability is maintained while swimming, but pure yaw control causes lateral movement that cannot be counteracted by the presented framework. △ Less

Submitted 21 July, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 7 pages. Accepted to the 2022 International Conference on Robotics and Automation (ICRA)

arXiv:2201.03719 [pdf, other]

doi 10.1115/1.4050249

A Low-Cost, Highly Customizable Solution for Position Estimation in Modular Robots

Authors: Chao Liu, Tarik Tosun, Mark Yim

Abstract: Accurate position sensing is important for state estimation and control in robotics. Reliable and accurate position sensors are usually expensive and difficult to customize. Incorporating them into systems that have very tight volume constraints such as modular robots are particularly difficult. PaintPots are low-cost, reliable, and highly customizable position sensors, but their performance is hi… ▽ More Accurate position sensing is important for state estimation and control in robotics. Reliable and accurate position sensors are usually expensive and difficult to customize. Incorporating them into systems that have very tight volume constraints such as modular robots are particularly difficult. PaintPots are low-cost, reliable, and highly customizable position sensors, but their performance is highly dependent on the manufacturing and calibration process. This paper presents a Kalman filter with a simplified observation model developed to deal with the non-linearity issues that result in the use of low-cost microcontrollers. In addition, a complete solution for the use of PaintPots in a variety of sensing modalities including manufacturing, characterization, and estimation is presented for an example modular robot, SMORES-EP. This solution can be easily adapted to a wide range of applications. △ Less

Submitted 10 January, 2022; originally announced January 2022.

Comments: 10 pages, 28 figures

Journal ref: ASME. J. Mechanisms Robotics. December 2021; 13(6): 061004

arXiv:2111.15664 [pdf, other]

OCR-free Document Understanding Transformer

Authors: Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park

Abstract: Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such… ▽ More Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model and synthetic data are available at https://github.com/clovaai/donut. △ Less

Submitted 6 October, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

Comments: ECCV 2022. (v5) update table 2 and figures; add LayoutLM and update scores with the latest test script at https://github.com/clovaai/donut

arXiv:2109.15278 [pdf, other]

Coverage Control in Multi-Robot Systems via Graph Neural Networks

Authors: Walker Gosrich, Siddharth Mayya, Rebecca Li, James Paulos, Mark Yim, Alejandro Ribeiro, Vijay Kumar

Abstract: This paper develops a decentralized approach to mobile sensor coverage by a multi-robot system. We consider a scenario where a team of robots with limited sensing range must position itself to effectively detect events of interest in a region characterized by areas of varying importance. Towards this end, we develop a decentralized control policy for the robots -- realized via a Graph Neural Netwo… ▽ More This paper develops a decentralized approach to mobile sensor coverage by a multi-robot system. We consider a scenario where a team of robots with limited sensing range must position itself to effectively detect events of interest in a region characterized by areas of varying importance. Towards this end, we develop a decentralized control policy for the robots -- realized via a Graph Neural Network -- which uses inter-robot communication to leverage non-local information for control decisions. By explicitly sharing information between multi-hop neighbors, the decentralized controller achieves a higher quality of coverage when compared to classical approaches that do not communicate and leverage only local information available to each robot. Simulated experiments demonstrate the efficacy of multi-hop communication for multi-robot coverage and evaluate the scalability and transferability of the learning-based controllers. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2109.00662 [pdf, other]

Quori: A Community-Informed Design of a Socially Interactive Humanoid Robot

Authors: Andrew Specian, Ross Mead, Simon Kim, Maja Matarić, Mark Yim

Abstract: Hardware platforms for socially interactive robotics can be limited by cost or lack of functionality. This paper presents the overall system -- design, hardware, and software -- for Quori, a novel, affordable, socially interactive humanoid robot platform for facilitating non-contact human-robot interaction (HRI) research. The design of the system is motivated by feedback sampled from the HRI resea… ▽ More Hardware platforms for socially interactive robotics can be limited by cost or lack of functionality. This paper presents the overall system -- design, hardware, and software -- for Quori, a novel, affordable, socially interactive humanoid robot platform for facilitating non-contact human-robot interaction (HRI) research. The design of the system is motivated by feedback sampled from the HRI research community. The overall design maintains a balance of affordability and functionality. Initial Quori testing and a six-month deployment are presented. Ten Quori platforms have been awarded to a diverse group of researchers from across the United States to facilitate HRI research to build a community database from a common platform. △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: 20 pages. 21 figures. This was accepted to and will be published to the IEEE Transactions on Robotics Journal

arXiv:2108.00309 [pdf, other]

doi 10.1109/TRO.2022.3228400

Motion Planning for Variable Topology Trusses: Reconfiguration and Locomotion

Authors: Chao Liu, Sencheng Yu, Mark Yim

Abstract: Truss robots are highly redundant parallel robotic systems that can be applied in a variety of scenarios. The variable topology truss (VTT) is a class of modular truss robots. As self-reconfigurable modular robots, a VTT is composed of many edge modules that can be rearranged into various structures depending on the task. These robots change their shape by not only controlling joint positions as w… ▽ More Truss robots are highly redundant parallel robotic systems that can be applied in a variety of scenarios. The variable topology truss (VTT) is a class of modular truss robots. As self-reconfigurable modular robots, a VTT is composed of many edge modules that can be rearranged into various structures depending on the task. These robots change their shape by not only controlling joint positions as with fixed morphology robots, but also reconfiguring the connectivity between truss members in order to change their topology. The motion planning problem for VTT robots is difficult due to their varying morphology, high dimensionality, the high likelihood for self-collision, and complex motion constraints. In this paper, a new motion planning framework to dramatically alter the structure of a VTT is presented. It can also be used to solve locomotion tasks that are much more efficient compared with previous work. Several test scenarios are used to show its effectiveness. Supplementary materials are available at https://www.modlabupenn.org/vtt-motion-planning/. △ Less

Submitted 24 September, 2023; v1 submitted 31 July, 2021; originally announced August 2021.

Comments: 20 pages, 36 figures

Journal ref: IEEE Transactions on Robotics, vol. 39, no. 3, pp. 2020-2039, June 2023

arXiv:2107.13055 [pdf, other]

doi 10.1109/IROS51168.2021.9636778

Thrust Direction Control of an Underactuated Oscillating Swimming Robot

Authors: Gedaliah Knizhnik, Mark Yim

Abstract: The Modboat is an autonomous surface robot that turns the oscillation of a single motor into a controlled paddling motion through passive flippers. Inertial control methods developed in prior work can successfully drive the Modboat along trajectories and enable docking to neighboring modules, but have a non-constant cycle time and cannot react to dynamic environments. In this work we present a thr… ▽ More The Modboat is an autonomous surface robot that turns the oscillation of a single motor into a controlled paddling motion through passive flippers. Inertial control methods developed in prior work can successfully drive the Modboat along trajectories and enable docking to neighboring modules, but have a non-constant cycle time and cannot react to dynamic environments. In this work we present a thrust direction control method for the Modboat that significantly improves the time-response of the system and increases the accuracy with which it can be controlled. We experimentally demonstrate that this method can be used to perform more compact maneuvers than prior methods or comparable robots can. We also present an extension to the controller that solves the reaction wheel problem of unbounded actuator velocity, and show that it further improves performance. △ Less

Submitted 3 February, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

Comments: 6 pages. Published in and presented at the 2021 IEE/RSJ International Conference on Intelligent Robots and Systems

arXiv:2107.11041 [pdf, other]

RewriteNet: Reliable Scene Text Editing with Implicit Decomposition of Text Contents and Styles

Authors: Junyeop Lee, Yoonsik Kim, Seonghyeon Kim, Moonbin Yim, Seung Shin, Gayoung Lee, Sungrae Park

Abstract: Scene text editing (STE), which converts a text in a scene image into the desired text while preserving an original style, is a challenging task due to a complex intervention between text and style. In this paper, we propose a novel STE model, referred to as RewriteNet, that decomposes text images into content and style features and re-writes a text in the original image. Specifically, RewriteNet… ▽ More Scene text editing (STE), which converts a text in a scene image into the desired text while preserving an original style, is a challenging task due to a complex intervention between text and style. In this paper, we propose a novel STE model, referred to as RewriteNet, that decomposes text images into content and style features and re-writes a text in the original image. Specifically, RewriteNet implicitly distinguishes the content from the style by introducing scene text recognition. Additionally, independent of the exact supervisions with synthetic examples, we propose a self-supervised training scheme for unlabeled real-world images, which bridges the domain gap between synthetic and real data. Our experiments present that RewriteNet achieves better generation performances than other comparisons. Further analysis proves the feature decomposition of RewriteNet and demonstrates the reliability and robustness through diverse experiments. Our implementation is publicly available at \url{https://github.com/clovaai/rewritenet} △ Less

Submitted 2 May, 2022; v1 submitted 23 July, 2021; originally announced July 2021.

Comments: CVPRW 2022 - AI for Content Creation Workshop

arXiv:2107.09313 [pdf, other]

SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models

Authors: Moonbin Yim, Yoonsik Kim, Han-Cheol Cho, Sungrae Park

Abstract: For successful scene text recognition (STR) models, synthetic text image generators have alleviated the lack of annotated text images from the real world. Specifically, they generate multiple text images with diverse backgrounds, font styles, and text shapes and enable STR models to learn visual patterns that might not be accessible from manually annotated data. In this paper, we introduce a new s… ▽ More For successful scene text recognition (STR) models, synthetic text image generators have alleviated the lack of annotated text images from the real world. Specifically, they generate multiple text images with diverse backgrounds, font styles, and text shapes and enable STR models to learn visual patterns that might not be accessible from manually annotated data. In this paper, we introduce a new synthetic text image generator, SynthTIGER, by analyzing techniques used for text image synthesis and integrating effective ones under a single algorithm. Moreover, we propose two techniques that alleviate the long-tail problem in length and character distributions of training data. In our experiments, SynthTIGER achieves better STR performance than the combination of synthetic datasets, MJSynth (MJ) and SynthText (ST). Our ablation study demonstrates the benefits of using sub-components of SynthTIGER and the guideline on generating synthetic text images for STR models. Our implementation is publicly available at https://github.com/clovaai/synthtiger. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: Accepted at ICDAR 2021, 16 pages, 6 figures

arXiv:2104.02755 [pdf, other]

doi 10.35708/RC1870-126268

A Quadratic Programming Approach to Manipulation in Real-Time Using Modular Robots

Authors: Chao Liu, Mark Yim

Abstract: Motion planning in high-dimensional space is a challenging task. In order to perform dexterous manipulation in an unstructured environment, a robot with many degrees of freedom is usually necessary, which also complicates its motion planning problem. Real-time control brings about more difficulties in which robots have to maintain the stability while moving towards the target. Redundant systems ar… ▽ More Motion planning in high-dimensional space is a challenging task. In order to perform dexterous manipulation in an unstructured environment, a robot with many degrees of freedom is usually necessary, which also complicates its motion planning problem. Real-time control brings about more difficulties in which robots have to maintain the stability while moving towards the target. Redundant systems are common in modular robots that consist of multiple modules and are able to transformed into different configurations with respect to different needs. Different from robots with fixed geometry or configurations, the kinematics model of a modular robotic system can alter as the robot reconfigures itself, and developing a generic control and motion planning approach for such systems is difficult, especially when multiple motion goals are coupled. A new manipulation planning framework is developed in this paper. The problem is formulated as a sequential linearly constrained quadratic program (QP) that can be solved efficiently. Some constraints can be incorporated into this QP, including a novel way to approximate environment obstacles. This solution can be used directly for real-time applications or as an off-line planning tool, and it is validated and demonstrated on the CKBot and SMORES-EP modular robot platforms. △ Less

Submitted 31 July, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: 25 pages, 20 figures

Journal ref: International Journal of Robotic Computing, Vol. 3, No. 1, (2021) 121-145

arXiv:2104.00800 [pdf, other]

doi 10.1007/s10514-022-10078-1

SMORES-EP, a Modular Robot with Parallel Self-assembly

Authors: Chao Liu, Qian Lin, Hyun Kim, Mark Yim

Abstract: Self-assembly of modular robotic systems enables the construction of complex robotic configurations to adapt to different tasks. This paper presents a framework for SMORES types of modular robots to efficiently self-assemble into tree topologies. These modular robots form kinematic chains that have been shown to be capable of a large variety of manipulation and locomotion tasks, yet they can recon… ▽ More Self-assembly of modular robotic systems enables the construction of complex robotic configurations to adapt to different tasks. This paper presents a framework for SMORES types of modular robots to efficiently self-assemble into tree topologies. These modular robots form kinematic chains that have been shown to be capable of a large variety of manipulation and locomotion tasks, yet they can reconfigure using a mobile reconfiguration. A desired kinematic topology can be mapped onto a planar pattern with optimal module assignment based on the modules' locations, then the mobile reconfiguration assembly process can be executed in parallel. A docking controller is developed to guarantee the success of docking processes. A hybrid control architecture is designed to handle a large number of modules and complex behaviors of each individual, and achieve efficient and robust self-assembly actions. The framework is demonstrated in both hardware and simulation on the SMORES-EP platform. △ Less

Submitted 2 January, 2023; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: 18 pages, 20 figures. Auton Robot (2022)

Journal ref: Autonomous Robots volume 47, pages 211-228 (2023)

arXiv:2102.12909 [pdf, other]

doi 10.1109/ICRA48506.2021.9562033

Docking and Undocking a Modular Underactuated Oscillating Swimming Robot

Authors: Gedaliah Knizhnik, Mark Yim

Abstract: We describe a docking mechanism and strategy to allow modular self-assembly for the Modboat: an inexpensive underactuated oscillating swimming robot powered by a single motor. Because propulsion is achieved through oscillation, orientation can be controlled only in the average; this complicates docking, which requires precise position and orientation control. Given these challenges, we present a d… ▽ More We describe a docking mechanism and strategy to allow modular self-assembly for the Modboat: an inexpensive underactuated oscillating swimming robot powered by a single motor. Because propulsion is achieved through oscillation, orientation can be controlled only in the average; this complicates docking, which requires precise position and orientation control. Given these challenges, we present a docking strategy and a motion primitive for controlling orientation, and show that this strategy allows successful docking in multiple configurations. Moreover, we demonstrate that the Modboat is also capable of undocking and changing its dock configuration, all without any additional actuation. This is unique among similar modular robotic systems. △ Less

Submitted 28 October, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

Comments: 6 pages. Submitted to the 2021 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2101.05201 [pdf, other]

doi 10.3389/fams.2021.651467

Optimisation of Spectral Wavelets for Persistence-based Graph Classification

Authors: Ka Man Yim, Jacob Leygonie

Abstract: A graph's spectral wavelet signature determines a filtration, and consequently an associated set of extended persistence diagrams. We propose a framework that optimises the choice of wavelet for a dataset of graphs, such that their associated persistence diagrams capture features of the graphs that are best suited to a given data science problem. Since the spectral wavelet signature of a graph is… ▽ More A graph's spectral wavelet signature determines a filtration, and consequently an associated set of extended persistence diagrams. We propose a framework that optimises the choice of wavelet for a dataset of graphs, such that their associated persistence diagrams capture features of the graphs that are best suited to a given data science problem. Since the spectral wavelet signature of a graph is derived from its Laplacian, our framework encodes geometric properties of graphs in their associated persistence diagrams and can be applied to graphs without a priori node attributes. We apply our framework to graph classification problems and obtain performances competitive with other persistence-based architectures. To provide the underlying theoretical foundations, we extend the differentiability result for ordinary persistent homology to extended persistent homology. △ Less

Submitted 1 March, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

arXiv:2002.01918 [pdf, other]

doi 10.1109/UR49135.2020.9144872

Design and Experiments with a Low-Cost Single-Motor Modular Aquatic Robot

Authors: Gedaliah Knizhnik, Mark Yim

Abstract: We present a novel design for a low-cost robotic boat powered by a single actuator, useful for both modular and swarming applications. The boat uses the conservation of angular momentum and passive flippers to convert the motion of a single motor into an adjustable paddling motion for propulsion and steering. We develop design criteria for modularity and swarming and present a prototype implementi… ▽ More We present a novel design for a low-cost robotic boat powered by a single actuator, useful for both modular and swarming applications. The boat uses the conservation of angular momentum and passive flippers to convert the motion of a single motor into an adjustable paddling motion for propulsion and steering. We develop design criteria for modularity and swarming and present a prototype implementing these criteria. We identify significant mechanical sensitivities with the presented design, theorize about the cause of the sensitivities, and present an improved design for future work. △ Less

Submitted 5 May, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

Comments: Accepted to the International Conference on Ubiquitous Robots (UR 2020). 8 pages

arXiv:1812.04190 [pdf, other]

Optimal Structure Synthesis for Environment Augmenting Robots

Authors: Tarik Tosun, Cynthia Sung, Colin McCloskey, Mark Yim

Abstract: Building structures can allow a robot to surmount large obstacles, expanding the set of areas it can reach. This paper presents a planning algorithm to automatically determine what structures a construction-capable robot must build in order to traverse its entire environment. Given an environment, a set of building blocks, and a robot capable of building structures, we seek a optimal set of struct… ▽ More Building structures can allow a robot to surmount large obstacles, expanding the set of areas it can reach. This paper presents a planning algorithm to automatically determine what structures a construction-capable robot must build in order to traverse its entire environment. Given an environment, a set of building blocks, and a robot capable of building structures, we seek a optimal set of structures (using a minimum number of building blocks) that could be built to make the entire environment traversable with respect to the robot's movement capabilities. We show that this problem is NP-Hard, and present a complete, optimal algorithm that solves it using a branch-and-bound strategy. The algorithm runs in exponential time in the worst case, but solves typical problems with practical speed. In hardware experiments, we show that the algorithm solves 3D maps of real indoor environments in about one minute, and that the structures selected by the algorithm allow a robot to traverse the entire environment. An accompanying video is available online at https://youtu.be/B9WM557NP44. △ Less

Submitted 10 December, 2018; originally announced December 2018.

arXiv:1712.02299 [pdf, other]

doi 10.1007/s10514-018-9738-1

Accomplishing High-Level Tasks with Modular Robots

Authors: Gangyuan Jing, Tarik Tosun, Mark Yim, Hadas Kress-Gazit

Abstract: The advantage of modular self-reconfigurable robot systems is their flexibility, but this advantage can only be realized if appropriate configurations (shapes) and behaviors (controlling programs) can be selected for a given task. In this paper, we present an integrated system for addressing high-level tasks with modular robots, and demonstrate that it is capable of accomplishing challenging, mult… ▽ More The advantage of modular self-reconfigurable robot systems is their flexibility, but this advantage can only be realized if appropriate configurations (shapes) and behaviors (controlling programs) can be selected for a given task. In this paper, we present an integrated system for addressing high-level tasks with modular robots, and demonstrate that it is capable of accomplishing challenging, multi-part tasks in hardware experiments. The system consists of four tightly integrated components: (1) A high-level mission planner, (2) A large design library spanning a wide set of functionality, (3) A design and simulation tool for populating the library with new configurations and behaviors, and (4) modular robot hardware. This paper builds on earlier work by the authors, extending the original system to include environmentally adaptive parametric behaviors, which integrate motion planners and feedback controllers with the system. △ Less

Submitted 1 May, 2018; v1 submitted 6 December, 2017; originally announced December 2017.

Comments: Published in Autonomous Robots, 2018. 18 pages

arXiv:1710.01840 [pdf, other]

Perception-Informed Autonomous Environment Augmentation With Modular Robots

Authors: Tarik Tosun, Jonathan Daudelin, Gangyuan Jing, Hadas Kress-Gazit, Mark Campbell, Mark Yim

Abstract: We present a system enabling a modular robot to autonomously build structures in order to accomplish high-level tasks. Building structures allows the robot to surmount large obstacles, expanding the set of tasks it can perform. This addresses a common weakness of modular robot systems, which often struggle to traverse large obstacles. This paper presents the hardware, perception, and planning to… ▽ More We present a system enabling a modular robot to autonomously build structures in order to accomplish high-level tasks. Building structures allows the robot to surmount large obstacles, expanding the set of tasks it can perform. This addresses a common weakness of modular robot systems, which often struggle to traverse large obstacles. This paper presents the hardware, perception, and planning tools that comprise our system. An environment characterization algorithm identifies features in the environment that can be augmented to create a path between two disconnected regions of the environment. Specially-designed building blocks enable the robot to create structures that can augment the environment to make obstacles traversable. A high-level planner reasons about the task, robot locomotion capabilities, and environment to decide if and where to augment the environment in order to perform the desired task. We validate our system in hardware experiments △ Less

Submitted 1 March, 2018; v1 submitted 4 October, 2017; originally announced October 2017.

Comments: 2018 IEEE International Conference on Robotics and Automation (ICRA). 7 pages

arXiv:1709.05435 [pdf, other]

doi 10.1126/scirobotics.aat4983

An Integrated System for Perception-Driven Autonomy with Modular Robots

Authors: Jonathan Daudelin, Gangyuan Jing, Tarik Tosun, Mark Yim, Hadas Kress-Gazit, Mark Campbell

Abstract: The theoretical ability of modular robots to reconfigure in response to complex tasks in a priori unknown environments has frequently been cited as an advantage and remains a major motivator for work in the field. We present a modular robot system capable of autonomously completing high-level tasks by reactively reconfiguring to meet the needs of a perceived, a priori unknown environment. The syst… ▽ More The theoretical ability of modular robots to reconfigure in response to complex tasks in a priori unknown environments has frequently been cited as an advantage and remains a major motivator for work in the field. We present a modular robot system capable of autonomously completing high-level tasks by reactively reconfiguring to meet the needs of a perceived, a priori unknown environment. The system integrates perception, high-level planning, and modular hardware, and is validated in three hardware demonstrations. Given a high-level task specification, a modular robot autonomously explores an unknown environment, decides when and how to reconfigure, and manipulates objects to complete its task. The system architecture balances distributed mechanical elements with centralized perception, planning, and control. By providing an example of how a modular robot system can be designed to leverage reactive reconfigurability in unknown environments, we have begun to lay the groundwork for modular self-reconfigurable robots to address tasks in the real world. △ Less

Submitted 13 December, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

Comments: Published article available at: http://robotics.sciencemag.org/cgi/content/full/3/23/eaat4983?ijkey=iBq7yW7Z8vmjE&keytype=ref&siteid=robotics

Journal ref: Science Robotics 31 Oct 2018. Vol. 3, Issue 23, eeat4983

Showing 1–29 of 29 results for author: Yim, M