subscribe to arXiv mailings

Turkish Delights: a Dataset on Turkish Euphemisms

Authors: Hasan Can Biyik, Patrick Lee, Anna Feldman

Abstract: Euphemisms are a form of figurative language relatively understudied in natural language processing. This research extends the current computational work on potentially euphemistic terms (PETs) to Turkish. We introduce the Turkish PET dataset, the first available of its kind in the field. By creating a list of euphemisms in Turkish, collecting example contexts, and annotating them, we provide both… ▽ More Euphemisms are a form of figurative language relatively understudied in natural language processing. This research extends the current computational work on potentially euphemistic terms (PETs) to Turkish. We introduce the Turkish PET dataset, the first available of its kind in the field. By creating a list of euphemisms in Turkish, collecting example contexts, and annotating them, we provide both euphemistic and non-euphemistic examples of PETs in Turkish. We describe the dataset and methodologies, and also experiment with transformer-based models on Turkish euphemism detection by using our dataset for binary classification. We compare performances across models using F1, accuracy, and precision as evaluation metrics. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: In Proceedings of The First SIGTURK workshop co-located with ACL 2024: https://sigturk.github.io/workshop/

arXiv:2405.16711 [pdf, other]

doi 10.1145/3643834.3661576

The AI-DEC: A Card-based Design Method for User-centered AI Explanations

Authors: Christine P Lee, Min Kyung Lee, Bilge Mutlu

Abstract: Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four di… ▽ More Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four dimensions of AI explanations that are critical for the integration of AI systems -- communication content, modality, frequency, and direction -- and offers design examples for end-users to design AI explanations that meet their needs. We evaluated this method through co-design sessions with workers in healthcare, finance, and management industries who regularly use AI systems in their daily work. Findings indicate that the AI-DEC effectively supported workers in designing explanations that accommodated diverse levels of performance and autonomy needs, which varied depending on the AI system's workplace role and worker values. We discuss the implications of using the AI-DEC for the user-centered design of AI explanations in real-world systems. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Journal ref: Designing Interactive Systems Conference, 2024, (DIS '24)

arXiv:2405.16710 [pdf, other]

doi 10.1145/3643834.3661559

REX: Designing User-centered Repair and Explanations to Address Robot Failures

Authors: Christine P Lee, Pragathi Praveena, Bilge Mutlu

Abstract: Robots in real-world environments continuously engage with multiple users and encounter changes that lead to unexpected conflicts in fulfilling user requests. Recent technical advancements (e.g., large-language models (LLMs), program synthesis) offer various methods for automatically generating repair plans that address such conflicts. In this work, we understand how automated repair and explanati… ▽ More Robots in real-world environments continuously engage with multiple users and encounter changes that lead to unexpected conflicts in fulfilling user requests. Recent technical advancements (e.g., large-language models (LLMs), program synthesis) offer various methods for automatically generating repair plans that address such conflicts. In this work, we understand how automated repair and explanations can be designed to improve user experience with robot failures through two user studies. In our first, online study ($n=162$), users expressed increased trust, satisfaction, and utility with the robot performing automated repair and explanations. However, we also identified risk factors -- safety, privacy, and complexity -- that require adaptive repair strategies. The second, in-person study ($n=24$) elucidated distinct repair and explanation strategies depending on the level of risk severity and type. Using a design-based approach, we explore automated repair with explanations as a solution for robots to handle conflicts and failures, complemented by adaptive strategies for risk factors. Finally, we discuss the implications of incorporating such strategies into robot designs to achieve seamless operation among changing user needs and environments. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Journal ref: Designing Interactive Systems Conference, 2024, (DIS '24)

arXiv:2403.14268 [pdf]

Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints

Authors: PeiYing Lee, HauYun Guo, Berlin Chen

Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Tra… ▽ More End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Transformer encoders at the lower layer of EEND-EDA model to enhance the effect of self-attention modules using speaker activity information. The results evaluated on public dataset Mini LibriSpeech, demonstrates the effectiveness of the work, reducing Diarization Error Rate from 30.95% to 28.17%. We will release the source code on GitHub to allow further research and reproducibility. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

Report number: TAAI2023-Domestic-131

arXiv:2403.13589 [pdf, other]

ReGround: Improving Textual and Spatial Grounding at No Cost

Authors: Phillip Y. Lee, Minhyuk Sung

Abstract: When an image generation process is guided by both a text prompt and spatial cues, such as a set of bounding boxes, do these elements work in harmony, or does one dominate the other? Our analysis of a pretrained image diffusion model that integrates gated self-attention into the U-Net reveals that spatial grounding often outweighs textual grounding due to the sequential flow from gated self-attent… ▽ More When an image generation process is guided by both a text prompt and spatial cues, such as a set of bounding boxes, do these elements work in harmony, or does one dominate the other? Our analysis of a pretrained image diffusion model that integrates gated self-attention into the U-Net reveals that spatial grounding often outweighs textual grounding due to the sequential flow from gated self-attention to cross-attention. We demonstrate that such bias can be significantly mitigated without sacrificing accuracy in either grounding by simply rewiring the network architecture, changing from sequential to parallel for gated self-attention and cross-attention. This surprisingly simple yet effective solution does not require any fine-tuning of the network but significantly reduces the trade-off between the two groundings. Our experiments demonstrate significant improvements from the original GLIGEN to the rewired version in the trade-off between textual grounding and spatial grounding. △ Less

Submitted 19 July, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted to ECCV 2024. Project page: https://re-ground.github.io/

arXiv:2403.03230 [pdf, other]

Large language models surpass human experts in predicting neuroscience results

Authors: Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata , et al. (14 additional authors not shown)

Abstract: Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created Brain… ▽ More Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors. △ Less

Submitted 21 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.17963 [pdf, other]

The Design and Implementation of a High-Performance Log-Structured RAID System for ZNS SSDs

Authors: Jinhong Li, Qiuping Wang, Shujie Han, Patrick P. C. Lee

Abstract: Zoned Namespace (ZNS) defines a new abstraction for host software to flexibly manage storage in flash-based SSDs as append-only zones. It also provides a Zone Append primitive to further boost the write performance of ZNS SSDs by exploiting intra-zone parallelism. However, making Zone Append effective for reliable and scalable storage, in the form of a RAID array of multiple ZNS SSDs, is non-trivi… ▽ More Zoned Namespace (ZNS) defines a new abstraction for host software to flexibly manage storage in flash-based SSDs as append-only zones. It also provides a Zone Append primitive to further boost the write performance of ZNS SSDs by exploiting intra-zone parallelism. However, making Zone Append effective for reliable and scalable storage, in the form of a RAID array of multiple ZNS SSDs, is non-trivial since Zone Append offloads address management to ZNS SSDs and requires hosts to dedicatedly manage RAID stripes across multiple drives. We propose ZapRAID, a high-performance log-structured RAID system for ZNS SSDs by carefully exploiting Zone Append to achieve high write parallelism and lightweight stripe management. ZapRAID adopts a group-based data layout with a coarse-grained ordering across multiple groups of stripes, such that it can use small-size metadata for stripe management on a per-group basis under Zone Append. It further adopts hybrid data management to simultaneously achieve intra-zone and inter-zone parallelism through a careful combination of both Zone Append and Zone Write primitives. We evaluate ZapRAID using microbenchmarks, trace-driven experiments, and real-application experiments. Our evaluation results show that ZapRAID achieves high write throughput and maintains high performance in normal reads, degraded reads, crash recovery, and full-drive recovery. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 29 pages

ACM Class: C.4; C.5.0

arXiv:2402.06124 [pdf, other]

Teleoscope: Exploring Themes in Large Document Sets By Example

Authors: Paul Bucci, Leo Foord-Kelcey, Patrick Yung Kang Lee, Alamjeet Singh, Ivan Beschastnikh

Abstract: Qualitative thematic exploration of data by hand does not scale and researchers create and update a personalized point of view as they explore data. As a result, machine learning (ML) approaches that might help with exploration are challenging to apply. We developed Teleoscope, a web-based system that supports interactive exploration of large corpora (100K-1M) of short documents (1-3 paragraphs).… ▽ More Qualitative thematic exploration of data by hand does not scale and researchers create and update a personalized point of view as they explore data. As a result, machine learning (ML) approaches that might help with exploration are challenging to apply. We developed Teleoscope, a web-based system that supports interactive exploration of large corpora (100K-1M) of short documents (1-3 paragraphs). Teleoscope provides visual programming workflows that have semantic and computational meaning; helping researchers to retrace, share, and recompute their sense-making process. Attempting to create qualitative "themes" rather than "topics," our NLP approach tunes an ML model to "think like you" without significant retraining. Here, we present our two-year design process and validation of Teleoscope, including a multi-week study with qualitative researchers (N = 5), a six-month field deployment with a qualitative research group, and an on-going public release. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 28 pages, 9 figures, pre-print

arXiv:2401.14838 [pdf, other]

Multi-modality action recognition based on dual feature shift in vehicle cabin monitoring

Authors: Dan Lin, Philip Hann Yung Lee, Yiming Li, Ruoyu Wang, Kim-Hui Yap, Bingbing Li, You Shing Ngim

Abstract: Driver Action Recognition (DAR) is crucial in vehicle cabin monitoring systems. In real-world applications, it is common for vehicle cabins to be equipped with cameras featuring different modalities. However, multi-modality fusion strategies for the DAR task within car cabins have rarely been studied. In this paper, we propose a novel yet efficient multi-modality driver action recognition method b… ▽ More Driver Action Recognition (DAR) is crucial in vehicle cabin monitoring systems. In real-world applications, it is common for vehicle cabins to be equipped with cameras featuring different modalities. However, multi-modality fusion strategies for the DAR task within car cabins have rarely been studied. In this paper, we propose a novel yet efficient multi-modality driver action recognition method based on dual feature shift, named DFS. DFS first integrates complementary features across modalities by performing modality feature interaction. Meanwhile, DFS achieves the neighbour feature propagation within single modalities, by feature shifting among temporal frames. To learn common patterns and improve model efficiency, DFS shares feature extracting stages among multiple modalities. Extensive experiments have been carried out to verify the effectiveness of the proposed DFS model on the Drive\&Act dataset. The results demonstrate that DFS achieves good performance and improves the efficiency of multi-modality driver action recognition. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.14526 [pdf, other]

MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Authors: Patrick Lee, Alain Chirino Trujillo, Diana Cuevas Plancarte, Olumide Ebenezer Ojo, Xinyi Liu, Iyanuoluwa Shode, Yuan Zhao, Jing Peng, Anna Feldman

Abstract: This study investigates the computational processing of euphemisms, a universal linguistic phenomenon, across multiple languages. We train a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases… ▽ More This study investigates the computational processing of euphemisms, a universal linguistic phenomenon, across multiple languages. We train a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic "categories" such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.13643 [pdf, ps, other]

doi 10.1145/3613905.3638195

Design, Development, and Deployment of Context-Adaptive AI Systems for Enhanced End-User Adoption

Authors: Christine P Lee

Abstract: My research centers on the development of context-adaptive AI systems to improve end-user adoption through the integration of technical methods. I deploy these AI systems across various interaction modalities, including user interfaces and embodied agents like robots, to expand their practical applicability. My research unfolds in three key stages: design, development, and deployment. In the desig… ▽ More My research centers on the development of context-adaptive AI systems to improve end-user adoption through the integration of technical methods. I deploy these AI systems across various interaction modalities, including user interfaces and embodied agents like robots, to expand their practical applicability. My research unfolds in three key stages: design, development, and deployment. In the design phase, user-centered approaches were used to understand user experiences with AI systems and create design tools for user participation in crafting AI explanations. In the ongoing development stage, a safety-guaranteed AI system for a robot agent was created to automatically provide adaptive solutions and explanations for unforeseen scenarios. The next steps will involve the implementation and evaluation of context-adaptive AI systems in various interaction forms. I seek to prioritize human needs in technology development, creating AI systems that tangibly benefit end-users in real-world applications and enhance interaction experiences. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 5 pages

Journal ref: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), May 11--16, 2024, Honolulu, HI, USA

arXiv:2401.03217 [pdf, other]

doi 10.1145/3610977.3634966

Understanding Large-Language Model (LLM)-powered Human-Robot Interaction

Authors: Callie Y. Kim, Christine P. Lee, Bilge Mutlu

Abstract: Large-language models (LLMs) hold significant promise in improving human-robot interaction, offering advanced conversational skills and versatility in managing diverse, open-ended user requests in various tasks and domains. Despite the potential to transform human-robot interaction, very little is known about the distinctive design requirements for utilizing LLMs in robots, which may differ from t… ▽ More Large-language models (LLMs) hold significant promise in improving human-robot interaction, offering advanced conversational skills and versatility in managing diverse, open-ended user requests in various tasks and domains. Despite the potential to transform human-robot interaction, very little is known about the distinctive design requirements for utilizing LLMs in robots, which may differ from text and voice interaction and vary by task and context. To better understand these requirements, we conducted a user study (n = 32) comparing an LLM-powered social robot against text- and voice-based agents, analyzing task-based requirements in conversational tasks, including choose, generate, execute, and negotiate. Our findings show that LLM-powered robots elevate expectations for sophisticated non-verbal cues and excel in connection-building and deliberation, but fall short in logical communication and may induce anxiety. We provide design implications both for robots integrating LLMs and for fine-tuning LLMs for use with robots. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: 10 pages, 4 figures. Callie Y. Kim and Christine P. Lee contributed equally to the work. To be published in Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24), March 11--14, 2024, Boulder, CO, USA

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.00083 [pdf, other]

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos

Authors: Pilhyeon Lee, Hyeran Byun

Abstract: Temporal sentence grounding aims to localize moments relevant to a language description. Recently, DETR-like approaches achieved notable progress by predicting the center and length of a target moment. However, they suffer from the issue of center misalignment raised by the inherent ambiguity of moment centers, leading to inaccurate predictions. To remedy this problem, we propose a novel boundary-… ▽ More Temporal sentence grounding aims to localize moments relevant to a language description. Recently, DETR-like approaches achieved notable progress by predicting the center and length of a target moment. However, they suffer from the issue of center misalignment raised by the inherent ambiguity of moment centers, leading to inaccurate predictions. To remedy this problem, we propose a novel boundary-oriented moment formulation. In our paradigm, the model no longer needs to find the precise center but instead suffices to predict any anchor point within the interval, from which the boundaries are directly estimated. Based on this idea, we design a boundary-aligned moment detection transformer, equipped with a dual-pathway decoding process. Specifically, it refines the anchor and boundaries within parallel pathways using global and boundary-focused attention, respectively. This separate design allows the model to focus on desirable regions, enabling precise refinement of moment predictions. Further, we propose a quality-based ranking method, ensuring that proposals with high localization qualities are prioritized over incomplete ones. Experiments on three benchmarks validate the effectiveness of the proposed methods. The code is available at https://github.com/Pilhyeon/BAM-DETR. △ Less

Submitted 18 July, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: Accepted by ECCV 2024

arXiv:2311.13319 [pdf, other]

Deep Learning for Vascular Segmentation and Applications in Phase Contrast Tomography Imaging

Authors: Ekin Yagis, Shahab Aslani, Yashvardhan Jain, Yang Zhou, Shahrokh Rahmani, Joseph Brunet, Alexandre Bellier, Christopher Werlein, Maximilian Ackermann, Danny Jonigk, Paul Tafforeau, Peter D Lee, Claire Walsh

Abstract: Automated blood vessel segmentation is vital for biomedical imaging, as vessel changes indicate many pathologies. Still, precise segmentation is difficult due to the complexity of vascular structures, anatomical variations across patients, the scarcity of annotated public datasets, and the quality of images. We present a thorough literature review, highlighting the state of machine learning techni… ▽ More Automated blood vessel segmentation is vital for biomedical imaging, as vessel changes indicate many pathologies. Still, precise segmentation is difficult due to the complexity of vascular structures, anatomical variations across patients, the scarcity of annotated public datasets, and the quality of images. We present a thorough literature review, highlighting the state of machine learning techniques across diverse organs. Our goal is to provide a foundation on the topic and identify a robust baseline model for application to vascular segmentation in a new imaging modality, Hierarchical Phase Contrast Tomography (HiP CT). Introduced in 2020 at the European Synchrotron Radiation Facility, HiP CT enables 3D imaging of complete organs at an unprecedented resolution of ca. 20mm per voxel, with the capability for localized zooms in selected regions down to 1mm per voxel without sectioning. We have created a training dataset with double annotator validated vascular data from three kidneys imaged with HiP CT in the context of the Human Organ Atlas Project. Finally, utilising the nnU Net model, we conduct experiments to assess the models performance on both familiar and unseen samples, employing vessel specific metrics. Our results show that while segmentations yielded reasonably high scores such as clDice values ranging from 0.82 to 0.88, certain errors persisted. Large vessels that collapsed due to the lack of hydrostatic pressure (HiP CT is an ex vivo technique) were segmented poorly. Moreover, decreased connectivity in finer vessels and higher segmentation errors at vessel boundaries were observed. Such errors obstruct the understanding of the structures by interrupting vascular tree connectivity. Through our review and outputs, we aim to set a benchmark for subsequent model evaluations using various modalities, especially with the HiP CT imaging database. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.00993 [pdf, other]

Scalable Probabilistic Forecasting in Retail with Gradient Boosted Trees: A Practitioner's Approach

Authors: Xueying Long, Quang Bui, Grady Oktavian, Daniel F. Schmidt, Christoph Bergmeir, Rakshitha Godahewa, Seong Per Lee, Kaifeng Zhao, Paul Condylis

Abstract: The recent M5 competition has advanced the state-of-the-art in retail forecasting. However, we notice important differences between the competition challenge and the challenges we face in a large e-commerce company. The datasets in our scenario are larger (hundreds of thousands of time series), and e-commerce can afford to have a larger assortment than brick-and-mortar retailers, leading to more i… ▽ More The recent M5 competition has advanced the state-of-the-art in retail forecasting. However, we notice important differences between the competition challenge and the challenges we face in a large e-commerce company. The datasets in our scenario are larger (hundreds of thousands of time series), and e-commerce can afford to have a larger assortment than brick-and-mortar retailers, leading to more intermittent data. To scale to larger dataset sizes with feasible computational effort, firstly, we investigate a two-layer hierarchy and propose a top-down approach to forecasting at an aggregated level with less amount of series and intermittency, and then disaggregating to obtain the decision-level forecasts. Probabilistic forecasts are generated under distributional assumptions. Secondly, direct training at the lower level with subsamples can also be an alternative way of scaling. Performance of modelling with subsets is evaluated with the main dataset. Apart from a proprietary dataset, the proposed scalable methods are evaluated using the Favorita dataset and the M5 dataset. We are able to show the differences in characteristics of the e-commerce and brick-and-mortar retail datasets. Notably, our top-down forecasting framework enters the top 50 of the original M5 competition, even with models trained at a higher level under a much simpler setting. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2308.16585 [pdf]

doi 10.1016/S2589-7500(23)00135-8

Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study

Authors: Patrick Saux, Pierre Bauvin, Violeta Raverdy, Julien Teigny, Hélène Verkindt, Tomy Soumphonphakdy, Maxence Debert, Anne Jacobs, Daan Jacobs, Valerie Monpellier, Phong Ching Lee, Chin Hong Lim, Johanna C Andersson-Assarsson, Lena Carlsson, Per-Arne Svensson, Florence Galtier, Guelareh Dezfoulian, Mihaela Moldovanu, Severine Andrieux, Julien Couster, Marie Lepage, Erminia Lembo, Ornella Verrastro, Maud Robert, Paulina Salminen , et al. (9 additional authors not shown)

Abstract: Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participa… ▽ More Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: The Lancet Digital Health, 2023

arXiv:2308.10554 [pdf, other]

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

Authors: Seogkyu Jeon, Bei Liu, Pilhyeon Lee, Kibeom Hong, Jianlong Fu, Hyeran Byun

Abstract: Training deep generative models usually requires a large amount of data. To alleviate the data collection cost, the task of zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain without any further training samples. Due to the data absence, the textual description of the target domain and the vision-language models, e.g., CLIP, are utilized… ▽ More Training deep generative models usually requires a large amount of data. To alleviate the data collection cost, the task of zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain without any further training samples. Due to the data absence, the textual description of the target domain and the vision-language models, e.g., CLIP, are utilized to effectively guide the generator. However, with only a single representative text feature instead of real images, the synthesized images gradually lose diversity as the model is optimized, which is also known as mode collapse. To tackle the problem, we propose a novel method to find semantic variations of the target text in the CLIP space. Specifically, we explore diverse semantic variations based on the informative text feature of the target domain while regularizing the uncontrolled deviation of the semantic information. With the obtained variations, we design a novel directional moment loss that matches the first and second moments of image and text direction distributions. Moreover, we introduce elastic weight consolidation and a relation consistency loss to effectively preserve valuable content information from the source domain, e.g., appearances. Through extensive experiments, we demonstrate the efficacy of the proposed methods in ensuring sample diversity in various scenarios of zero-shot GAN adaptation. We also conduct ablation studies to validate the effect of each proposed component. Notably, our model achieves a new state-of-the-art on zero-shot GAN adaptation in terms of both diversity and quality. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023 (poster)

arXiv:2308.03417 [pdf, other]

PURL: Safe and Effective Sanitization of Link Decoration

Authors: Shaoor Munir, Patrick Lee, Umar Iqbal, Zubair Shafiq, Sandra Siby

Abstract: While privacy-focused browsers have taken steps to block third-party cookies and mitigate browser fingerprinting, novel tracking techniques that can bypass existing countermeasures continue to emerge. Since trackers need to share information from the client-side to the server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect… ▽ More While privacy-focused browsers have taken steps to block third-party cookies and mitigate browser fingerprinting, novel tracking techniques that can bypass existing countermeasures continue to emerge. Since trackers need to share information from the client-side to the server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect and sanitize tracking information in decorated links. To this end, we present PURL (pronounced purel-l), a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration. Our evaluation shows that PURL significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques. PURL's deployment on a sample of top-million websites shows that link decoration is abused for tracking on nearly three-quarters of the websites, often to share cookies, email addresses, and fingerprinting information. △ Less

Submitted 6 March, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.01887 [pdf, other]

Athena 2.0: Discourse and User Modeling in Open Domain Dialogue

Authors: Omkar Patil, Lena Reed, Kevin K. Bowden, Juraj Juraska, Wen Cui, Vrindavan Harrison, Rishi Rajasekaran, Angela Ramirez, Cecilia Li, Eduardo Zamora, Phillip Lee, Jeshwanth Bheemanpally, Rohan Pandey, Adwait Ratnaparkhi, Marilyn Walker

Abstract: Conversational agents are consistently growing in popularity and many people interact with them every day. While many conversational agents act as personal assistants, they can have many different goals. Some are task-oriented, such as providing customer support for a bank or making a reservation. Others are designed to be empathetic and to form emotional connections with the user. The Alexa Prize… ▽ More Conversational agents are consistently growing in popularity and many people interact with them every day. While many conversational agents act as personal assistants, they can have many different goals. Some are task-oriented, such as providing customer support for a bank or making a reservation. Others are designed to be empathetic and to form emotional connections with the user. The Alexa Prize Challenge aims to create a socialbot, which allows the user to engage in coherent conversations, on a range of popular topics that will interest the user. Here we describe Athena 2.0, UCSC's conversational agent for Amazon's Socialbot Grand Challenge 4. Athena 2.0 utilizes a novel knowledge-grounded discourse model that tracks the entity links that Athena introduces into the dialogue, and uses them to constrain named-entity recognition and linking, and coreference resolution. Athena 2.0 also relies on a user model to personalize topic selection and other aspects of the conversation to individual users. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: Alexa Prize Proceedings, 2021. Socialbot Grand Challenge 4

arXiv:2307.09724 [pdf, other]

AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks

Authors: Kibeom Hong, Seogkyu Jeon, Junsoo Lee, Namhyuk Ahn, Kunhee Kim, Pilhyeon Lee, Daesik Kim, Youngjung Uh, Hyeran Byun

Abstract: To deliver the artistic expression of the target style, recent studies exploit the attention mechanism owing to its ability to map the local patches of the style image to the corresponding patches of the content image. However, because of the low semantic correspondence between arbitrary content and artworks, the attention module repeatedly abuses specific local patches from the style image, resul… ▽ More To deliver the artistic expression of the target style, recent studies exploit the attention mechanism owing to its ability to map the local patches of the style image to the corresponding patches of the content image. However, because of the low semantic correspondence between arbitrary content and artworks, the attention module repeatedly abuses specific local patches from the style image, resulting in disharmonious and evident repetitive artifacts. To overcome this limitation and accomplish impeccable artistic style transfer, we focus on enhancing the attention mechanism and capturing the rhythm of patterns that organize the style. In this paper, we introduce a novel metric, namely pattern repeatability, that quantifies the repetition of patterns in the style image. Based on the pattern repeatability, we propose Aesthetic Pattern-Aware style transfer Networks (AesPA-Net) that discover the sweet spot of local and global style expressions. In addition, we propose a novel self-supervisory task to encourage the attention mechanism to learn precise and meaningful semantic correspondence. Lastly, we introduce the patch-wise style loss to transfer the elaborate rhythm of local patterns. Through qualitative and quantitative evaluations, we verify the reliability of the proposed pattern repeatability that aligns with human perception, and demonstrate the superiority of the proposed framework. △ Less

Submitted 8 August, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023. Code is available at this https://github.com/Kibeom-Hong/AesPA-Net

arXiv:2306.09626 [pdf, other]

PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition

Authors: Jia Le Ngwe, Kian Ming Lim, Chin Poo Lee, Thian Song Ong

Abstract: Facial Expression Recognition (FER) is a machine learning problem that deals with recognizing human facial expressions. While existing work has achieved performance improvements in recent years, FER in the wild and under challenging conditions remains a challenge. In this paper, a lightweight patch and attention network based on MobileNetV1, referred to as PAtt-Lite, is proposed to improve FER per… ▽ More Facial Expression Recognition (FER) is a machine learning problem that deals with recognizing human facial expressions. While existing work has achieved performance improvements in recent years, FER in the wild and under challenging conditions remains a challenge. In this paper, a lightweight patch and attention network based on MobileNetV1, referred to as PAtt-Lite, is proposed to improve FER performance under challenging conditions. A truncated ImageNet-pre-trained MobileNetV1 is utilized as the backbone feature extractor of the proposed method. In place of the truncated layers is a patch extraction block that is proposed for extracting significant local facial features to enhance the representation from MobileNetV1, especially under challenging conditions. An attention classifier is also proposed to improve the learning of these patched feature maps from the extremely lightweight feature extractor. The experimental results on public benchmark databases proved the effectiveness of the proposed method. PAtt-Lite achieved state-of-the-art results on CK+, RAF-DB, FER2013, FERPlus, and the challenging conditions subsets for RAF-DB and FERPlus. The source code for the proposed method will be available at https://github.com/JLREx/PAtt-Lite. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2306.00217 [pdf, other]

FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms

Authors: Patrick Lee, Iyanuoluwa Shode, Alain Chirino Trujillo, Yuan Zhao, Olumide Ebenezer Ojo, Diana Cuevas Plancarte, Anna Feldman, Jing Peng

Abstract: Transformers have been shown to work well for the task of English euphemism disambiguation, in which a potentially euphemistic term (PET) is classified as euphemistic or non-euphemistic in a particular context. In this study, we expand on the task in two ways. First, we annotate PETs for vagueness, a linguistic property associated with euphemisms, and find that transformers are generally better at… ▽ More Transformers have been shown to work well for the task of English euphemism disambiguation, in which a potentially euphemistic term (PET) is classified as euphemistic or non-euphemistic in a particular context. In this study, we expand on the task in two ways. First, we annotate PETs for vagueness, a linguistic property associated with euphemisms, and find that transformers are generally better at classifying vague PETs, suggesting linguistic differences in the data that impact performance. Second, we present novel euphemism corpora in three different languages: Yoruba, Spanish, and Mandarin Chinese. We perform euphemism disambiguation experiments in each language using multilingual transformer models mBERT and XLM-RoBERTa, establishing preliminary results from which to launch future work. △ Less

Submitted 6 June, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2304.13678 [pdf, other]

A marker-less human motion analysis system for motion-based biomarker discovery in knee disorders

Authors: Kai Armstrong, Lei Zhang, Yan Wen, Alexander P. Willmott, Paul Lee, Xujioing Ye

Abstract: In recent years the NHS has been having increased difficulty seeing all low-risk patients, this includes but not limited to suspected osteoarthritis (OA) patients. To help address the increased waiting lists and shortages of staff, we propose a novel method of automated biomarker identification for diagnosis of knee disorders and the monitoring of treatment progression. The proposed method allows… ▽ More In recent years the NHS has been having increased difficulty seeing all low-risk patients, this includes but not limited to suspected osteoarthritis (OA) patients. To help address the increased waiting lists and shortages of staff, we propose a novel method of automated biomarker identification for diagnosis of knee disorders and the monitoring of treatment progression. The proposed method allows for the measurement and analysis of biomechanics and analyse their clinical significance, in both a cheap and sensitive alternative to the currently available commercial alternatives. These methods and results validate the capabilities of standard RGB cameras in clinical environments to capture motion and show that when compared to alternatives such as depth cameras there is a comparable accuracy in the clinical environment. Biomarker identification using Principal Component Analysis (PCA) allows the reduction of the dimensionality to produce the most representative features from motion data, these new biomarkers can then be used to assess the success of treatment and track the progress of rehabilitation. This was validated by applying these techniques on a case study utilising the exploratory use of local anaesthetic applied on knee pain, this allows these new representative biomarkers to be validated as statistically significant (p-value < 0.05). △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 11 pages, 5 figures

arXiv:2303.17285 [pdf, other]

Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection

Authors: Pilhyeon Lee, Taeoh Kim, Minho Shim, Dongyoon Wee, Hyeran Byun

Abstract: Temporal action detection aims to predict the time intervals and the classes of action instances in the video. Despite the promising performance, existing two-stream models exhibit slow inference speed due to their reliance on computationally expensive optical flow. In this paper, we introduce a decomposed cross-modal distillation framework to build a strong RGB-based detector by transferring know… ▽ More Temporal action detection aims to predict the time intervals and the classes of action instances in the video. Despite the promising performance, existing two-stream models exhibit slow inference speed due to their reliance on computationally expensive optical flow. In this paper, we introduce a decomposed cross-modal distillation framework to build a strong RGB-based detector by transferring knowledge of the motion modality. Specifically, instead of direct distillation, we propose to separately learn RGB and motion representations, which are in turn combined to perform action localization. The dual-branch design and the asymmetric training objectives enable effective motion knowledge transfer while preserving RGB information intact. In addition, we introduce a local attentive fusion to better exploit the multimodal complementarity. It is designed to preserve the local discriminability of the features that is important for action localization. Extensive experiments on the benchmarks verify the effectiveness of the proposed method in enhancing RGB-based action detectors. Notably, our framework is agnostic to backbones and detection heads, bringing consistent gains across different model combinations. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023

arXiv:2303.12712 [pdf, other]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Authors: Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang

Abstract: Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an earl… ▽ More Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions. △ Less

Submitted 13 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2301.08448 [pdf, other]

Source-free Subject Adaptation for EEG-based Visual Recognition

Authors: Pilhyeon Lee, Seogkyu Jeon, Sunhee Hwang, Minjung Shin, Hyeran Byun

Abstract: This paper focuses on subject adaptation for EEG-based visual recognition. It aims at building a visual stimuli recognition system customized for the target subject whose EEG samples are limited, by transferring knowledge from abundant data of source subjects. Existing approaches consider the scenario that samples of source subjects are accessible during training. However, it is often infeasible a… ▽ More This paper focuses on subject adaptation for EEG-based visual recognition. It aims at building a visual stimuli recognition system customized for the target subject whose EEG samples are limited, by transferring knowledge from abundant data of source subjects. Existing approaches consider the scenario that samples of source subjects are accessible during training. However, it is often infeasible and problematic to access personal biological data like EEG signals due to privacy issues. In this paper, we introduce a novel and practical problem setup, namely source-free subject adaptation, where the source subject data are unavailable and only the pre-trained model parameters are provided for subject adaptation. To tackle this challenging problem, we propose classifier-based data generation to simulate EEG samples from source subjects using classifier responses. Using the generated samples and target subject data, we perform subject-independent feature learning to exploit the common knowledge shared across different subjects. Notably, our framework is generalizable and can adopt any subject-independent learning method. In the experiments on the EEG-ImageNet40 benchmark, our model brings consistent improvements regardless of the choice of subject-independent learning. Also, our method shows promising performance, recording top-1 test accuracy of 74.6% under the 5-shot setting even without relying on source data. Our code can be found at https://github.com/DeepBCI/Deep-BCI/tree/master/1_Intelligent_BCI/Source_Free_Subject_Adaptation_for_EEG. △ Less

Submitted 20 January, 2023; originally announced January 2023.

Comments: Accepted by the 11th IEEE International Winter Conference on Brain-Computer Interface (BCI 2023). Code is available at https://github.com/DeepBCI/Deep-BCI

arXiv:2211.13327 [pdf, other]

A Report on the Euphemisms Detection Shared Task

Authors: Patrick Lee, Anna Feldman, Jing Peng

Abstract: This paper presents The Shared Task on Euphemism Detection for the Third Workshop on Figurative Language Processing (FigLang 2022) held in conjunction with EMNLP 2022. Participants were invited to investigate the euphemism detection task: given input text, identify whether it contains a euphemism. The input data is a corpus of sentences containing potentially euphemistic terms (PETs) collected fro… ▽ More This paper presents The Shared Task on Euphemism Detection for the Third Workshop on Figurative Language Processing (FigLang 2022) held in conjunction with EMNLP 2022. Participants were invited to investigate the euphemism detection task: given input text, identify whether it contains a euphemism. The input data is a corpus of sentences containing potentially euphemistic terms (PETs) collected from the GloWbE corpus (Davies and Fuchs, 2015), and are human-annotated as containing either a euphemistic or literal usage of a PET. In this paper, we present the results and analyze the common themes, methods and findings of the participating teams △ Less

Submitted 3 December, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

arXiv:2210.13576 [pdf, ps, other]

Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

Authors: Evonne P. C. Lee, Guangzhi Sun, Chao Zhang, Philip C. Woodland

Abstract: In speaker diarisation, speaker embedding extraction models often suffer from the mismatch between their training loss functions and the speaker clustering method. In this paper, we propose the method of spectral clustering-aware learning of embeddings (SCALE) to address the mismatch. Specifically, besides an angular prototype cal (AP) loss, SCALE uses a novel affinity matrix loss which directly m… ▽ More In speaker diarisation, speaker embedding extraction models often suffer from the mismatch between their training loss functions and the speaker clustering method. In this paper, we propose the method of spectral clustering-aware learning of embeddings (SCALE) to address the mismatch. Specifically, besides an angular prototype cal (AP) loss, SCALE uses a novel affinity matrix loss which directly minimises the error between the affinity matrix estimated from speaker embeddings and the reference. SCALE also includes p-percentile thresholding and Gaussian blur as two important hyper-parameters for spectral clustering in training. Experiments on the AMI dataset showed that speaker embeddings obtained with SCALE achieved over 50% relative speaker error rate reductions using oracle segmentation, and over 30% relative diarisation error rate reductions using automatic segmentation when compared to a strong baseline with the AP-loss-based speaker embeddings. △ Less

Submitted 14 March, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: To appear in ICASSP 2023, 5 pages

arXiv:2208.04286 [pdf]

doi 10.1016/j.patcog.2022.108953

Exploiting Shape Cues for Weakly Supervised Semantic Segmentation

Authors: Sungpil Kho, Pilhyeon Lee, Wonyoung Lee, Minsong Ki, Hyeran Byun

Abstract: Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training. To this end, previous methods adopt the common pipeline: they generate pseudo masks from class activation maps (CAMs) and use such masks to supervise segmentation networks. However, it is challenging to derive comprehensive pseudo masks that cover the whole extent… ▽ More Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training. To this end, previous methods adopt the common pipeline: they generate pseudo masks from class activation maps (CAMs) and use such masks to supervise segmentation networks. However, it is challenging to derive comprehensive pseudo masks that cover the whole extent of objects due to the local property of CAMs, i.e., they tend to focus solely on small discriminative object parts. In this paper, we associate the locality of CAMs with the texture-biased property of convolutional neural networks (CNNs). Accordingly, we propose to exploit shape information to supplement the texture-biased CNN features, thereby encouraging mask predictions to be not only comprehensive but also well-aligned with object boundaries. We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities, in order to generate reliable pseudo masks to supervise the model. Importantly, our model is end-to-end trained within a single-stage framework and therefore efficient in terms of the training cost. Through extensive experiments on PASCAL VOC 2012, we validate the effectiveness of our method in producing precise and shape-aligned segmentation results. Specifically, our model surpasses the existing state-of-the-art single-stage approaches by large margins. What is more, it also achieves a new state-of-the-art performance over multi-stage approaches, when adopted in a simple two-stage pipeline without bells and whistles. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: Accepted by Pattern Recognition. The first two authors contributed equally

Journal ref: Pattern Recognition 132 (2022): 108953

arXiv:2208.00301 [pdf, other]

doi 10.1145/3517428.3550398

Towards Visualization of Time-Series Ecological Momentary Assessment (EMA) Data on Standalone Voice-First Virtual Assistants

Authors: Yichen Han, Christopher Bo Han, Chen Chen, Peng Wei Lee, Michael Hogarth, Alison A. Moore, Nadir Weibel, Emilia Farcas

Abstract: Population aging is an increasingly important consideration for health care in the 21th century, and continuing to have access and interact with digital health information is a key challenge for aging populations. Voice-based Intelligent Virtual Assistants (IVAs) are promising to improve the Quality of Life (QoL) of older adults, and coupled with Ecological Momentary Assessments (EMA) they can be… ▽ More Population aging is an increasingly important consideration for health care in the 21th century, and continuing to have access and interact with digital health information is a key challenge for aging populations. Voice-based Intelligent Virtual Assistants (IVAs) are promising to improve the Quality of Life (QoL) of older adults, and coupled with Ecological Momentary Assessments (EMA) they can be effective to collect important health information from older adults, especially when it comes to repeated time-based events. However, this same EMA data is hard to access for the older adult: although the newest IVAs are equipped with a display, the effectiveness of visualizing time-series based EMA data on standalone IVAs has not been explored. To investigate the potential opportunities for visualizing time-series based EMA data on standalone IVAs, we designed a prototype system, where older adults are able to query and examine the time-series EMA data on Amazon Echo Show - a widely used commercially available standalone screen-based IVA. We conducted a preliminary semi-structured interview with a geriatrician and an older adult, and identified three findings that should be carefully considered when designing such visualizations. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: 4 pages, The 24th International ACM SIGACCESS Conference on Computers and Accessibility

ACM Class: K.4.2; K.6.m; J.3

arXiv:2207.09613 [pdf]

doi 10.1016/j.eswa.2022.117697

Exploiting Domain Transferability for Collaborative Inter-level Domain Adaptive Object Detection

Authors: Mirae Do, Seogkyu Jeon, Pilhyeon Lee, Kibeom Hong, Yu-seung Ma, Hyeran Byun

Abstract: Domain adaptation for object detection (DAOD) has recently drawn much attention owing to its capability of detecting target objects without any annotations. To tackle the problem, previous works focus on aligning features extracted from partial levels (e.g., image-level, instance-level, RPN-level) in a two-stage detector via adversarial training. However, individual levels in the object detection… ▽ More Domain adaptation for object detection (DAOD) has recently drawn much attention owing to its capability of detecting target objects without any annotations. To tackle the problem, previous works focus on aligning features extracted from partial levels (e.g., image-level, instance-level, RPN-level) in a two-stage detector via adversarial training. However, individual levels in the object detection pipeline are closely related to each other and this inter-level relation is unconsidered yet. To this end, we introduce a novel framework for DAOD with three proposed components: Multi-scale-aware Uncertainty Attention (MUA), Transferable Region Proposal Network (TRPN), and Dynamic Instance Sampling (DIS). With these modules, we seek to reduce the negative transfer effect during training while maximizing transferability as well as discriminability in both domains. Finally, our framework implicitly learns domain invariant regions for object detection via exploiting the transferable information and enhances the complementarity between different detection levels by collaboratively utilizing their domain information. Through ablation studies and experiments, we show that the proposed modules contribute to the performance improvement in a synergic way, demonstrating the effectiveness of our method. Moreover, our model achieves a new state-of-the-art performance on various benchmarks. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted to Expert Systems with Applications. The first three authors contributed equally

Journal ref: Expert Systems with Applications 205 (2022): 117697

arXiv:2207.05138 [pdf, other]

Towards Personalized Healthcare in Cardiac Population: The Development of a Wearable ECG Monitoring System, an ECG Lossy Compression Schema, and a ResNet-Based AF Detector

Authors: Wei-Ying Yi, Peng-Fei Liu, Sheung-Lai Lo, Ya-Fen Chan, Yu Zhou, Yee Leung, Kam-Sang Woo, Alex Pui-Wai Lee, Jia-Min Chen, Kwong-Sak Leung

Abstract: Cardiovascular diseases (CVDs) are the number one cause of death worldwide. While there is growing evidence that the atrial fibrillation (AF) has strong associations with various CVDs, this heart arrhythmia is usually diagnosed using electrocardiography (ECG) which is a risk-free, non-intrusive, and cost-efficient tool. Continuously and remotely monitoring the subjects' ECG information unlocks the… ▽ More Cardiovascular diseases (CVDs) are the number one cause of death worldwide. While there is growing evidence that the atrial fibrillation (AF) has strong associations with various CVDs, this heart arrhythmia is usually diagnosed using electrocardiography (ECG) which is a risk-free, non-intrusive, and cost-efficient tool. Continuously and remotely monitoring the subjects' ECG information unlocks the potentials of prompt pre-diagnosis and timely pre-treatment of AF before the development of any life-threatening conditions/diseases. Ultimately, the CVDs associated mortality could be reduced. In this manuscript, the design and implementation of a personalized healthcare system embodying a wearable ECG device, a mobile application, and a back-end server are presented. This system continuously monitors the users' ECG information to provide personalized health warnings/feedbacks. The users are able to communicate with their paired health advisors through this system for remote diagnoses, interventions, etc. The implemented wearable ECG devices have been evaluated and showed excellent intra-consistency (CVRMS=5.5%), acceptable inter-consistency (CVRMS=12.1%), and negligible RR-interval errors (ARE<1.4%). To boost the battery life of the wearable devices, a lossy compression schema utilizing the quasi-periodic feature of ECG signals to achieve compression was proposed. Compared to the recognized schemata, it outperformed the others in terms of compression efficiency and distortion, and achieved at least 2x of CR at a certain PRD or RMSE for ECG signals from the MIT-BIH database. To enable automated AF diagnosis/screening in the proposed system, a ResNet-based AF detector was developed. For the ECG records from the 2017 PhysioNet CinC challenge, this AF detector obtained an average testing F1=85.10% and a best testing F1=87.31%, outperforming the state-of-the-art. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2206.12980 [pdf]

Detecting Schizophrenia with 3D Structural Brain MRI Using Deep Learning

Authors: Junhao Zhang, Vishwanatha M. Rao, Ye Tian, Yanting Yang, Nicolas Acosta, Zihan Wan, Pin-Yu Lee, Chloe Zhang, Lawrence S. Kegeles, Scott A. Small, Jia Guo

Abstract: Schizophrenia is a chronic neuropsychiatric disorder that causes distinct structural alterations within the brain. We hypothesize that deep learning applied to a structural neuroimaging dataset could detect disease-related alteration and improve classification and diagnostic accuracy. We tested this hypothesis using a single, widely available, and conventional T1-weighted MRI scan, from which we e… ▽ More Schizophrenia is a chronic neuropsychiatric disorder that causes distinct structural alterations within the brain. We hypothesize that deep learning applied to a structural neuroimaging dataset could detect disease-related alteration and improve classification and diagnostic accuracy. We tested this hypothesis using a single, widely available, and conventional T1-weighted MRI scan, from which we extracted the 3D whole-brain structure using standard post-processing methods. A deep learning model was then developed, optimized, and evaluated on three open datasets with T1-weighted MRI scans of patients with schizophrenia. Our proposed model outperformed the benchmark model, which was also trained with structural MR images using a 3D CNN architecture. Our model is capable of almost perfectly (area under the ROC curve = 0.987) distinguishing schizophrenia patients from healthy controls on unseen structural MRI scans. Regional analysis localized subcortical regions and ventricles as the most predictive brain regions. Subcortical structures serve a pivotal role in cognitive, affective, and social functions in humans, and structural abnormalities of these regions have been associated with schizophrenia. Our finding corroborates that schizophrenia is associated with widespread alterations in subcortical brain structure and the subcortical structural information provides prominent features in diagnostic classification. Together, these results further demonstrate the potential of deep learning to improve schizophrenia diagnosis and identify its structural neuroimaging signatures from a single, standard T1-weighted brain MRI. △ Less

Submitted 7 July, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

Comments: 13 pages, 6 figures

arXiv:2205.14555 [pdf, other]

Two New Piggybacking Designs with Lower Repair Bandwidth

Authors: Zhengyi Jiang, Hanxu Hou, Yunghsiang S. Han, Patrick P. C. Lee, Bo Bai, Zhongyi Huang

Abstract: Piggybacking codes are a special class of MDS array codes that can achieve small repair bandwidth with small sub-packetization by first creating some instances of an $(n,k)$ MDS code, such as a Reed-Solomon (RS) code, and then designing the piggyback function. In this paper, we propose a new piggybacking coding design which designs the piggyback function over some instances of both $(n,k)$ MDS cod… ▽ More Piggybacking codes are a special class of MDS array codes that can achieve small repair bandwidth with small sub-packetization by first creating some instances of an $(n,k)$ MDS code, such as a Reed-Solomon (RS) code, and then designing the piggyback function. In this paper, we propose a new piggybacking coding design which designs the piggyback function over some instances of both $(n,k)$ MDS code and $(n,k')$ MDS code, when $k\geq k'$. We show that our new piggybacking design can significantly reduce the repair bandwidth for single-node failures. When $k=k'$, we design piggybacking code that is MDS code and we show that the designed code has lower repair bandwidth for single-node failures than all existing piggybacking codes when the number of parity node $r=n-k\geq8$ and the sub-packetization $α<r$. Moreover, we propose another piggybacking codes by designing $n$ piggyback functions of some instances of $(n,k)$ MDS code and adding the $n$ piggyback functions into the $n$ newly created empty entries with no data symbols. We show that our code can significantly reduce repair bandwidth for single-node failures at a cost of slightly more storage overhead. In addition, we show that our code can recover any $r+1$ node failures for some parameters. We also show that our code has lower repair bandwidth than locally repairable codes (LRCs) under the same fault-tolerance and redundancy for some parameters. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.11753 [pdf, other]

Efficient LSM-Tree Key-Value Data Management on Hybrid SSD/HDD Zoned Storage

Authors: Jinhong Li, Qiuping Wang, Patrick P. C. Lee

Abstract: Zoned storage devices, such as zoned namespace (ZNS) solid-state drives (SSDs) and host-managed shingled magnetic recording (HM-SMR) hard-disk drives (HDDs), expose interfaces for host-level applications to support fine-grained, high-performance storage management. Combining ZNS SSDs and HM-SMR HDDs into a unified hybrid storage system is a natural direction to scale zoned storage at low cost, yet… ▽ More Zoned storage devices, such as zoned namespace (ZNS) solid-state drives (SSDs) and host-managed shingled magnetic recording (HM-SMR) hard-disk drives (HDDs), expose interfaces for host-level applications to support fine-grained, high-performance storage management. Combining ZNS SSDs and HM-SMR HDDs into a unified hybrid storage system is a natural direction to scale zoned storage at low cost, yet how to effectively incorporate zoned storage awareness into hybrid storage is a non-trivial issue. We make a case for key-value (KV) stores based on log-structured merge trees (LSM-trees) as host-level applications, and present HHZS, a middleware system that bridges an LSM-tree KV store with hybrid zoned storage devices based on hints. HHZS leverages hints issued by the flushing, compaction, and caching operations of the LSM-tree KV store to manage KV objects in placement, migration, and caching in hybrid ZNS SSD and HM-SMR HDD zoned storage. Experiments show that our HHZS prototype, when running on real ZNS SSD and HM-SMR HDD devices, achieves the highest throughput compared with all baselines under various settings. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2205.10451 [pdf, other]

Searching for PETs: Using Distributional and Sentiment-Based Methods to Find Potentially Euphemistic Terms

Authors: Patrick Lee, Martha Gavidia, Anna Feldman, Jing Peng

Abstract: This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs. Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of distributional similarities to select and filter phrase candidates from a sentence and rank them using a set of simple sentiment-based metrics. We present the results of… ▽ More This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs. Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of distributional similarities to select and filter phrase candidates from a sentence and rank them using a set of simple sentiment-based metrics. We present the results of our approach tested on a corpus of sentences containing euphemisms, demonstrating its efficacy for detecting single and multi-word PETs from a broad range of topics. We also discuss future potential for sentiment-based methods on this task. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Journal ref: Proceedings of UnImplicit: The Second Workshop on Understanding Implicit and Underspecified Language, NAACL 2022, Seattle

arXiv:2205.02728 [pdf, other]

CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms

Authors: Martha Gavidia, Patrick Lee, Anna Feldman, Jing Peng

Abstract: Euphemisms have not received much attention in natural language processing, despite being an important element of polite and figurative language. Euphemisms prove to be a difficult topic, not only because they are subject to language change, but also because humans may not agree on what is a euphemism and what is not. Nevertheless, the first step to tackling the issue is to collect and analyze exa… ▽ More Euphemisms have not received much attention in natural language processing, despite being an important element of polite and figurative language. Euphemisms prove to be a difficult topic, not only because they are subject to language change, but also because humans may not agree on what is a euphemism and what is not. Nevertheless, the first step to tackling the issue is to collect and analyze examples of euphemisms. We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus. Additionally, we present a subcorpus of texts where these PETs are not being used euphemistically, which may be useful for future applications. We also discuss the results of multiple analyses run on the corpus. Firstly, we find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment. Secondly, we observe cases of disagreement in an annotation task, where humans are asked to label PETs as euphemistic or not in a subset of our corpus text examples. We attribute the disagreement to a variety of potential reasons, including if the PET was a commonly accepted term (CAT). △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: Proceedings of LREC 2022

arXiv:2203.16209 [pdf, other]

Fair Contrastive Learning for Facial Attribute Classification

Authors: Sungho Park, Jewook Lee, Pilhyeon Lee, Sunhee Hwang, Dohyung Kim, Hyeran Byun

Abstract: Learning visual representation of high quality is essential for image classification. Recently, a series of contrastive representation learning methods have achieved preeminent success. Particularly, SupCon outperformed the dominant methods based on cross-entropy loss in representation learning. However, we notice that there could be potential ethical risks in supervised contrastive learning. In t… ▽ More Learning visual representation of high quality is essential for image classification. Recently, a series of contrastive representation learning methods have achieved preeminent success. Particularly, SupCon outperformed the dominant methods based on cross-entropy loss in representation learning. However, we notice that there could be potential ethical risks in supervised contrastive learning. In this paper, we for the first time analyze unfairness caused by supervised contrastive learning and propose a new Fair Supervised Contrastive Loss (FSCL) for fair visual representation learning. Inheriting the philosophy of supervised contrastive learning, it encourages representation of the same class to be closer to each other than that of different classes, while ensuring fairness by penalizing the inclusion of sensitive attribute information in representation. In addition, we introduce a group-wise normalization to diminish the disparities of intra-group compactness and inter-class separability between demographic groups that arouse unfair classification. Through extensive experiments on CelebA and UTK Face, we validate that the proposed method significantly outperforms SupCon and existing state-of-the-art methods in terms of the trade-off between top-1 accuracy and fairness. Moreover, our method is robust to the intensity of data bias and effectively works in incomplete supervised settings. Our code is available at https://github.com/sungho-CoolG/FSCL. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2203.10766 [pdf, other]

An In-Depth Comparative Analysis of Cloud Block Storage Workloads: Findings and Implications

Authors: Jinhong Li, Qiuping Wang, Patrick P. C. Lee, Chao Shi

Abstract: Cloud block storage systems support diverse types of applications in modern cloud services. Characterizing their I/O activities is critical for guiding better system designs and optimizations. In this paper, we present an in-depth comparative analysis of production cloud block storage workloads through the block-level I/O traces of billions of I/O requests collected from two production systems, Al… ▽ More Cloud block storage systems support diverse types of applications in modern cloud services. Characterizing their I/O activities is critical for guiding better system designs and optimizations. In this paper, we present an in-depth comparative analysis of production cloud block storage workloads through the block-level I/O traces of billions of I/O requests collected from two production systems, Alibaba Cloud and Tencent Cloud Block Storage. We study their characteristics of load intensities, spatial patterns, and temporal patterns. We also compare the cloud block storage workloads with the notable public block-level I/O workloads from the enterprise data centers at Microsoft Research Cambridge, and identify the commonalities and differences of the three sources of traces. To this end, we provide 6 findings through the high-level analysis and 16 findings through the detailed analysis on load intensity, spatial patterns, and temporal patterns. We discuss the implications of our findings on load balancing, cache efficiency, and storage cluster management in cloud block storage systems. △ Less

Submitted 19 November, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: 30 pages. Accepted by ACM Transactions on Storage

arXiv:2202.08406 [pdf, other]

doi 10.1145/3491102.3501955

The Unboxing Experience: Exploration and Design of Initial Interactions Between Children and Social Robots

Authors: Christine P Lee, Bengisu Cagiltay, Bilge Mutlu

Abstract: Social robots are increasingly introduced into children's lives as educational and social companions, yet little is known about how these products might best be introduced to their environments. The emergence of the "unboxing" phenomenon in media suggests that introduction is key to technology adoption where initial impressions are made. To better understand this phenomenon toward designing a posi… ▽ More Social robots are increasingly introduced into children's lives as educational and social companions, yet little is known about how these products might best be introduced to their environments. The emergence of the "unboxing" phenomenon in media suggests that introduction is key to technology adoption where initial impressions are made. To better understand this phenomenon toward designing a positive unboxing experience in the context of social robots for children, we conducted three field studies with families of children aged 8 to 13: (1) an exploratory free-play activity ($n=12$); (2) a co-design session ($n=11$) that informed the development of a prototype box and a curated unboxing experience; and (3) a user study ($n=9$) that evaluated children's experiences. Our findings suggest the unboxing experience of social robots can be improved through the design of a creative aesthetic experience that engages the child socially to guide initial interactions and foster a positive child-robot relationship. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: To be published in 2022 CHI Conference on Human Factors in Computing Systems (CHI '22)

Journal ref: CHI Conference on Human Factors in Computing Systems (CHI '22), April 29-May 5, 2022, New Orleans, LA, USA

arXiv:2202.02901 [pdf, other]

Inter-subject Contrastive Learning for Subject Adaptive EEG-based Visual Recognition

Authors: Pilhyeon Lee, Sunhee Hwang, Jewook Lee, Minjung Shin, Seogkyu Jeon, Hyeran Byun

Abstract: This paper tackles the problem of subject adaptive EEG-based visual recognition. Its goal is to accurately predict the categories of visual stimuli based on EEG signals with only a handful of samples for the target subject during training. The key challenge is how to appropriately transfer the knowledge obtained from abundant data of source subjects to the subject of interest. To this end, we intr… ▽ More This paper tackles the problem of subject adaptive EEG-based visual recognition. Its goal is to accurately predict the categories of visual stimuli based on EEG signals with only a handful of samples for the target subject during training. The key challenge is how to appropriately transfer the knowledge obtained from abundant data of source subjects to the subject of interest. To this end, we introduce a novel method that allows for learning subject-independent representation by increasing the similarity of features sharing the same class but coming from different subjects. With the dedicated sampling principle, our model effectively captures the common knowledge shared across different subjects, thereby achieving promising performance for the target subject even under harsh problem settings with limited data. Specifically, on the EEG-ImageNet40 benchmark, our model records the top-1 / top-3 test accuracy of 72.6% / 91.6% when using only five EEG samples per class for the target subject. Our code is available at https://github.com/DeepBCI/Deep-BCI/tree/master/1_Intelligent_BCI/Inter_Subject_Contrastive_Learning_for_EEG. △ Less

Submitted 6 February, 2022; originally announced February 2022.

Comments: Accepted by the 10th IEEE International Winter Conference on Brain-Computer Interface (BCI 2022). Code is available at https://github.com/DeepBCI/Deep-BCI

arXiv:2201.08741 [pdf]

doi 10.3389/fnimg.2022.1023481

Improving Across-Dataset Brain Tissue Segmentation Using Transformer

Authors: Vishwanatha M. Rao, Zihan Wan, Soroush Arabshahi, David J. Ma, Pin-Yu Lee, Ye Tian, Xuzhe Zhang, Andrew F. Laine, Jia Guo

Abstract: Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentati… ▽ More Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentation tool. Despite the recent success of deep convolutional neural networks (CNNs) for brain tissue segmentation, many such solutions do not generalize well to new datasets, which is critical for a reliable solution. Transformers have demonstrated success in natural image segmentation and have recently been applied to 3D medical image segmentation tasks due to their ability to capture long-distance relationships in the input where the local receptive fields of CNNs struggle. This study introduces a novel CNN-Transformer hybrid architecture designed for brain tissue segmentation. We validate our model's performance across four multi-site T1w MRI datasets, covering different vendors, field strengths, scan parameters, time points, and neuropsychiatric conditions. In all situations, our model achieved the greatest generality and reliability. Out method is inherently robust and can serve as a valuable tool for brain-related T1w MRI studies. The code for the TABS network is available at: https://github.com/raovish6/TABS. △ Less

Submitted 31 January, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

ACM Class: I.4.6

arXiv:2112.09771 [pdf, ps, other]

Privacy Leakage over Dependent Attributes in One-Sided Differential Privacy

Authors: Phillip Lee, Kevin Smith

Abstract: Providing a provable privacy guarantees while maintaining the utility of data is a challenging task in many real-world applications. Recently, a new framework called One-Sided Differential Privacy (OSDP) was introduced that extends existing differential privacy approaches. OSDP increases the utility of the data by taking advantage of the fact that not all records are sensitive. However, the previo… ▽ More Providing a provable privacy guarantees while maintaining the utility of data is a challenging task in many real-world applications. Recently, a new framework called One-Sided Differential Privacy (OSDP) was introduced that extends existing differential privacy approaches. OSDP increases the utility of the data by taking advantage of the fact that not all records are sensitive. However, the previous work assumed that all records are statistically independent from each other. Motivated by occupancy data in building management systems, this paper extends the existing one-sided differential privacy framework. In this paper, we quantify the overall privacy leakage when the adversary is given dependency information between the records. In addition, we show how an optimization problem can be constructed that efficiently trades off between the utility and privacy. △ Less

Submitted 17 December, 2021; originally announced December 2021.

arXiv:2111.02519 [pdf, other]

Athena 2.0: Contextualized Dialogue Management for an Alexa Prize SocialBot

Authors: Juraj Juraska, Kevin K. Bowden, Lena Reed, Vrindavan Harrison, Wen Cui, Omkar Patil, Rishi Rajasekaran, Angela Ramirez, Cecilia Li, Eduardo Zamora, Phillip Lee, Jeshwanth Bheemanpally, Rohan Pandey, Adwait Ratnaparkhi, Marilyn Walker

Abstract: Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges. One reason for Athena's success is its novel dialogue management strategy, which allows it to dynamically construct dialogues and responses from component modules, leading to novel conversations with every interaction. Here we describe Athena's system design and performance in the Alexa Pr… ▽ More Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges. One reason for Athena's success is its novel dialogue management strategy, which allows it to dynamically construct dialogues and responses from component modules, leading to novel conversations with every interaction. Here we describe Athena's system design and performance in the Alexa Prize during the 20/21 competition. A live demo of Athena as well as video recordings will provoke discussion on the state of the art in conversational AI. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: Accepted to EMNLP 2021 System Demonstrations

arXiv:2110.13470 [pdf, other]

Subject Adaptive EEG-based Visual Recognition

Authors: Pilhyeon Lee, Sunhee Hwang, Seogkyu Jeon, Hyeran Byun

Abstract: This paper focuses on EEG-based visual recognition, aiming to predict the visual object class observed by a subject based on his/her EEG signals. One of the main challenges is the large variation between signals from different subjects. It limits recognition systems to work only for the subjects involved in model training, which is undesirable for real-world scenarios where new subjects are freque… ▽ More This paper focuses on EEG-based visual recognition, aiming to predict the visual object class observed by a subject based on his/her EEG signals. One of the main challenges is the large variation between signals from different subjects. It limits recognition systems to work only for the subjects involved in model training, which is undesirable for real-world scenarios where new subjects are frequently added. This limitation can be alleviated by collecting a large amount of data for each new user, yet it is costly and sometimes infeasible. To make the task more practical, we introduce a novel problem setting, namely subject adaptive EEG-based visual recognition. In this setting, a bunch of pre-recorded data of existing users (source) is available, while only a little training data from a new user (target) are provided. At inference time, the model is evaluated solely on the signals from the target user. This setting is challenging, especially because training samples from source subjects may not be helpful when evaluating the model on the data from the target subject. To tackle the new problem, we design a simple yet effective baseline that minimizes the discrepancy between feature distributions from different subjects, which allows the model to extract subject-independent features. Consequently, our model can learn the common knowledge shared among subjects, thereby significantly improving the recognition performance for the target subject. In the experiments, we demonstrate the effectiveness of our method under various settings. Our code is available at https://github.com/DeepBCI/Deep-BCI/tree/master/1_Intelligent_BCI/Subject_Adaptive_EEG_based_Visual_Recognition. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted by ACPR 2021. Code is available at https://github.com/DeepBCI/Deep-BCI

arXiv:2110.04785 [pdf, ps, other]

A Generalization of Array Codes with Local Properties and Efficient Encoding/Decoding

Authors: Hanxu Hou, Yunghsiang S. Han, Patrick P. C. Lee, You Wu, Guojun Han, Mario Blaum

Abstract: A maximum distance separable (MDS) array code is composed of $m\times (k+r)$ arrays such that any $k$ out of $k+r$ columns suffice to retrieve all the information symbols. Expanded-Blaum-Roth (EBR) codes and Expanded-Independent-Parity (EIP) codes are two classes of MDS array codes that can repair any one symbol in a column by locally accessing some other symbols within the column, where the numbe… ▽ More A maximum distance separable (MDS) array code is composed of $m\times (k+r)$ arrays such that any $k$ out of $k+r$ columns suffice to retrieve all the information symbols. Expanded-Blaum-Roth (EBR) codes and Expanded-Independent-Parity (EIP) codes are two classes of MDS array codes that can repair any one symbol in a column by locally accessing some other symbols within the column, where the number of symbols $m$ in a column is a prime number. By generalizing the constructions of EBR and EIP codes, we propose new MDS array codes, such that any one symbol can be locally recovered and the number of symbols in a column can be not only a prime number but also a power of an odd prime number. Also, we present an efficient encoding/decoding method for the proposed generalized EBR (GEBR) and generalized EIP (GEIP) codes based on the LU factorization of a Vandermonde matrix. We show that the proposed decoding method has less computational complexity than existing methods. Furthermore, we show that the proposed GEBR codes have both a larger minimum symbol distance and a larger recovery ability of erased lines for some parameters when compared to EBR codes. We show that EBR codes can recover any $r$ erased lines of a slope for any parameter $r$, which was an open problem in [2]. △ Less

Submitted 12 September, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

arXiv:2108.11173 [pdf, other]

Incorporating Surprisingly Popular Algorithm and Euclidean Distance-based Adaptive Topology into PSO

Authors: Xuan Wu, Jizong Han, Di Wang, Pengyue Gao, Quanlong Cui, Liang Chen, Yanchun Liang, Han Huang, Heow Pueh Lee, Chunyan Miao, You Zhou, Chunguo Wu

Abstract: While many Particle Swarm Optimization (PSO) algorithms only use fitness to assess the performance of particles, in this work, we adopt Surprisingly Popular Algorithm (SPA) as a complementary metric in addition to fitness. Consequently, particles that are not widely known also have the opportunity to be selected as the learning exemplars. In addition, we propose a Euclidean distance-based adaptive… ▽ More While many Particle Swarm Optimization (PSO) algorithms only use fitness to assess the performance of particles, in this work, we adopt Surprisingly Popular Algorithm (SPA) as a complementary metric in addition to fitness. Consequently, particles that are not widely known also have the opportunity to be selected as the learning exemplars. In addition, we propose a Euclidean distance-based adaptive topology to cooperate with SPA, where each particle only connects to k number of particles with the shortest Euclidean distance during each iteration. We also introduce the adaptive topology into heterogeneous populations to better solve large-scale problems. Specifically, the exploration sub-population better preserves the diversity of the population while the exploitation sub-population achieves fast convergence. Therefore, large-scale problems can be solved in a collaborative manner to elevate the overall performance. To evaluate the performance of our method, we conduct extensive experiments on various optimization problems, including three benchmark suites and two real-world optimization problems. The results demonstrate that our Euclidean distance-based adaptive topology outperforms the other widely adopted topologies and further suggest that our method performs significantly better than state-of-the-art PSO variants on small, medium, and large-scale problems. △ Less

Submitted 12 September, 2023; v1 submitted 25 August, 2021; originally announced August 2021.

arXiv:2108.08596 [pdf, other]

Feature Stylization and Domain-aware Contrastive Learning for Domain Generalization

Authors: Seogkyu Jeon, Kibeom Hong, Pilhyeon Lee, Jewook Lee, Hyeran Byun

Abstract: Domain generalization aims to enhance the model robustness against domain shift without accessing the target domain. Since the available source domains for training are limited, recent approaches focus on generating samples of novel domains. Nevertheless, they either struggle with the optimization problem when synthesizing abundant domains or cause the distortion of class semantics. To these ends,… ▽ More Domain generalization aims to enhance the model robustness against domain shift without accessing the target domain. Since the available source domains for training are limited, recent approaches focus on generating samples of novel domains. Nevertheless, they either struggle with the optimization problem when synthesizing abundant domains or cause the distortion of class semantics. To these ends, we propose a novel domain generalization framework where feature statistics are utilized for stylizing original features to ones with novel domain properties. To preserve class information during stylization, we first decompose features into high and low frequency components. Afterward, we stylize the low frequency components with the novel domain styles sampled from the manipulated statistics, while preserving the shape cues in high frequency ones. As the final step, we re-merge both components to synthesize novel domain features. To enhance domain robustness, we utilize the stylized features to maintain the model consistency in terms of features as well as outputs. We achieve the feature consistency with the proposed domain-aware supervised contrastive loss, which ensures domain invariance while increasing class discriminability. Experimental results demonstrate the effectiveness of the proposed feature stylization and the domain-aware contrastive loss. Through quantitative comparisons, we verify the lead of our method upon existing state-of-the-art methods on two benchmarks, PACS and Office-Home. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: Accepted to ACM MM 2021 (oral)

arXiv:2108.05029 [pdf, other]

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization

Authors: Pilhyeon Lee, Hyeran Byun

Abstract: We tackle the problem of localizing temporal intervals of actions with only a single frame label for each action instance for training. Owing to label sparsity, existing work fails to learn action completeness, resulting in fragmentary action predictions. In this paper, we propose a novel framework, where dense pseudo-labels are generated to provide completeness guidance for the model. Concretely,… ▽ More We tackle the problem of localizing temporal intervals of actions with only a single frame label for each action instance for training. Owing to label sparsity, existing work fails to learn action completeness, resulting in fragmentary action predictions. In this paper, we propose a novel framework, where dense pseudo-labels are generated to provide completeness guidance for the model. Concretely, we first select pseudo background points to supplement point-level action labels. Then, by taking the points as seeds, we search for the optimal sequence that is likely to contain complete action instances while agreeing with the seeds. To learn completeness from the obtained sequence, we introduce two novel losses that contrast action instances with background ones in terms of action score and feature similarity, respectively. Experimental results demonstrate that our completeness guidance indeed helps the model to locate complete action instances, leading to large performance gains especially under high IoU thresholds. Moreover, we demonstrate the superiority of our method over existing state-of-the-art methods on four benchmarks: THUMOS'14, GTEA, BEOID, and ActivityNet. Notably, our method even performs comparably to recent fully-supervised methods, at the 6 times cheaper annotation cost. Our code is available at https://github.com/Pilhyeon. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: Accepted by ICCV 2021 (Oral). Code is available at https://github.com/Pilhyeon

Showing 1–50 of 94 results for author: Lee, P