Skip to main content

Showing 1–29 of 29 results for author: Peng, I

  1. arXiv:2407.07850  [pdf, other

    cs.DC

    Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper

    Authors: Gabin Schieffer, Jacob Wahlgren, Jie Ren, Jennifer Faj, Ivy Peng

    Abstract: Memory management across discrete CPU and GPU physical memory is traditionally achieved through explicit GPU allocations and data copy or unified virtual memory. The Grace Hopper Superchip, for the first time, supports an integrated CPU-GPU system page table, hardware-level addressing of system allocated memory, and cache-coherent NVLink-C2C interconnect, bringing an alternative solution for enabl… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to ICPP '24 (The 53rd International Conference on Parallel Processing)

  2. arXiv:2406.11760  [pdf, other

    cs.DC

    Understanding Layered Portability from HPC to Cloud in Containerized Environments

    Authors: Daniel Medeiros, Gabin Schieffer, Jacob Wahlgren, Ivy Peng

    Abstract: Recent development in lightweight OS-level virtualization, containers, provides a potential solution for running HPC applications on the cloud platform. In this work, we focus on the impact of different layers in a containerized environment when migrating HPC containers from a dedicated HPC system to a cloud platform. On three ARM-based platforms, including the latest Nvidia Grace CPU, we use six… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Submitted to ISC Workshop - Workshop on Converged Computing '24, preprint

  3. arXiv:2401.16971  [pdf, other

    cs.DC

    Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations

    Authors: Francieli Boito, Jim Brandt, Valeria Cardellini, Philip Carns, Florina M. Ciorba, Hilary Egan, Ahmed Eleliemy, Ann Gentile, Thomas Gruber, Jeff Hanson, Utz-Uwe Haus, Kevin Huck, Thomas Ilsche, Thomas Jakobsche, Terry Jones, Sven Karlsson, Abdullah Mueen, Michael Ott, Tapasya Patki, Ivy Peng, Krishnan Raghavan, Stephen Simms, Kathleen Shoga, Michael Showerman, Devesh Tiwari , et al. (2 additional authors not shown)

    Abstract: Many High Performance Computing (HPC) facilities have developed and deployed frameworks in support of continuous monitoring and operational data analytics (MODA) to help improve efficiency and throughput. Because of the complexity and scale of systems and workflows and the need for low-latency response to address dynamic circumstances, automated feedback and response have the potential to be more… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  4. A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

    Authors: Jacob Wahlgren, Gabin Schieffer, Maya Gokhale, Ivy Peng

    Abstract: Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising and non-disruptive option for memory disaggregation is rack-scale memory pooling, where shared memory pools supplement node-local memory. This work outlines the… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted to SC23 (The International Conference for High Performance Computing, Networking, Storage, and Analysis 2023)

  5. arXiv:2308.00763  [pdf, other

    cs.DC

    Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU

    Authors: Gabin Schieffer, Nattawat Pornthisan, Daniel Araújo de Medeiros, Stefano Markidis, Jacob Wahlgren, Ivy Peng

    Abstract: High-performance GPU-accelerated particle filter methods are critical for object detection applications, ranging from autonomous driving, robot localization, to time-series prediction. In this work, we investigate the design, development and optimization of particle-filter using half-precision on CUDA cores and compare their performance and accuracy with single- and double-precision baselines on N… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: 12 pages, 8 figures, conference. To be published in The 21st International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar2023)

  6. arXiv:2307.14860  [pdf, other

    cs.PF

    Quantum Computer Simulations at Warp Speed: Assessing the Impact of GPU Acceleration

    Authors: Jennifer Faj, Ivy Peng, Jacob Wahlgren, Stefano Markidis

    Abstract: Quantum computer simulators are crucial for the development of quantum computing. In this work, we investigate the suitability and performance impact of GPU and multi-GPU systems on a widely used simulation tool - the state vector simulator Qiskit Aer. In particular, we evaluate the performance of both Qiskit's default Nvidia Thrust backend and the recent Nvidia cuQuantum backend on Nvidia A100 GP… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  7. Leveraging HPC Profiling & Tracing Tools to Understand the Performance of Particle-in-Cell Monte Carlo Simulations

    Authors: Jeremy J. Williams, David Tskhakaya, Stefan Costea, Ivy B. Peng, Marta Garcia-Gasulla, Stefano Markidis

    Abstract: Large-scale plasma simulations are critical for designing and developing next-generation fusion energy devices and modeling industrial plasmas. BIT1 is a massively parallel Particle-in-Cell code designed for specifically studying plasma material interaction in fusion devices. Its most salient characteristic is the inclusion of collision Monte Carlo models for different plasma species. In this work… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by the Euro-Par 2023 workshops (TDLPP 2023), prepared in the standardized Springer LNCS format and consists of 12 pages, which includes the main text, references, and figures

  8. arXiv:2304.03748  [pdf, other

    cs.LG cs.AI physics.comp-ph physics.data-an

    Perspectives on AI Architectures and Co-design for Earth System Predictability

    Authors: Maruti K. Mudunuru, James A. Ang, Mahantesh Halappanavar, Simon D. Hammond, Maya B. Gokhale, James C. Hoe, Tushar Krishna, Sarat S. Sreepathi, Matthew R. Norman, Ivy B. Peng, Philip W. Jones

    Abstract: Recently, the U.S. Department of Energy (DOE), Office of Science, Biological and Environmental Research (BER), and Advanced Scientific Computing Research (ASCR) programs organized and held the Artificial Intelligence for Earth System Predictability (AI4ESP) workshop series. From this workshop, a critical conclusion that the DOE BER and ASCR community came to is the requirement to develop a new par… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: 23 pages, 1 figure

  9. arXiv:2302.09468  [pdf, other

    cs.PF cs.OS

    Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory Systems

    Authors: Jie Ren, Dong Xu, Ivy Peng, Junhee Ryu, Kwangsik Shin, Daewoo Kim, Dong Li

    Abstract: Multi-tiered large memory systems call for rethinking of memory profiling and migration because of the unique problems unseen in the traditional memory systems with smaller capacity and fewer tiers. We develop MTM, an application-transparent page management system based on three principles: (1) connecting the control of profiling overhead with the profiling mechanism for high-quality profiling; (2… ▽ More

    Submitted 1 May, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

  10. Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

    Authors: Jacob Wahlgren, Maya Gokhale, Ivy B. Peng

    Abstract: Current HPC systems provide memory resources that are statically configured and tightly coupled with compute nodes. However, workloads on HPC systems are evolving. Diverse workloads lead to a need for configurable memory resources to achieve high performance and utilization. In this study, we evaluate a memory subsystem design leveraging CXL-enabled memory pooling. Two promising use cases of compo… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 10 pages, 13 figures. Accepted for publication in Workshop on Memory Centric High Performance Computing (MCHPC'22) at SC22

  11. arXiv:2106.05373  [pdf, other

    cs.DC cs.LG cs.NE

    StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

    Authors: Artur Podobas, Martin Svedin, Steven W. D. Chien, Ivy B. Peng, Naresh Balaji Ravichandran, Pawel Herman, Anders Lansner, Stefano Markidis

    Abstract: The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at the International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART 2021)

  12. arXiv:2010.05348  [pdf, other

    physics.comp-ph cs.LG

    Automatic Particle Trajectory Classification in Plasma Simulations

    Authors: Stefano Markidis, Ivy Peng, Artur Podobas, Itthinat Jongsuebchoke, Gabriel Bengtsson, Pawel Herman

    Abstract: Numerical simulations of plasma flows are crucial for advancing our understanding of microscopic processes that drive the global plasma dynamics in fusion devices, space, and astrophysical systems. Identifying and classifying particle trajectories allows us to determine specific on-going acceleration mechanisms, shedding light on essential plasma processes. Our overall goal is to provide a gener… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at AI4S: Workshop on Artificial Intelligence and Machine Learning for Scientific Applications

  13. sputniPIC: an Implicit Particle-in-Cell Code for Multi-GPU Systems

    Authors: Steven W. D. Chien, Jonas Nylund, Gabriel Bengtsson, Ivy B. Peng, Artur Podobas, Stefano Markidis

    Abstract: Large-scale simulations of plasmas are essential for advancing our understanding of fusion devices, space, and astrophysical systems. Particle-in-Cell (PIC) codes have demonstrated their success in simulating numerous plasma phenomena on HPC systems. Today, flagship supercomputers feature multiple GPUs per compute node to achieve unprecedented computing power at high power efficiency. PIC codes re… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2020)

  14. tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

    Authors: Steven W. D. Chien, Artur Podobas, Ivy B. Peng, Stefano Markidis

    Abstract: Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and al… ▽ More

    Submitted 11 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 2020 International Conference on Cluster Computing (CLUSTER 2020)

  15. Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems

    Authors: Ivy Peng, Kai Wu, Jie Ren, Dong Li, Maya Gokhale

    Abstract: The emergence of high-density byte-addressable non-volatile memory (NVM) is promising to accelerate data- and compute-intensive applications. Current NVM technologies have lower performance than DRAM and, thus, are often paired with DRAM in a heterogeneous main memory. Recently, byte-addressable NVM hardware becomes available. This work provides a timely evaluation of representative HPC applicatio… ▽ More

    Submitted 15 February, 2020; originally announced February 2020.

    Comments: 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS2020)

  16. Performance Evaluation of Advanced Features in CUDA Unified Memory

    Authors: Steven W. D. Chien, Ivy B. Peng, Stefano Markidis

    Abstract: CUDA Unified Memory improves the GPU programmability and also enables GPU memory oversubscription. Recently, two advanced memory features, memory advises and asynchronous prefetch, have been introduced. In this work, we evaluate the new features on two platforms that feature different CPUs, GPUs, and interconnects. We derive a benchmark suite for the experiments and stress the memory system to eva… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: Accepted for publication at Workshop on Memory Centric High Performance Computing (MCHPC'19) in SC19

  17. arXiv:1910.07566  [pdf

    cs.DC

    UMap: Enabling Application-driven Optimizations for Page Management

    Authors: Ivy B. Peng, Marty McFadden, Eric Green, Keita Iwabuchi, Kai Wu, Dong Li, Roger Pearce, Maya Gokhale

    Abstract: Leadership supercomputers feature a diversity of storage, from node-local persistent memory and NVMe SSDs to network-interconnected flash memory and HDD. Memory mapping files on different tiers of storage provides a uniform interface in applications. However, system-wide services like mmap are optimized for generality and lack flexibility for enabling application-specific optimizations. In this wo… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

  18. System Evaluation of the Intel Optane Byte-addressable NVM

    Authors: Ivy B. Peng, Maya B. Gokhale, Eric W. Green

    Abstract: Byte-addressable non-volatile memory (NVM) features high density, DRAM comparable performance, and persistence. These characteristics position NVM as a promising new tier in the memory hierarchy. Nevertheless, NVM has asymmetric read and write performance, and considerably higher write energy than DRAM. Our work provides an in-depth evaluation of the first commercially available byte-addressable N… ▽ More

    Submitted 18 August, 2019; originally announced August 2019.

    Journal ref: In Proceedings of the International Symposium on Memory Systems, 2019

  19. Posit NPB: Assessing the Precision Improvement in HPC Scientific Applications

    Authors: Steven W. D. Chien, Ivy B. Peng, Stefano Markidis

    Abstract: Floating-point operations can significantly impact the accuracy and performance of scientific applications on large-scale parallel systems. Recently, an emerging floating-point format called Posit has attracted attention as an alternative to the standard IEEE floating-point formats because it could enable higher precision than IEEE formats using the same number of bits. In this work, we first expl… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: Accepted for publication in PPAM 2019 conference

  20. arXiv:1810.04110  [pdf, other

    cs.DC

    MPI Windows on Storage for HPC Applications

    Authors: Sergio Rivas-Gomez, Roberto Gioiosa, Ivy Bo Peng, Gokcen Kestor, Sai Narasimhamurthy, Erwin Laure, Stefano Markidis

    Abstract: Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we e… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.

  21. The SAGE Project: a Storage Centric Approach for Exascale Computing

    Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Steven Wei-der Chien, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Shaun de Witt, Dirk Pleiter, Stefano Markidis

    Abstract: SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale r… ▽ More

    Submitted 6 July, 2018; originally announced July 2018.

    Comments: Submitted to Computing Frontiers 2018. arXiv admin note: substantial text overlap with arXiv:1805.00556

  22. SAGE: Percipient Storage for Exascale Data Centric Computing

    Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Stefano Markidis, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Dirk Pleiter, Shaun de Witt

    Abstract: We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infras… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Journal ref: Parallel Computing, 23 March 2018

  23. NVIDIA Tensor Core Programmability, Performance & Precision

    Authors: Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, Jeffrey S. Vetter

    Abstract: The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to pro… ▽ More

    Submitted 11 March, 2018; originally announced March 2018.

    Comments: This paper has been accepted by the Eighth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018

  24. arXiv:1708.01306  [pdf, other

    cs.DC

    MPI Streams for HPC Applications

    Authors: Ivy Bo Peng, Stefano Markidis, Roberto Gioiosa, Gokcen Kestor, Erwin Laure

    Abstract: Data streams are a sequence of data flowing between source and destination processes. Streaming is widely used for signal, image and video processing for its efficiency in pipelining and effectiveness in reducing demand for memory. The goal of this work is to extend the use of data streams to support both conventional scientific applications and emerging data analytic applications running on HPC p… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

    Comments: Advances in Parallel Computing

  25. arXiv:1708.01304  [pdf, other

    cs.DC

    Preparing HPC Applications for the Exascale Era: A Decoupling Strategy

    Authors: Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Erwin Laure, Stefano Markidis

    Abstract: Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of parallel applications, each process performs all the operations, which might result inefficient and seriously limit scalability, especially at large scale. We propo… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

    Comments: The 46th International Conference on Parallel Processing (ICPP-2017)

  26. arXiv:1704.08492  [pdf

    cs.DC

    Extending Message Passing Interface Windows to Storage

    Authors: Sergio Rivas-Gomez, Stefano Markidis, Ivy Bo Peng, Erwin Laure, Gokcen Kestor, Roberto Gioiosa

    Abstract: This work presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could poten… ▽ More

    Submitted 27 April, 2017; originally announced April 2017.

  27. Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

    Authors: Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Erwin Laure, Stefano Markidis

    Abstract: Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventi… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

  28. Idle Period Propagation in Message-Passing Applications

    Authors: Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, Roberto Gioiosa

    Abstract: Idle periods on different processes of Message Passing applications are unavoidable. While the origin of idle periods on a single process is well understood as the effect of system and architectural random delays, yet it is unclear how these idle periods propagate from one process to another. It is important to understand idle period propagation in Message Passing applications as it allows applica… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

    Comments: 18th International Conference on High Performance Computing and Communications, IEEE, 2016

  29. Exploring Application Performance on Emerging Hybrid-Memory Supercomputers

    Authors: Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, Roberto Gioiosa

    Abstract: Next-generation supercomputers will feature more hierarchical and heterogeneous memory systems with different memory technologies working side-by-side. A critical question is whether at large scale existing HPC applications and emerging data-analytics workloads will have performance improvement or degradation on these systems. We propose a systematic and fair methodology to identify the trend of a… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

    Comments: 18th International Conference on High Performance Computing and Communications, IEEE, 2016