Di Zhuang

Santa Monica, California, United States Contact Info
1K followers 500+ connections

Join to view profile

About

- 7+ years of academic research experience in security, privacy-enhancing technologies…

Activity

Join now to see all activity

Experience & Education

  • Snap Inc.

View Di’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Publications

  • Discriminative Adversarial Domain Generalization with Meta-learning based Cross-domain Validation

    https://arxiv.org/abs/2011.00444

    The generalization capability of machine learning models, which refers to generalizing the knowledge for an ``unseen'' domain via learning from one or multiple seen domain(s), is of great importance to develop and deploy machine learning applications in the real-world conditions. Domain Generalization (DG) techniques aim to enhance such generalization capability of machine learning models, where the learnt feature representation and the classifier are two crucial factors to improve…

    The generalization capability of machine learning models, which refers to generalizing the knowledge for an ``unseen'' domain via learning from one or multiple seen domain(s), is of great importance to develop and deploy machine learning applications in the real-world conditions. Domain Generalization (DG) techniques aim to enhance such generalization capability of machine learning models, where the learnt feature representation and the classifier are two crucial factors to improve generalization and make decisions. In this paper, we propose Discriminative Adversarial Domain Generalization (DADG) with meta-learning-based cross-domain validation. Our proposed framework contains two main components that work synergistically to build a domain-generalized DNN model: (i) discriminative adversarial learning, which proactively learns a generalized feature representation on multiple ``seen'' domains, and (ii) meta-learning based cross domain validation, which simulates train/test domain shift via applying meta-learning techniques in the training process. In the experimental evaluation, a comprehensive comparison has been made among our proposed approach and other existing approaches on three benchmark datasets. The results shown that DADG consistently outperforms a strong baseline DeepAll, and outperforms the other existing DG algorithms in most of the evaluation cases.

    See publication
  • Utility-aware Privacy-preserving Data Releasing

    arXiv preprint arXiv:2005.04369

    In the big data era, more and more cloud-based data-driven applications are developed that leverage individual data to provide certain valuable services (the utilities). On the other hand, since the same set of individual data could be utilized to infer the individual's certain sensitive information, it creates new channels to snoop the individual's privacy. Hence it is of great importance to develop techniques that enable the data owners to release privatized data, that can still be utilized…

    In the big data era, more and more cloud-based data-driven applications are developed that leverage individual data to provide certain valuable services (the utilities). On the other hand, since the same set of individual data could be utilized to infer the individual's certain sensitive information, it creates new channels to snoop the individual's privacy. Hence it is of great importance to develop techniques that enable the data owners to release privatized data, that can still be utilized for certain premised intended purpose. Existing data releasing approaches, however, are either privacy-emphasized (no consideration on utility) or utility-driven (no guarantees on privacy). In this work, we propose a two-step perturbation-based utility-aware privacy-preserving data releasing framework. First, certain predefined privacy and utility problems are learned from the public domain data (background knowledge). Later, our approach leverages the learned knowledge to precisely perturb the data owners' data into privatized data that can be successfully utilized for certain intended purpose (learning to succeed), without jeopardizing certain predefined privacy (training to fail). Extensive experiments have been conducted on Human Activity Recognition, Census Income and Bank Marketing datasets to demonstrate the effectiveness and practicality of our framework.

    See publication
  • CS-AF: A Cost-sensitive Multi-classifier Active Fusion Framework for Skin Lesion Classification

    arXiv preprint arXiv:2004.12064

    Convolutional neural networks (CNNs) have achieved the state-of-the-art performance in skin lesion analysis. Compared with single CNN classifier, combining the results of multiple classifiers via fusion approaches shows to be more effective and robust. Since the skin lesion datasets are usually limited and statistically biased, while designing an effective fusion approach, it is important to consider not only the performance of each classifier on the training/validation dataset, but also the…

    Convolutional neural networks (CNNs) have achieved the state-of-the-art performance in skin lesion analysis. Compared with single CNN classifier, combining the results of multiple classifiers via fusion approaches shows to be more effective and robust. Since the skin lesion datasets are usually limited and statistically biased, while designing an effective fusion approach, it is important to consider not only the performance of each classifier on the training/validation dataset, but also the relative discriminative power (e.g., confidence) of each classifier regarding an individual sample in the testing phase, which calls for an active fusion approach. Furthermore, in skin lesion analysis, the data of certain classes is usually abundant making them an over-represented majority (e.g., benign lesions), while the data of some other classes is deficient, making them an underrepresented minority (e.g., cancerous lesions). It is more crucial to precisely identify the samples from an underrepresented (i.e., in terms of the amount of data) but more important (e.g., the cancerous lesions) minority class. In other words, misclassifying a more severe lesion to a benign or less severe lesion should have relative more cost (e.g., money, time and even lives). To address such challenges, we present CS-AF, a cost-sensitive multi-classifier active fusion framework for skin lesion classification. In the experimental evaluation, we prepared 60 base classifiers (of 10 CNN architectures) on the ISIC research datasets. Our experimental results show that our framework consistently outperforms the static fusion competitors.

    See publication
  • SAIA: Split Artificial Intelligence Architecture for Mobile Healthcare System

    arXiv preprint arXiv:2004.12059

    As the advancement of deep learning (DL), the Internet of Things and cloud computing techniques for biomedical and healthcare problems, mobile healthcare systems have received unprecedented attention. Since DL techniques usually require enormous amount of computation, most of them cannot be directly deployed on the resource-constrained mobile and IoT devices. Hence, most of the mobile healthcare systems leverage the cloud computing infrastructure, where the data collected by the mobile and IoT…

    As the advancement of deep learning (DL), the Internet of Things and cloud computing techniques for biomedical and healthcare problems, mobile healthcare systems have received unprecedented attention. Since DL techniques usually require enormous amount of computation, most of them cannot be directly deployed on the resource-constrained mobile and IoT devices. Hence, most of the mobile healthcare systems leverage the cloud computing infrastructure, where the data collected by the mobile and IoT devices would be transmitted to the cloud computing platforms for analysis. However, in the contested environments, relying on the cloud might not be practical at all times. For instance, the satellite communication might be denied or disrupted. We propose SAIA, a Split Artificial Intelligence Architecture for mobile healthcare systems. Unlike traditional approaches for artificial intelligence (AI) which solely exploits the computational power of the cloud server, SAIA could not only relies on the cloud computing infrastructure while the wireless communication is available, but also utilizes the lightweight AI solutions that work locally on the client side, hence, it can work even when the communication is impeded. In SAIA, we propose a meta-information based decision unit, that could tune whether a sample captured by the client should be operated by the embedded AI (i.e., keeping on the client) or the networked AI (i.e., sending to the server), under different conditions. In our experimental evaluation, extensive experiments have been conducted on two popular healthcare datasets. Our results show that SAIA consistently outperforms its baselines in terms of both effectiveness and efficiency.

    See publication
  • AutoGAN-based Dimension Reduction for Privacy Preservation

    Neurocomputing 384, 94-103

    Protecting sensitive information against data exploiting attacks is an emerging research area in data mining. Over the past, several different methods have been introduced to protect individual privacy from such attacks while maximizing data-utility of the application. However, these existing techniques are not sufficient to effectively protect data owner privacy, especially in the scenarios that utilize visualizable data (e.g. images, videos) or the applications that require heavy computations…

    Protecting sensitive information against data exploiting attacks is an emerging research area in data mining. Over the past, several different methods have been introduced to protect individual privacy from such attacks while maximizing data-utility of the application. However, these existing techniques are not sufficient to effectively protect data owner privacy, especially in the scenarios that utilize visualizable data (e.g. images, videos) or the applications that require heavy computations for implementation. To address these problems, we propose a new dimension reduction-based method for privacy preservation. Our method generates dimension-reduced data for performing machine learning tasks and prevents a strong adversary from reconstructing the original data. We first introduce a theoretical approach to evaluate dimension reduction-based privacy preserving mechanisms, then propose a non-linear dimension reduction framework motivated by state-of-the-art neural network structures for privacy preservation. We conducted experiments over three different face image datasets (AT&T, YaleB, and CelebA), and the results show that when the number of dimensions is reduced to seven, we can achieve the accuracies of 79%, 80%, and 73% respectively and the reconstructed images are not recognizable to naked human eyes.

    Other authors
    • Pei-Yuan Wu
    • Hung Nguyen
    • J. Morris Chang
    See publication
  • DynaMo: Dynamic Community Detection by Incrementally Maximizing Modularity

    Accepted for IEEE Transactions on Knowledge and Data Engineering

    Community detection is of great importance for online social network analysis. The volume, variety and velocity of data generated by today's online social networks are advancing the way researchers analyze those networks. For instance, real-world networks, such as Facebook, LinkedIn and Twitter, are inherently growing rapidly and expanding aggressively over time. However, most of the studies so far have been focusing on detecting communities on the static networks. It is computationally…

    Community detection is of great importance for online social network analysis. The volume, variety and velocity of data generated by today's online social networks are advancing the way researchers analyze those networks. For instance, real-world networks, such as Facebook, LinkedIn and Twitter, are inherently growing rapidly and expanding aggressively over time. However, most of the studies so far have been focusing on detecting communities on the static networks. It is computationally expensive to directly employ a well-studied static algorithm repeatedly on the network snapshots of the dynamic networks. We propose DynaMo, a novel modularity-based dynamic community detection algorithm, aiming to detect communities of dynamic networks as effective as repeatedly applying static algorithms but in a more efficient way. DynaMo is an adaptive and incremental algorithm, which is designed for incrementally maximizing the modularity gain while updating the community structure of dynamic networks. In the experimental evaluation, a comprehensive comparison has been made among DynaMo, Louvain (static) and 5 other dynamic algorithms. Extensive experiments have been conducted on 6 real-world networks and 10,000 synthetic networks. Our results show that DynaMo outperforms all the other 5 dynamic algorithms in terms of the effectiveness, and is 2 to 5 times (by average) faster than Louvain algorithm.

    Other authors
    See publication
  • Enhanced PeerHunter: Detecting Peer-to-peer Botnets through Network-Flow Level Community Behavior Analysis

    IEEE Transactions on Information Forensics and Security 14.6 (2019): 1485-1500.

    Peer-to-peer (P2P) botnets have become one of the major threats in network security for serving as the fundamental infrastructure for various cyber-crimes. More challenges are involved in the problem of detecting P2P botnets, despite a few work claimed to detect centralized botnets effectively. We propose Enhanced PeerHunter, a network-flow level community behavior analysis based system, to detect P2P botnets. Our system starts from a P2P network flow detection component. Then, it uses “mutual…

    Peer-to-peer (P2P) botnets have become one of the major threats in network security for serving as the fundamental infrastructure for various cyber-crimes. More challenges are involved in the problem of detecting P2P botnets, despite a few work claimed to detect centralized botnets effectively. We propose Enhanced PeerHunter, a network-flow level community behavior analysis based system, to detect P2P botnets. Our system starts from a P2P network flow detection component. Then, it uses “mutual contacts” to cluster bots into communities. Finally, it uses network-flow level community behavior analysis to detect potential botnets. In the experimental evaluation, we propose two evasion attacks, where we assume the adversaries know our techniques in advance and attempt to evade our system by making the P2P bots mimic the behavior of legitimate P2P applications. Our results showed that Enhanced PeerHunter can obtain high detection rate with few false positives, and high robustness against the proposed attacks.

    See publication
  • FRiPAL: Face Recognition in Privacy Abstraction Layer

    IEEE Conference on Dependable and Secure Computing

    Data-driven mobile applications are becoming increasingly popular in civilian and law enforcement. RapidGather, for instance, is an smartphone application that collects data from individual, and spreads rapid emergency responses. Image data is widely used in such applications, and machine learning methods could be utilized to analyze the image data. However, people would hesitate to share the data without protecting their privacy. In this paper, we propose to utilize dimensionality reduction…

    Data-driven mobile applications are becoming increasingly popular in civilian and law enforcement. RapidGather, for instance, is an smartphone application that collects data from individual, and spreads rapid emergency responses. Image data is widely used in such applications, and machine learning methods could be utilized to analyze the image data. However, people would hesitate to share the data without protecting their privacy. In this paper, we propose to utilize dimensionality reduction techniques for privacy-preserving machine learning in face recognition for the image data. To demonstrate the proposed approach, we implement a client server system, FRiPAL. With extensive experiments, we show that FRiPAL is efficient, and could preserve the privacy of data owners while maintaining the utility for data users.

    Other authors
    See publication
  • PeerHunter: Detecting peer-to-peer botnets through community behavior analysis

    IEEE Conference on Dependable and Secure Computing

    Peer-to-peer (P2P) botnets have become one of the major threats in network security for serving as the infrastructure that responsible for various of cyber-crimes. Though a few existing work claimed to detect traditional botnets effectively, the problem of detecting P2P botnets involves more challenges. In this paper, we present PeerHunter, a community behavior analysis based method, which is capable of detecting botnets that communicate via a P2P structure. PeerHunter starts from a P2P hosts…

    Peer-to-peer (P2P) botnets have become one of the major threats in network security for serving as the infrastructure that responsible for various of cyber-crimes. Though a few existing work claimed to detect traditional botnets effectively, the problem of detecting P2P botnets involves more challenges. In this paper, we present PeerHunter, a community behavior analysis based method, which is capable of detecting botnets that communicate via a P2P structure. PeerHunter starts from a P2P hosts detection component. Then, it uses mutual contacts as the main feature to cluster bots into communities. Finally, it uses community behavior analysis to detect potential botnet communities and further identify bot candidates. Through extensive experiments with real and simulated network traces, PeerHunter can achieve very high detection rate and low false positives.

    See publication

Courses

  • Advanced Protocols and Network Security

    CPR E 530

  • Assembly Language Programming

    -

  • Computational Perception

    CPR E 575

  • Computer Network

    -

  • Computer and Network Forensics

    CPR E 536

  • Cryptography and Coding Theory

    MAT 5932

  • Data Analytics in Electrical and Computer Engineering

    E E 525X

  • Design and Analysis of Algorithms

    CPR E 511

  • Embedded Control System

    -

  • Information System Security

    CPR E 531

  • Information Warfare

    CPR E 532

  • Linear Programming

    I E 534

  • Machine Learning

    COM S 573

  • Network Science

    EEL 6935

  • Operating System

    -

  • Principles of Computer Organization

    -

  • Principles of Database System

    -

  • Probabilistic Methods in Computer Engineering

    CPR E 528

  • Statistical Inference

    EEL 6936

Projects

  • Data-Driven Intelligence for Active Identification and Characterization (Sponsor: U.S. DoD)

    - Present

    - Led a team of 7 Ph.D. students in the EE Department of USF to build an active Identification and Characterization System (IDCS) that would actively integrate real-time information derived from physical and behavioral biometrics, efficiently process big data using machine learning/deep learning techniques, and effectively integrate the results using fusion algorithms

    - Proposed and implemented a Multi-classifier Active Fusion approach on ISIC 2019 skin lesion dataset (8 classes, 25,331…

    - Led a team of 7 Ph.D. students in the EE Department of USF to build an active Identification and Characterization System (IDCS) that would actively integrate real-time information derived from physical and behavioral biometrics, efficiently process big data using machine learning/deep learning techniques, and effectively integrate the results using fusion algorithms

    - Proposed and implemented a Multi-classifier Active Fusion approach on ISIC 2019 skin lesion dataset (8 classes, 25,331 images) using 96 DNN classifiers of 12 latest DNN architectures (e.g., SENet, PNASNet, NAS-Net, EfficientNet-B7, etc.), achieving an accuracy of 90.6% (vs. 89.1% for the weighted average ensemble algorithm and 88% for SOTA DNN classifier, SENet, under the same settings (PyTorch) (link to preprint: https://arxiv.org/pdf/2004.12064.pdf)

    - Proposed Split AI Architecture (SAIA) for automated medical diagnosis on resource-constrained mobile devices (e.g., Android smartphones), that can intelligently determine whether an image captured by the mobile device should be diagnosed by the embedded AI (TensorFlow Lite) or uploaded to the server and be processed by the server-side using more powerful AI solutions according to the accuracy requirement of disease diagnosis(done), communication availability (done), and energy consumption (ongoing) (link to preprint: https://arxiv.org/pdf/2004.12059.pdf)

    - Implemented SAIA on ISIC 2019 skin lesion dataset (8 classes, 25,331 images) and a nail fungus dataset (2 classes, 53,794 useful images), and demonstrated that the design of SAIA dramatically enhances the efficiency and practicality for automated medical diagnosis on resource-constrained mobile devices (achieved an accuracy of 90.6% for skin lesion classification and an accuracy of 93.2% for nail fungus identification while sending 35%-55% less amount of the images to the server-side)

  • Locally Differentially Private Deep Learning (LDP-DL) (Sponsor: Cyber Florida)

    - Present

    - Designed and developed a privacy-preserving distributed deep learning framework via local differential privacy and knowledge distillation (Differential Privacy, PyTorch, IBM Diffprivlib, NumPy)

    - Designed an active query sampling (active learning) approach, that actively selects a subset of the unlabeled data to query their labels from the LDP-DL models, and dramatically reduces the privacy budget required by LDP (25%-50% privacy budget reduction, evaluated on CIFAR10, MNIST, and…

    - Designed and developed a privacy-preserving distributed deep learning framework via local differential privacy and knowledge distillation (Differential Privacy, PyTorch, IBM Diffprivlib, NumPy)

    - Designed an active query sampling (active learning) approach, that actively selects a subset of the unlabeled data to query their labels from the LDP-DL models, and dramatically reduces the privacy budget required by LDP (25%-50% privacy budget reduction, evaluated on CIFAR10, MNIST, and Fashion-MNIST)

  • Dynamic Community Detection in Social Networks (Sponsor: USF Dissertation Completion Fellowship)

    -

    - Designed a novel, effective, and efficient Dynamic Community Detection (DCD) algorithm, DynaMo, that can dynamically detect non-overlapped communities in real-world social networks

    - Developed DynaMo on real-world social networks (e.g., Facebook of 60k users, Flickr of 780k users, DBLP of 1.4 million authors, YouTube of 3.2 million users) and 10k synthetic networks, and demonstrated that DynaMo consistently outperforms 5 latest DCD competitors in terms of the effectiveness and is 2 to…

    - Designed a novel, effective, and efficient Dynamic Community Detection (DCD) algorithm, DynaMo, that can dynamically detect non-overlapped communities in real-world social networks

    - Developed DynaMo on real-world social networks (e.g., Facebook of 60k users, Flickr of 780k users, DBLP of 1.4 million authors, YouTube of 3.2 million users) and 10k synthetic networks, and demonstrated that DynaMo consistently outperforms 5 latest DCD competitors in terms of the effectiveness and is 2 to 5 times faster than the Louvain algorithm (one of the best static algorithms) (publication: https://ieeexplore.ieee.org/document/8890861 + code in Java: https://github.com/nogrady/dynamo)

  • Efficient Privacy-Preserving Machine Learning (Sponsor: DARPA)

    -

    - Led a team of 4∼7 EE/CS Ph.D. students at ISU/USF to develop Privacy-Preserving Machine Learning techniques to help data owners and data users make use of the data without compromising the data owner’s privacy; coordinated and successfully delivered collaborative research projects and demonstrations between team members at ISU/USF and Princeton University

    - Designed and developed Face Recognition in Privacy Abstraction Layer (FRiPAL) system, where an Android app client can…

    - Led a team of 4∼7 EE/CS Ph.D. students at ISU/USF to develop Privacy-Preserving Machine Learning techniques to help data owners and data users make use of the data without compromising the data owner’s privacy; coordinated and successfully delivered collaborative research projects and demonstrations between team members at ISU/USF and Princeton University

    - Designed and developed Face Recognition in Privacy Abstraction Layer (FRiPAL) system, where an Android app client can capture/select, segment, and send facial images to the server that conducts privacy-preserving facial recognition (using Java, MySQL, RabbitMQ, Protocol Buffers, Maven, VMware, Docker, and Google Protocol Buffer) (publication: https://ieeexplore.ieee.org/document/8073826)

    - Designed and developed efficient privacy-preserving machine learning algorithms using dimensionality reduction (e.g., PCA), differential privacy, and homomorphic encryption (e.g., Paillier) techniques (Java and Python)

    Other creators
  • Peer-to-peer Botnet Detection using Community Behavior Analysis

    -

    - Designed and developed a novel, effective, and efficient P2P botnet detection algorithm,PeerHunter, using network-flow level community behavior analysis (achieved 100% detection rate with less than 3% false-positive rate) (Java, MapReduce, Apache Hadoop, Apache Giraph) (publication: https://ieeexplore.ieee.org/document/8536452 + code in Java: https://github.com/nogrady/Enhanced_PeerHunter_Botnet_Detection)

  • Capturing Cognitive Fingerprints for Active Authentication (Sponsor: DARPA)

    -

    - Proposed to capture cognitive fingerprints from individuals’ web browsing behavior and use them as biometrics for active authentication

    - Designed and developed Chrome extensions to collect, extract and analyze users’ web browsing behaviors for active authentication with machine learning approaches, involving Java, PHP, JavaScript, MySQL and HTML

    Other creators
  • Markov Chain Analysis on trend & number prediction of CyRide passengers

    -

    - Cpr E 528: Probabilistic Methods in Computer Engineering course project

    - Developed several Markov Chain models to do trend & number prediction analysis on 8 years real data of CyRide passengers

    - Implemented and simulated each model using Matlab, Excel and Access

  • Solving Sudoku with binary integer linear programming

    -

    - IE 534 Linear Programming course project

    - Designed and implemented a Sudoku solving algorithm with binary integer linear programming using Matlab

    - Implemented a Sudoku automatically solving software using mixed C# - Matlab

    Other creators
  • Design & Implementation of an RSSI-based Indoor Location System

    -

    - Designed a RSSI-based indoor location algorithm, which can, to a certain extent, overcome the dependence on the complex environment of the location area

    - Developed the Client module on Android platform and implement the Server module using C#

  • Campus Vehicle GPS Navigation System

    -

    - Led a 4 people team in designing and implementing a vehicle GPS navigation system on the ARM – Linux platform using the map of Nankai University

    - Implemented and tested the GPS signal acquisition module on ARM 11 (FriendlyArm mini6410)

    - Designed and implemented path selection algorithm using C

    - Implemented the GUI using Qt

  • Student Information Management System

    -

    - Designed the backstage database of this system using Microsoft SQL Server

    - Implemented this B/S mode system using C# and ASP.NET

Honors & Awards

  • Allan R. Gondeck Memorial Scholarship

    University of South Florida

  • Dissertation Completion Fellowship

    University of South Florida

  • R. Howard and Hazel Porter Scholarship

    Iowa State University

    A scholarship for high performing international university students.

Languages

  • English

    Full professional proficiency

  • Chinese

    Native or bilingual proficiency

Organizations

  • Phi Kappa Phi

    Member

    - Present
  • IEEE - Eta Kappa Nu (IEEE - HKN)

    Member

    - Present
  • Tau Beta Pi - The Engineering Honor Society

    Member

    - Present
  • IEEE

    Student Member

More activity by Di

View Di’s full profile

  • See who you know in common
  • Get introduced
  • Contact Di directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Di Zhuang

Add new skills with these courses