Publications

2025
- New Algorithms for Incremental Minimum Spanning Trees and Temporal Graph Applications
  - Xiangyun Ding, Yan Gu, Yihan Sun
  - SIAM Conference on Applied and Computational Discrete Algorithms (ACDA 2025)
  Abstract
  Processing graphs with temporal information (the temporal graphs) has become increasingly important in the real world. In this paper, we study efficient solutions to temporal graph applications using new algorithms for Incremental Minimum Spanning Trees (MST). The first contribution of this work is to formally discuss how a broad set of setting-problem combinations of temporal graph processing can be solved using incremental MST, along with their theoretical guarantees.
  Despite the importance of the problem, we observe a gap between theory and practice for efficient incremental MST algorithms. While many classic data structures, such as the link-cut tree, provide strong bounds for incremental MST, their performance is limited in practice. Meanwhile, existing practical solutions used in applications do not have any non-trivial theoretical guarantees. Our second and main contribution includes new algorithms for incremental MST that are efficient both in theory and in practice. Our new data structure, the AM-tree, achieves the same theoretical bound as the link-cut tree for temporal graph processing and shows strong performance in practice. In our experiments, the AM-tree has competitive or better performance than existing practical solutions due to theoretical guarantees, and can be significantly faster than the link-cut tree (7.8-11x in updates and 7.7-13.7x in queries).
2024
- Parallel and (Nearly) Work-Efficient Dynamic Programming
  - Xiangyun Ding, Yan Gu, Yihan Sun
  - The 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2024)
  - Outstanding Paper! (Best Paper Finalist) 🎉
  Abstract
  The idea of dynamic programming (DP), proposed by Bellman in the 1950s, is one of the most important algorithmic techniques. However, in parallel, many fundamental and sequentially simple problems become more challenging, and open to a (nearly) work-efficient solution (i.e., the work is off by at most a polylogarithmic factor over the best sequential solution). In fact, sequential DP algorithms employ many advanced optimizations such as decision monotonicity or special data structures, and achieve better work than straightforward solutions. Many such optimizations are inherently sequential, which creates extra challenges for a parallel algorithm to achieve the same work bound.
  The goal of this paper is to achieve (nearly) work-efficient parallel DP algorithms by parallelizing classic, highly-optimized and practical sequential algorithms. We show a general framework called the Cordon Algorithm for parallel DP algorithms, and use it to solve several classic problems. Our selection of problems includes Longest Increasing Subsequence (LIS), sparse Longest Common Subsequence (LCS), convex/concave generalized Least Weight Subsequence (LWS), Optimal Alphabetic Tree (OAT), and more. We show how the Cordon Algorithm can be used to achieve the same level of optimization as the sequential algorithms, and achieve good parallelism. Many of our algorithms are conceptually simple, and we show some experimental results as proofs-of-concept.
- Fast and Space-Efficient Parallel Algorithms for Influence Maximization
  - Letong Wang, Xiangyun Ding, Yan Gu, Yihan Sun
  - Proceedings of the VLDB Endowment (VLDB), 2024
  Abstract
  Influence Maximization (IM) is a crucial problem in data science. The goal is to find a fixed-size set of highly influentialseedvertices on a network to maximize the influence spread along the edges. While IM is NP-hard on commonly used diffusion models, a greedy algorithm can achieve (1 - 1/e)-approximation by repeatedly selecting the vertex with the highestmarginal gainin influence as the seed. However, we observe two performance issues in the existing work that prevent them from scaling to today's large-scale graphs: space-inefficient memorization to estimate marginal gain, and time-inefficient seed selection process due to a lack of parallelism.
  This paper significantly improves the scalability of IM using two key techniques. The first is asketch-compressiontechnique for the independent cascading model on undirected graphs. It allows combining the simulation and sketching approaches to achieve a time-space tradeoff. The second technique includes new data structures for parallel seed selection. Using our new approaches, we implemented PaC-IM: Parallel and Compressed IM.
  We compare PaC-IM with state-of-the-art parallel IM systems on a 96-core machine with 1.5TB memory. PaC-IM can process the ClueWeb graph with 978M vertices and 75B edges in about 2 hours. On average, across all tested graphs, our uncompressed version is 5--18x faster and about 1.4x more space-efficient than existing parallel IM systems. Using compression further saves 3.8x space with only 70% overhead in time on average.
2023
- Efficient Parallel Output-Sensitive Edit Distance
  - Xiangyun Ding, Xiaojun Dong, Yan Gu, Youzhe Liu, Yihan Sun
  - 31st Annual European Symposium on Algorithms (ESA 2023)
  - Best Paper Award! 🎉
  Abstract
  In this paper, we study efficient parallel edit distance algorithms, both in theory and in practice. Given two strings A[1..n] and B[1..m], and a set of operations allowed to edit the strings, the edit distance between A and B is the minimum number of operations required to transform A into B. In this paper, we use edit distance to refer to the Levenshtein distance, which allows for unit-cost single-character edits (insertions, deletions, substitutions). Sequentially, a standard Dynamic Programming (DP) algorithm solves edit distance with Θ(nm) cost. In many real-world applications, the strings to be compared are similar to each other and have small edit distances. To achieve highly practical implementations, we focus on output-sensitive parallel edit-distance algorithms, i.e., to achieve asymptotically better cost bounds than the standard Θ(nm) algorithm when the edit distance is small. We study four algorithms in the paper, including three algorithms based on Breadth-First Search (BFS), and one algorithm based on Divide-and-Conquer (DaC). Our BFS-based solution is based on the Landau-Vishkin algorithm. We implement three different data structures for the longest common prefix (LCP) queries needed in the algorithm: the classic solution using parallel suffix array, and two hash-based solutions proposed in this paper. Our DaC-based solution is inspired by the output-insensitive solution proposed by Apostolico et al., and we propose a non-trivial adaption to make it output-sensitive. All of the algorithms studied in this paper have good theoretical guarantees, and they achieve different tradeoffs between work (total number of operations), span (longest dependence chain in the computation), and space.
  We test and compare our algorithms on both synthetic data and real-world data, including DNA sequences, Wikipedia texts, GitHub repositories, etc. Our BFS-based algorithms outperform the existing parallel edit-distance implementation in ParlayLib in all test cases. On cases with fewer than 10⁵ edits, our algorithm can process input sequences of size 10⁹ in about ten seconds, while ParlayLib can only process sequences of sizes up to 10⁶ in the same amount of time. By comparing our algorithms, we also provide a better understanding of the choice of algorithms for different input patterns. We believe that our paper is the first systematic study in the theory and practice of parallel edit distance.
2022
- DP-Nets: Dynamic programming assisted quantization schemes for DNN compression and acceleration
  - Dingcheng Yang, Wenjian Yu, Xiangyun Ding, Ao Zhou, Xiaoyi Wang
  - Integration, the VLSI Journal
  Abstract
  In this work, we present effective quantization schemes called DP-Nets for the compression and acceleration of deep neural networks (DNNs). A key ingredient is a novel dynamic programming (DP) based algorithm to obtain the optimal solution of scalar K-means clustering. Based on the approaches with regularization and quantization function, two weight quantization approaches called DPR and DPQ for compressing normal DNNs are proposed respectively. Accordingly, a technique based on DP-Nets for inference acceleration is presented. Experiments show that DP-Nets produce models with higher inference accuracy than recently proposed counterparts while achieving same or larger compression. They are also extended for compressing robust DNNs, and the relevant experiments show 16X compression of the robust ResNet-18 model with less than 3% accuracy drop on both natural and adversarial examples. The experiments with FPGA demonstrate that the technique for inference acceleration brings over 5X speedup on matrix–vector multiplication.
2020
- Efficient model-based collaborative filtering with fast adaptive PCA
  - Xiangyun Ding, Wenjian Yu, Yuyang Xie, Shenghua Liu
  - IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI 2020)
  Abstract
  A model-based collaborative filtering (CF) approach utilizing fast adaptive randomized singular value decomposition (SVD) is proposed for the matrix completion problem in recommender system. Firstly, a fast adaptive PCA framework is presented which combines the fixed-precision randomized matrix factorization algorithm [1] and accelerating skills for handling large sparse data. Then, a novel termination mechanism for the adaptive PCA is proposed to automatically determine a number of latent factors for achieving the near optimal prediction accuracy during the subsequent model-based CF. The resulted CF approach has good accuracy while inheriting high runtime efficiency. Experiments on real data show that, the proposed adaptive PCA is up to 2.7X and 6.7X faster than the original fixed-precision SVD approach [1] and svds in Matlab repsectively, while preserving accuracy. The proposed model-based CF approach is able to efficiently process the MovieLens data with 20M ratings and exhibits more than 10X speedup over the regularized matrix factorization based approach [2] and the fast singular value thresholding approach [3] with comparable or better accuracy. Compared with the deep-learning-based CF approach, the proposed approach is much more computationally efficient, with just marginal performance loss.

Xiangyun Ding