Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2026
-
MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention ComputationAdvances in Ninth Annual Conference on Machine Learning and Systems, MLSys 2026, 2026 -
From Skew to Symmetry: Node-Interconnect Multi-Path Balancing with Execution-time Planning for Modern GPU ClustersAdvances in 40th IEEE International Parallel & Distributed Processing Symposium, IPDPS 2026, 2026
2025
- Training ultra long context language model with fully pipelined distributed transformerAdvances in Eighth Annual Conference on Machine Learning and Systems, MLSys 2025, 2025
2024
-
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model InferenceAdvances in 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS 24), 2024
2023
-
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel InferenceAdvances in 30th IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, & ANALYTICS (HiPC 23), 2023 -
A novel framework for efficient offloading of communication operations to bluefield smartnicsIn 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS 23) , 2023 -
MPI-xCCL: A Portable MPI Library over Collective Communication Libraries for Various AcceleratorsIn Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis , 2023
2021
-
Soft: Softmax-free transformer with linear complexityAdvances in Neural Information Processing Systems (NeurIPS 21), 2021
2020
-
SPRNet: single-pixel reconstruction for one-stage instance segmentationIEEE transactions on cybernetics, 2020