CS59200: AI/DC Networking
Instructor: Vamsi Addanki
Department: Computer Science, Purdue University
Semester: Fall 2025
Location: LWSN B134
Time: TR 3:00PM - 4:15PM
Credit hours: 3
Email: vaddank@purdue.edu
Target Audience
This course is particularly valuable for early-stage PhD students in computer science who are interested in pursuing research at the intersection of AI, networking, and computing systems.
Course Overview
Have you ever wondered what makes massive AI models possible?
How do thousands of GPUs communicate to train today’s largest models? Can AI itself help configure and optimize the networks that connect them?
From cutting-edge datacenter architectures to adaptive photonic interconnects that bend light, this course dives into the critical role networking plays in enabling AI at scale — and how AI, in turn, is transforming the way we build and manage networks.
In this course, we will explore the algorithms, architectures, and open research challenges at the frontier where AI and networking meet.
This seminar-style course will cover a range of topics, including: datacenter network topologies, collective communication algorithms (GPU-to-GPU communication), photonic interconnects, network congestion control and load balancing, AI-assisted algorithm design, and the use of AI in network management and optimization. The tentative weekly schedule is as follows. Optional reading material is just for your reference to explore the literature (strongly recommended), but not explicitly required for the course.
Tentative Schedule
Week |
Date |
Paper Title |
Presenter |
Optional Reading |
Warmup! |
1 |
Aug 26 |
Introduction
Slides
|
All |
– |
1 |
Aug 28 |
How to read a paper [1] |
All/Discussion |
[2, 3] |
LLM Training Architectures (Hyperscaler Experience) |
2 |
Sep 2 |
RDMA over Ethernet for Distributed Training at Meta Scale [4] |
[Student] |
[5, 6, 7] |
2 |
Sep 4 |
Alibaba HPN: A Data Center Network for LLM Training [8] |
[Student] |
[9, 10] |
Collective Communication I: Primitives & AllReduce |
3 |
Sep 09 |
Optimization of Collective Communication in MPICH [11] |
[Student] |
[12] |
3 |
Sep 11 |
Swing: Short-cutting Rings for Higher Bandwidth AllReduce [13] |
[Student] |
[14] |
Collective Communication II: Synthesis |
4 |
Sep 16 |
Synthesizing Optimal Collective Algorithms [15] |
[Student] |
[16] |
4 |
Sep 18 |
Collectives as Multi-Commodity Flow [17] |
[Student] |
[18, 19] |
Collective Communication III: Stragglers |
5 |
Sep 23 |
Accelerating AllReduce with a Persistent Straggler [20] |
[Student] |
[21] |
5 |
Sep 25 |
OptiReduce: Tail-Optimal AllReduce [22] |
[Student] |
[23, 24] |
Assignment 1 Due |
Photonic Interconnects I: Oblivious & Traffic-Aware |
6 |
Sept 30 |
RotorNet [25] |
[Student] |
[26, 27, 28, 29] |
6 |
Oct 2 |
Scheduling in Hybrid Networks [30] |
[Student] |
[31, 32, 33, 34] |
Photonic Interconnects II: TPU Clusters |
7 |
Oct 7 |
TPU v4 Supercomputer [35] |
[Student] |
[36] |
7 |
Oct 09 |
Resiliency at Scale: TPUv4 [37] |
[Student] |
[38] |
Photonic Interconnects III: Topologies for Collectives |
8 |
Oct 14 |
SiP-ML [39] |
[Student] |
[40] |
8 |
Oct 16 |
TopoOpt [41] |
[Student] |
[42] |
Assignment 2 Due |
Photonic Interconnects IV: Chip-to-Chip |
9 |
Oct 21 |
Server-Scale Photonic Connectivity [43] |
[Student] |
[44, 45] |
9 |
Oct 23 |
Midterm Examination |
AI for Networks I: LLMs & Fun Stuff |
10 |
Oct 28 |
Enhancing Network Management Using Code Generated by LLMs [46] |
[Student] |
[47] |
10 |
Oct 30 |
What do LLMs need to Synthesize Correct Router Configurations? [48] |
[Student] |
[49, 50] |
AI for Networks II: Performance Guarantees |
11 |
Nov 4 |
Credence: Augmenting Switch Buffer Sharing with ML Predictions [51] |
[Student] |
[52] |
11 |
Nov 6 |
Towards Integrating Formal Methods into ML-Based Systems [53] |
[Student] |
[54] |
AI for Networks III: Wide-Area Networks |
12 |
Nov 11 |
DOTE: Rethinking (Predictive) WAN Traffic Engineering [55] |
[Student] |
[56] |
12 |
Nov 13 |
Transferable Neural WAN TE for Changing Topologies [57] |
[Student] |
[58] |
AI for Networks IV: Congestion Control |
13 |
Nov 18 |
TCP ex Machina [59] |
[Student] |
[60] |
13 |
Nov 20 |
PCC: Re-architecting Congestion Control [61] |
[Student] |
[62] |
Assignment 3 Due |
Thanksgiving Break |
14 |
Nov 25 |
– |
14 |
Nov 27 |
– |
Projects & Feedback |
15 |
Dec 2 |
Project submissions due next week |
15 |
Dec 4 |
Project submissions due next week |
Finals Week |
16 |
Dec 09 |
Final Presentations (All) |
16 |
Dec 11 |
Final Presentations (All) |
Assignments, Midterm, and Final Project
The course is structured around student-led presentations and discussions held during weekly sessions, with the instructor providing guidance and facilitating exploration of the material. Each student will present assigned research papers to the class and participate in discussions to enhance collective understanding. Course evaluation is based on three assignments, one midterm exam, and a final research project.
- Assignment 1. Each student will be assigned an AllReduce algorithm (or a synthesized variant) to implement in the Astra-Sim simulator. The simulation should use a ring topology of $16$ nodes, each connected by $400$ Gbps links with a $500$ ns propagation delay. The goal is to evaluate the algorithm’s completion time and compare it against the baseline Ring AllReduce across a range of message sizes.
- Assignment 2. Building on the first assignment, extend the implementation to a reconfigurable ring topology where nodes are connected via a photonic switch. The objective is to optimize the circuit-switching schedule to minimize the AllReduce completion time. Students may submit either: (i) A proof showing the minimized completion time based on an optimized schedule, or (ii) Simulation results using Astra-Sim, along with a clear description of the optimization method used.
- Assignment 3.
- Option 1: Implement the assigned AllReduce algorithm using NVIDIA Collective Communication Library (NCCL) by writing a CUDA code or using torch in python, and evaluate the performance in an $8$ GPU cluster (access will be provided). nccl-tests repository provides a good starting point for implementation.
- Option 2: Extend HTTP/3 QUIC transport protocol with a learning-augmented congestion control algorithm (e.g., having Cubic or Reno as base algorithms and leveraging ML predictions about network conditions) and implement in aioquic. Test the final implementation by sending iperf traffic to different remote servers and compare the throughput-latency-loss performance against the baseline QUIC implementation. Students may explore any learning algorithm of their choice, with emphasis on the techniques and methods discussed in the course schedule.
Midterm: The format of the midterm will be announced during the semester and will focus on the core concepts underlying the algorithms and protocol designs covered in the weekly readings.
Final project: The course concludes with a final research project, to be submitted via an internal HotCRP portal. Evaluation of the project will primarily consider the originality of the algorithmic or systems design proposal, the depth of related work understanding, and the quality of the presentation.
Learning Objectives
- Develop critical thinking in networking.
- Lead and participate in academic discussions.
- Analyze and present research papers.
- Explore and propose innovative solutions.
- Implement/test GPU communication algorithms.
- Write research papers on AI/DC networking.
Evaluation
- Assignments: 25%
- Midterm exam: 25%
- Final project/paper: 50%
References
How to read a paper
S. Keshav.
SIGCOMM Comput. Commun. Rev.,
2007.
Abstract
BibTeX
Click
here to close the dropdown!
Researchers spend a great deal of time reading research papers. However, this skill is rarely taught, leading to much wasted effort. This article outlines a practical and efficient three-pass method for reading research papers. I also describe how to use this method to do a literature survey.
Click
here to close the dropdown!
@article{10.1145/1273445.1273458,
author = {Keshav, S.},
title = {How to read a paper},
year = {2007},
issue_date = {July 2007},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {37},
number = {3},
issn = {0146-4833},
url = {https://doi.org/10.1145/1273445.1273458},
doi = {10.1145/1273445.1273458},
journal = {SIGCOMM Comput. Commun. Rev.},
pages = {83–84},
numpages = {2},
keywords = {hints, paper, reading}
}
Writing reviews for systems conferences
Timothy Roscoe.
BibTeX
Click
here to close the dropdown!
@misc{roscoe2007writing,
title = {Writing reviews for systems conferences},
author = {Roscoe, Timothy},
year = {2007},
url = {https://people.inf.ethz.ch/troscoe/pubs/review-writing.pdf}
}
Writing Technical Articles
Henning Schulzrinne.
BibTeX
Click
here to close the dropdown!
@misc{henning,
title = {Writing Technical Articles},
author = {Schulzrinne, Henning},
url = {https://www.cs.columbia.edu/~hgs/etc/writing-style.html}
}
RDMA over Ethernet for Distributed Training at Meta Scale
Adithya Gangidi, Rui Miao, Shengbao Zheng, Sai Jayesh Bondu, Guilherme Goes, Hany Morsy, Rohit Puri, Mohammad Riftadi, Ashmitha Jeevaraj Shetty, Jingyi Yang, Shuqiang Zhang, Mikel Jimenez Fernandez, Shashidhar Gandham, and Hongyi Zeng.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
The rapid growth in both computational density and scale in AI models in recent years motivates the construction of an efficient and reliable dedicated network infrastructure. This paper presents the design, implementation, and operation of Meta’s Remote Direct Memory Access over Converged Ethernet (RoCE) networks for distributed AI training.Our design principles involve a deep understanding of the workloads, and we translated these insights into the design of various network components: Network Topology - To support the rapid evolution of generations of AI hardware platforms, we separated GPU-based training into its own "backend" network. Routing - Training workloads inherently impose load imbalance and burstiness, so we deployed several iterations of routing schemes to achieve near-optimal traffic distribution. Transport - We outline how we initially attempted to use DCQCN for congestion management but then pivoted away from DCQCN to instead leverage the collective library itself to manage congestion. Operations - We share our experience operating large-scale AI networks, including toolings we developed and troubleshooting examples.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672233,
author = {Gangidi, Adithya and Miao, Rui and Zheng, Shengbao and Bondu, Sai Jayesh and Goes, Guilherme and Morsy, Hany and Puri, Rohit and Riftadi, Mohammad and Shetty, Ashmitha Jeevaraj and Yang, Jingyi and Zhang, Shuqiang and Fernandez, Mikel Jimenez and Gandham, Shashidhar and Zeng, Hongyi},
title = {RDMA over Ethernet for Distributed Training at Meta Scale},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672233},
doi = {10.1145/3651890.3672233},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {57–70},
numpages = {14},
keywords = {RDMA, distributed training},
series = {ACM SIGCOMM '24}
}
RDMA over Commodity Ethernet at Scale
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn.
Proceedings of the 2016 ACM SIGCOMM Conference,
Florianopolis, Brazil,
2016.
Abstract
BibTeX
Click
here to close the dropdown!
Over the past one and half years, we have been using RDMA over commodity Ethernet (RoCEv2) to support some of Microsoft’s highly-reliable, latency-sensitive services. This paper describes the challenges we encountered during the process and the solutions we devised to address them. In order to scale RoCEv2 beyond VLAN, we have designed a DSCP-based priority flow-control (PFC) mechanism to ensure large-scale deployment. We have addressed the safety challenges brought by PFC-induced deadlock (yes, it happened!), RDMA transport livelock, and the NIC PFC pause frame storm problem. We have also built the monitoring and management systems to make sure RDMA works as expected. Our experiences show that the safety and scalability issues of running RoCEv2 at scale can all be addressed, and RDMA can replace TCP for intra data center communications and achieve low latency, low CPU overhead, and high throughput.
Click
here to close the dropdown!
@inproceedings{10.1145/2934872.2934908,
author = {Guo, Chuanxiong and Wu, Haitao and Deng, Zhong and Soni, Gaurav and Ye, Jianxi and Padhye, Jitu and Lipshteyn, Marina},
title = {RDMA over Commodity Ethernet at Scale},
year = {2016},
isbn = {9781450341936},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2934872.2934908},
doi = {10.1145/2934872.2934908},
booktitle = {Proceedings of the 2016 ACM SIGCOMM Conference},
pages = {202–215},
numpages = {14},
keywords = {RoCEv2, RDMA, PFC propagation, PFC, Deadlock},
series = {SIGCOMM '16}
}
ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms
Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, and Tushar Krishna.
2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),
2020.
BibTeX
Click
here to close the dropdown!
@inproceedings{a9238637,
author = {Rashidi, Saeed and Sridharan, Srinivas and Srinivasan, Sudarshan and Krishna, Tushar},
booktitle = {2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)},
title = {ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms},
year = {2020},
volume = {},
number = {},
pages = {81-92},
keywords = {Training;Technological innovation;Navigation;Network topology;Software algorithms;Software;Scheduling;Distributed training;Collective communication;Training parallelism;High performance training systems},
doi = {10.1109/ISPASS48437.2020.00018}
}
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, and Tushar Krishna.
2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),
2023.
BibTeX
Click
here to close the dropdown!
@inproceedings{a10158106,
author = {Won, William and Heo, Taekyung and Rashidi, Saeed and Sridharan, Srinivas and Srinivasan, Sudarshan and Krishna, Tushar},
booktitle = {2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)},
title = {ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale},
year = {2023},
volume = {},
number = {},
pages = {283-294},
keywords = {Training;Semiconductor device modeling;Analytical models;Network topology;Systems modeling;Throughput;Data models;Distributed training;High-performance training;Multi-dimensional network;Disaggregated memory system},
doi = {10.1109/ISPASS57527.2023.00035}
}
Alibaba HPN: A Data Center Network for Large Language Model Training
Kun Qian, Yongqing Xi, Jiamin Cao, Jiaqi Gao, Yichi Xu, Yu Guan, Binzhang Fu, Xuemei Shi, Fangbo Zhu, Rui Miao, Chao Wang, Peng Wang, Pengcheng Zhang, Xianlong Zeng, Eddie Ruan, Zhiping Yao, Ennan Zhai, and Dennis Cai.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
This paper presents HPN, Alibaba Cloud’s data center network for large language model (LLM) training. Due to the differences between LLMs and general cloud computing (e.g., in terms of traffic patterns and fault tolerance), traditional data center networks are not well-suited for LLM training. LLM training produces a small number of periodic, bursty flows (e.g., 400Gbps) on each host. This characteristic of LLM training predisposes Equal-Cost Multi-Path (ECMP) to hash polarization, causing issues such as uneven traffic distribution. HPN introduces a 2-tier, dual-plane architecture capable of interconnecting 15K GPUs within one Pod, typically accommodated by the traditional 3-tier Clos architecture. Such a new architecture design not only avoids hash polarization but also greatly reduces the search space for path selection. Another challenge in LLM training is that its requirement for GPUs to complete iterations in synchronization makes it more sensitive to singlepoint failure (typically occurring on ToR). HPN proposes a new dual-ToR design to replace the single-ToR in traditional data center networks. HPN has been deployed in our production for more than eight months. We share our experience in designing, and building HPN, as well as the operational lessons of HPN in production.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672265,
author = {Qian, Kun and Xi, Yongqing and Cao, Jiamin and Gao, Jiaqi and Xu, Yichi and Guan, Yu and Fu, Binzhang and Shi, Xuemei and Zhu, Fangbo and Miao, Rui and Wang, Chao and Wang, Peng and Zhang, Pengcheng and Zeng, Xianlong and Ruan, Eddie and Yao, Zhiping and Zhai, Ennan and Cai, Dennis},
title = {Alibaba HPN: A Data Center Network for Large Language Model Training},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672265},
doi = {10.1145/3651890.3672265},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {691–706},
numpages = {16},
keywords = {network architecture, AI infrastructure, large language model, model training, data center networks},
series = {ACM SIGCOMM '24}
}
SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision
Xizheng Wang, Qingxu Li, Yichi Xu, Gang Lu, Dan Li, Li Chen, Heyang Zhou, Linkang Zheng, Sen Zhang, Yikai Zhu, Yang Liu, Pengcheng Zhang, Kun Qian, Kunling He, Jiaqi Gao, Ennan Zhai, Dennis Cai, and Binzhang Fu.
22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25),
2025.
BibTeX
Click
here to close the dropdown!
@inproceedings{a305348,
author = {Wang, Xizheng and Li, Qingxu and Xu, Yichi and Lu, Gang and Li, Dan and Chen, Li and Zhou, Heyang and Zheng, Linkang and Zhang, Sen and Zhu, Yikai and Liu, Yang and Zhang, Pengcheng and Qian, Kun and He, Kunling and Gao, Jiaqi and Zhai, Ennan and Cai, Dennis and Fu, Binzhang},
title = {{SimAI}: Unifying Architecture Design and Performance Tuning for {Large-Scale} Large Language Model Training with Scalability and Precision},
booktitle = {22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)},
year = {2025},
isbn = {978-1-939133-46-5},
address = {Philadelphia, PA},
pages = {541--558},
url = {https://www.usenix.org/conference/nsdi25/presentation/wang-xizheng-simai},
publisher = {USENIX Association}
}
I’ve Got 99 Problems But FLOPS Ain’t One
Alexandru M. Gherghescu, Vlad-Andrei Bădoiu, Alexandru Agache, Mihai-Valentin Dumitru, Iuliu Vasilescu, Radu Mantu, and Costin Raiciu.
Proceedings of the 23rd ACM Workshop on Hot Topics in Networks,
Irvine, CA, USA,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
Hyperscalers dominate the landscape of large network deployments, yet they rarely share data or insights about the challenges they face. In light of this supremacy, what problems can we find to solve in this space? We take an unconventional approach to find relevant research directions, starting from public plans to build a $100 billion datacenter for machine learning applications [53]. Leveraging the language models scaling laws, we discover what workloads such a datacenter might carry and explore the challenges one may encounter in doing so, with a focus on networking research. We conclude that building the datacenter and training such models is technically possible, but this requires novel wide-area transports for inter-DC communication, a multipath transport and novel datacenter topologies for intra-datacenter communication, high speed scale-up networks and transports, outlining a rich research agenda for the networking community.
Click
here to close the dropdown!
@inproceedings{a10.1145/3696348.3696893,
author = {Gherghescu, Alexandru M. and B\u{a}doiu, Vlad-Andrei and Agache, Alexandru and Dumitru, Mihai-Valentin and Vasilescu, Iuliu and Mantu, Radu and Raiciu, Costin},
title = {I've Got 99 Problems But FLOPS Ain't One},
year = {2024},
isbn = {9798400712722},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3696348.3696893},
doi = {10.1145/3696348.3696893},
booktitle = {Proceedings of the 23rd ACM Workshop on Hot Topics in Networks},
pages = {195–204},
numpages = {10},
keywords = {Datacenter Networking, Large Language Models Training},
series = {HotNets '24}
}
Optimization of Collective Communication Operations in MPICH
Rajeev Thakur, Rolf Rabenseifner, and William Gropp.
The International Journal of High Performance Computing Applications,
2005.
BibTeX
Click
here to close the dropdown!
@article{doi:10.1177/1094342005051521,
author = {Thakur, Rajeev and Rabenseifner, Rolf and Gropp, William},
title = {Optimization of Collective Communication Operations in MPICH},
journal = {The International Journal of High Performance Computing Applications},
volume = {19},
number = {1},
pages = {49-66},
year = {2005},
doi = {10.1177/1094342005051521},
url = {https://doi.org/10.1177/1094342005051521},
eprint = {https://doi.org/10.1177/1094342005051521}
}
Collective communication: theory, practice, and experience
Ernie Chan, Marcel Heimlich, Avi Purkayastha, and Robert van de Geijn.
Concurrency and Computation: Practice and Experience,
2007.
Abstract
BibTeX
Click
here to close the dropdown!
Abstract We discuss the design and high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Xeon/Pentium 4 (R) processor cluster are included. Copyright © 2007 John Wiley & Sons, Ltd.
Click
here to close the dropdown!
@article{https://doi.org/10.1002/cpe.1206,
author = {Chan, Ernie and Heimlich, Marcel and Purkayastha, Avi and van de Geijn, Robert},
title = {Collective communication: theory, practice, and experience},
journal = {Concurrency and Computation: Practice and Experience},
volume = {19},
number = {13},
pages = {1749-1783},
keywords = {collective communication, distributed-memory architecture, clusters},
doi = {https://doi.org/10.1002/cpe.1206},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.1206},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.1206},
year = {2007}
}
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
Daniele De Sensi, Tommaso Bonato, David Saam, and Torsten Hoefler.
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24),
2024.
BibTeX
Click
here to close the dropdown!
@inproceedings{a295653,
author = {Sensi, Daniele De and Bonato, Tommaso and Saam, David and Hoefler, Torsten},
title = {Swing: Short-cutting Rings for Higher Bandwidth Allreduce},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {1445--1462},
url = {https://www.usenix.org/conference/nsdi24/presentation/de-sensi},
publisher = {USENIX Association}
}
A generalization of the allreduce operation
Dmitry Kolmakov and Xuecang Zhang.
arXiv preprint arXiv:2004.09362,
2020.
BibTeX
Click
here to close the dropdown!
@article{kolmakov2020generalization,
title = {A generalization of the allreduce operation},
author = {Kolmakov, Dmitry and Zhang, Xuecang},
journal = {arXiv preprint arXiv:2004.09362},
url = {https://arxiv.org/abs/2004.09362},
year = {2020}
}
Synthesizing optimal collective algorithms
Zixian Cai, Zhengyang Liu, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Jacob Nelson, and Olli Saarikivi.
Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
Virtual Event, Republic of Korea,
2021.
Abstract
BibTeX
Click
here to close the dropdown!
Collective communication algorithms are an important component of distributed computation. Indeed, in the case of deep-learning, collective communication is the Amdahl’s bottleneck of data-parallel training.This paper introduces SCCL (for Synthesized Collective Communication Library), a systematic approach to synthesizing collective communication algorithms that are explicitly tailored to a particular hardware topology. SCCL synthesizes algorithms along the Pareto-frontier spanning from latency-optimal to bandwidth-optimal implementations of a collective. The paper demonstrates how to encode the synthesis problem as a quantifier-free SMT formula which can be discharged to a theorem prover. We show how our carefully built encoding enables SCCL to scale.We synthesize novel latency and bandwidth optimal algorithms not seen in the literature on two popular hardware topologies. We also show how SCCL efficiently lowers algorithms to implementations on two hardware architectures (NVIDIA and AMD) and demonstrate competitive performance with hand optimized collective communication libraries.
Click
here to close the dropdown!
@inproceedings{10.1145/3437801.3441620,
author = {Cai, Zixian and Liu, Zhengyang and Maleki, Saeed and Musuvathi, Madanlal and Mytkowicz, Todd and Nelson, Jacob and Saarikivi, Olli},
title = {Synthesizing optimal collective algorithms},
year = {2021},
isbn = {9781450382946},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3437801.3441620},
doi = {10.1145/3437801.3441620},
booktitle = {Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {62–75},
numpages = {14},
keywords = {synthesis, network, interconnection, collective communication, GPU},
series = {PPoPP '21}
}
Blink: Fast and Generic Collectives for Distributed ML
Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Nikhil Devanur, Jorgen Thelin, and Ion Stoica.
Proceedings of Machine Learning and Systems,
2020.
BibTeX
Click
here to close the dropdown!
@inproceedings{MLSYS2020_cd3a9a55,
author = {Wang, Guanhua and Venkataraman, Shivaram and Phanishayee, Amar and Devanur, Nikhil and Thelin, Jorgen and Stoica, Ion},
booktitle = {Proceedings of Machine Learning and Systems},
editor = {Dhillon, I. and Papailiopoulos, D. and Sze, V.},
pages = {172--186},
title = {Blink: Fast and Generic Collectives for Distributed ML},
url = {https://proceedings.mlsys.org/paper_files/paper/2020/file/cd3a9a55f7f3723133fa4a13628cdf03-Paper.pdf},
volume = {2},
year = {2020}
}
Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem
Xuting Liu, Behnaz Arzani, Siva Kesava Reddy Kakarla, Liangyu Zhao, Vincent Liu, Miguel Castro, Srikanth Kandula, and Luke Marshall.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
Cloud operators utilize collective communication optimizers to enhance the efficiency of the single-tenant, centrally managed training clusters they manage. However, current optimizers struggle to scale for such use cases and often compromise solution quality for scalability. Our solution, TE-CCL, adopts a traffic-engineering-based approach to collective communication. Compared to a state-of-the-art optimizer, TACCL, TE-CCL produced schedules with 2\texttimes better performance on topologies TACCL supports (and its solver took a similar amount of time as TACCL’s heuristic-based approach). TECCL additionally scales to larger topologies than TACCL. On our GPU testbed, TE-CCL outperformed TACCL by 2.14\texttimes and RCCL by 3.18\texttimes in terms of algorithm bandwidth.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672249,
author = {Liu, Xuting and Arzani, Behnaz and Kakarla, Siva Kesava Reddy and Zhao, Liangyu and Liu, Vincent and Castro, Miguel and Kandula, Srikanth and Marshall, Luke},
title = {Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672249},
doi = {10.1145/3651890.3672249},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {16–37},
numpages = {22},
keywords = {GPU, collective communication, traffic engineering},
series = {ACM SIGCOMM '24}
}
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, and Rachee Singh.
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23),
2023.
BibTeX
Click
here to close the dropdown!
@inproceedings{a285084,
author = {Shah, Aashaka and Chidambaram, Vijay and Cowan, Meghan and Maleki, Saeed and Musuvathi, Madan and Mytkowicz, Todd and Nelson, Jacob and Saarikivi, Olli and Singh, Rachee},
title = {{TACCL}: Guiding Collective Algorithm Synthesis using Communication Sketches},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {593--612},
url = {https://www.usenix.org/conference/nsdi23/presentation/shah},
publisher = {USENIX Association}
}
AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training
Guanbin Xu, Zhihao Le, Yinhe Chen, Zhiqi Lin, Zewen Jin, Youshan Miao, and Cheng Li.
22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25),
2025.
BibTeX
Click
here to close the dropdown!
@inproceedings{a305967,
author = {Xu, Guanbin and Le, Zhihao and Chen, Yinhe and Lin, Zhiqi and Jin, Zewen and Miao, Youshan and Li, Cheng},
title = {{AutoCCL}: Automated Collective Communication Tuning for Accelerating Distributed and Parallel {DNN} Training},
booktitle = {22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)},
year = {2025},
isbn = {978-1-939133-46-5},
address = {Philadelphia, PA},
pages = {667--683},
url = {https://www.usenix.org/conference/nsdi25/presentation/xu-guanbin},
publisher = {USENIX Association}
}
Accelerating AllReduce with a Persistent Straggler
Arjun Devraj, Eric Ding, Abhishek Vijaya Kumar, Robert Kleinberg, and Rachee Singh.
arXiv preprint arXiv:2505.23523,
2025.
BibTeX
Click
here to close the dropdown!
@article{devraj2025accelerating,
title = {Accelerating AllReduce with a Persistent Straggler},
author = {Devraj, Arjun and Ding, Eric and Kumar, Abhishek Vijaya and Kleinberg, Robert and Singh, Rachee},
journal = {arXiv preprint arXiv:2505.23523},
year = {2025},
url = {https://arxiv.org/abs/2505.23523}
}
Addressing the straggler problem for iterative convergent parallel ML
Aaron Harlap, Henggang Cui, Wei Dai, Jinliang Wei, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing.
Proceedings of the Seventh ACM Symposium on Cloud Computing,
Santa Clara, CA, USA,
2016.
Abstract
BibTeX
Click
here to close the dropdown!
FlexRR provides a scalable, efficient solution to the straggler problem for iterative machine learning (ML). The frequent (e.g., per iteration) barriers used in traditional BSP-based distributed ML implementations cause every transient slowdown of any worker thread to delay all others. FlexRR combines a more flexible synchronization model with dynamic peer-to-peer re-assignment of work among workers to address straggler threads. Experiments with real straggler behavior observed on Amazon EC2 and Microsoft Azure, as well as injected straggler behavior stress tests, confirm the significance of the problem and the effectiveness of FlexRR’s solution. Using FlexRR, we consistently observe near-ideal run-times (relative to no performance jitter) across all real and injected straggler behaviors tested.
Click
here to close the dropdown!
@inproceedings{10.1145/2987550.2987554,
author = {Harlap, Aaron and Cui, Henggang and Dai, Wei and Wei, Jinliang and Ganger, Gregory R. and Gibbons, Phillip B. and Gibson, Garth A. and Xing, Eric P.},
title = {Addressing the straggler problem for iterative convergent parallel ML},
year = {2016},
isbn = {9781450345255},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2987550.2987554},
doi = {10.1145/2987550.2987554},
booktitle = {Proceedings of the Seventh ACM Symposium on Cloud Computing},
pages = {98–111},
numpages = {14},
series = {SoCC '16}
}
OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud
Ertza Warraich, Omer Shabtai, Khalid Manaa, Shay Vargaftik, Yonatan Piasetzky, Matty Kadosh, Lalith Suresh, and Muhammad Shahbaz.
22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25),
2025.
BibTeX
Click
here to close the dropdown!
@inproceedings{a305995,
author = {Warraich, Ertza and Shabtai, Omer and Manaa, Khalid and Vargaftik, Shay and Piasetzky, Yonatan and Kadosh, Matty and Suresh, Lalith and Shahbaz, Muhammad},
title = {{OptiReduce}: Resilient and {Tail-Optimal} {AllReduce} for Distributed Deep Learning in the Cloud},
booktitle = {22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)},
year = {2025},
isbn = {978-1-939133-46-5},
address = {Philadelphia, PA},
pages = {685--703},
url = {https://www.usenix.org/conference/nsdi25/presentation/warraich},
publisher = {USENIX Association}
}
Straggler Mitigation in Distributed Optimization Through Data Encoding
Can Karakus, Yifan Sun, Suhas Diggavi, and Wotao Yin.
Advances in Neural Information Processing Systems,
2017.
BibTeX
Click
here to close the dropdown!
@inproceedings{NIPS2017_663772ea,
author = {Karakus, Can and Sun, Yifan and Diggavi, Suhas and Yin, Wotao},
booktitle = {Advances in Neural Information Processing Systems},
editor = {Guyon, I. and Luxburg, U. Von and Bengio, S. and Wallach, H. and Fergus, R. and Vishwanathan, S. and Garnett, R.},
pages = {},
publisher = {Curran Associates, Inc.},
title = {Straggler Mitigation in Distributed Optimization Through Data Encoding},
url = {https://proceedings.neurips.cc/paper_files/paper/2017/file/663772ea088360f95bac3dc7ffb841be-Paper.pdf},
volume = {30},
year = {2017}
}
Solving the straggler problem with bounded staleness
James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R Ganger, Garth Gibson, Kimberly Keeton, and Eric Xing.
14th Workshop on Hot Topics in Operating Systems (HotOS XIV),
2013.
BibTeX
Click
here to close the dropdown!
@inproceedings{cipar2013solving,
title = {Solving the straggler problem with bounded staleness},
author = {Cipar, James and Ho, Qirong and Kim, Jin Kyu and Lee, Seunghak and Ganger, Gregory R and Gibson, Garth and Keeton, Kimberly and Xing, Eric},
booktitle = {14th Workshop on Hot Topics in Operating Systems (HotOS XIV)},
year = {2013}
}
RotorNet: A Scalable, Low-complexity, Optical Datacenter Network
William M. Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C. Snoeren, and George Porter.
Proceedings of the Conference of the ACM Special Interest Group on Data Communication,
Los Angeles, CA, USA,
2017.
Abstract
BibTeX
Click
here to close the dropdown!
The ever-increasing bandwidth requirements of modern datacenters have led researchers to propose networks based upon optical circuit switches, but these proposals face significant deployment challenges. In particular, previous proposals dynamically configure circuit switches in response to changes in workload, requiring network-wide demand estimation, centralized circuit assignment, and tight time synchronization between various network elements— resulting in a complex and unwieldy control plane. Moreover, limitations in the technologies underlying the individual circuit switches restrict both the rate at which they can be reconfigured and the scale of the network that can be constructed.We propose RotorNet, a circuit-based network design that addresses these two challenges. While RotorNet dynamically reconfigures its constituent circuit switches, it decouples switch configuration from traffic patterns, obviating the need for demand collection and admitting a fully decentralized control plane. At the physical layer, RotorNet relaxes the requirements on the underlying circuit switches—in particular by not requiring individual switches to implement a full crossbar—enabling them to scale to 1000s of ports. We show that RotorNet outperforms comparably priced Fat Tree topologies under a variety of workload conditions, including traces taken from two commercial datacenters. We also demonstrate a small-scale RotorNet operating in practice on an eight-node testbed.
Click
here to close the dropdown!
@inproceedings{10.1145/3098822.3098838,
author = {Mellette, William M. and McGuinness, Rob and Roy, Arjun and Forencich, Alex and Papen, George and Snoeren, Alex C. and Porter, George},
title = {RotorNet: A Scalable, Low-complexity, Optical Datacenter Network},
year = {2017},
isbn = {9781450346535},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3098822.3098838},
doi = {10.1145/3098822.3098838},
booktitle = {Proceedings of the Conference of the ACM Special Interest Group on Data Communication},
pages = {267–280},
numpages = {14},
keywords = {optical switching, Datacenter},
series = {SIGCOMM '17}
}
Realizing RotorNet: Toward Practical Microsecond Scale Optical Networking
William M. Mellette, Alex Forencich, Rukshani Athapathu, Alex C. Snoeren, George Papen, and George Porter.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
We describe our experience building and deploying a demand-oblivious optically-switched network based on the RotorNet and Opera architectures. We detail the design, manufacture, deployment, and end-to-end operation of a 128-port optical rotor switch along with supporting NIC hardware and host software. Using this prototype, we assess yield, synchronization, and interoperability with commodity hardware and software at a scale of practical relevance. We provide the first real-world measurements of Linux TCP throughput and host-to-host latency in an operational RotorNet, achieving 98% of link rate with 99th-percentile ping times faster than commodity packet-switching hardware. In the process, we uncover unexpected challenges with link-level dropouts and devise a novel and flexible way to address them. Our deployment experience demonstrates the feasibility of our implementation approach and identifies opportunities for future exploration.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672273,
author = {Mellette, William M. and Forencich, Alex and Athapathu, Rukshani and Snoeren, Alex C. and Papen, George and Porter, George},
title = {Realizing RotorNet: Toward Practical Microsecond Scale Optical Networking},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672273},
doi = {10.1145/3651890.3672273},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {392–414},
numpages = {23},
keywords = {circuit-switching, optical networking, datacenter networks},
series = {ACM SIGCOMM '24}
}
Sirius: A Flat Datacenter Network with Nanosecond Optical Switching
Hitesh Ballani, Paolo Costa, Raphael Behrendt, Daniel Cletheroe, Istvan Haller, Krzysztof Jozwik, Fotini Karinou, Sophie Lange, Kai Shi, Benn Thomsen, and Hugh Williams.
Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication,
Virtual Event, USA,
2020.
Abstract
BibTeX
Click
here to close the dropdown!
The increasing gap between the growth of datacenter traffic and electrical switch capacity is expected to worsen due to the slowdown of Moore’s law, motivating the need for a new switching technology for the post-Moore’s law era that can meet the increasingly stringent requirements of hardware-driven cloud workloads. We propose Sirius, an optically-switched network for datacenters providing the abstraction of a single, high-radix switch that can connect thousands of nodes—racks or servers—in a datacenter while achieving nanosecond-granularity reconfiguration. At its core, Sirius uses a combination of tunable lasers and simple, passive gratings that route light based on its wavelength. Sirius’ switching technology and topology is tightly codesigned with its routing and scheduling and with novel congestion-control and time-synchronization mechanisms to achieve a scalable yet flat network that can offer high bandwidth and very low end-to-end latency. Through a small-scale prototype using a custom tunable laser chip that can tune in less than 912 ps, we demonstrate 3.84 ns end-to-end reconfiguration atop 50 Gbps channels. Through large-scale simulations, we show that Sirius can approximate the performance of an ideal, electrically-switched non-blocking network with up to 74-77% lower power.
Click
here to close the dropdown!
@inproceedings{10.1145/3387514.3406221,
author = {Ballani, Hitesh and Costa, Paolo and Behrendt, Raphael and Cletheroe, Daniel and Haller, Istvan and Jozwik, Krzysztof and Karinou, Fotini and Lange, Sophie and Shi, Kai and Thomsen, Benn and Williams, Hugh},
title = {Sirius: A Flat Datacenter Network with Nanosecond Optical Switching},
year = {2020},
isbn = {9781450379557},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3387514.3406221},
doi = {10.1145/3387514.3406221},
booktitle = {Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication},
pages = {782–797},
numpages = {16},
keywords = {Datacenter Networks, Fast Tunable Lasers, Nanosecond Switching, Optical Switches, Scheduler-less design, Vertical Integration},
series = {SIGCOMM '20}
}
Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks
Vamsi Addanki, Chen Avin, and Stefan Schmid.
Proc. ACM Meas. Anal. Comput. Syst.,
2023.
Abstract
BibTeX
Click
here to close the dropdown!
The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new technology to deal with the explosive growth of datacenter traffic. Specifically, periodic reconfigurable datacenter networks (RDCNs) such as RotorNet (SIGCOMM 2017), Opera (NSDI 2020) and Sirius (SIGCOMM 2020) have been shown to provide high throughput, by emulating a complete graph through fast periodic circuit switch scheduling.However, to achieve such a high throughput, existing reconfigurable network designs pay a high price: in terms of potentially high delays, but also, as we show as a first contribution in this paper, in terms of the high buffer requirements. In particular, we show that under buffer constraints, emulating the high-throughput complete graph is infeasible at scale, and we uncover a spectrum of unvisited and attractive alternative RDCNs, which emulate regular graphs, but with lower node degree than the complete graph.We present Mars, a periodic reconfigurable topology which emulates ad-regular graph with near-optimal throughput. In particular, we systematically analyze how the degree d can be optimized for throughput given the available buffer and delay tolerance of the datacenter. We further show empirically that Mars achieves higher throughput compared to existing systems when buffer sizes are bounded.
Click
here to close the dropdown!
@article{10.1145/3579312,
author = {Addanki, Vamsi and Avin, Chen and Schmid, Stefan},
title = {Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks},
year = {2023},
issue_date = {March 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {7},
number = {1},
url = {https://doi.org/10.1145/3579312},
doi = {10.1145/3579312},
journal = {Proc. ACM Meas. Anal. Comput. Syst.},
articleno = {2},
numpages = {43},
keywords = {buffer requirements, datacenter, reconfigurable networks, throughput}
}
Shale: A Practical, Scalable Oblivious Reconfigurable Network
Daniel Amir, Nitika Saran, Tegan Wilson, Robert Kleinberg, Vishal Shrivastav, and Hakim Weatherspoon.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
Circuit-switched technologies have long been proposed for handling high-throughput traffic in datacenter networks, but recent developments in nanosecond-scale reconfiguration have created the enticing possibility of handling low-latency traffic as well. The novel Oblivious Reconfigurable Network (ORN) design paradigm promises to deliver on this possibility. Prior work in ORN designs achieved latencies that scale linearly with system size, making them unsuitable for large-scale deployments. Recent theoretical work showed that ORNs can achieve far better latency scaling, proposing theoretical ORN designs that are Pareto optimal in latency and throughput.In this work, we bridge multiple gaps between theory and practice to develop Shale, the first ORN capable of providing low-latency networking at datacenter scale while still guaranteeing high throughput. By interleaving multiple Pareto optimal schedules in parallel, both latency- and throughput-sensitive flows can achieve optimal performance. To achieve the theoretical low latencies in practice, we design a new congestion control mechanism which is best suited to the characteristics of Shale. In datacenter-scale packet simulations, our design compares favorably with both an in-network congestion mitigation strategy, modern receiver-driven protocols such as NDP, and an idealized analog for sender-driven protocols. We implement an FPGA-based prototype of Shale, achieving orders of magnitude better resource scaling than existing ORN proposals. Finally, we extend our congestion control solution to handle node and link failures.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672248,
author = {Amir, Daniel and Saran, Nitika and Wilson, Tegan and Kleinberg, Robert and Shrivastav, Vishal and Weatherspoon, Hakim},
title = {Shale: A Practical, Scalable Oblivious Reconfigurable Network},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672248},
doi = {10.1145/3651890.3672248},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {449–464},
numpages = {16},
keywords = {optical switches, datacenter networks, nanosecond switching},
series = {ACM SIGCOMM '24}
}
Scheduling techniques for hybrid circuit/packet networks
He Liu, Matthew K. Mukerjee, Conglong Li, Nicolas Feltman, George Papen, Stefan Savage, Srinivasan Seshan, Geoffrey M. Voelker, David G. Andersen, Michael Kaminsky, George Porter, and Alex C. Snoeren.
Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies,
Heidelberg, Germany,
2015.
Abstract
BibTeX
Click
here to close the dropdown!
A range of new datacenter switch designs combine wireless or optical circuit technologies with electrical packet switching to deliver higher performance at lower cost than traditional packet-switched networks. These "hybrid" networks schedule large traffic demands via a high-rate circuits and remaining traffic with a lower-rate, traditional packet-switches. Achieving high utilization requires an efficient scheduling algorithm that can compute proper circuit configurations and balance traffic across the switches. Recent proposals, however, provide no such algorithm and rely on an omniscient oracle to compute optimal switch configurations.Finding the right balance of circuit and packet switch use is difficult: circuits must be reconfigured to serve different demands, incurring non-trivial switching delay, while the packet switch is bandwidth constrained. Adapting existing crossbar scheduling algorithms proves challenging with these constraints. In this paper, we formalize the hybrid switching problem, explore the design space of scheduling algorithms, and provide insight on using such algorithms in practice. We propose a heuristic-based algorithm, Solstice that provides a 2.9\texttimes increase in circuit utilization over traditional scheduling algorithms, while being within 14% of optimal, at scale.
Click
here to close the dropdown!
@inproceedings{10.1145/2716281.2836126,
author = {Liu, He and Mukerjee, Matthew K. and Li, Conglong and Feltman, Nicolas and Papen, George and Savage, Stefan and Seshan, Srinivasan and Voelker, Geoffrey M. and Andersen, David G. and Kaminsky, Michael and Porter, George and Snoeren, Alex C.},
title = {Scheduling techniques for hybrid circuit/packet networks},
year = {2015},
isbn = {9781450334129},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2716281.2836126},
doi = {10.1145/2716281.2836126},
booktitle = {Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies},
articleno = {41},
numpages = {13},
keywords = {circuit networks, hybrid networks, packet networks},
series = {CoNEXT '15}
}
Integrating microsecond circuit switching into the data center
George Porter, Richard Strong, Nathan Farrington, Alex Forencich, Pang Chen-Sun, Tajana Rosing, Yeshaiahu Fainman, George Papen, and Amin Vahdat.
Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM,
Hong Kong, China,
2013.
Abstract
BibTeX
Click
here to close the dropdown!
Recent proposals have employed optical circuit switching (OCS) to reduce the cost of data center networks. However, the relatively slow switching times (10–100 ms) assumed by these approaches, and the accompanying latencies of their control planes, has limited its use to only the largest data center networks with highly aggregated and constrained workloads. As faster switch technologies become available, designing a control plane capable of supporting them becomes a key challenge.In this paper, we design and implement an OCS prototype capable of switching in 11.5 us, and we use this prototype to expose a set of challenges that arise when supporting switching at microsecond time scales. In response, we propose a microsecond-latency control plane based on a circuit scheduling approach we call Traffic Matrix Scheduling (TMS) that proactively communicates circuit assignments to communicating entities so that circuit bandwidth can be used efficiently.
Click
here to close the dropdown!
@inproceedings{10.1145/2486001.2486007,
author = {Porter, George and Strong, Richard and Farrington, Nathan and Forencich, Alex and Chen-Sun, Pang and Rosing, Tajana and Fainman, Yeshaiahu and Papen, George and Vahdat, Amin},
title = {Integrating microsecond circuit switching into the data center},
year = {2013},
isbn = {9781450320566},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2486001.2486007},
doi = {10.1145/2486001.2486007},
booktitle = {Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM},
pages = {447–458},
numpages = {12},
keywords = {optical networks, data center networks},
series = {SIGCOMM '13}
}
Helios: a hybrid electrical/optical switch architecture for modular data centers
Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat.
Proceedings of the ACM SIGCOMM 2010 Conference,
New Delhi, India,
2010.
Abstract
BibTeX
Click
here to close the dropdown!
The basic building block of ever larger data centers has shifted from a rack to a modular container with hundreds or even thousands of servers. Delivering scalable bandwidth among such containers is a challenge. A number of recent efforts promise full bisection bandwidth between all servers, though with significant cost, complexity, and power consumption. We present Helios, a hybrid electrical/optical switch architecture that can deliver significant reductions in the number of switching elements, cabling, cost, and power consumption relative to recently proposed data center network architectures. We explore architectural trade offs and challenges associated with realizing these benefits through the evaluation of a fully functional Helios prototype.
Click
here to close the dropdown!
@inproceedings{10.1145/1851182.1851223,
author = {Farrington, Nathan and Porter, George and Radhakrishnan, Sivasankar and Bazzaz, Hamid Hajabdolali and Subramanya, Vikram and Fainman, Yeshaiahu and Papen, George and Vahdat, Amin},
title = {Helios: a hybrid electrical/optical switch architecture for modular data centers},
year = {2010},
isbn = {9781450302012},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1851182.1851223},
doi = {10.1145/1851182.1851223},
booktitle = {Proceedings of the ACM SIGCOMM 2010 Conference},
pages = {339–350},
numpages = {12},
keywords = {optical networks, data center networks},
series = {SIGCOMM '10}
}
ProjecToR: Agile Reconfigurable Data Center Interconnect
Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Nikhil Devanur, Janardhan Kulkarni, Gireeja Ranade, Pierre-Alexandre Blanche, Houman Rastegarfar, Madeleine Glick, and Daniel Kilper.
Proceedings of the 2016 ACM SIGCOMM Conference,
Florianopolis, Brazil,
2016.
Abstract
BibTeX
Click
here to close the dropdown!
We explore a novel, free-space optics based approach for building data center interconnects. It uses a digital micromirror device (DMD) and mirror assembly combination as a transmitter and a photodetector on top of the rack as a receiver (Figure 1). Our approach enables all pairs of racks to establish direct links, and we can reconfigure such links (i.e., connect different rack pairs) within 12 us. To carry traffic from a source to a destination rack, transmitters and receivers in our interconnect can be dynamically linked in millions of ways. We develop topology construction and routing methods to exploit this flexibility, including a flow scheduling algorithm that is a constant factor approximation to the offline optimal solution. Experiments with a small prototype point to the feasibility of our approach. Simulations using realistic data center workloads show that, compared to the conventional folded-Clos interconnect, our approach can improve mean flow completion time by 30-95% and reduce cost by 25-40%.
Click
here to close the dropdown!
@inproceedings{10.1145/2934872.2934911,
author = {Ghobadi, Monia and Mahajan, Ratul and Phanishayee, Amar and Devanur, Nikhil and Kulkarni, Janardhan and Ranade, Gireeja and Blanche, Pierre-Alexandre and Rastegarfar, Houman and Glick, Madeleine and Kilper, Daniel},
title = {ProjecToR: Agile Reconfigurable Data Center Interconnect},
year = {2016},
isbn = {9781450341936},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2934872.2934911},
doi = {10.1145/2934872.2934911},
booktitle = {Proceedings of the 2016 ACM SIGCOMM Conference},
pages = {216–229},
numpages = {14},
keywords = {Reconfigurability, Free-Space Optics, Data Centers},
series = {SIGCOMM '16}
}
NegotiaToR: Towards A Simple Yet Effective On-demand Reconfigurable Datacenter Network
Cong Liang, Xiangli Song, Jing Cheng, Mowei Wang, Yashe Liu, Zhenhua Liu, Shizhen Zhao, and Yong Cui.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
Recent advances in fast optical switching technology show promise in meeting the high goodput and low latency requirements of datacenter networks (DCN). We present NegotiaToR, a simple network architecture for optical reconfigurable DCNs that utilizes on-demand scheduling to handle dynamic traffic. In NegotiaToR, racks exchange scheduling messages through an in-band control plane and distributedly calculate non-conflicting paths from binary traffic demand information. Optimized for incasts, it also provides opportunities to bypass scheduling delays. NegotiaToR is compatible with prevalent flat topologies, and is tailored towards a minimalist design for on-demand reconfigurable DCNs, enhancing practicality. Through large-scale simulations, we show that NegotiaToR achieves both small mice flow completion time (FCT) and high goodput on two representative flat topologies, especially under heavy loads. Particularly, the FCT of mice flows is one to two orders of magnitude better than the state-of-the-art traffic-oblivious reconfigurable DCN design.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672222,
author = {Liang, Cong and Song, Xiangli and Cheng, Jing and Wang, Mowei and Liu, Yashe and Liu, Zhenhua and Zhao, Shizhen and Cui, Yong},
title = {NegotiaToR: Towards A Simple Yet Effective On-demand Reconfigurable Datacenter Network},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672222},
doi = {10.1145/3651890.3672222},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {415–432},
numpages = {18},
keywords = {datacenter network, optical switching},
series = {ACM SIGCOMM '24}
}
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
Norm Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Clifford Young, Xiang Zhou, Zongwei Zhou, and David A Patterson.
Proceedings of the 50th Annual International Symposium on Computer Architecture,
Orlando, FL, USA,
2023.
Abstract
BibTeX
Click
here to close the dropdown!
In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5% of system cost and <3% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x–7x yet use only 5% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus nearly 10x faster overall, which along with OCS flexibility and availability allows a large language model to train at an average of 60% of peak FLOPS/second. For similar sized systems, it is 4.3x–4.5x faster than the Graphcore IPU Bow and is 1.2x–1.7x faster and uses 1.3x–1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use 2–6x less energy and produce 20x less CO2e than contemporary DSAs in typical on-premise data centers.
Click
here to close the dropdown!
@inproceedings{10.1145/3579371.3589350,
author = {Jouppi, Norm and Kurian, George and Li, Sheng and Ma, Peter and Nagarajan, Rahul and Nai, Lifeng and Patil, Nishant and Subramanian, Suvinay and Swing, Andy and Towles, Brian and Young, Clifford and Zhou, Xiang and Zhou, Zongwei and Patterson, David A},
title = {TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings},
year = {2023},
isbn = {9798400700958},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3579371.3589350},
doi = {10.1145/3579371.3589350},
booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture},
articleno = {82},
numpages = {14},
keywords = {machine learning, domain specific architecture, TPU, GPU, IPU, supercomputer, optical interconnect, reconfigurable, embeddings, large language model, power usage effectiveness, warehouse scale computer, carbon emissions, energy, CO2 equivalent emissions},
series = {ISCA '23}
}
A Machine Learning Supercomputer with an Optically Reconfigurable Interconnect and Embeddings Support
Norman P. Jouppi and Andy Swing.
2023 IEEE Hot Chips 35 Symposium (HCS),
2023.
BibTeX
Click
here to close the dropdown!
@inproceedings{a10254691,
author = {Jouppi, Norman P. and Swing, Andy},
booktitle = {2023 IEEE Hot Chips 35 Symposium (HCS)},
title = {A Machine Learning Supercomputer with an Optically Reconfigurable Interconnect and Embeddings Support},
year = {2023},
volume = {},
number = {},
pages = {1-24},
keywords = {Optical interconnections;Optical switches;Integrated circuit interconnections;Machine learning;Supercomputers;Switching circuits},
doi = {10.1109/HCS59251.2023.10254691}
}
Resiliency at Scale: Managing Google’s TPUv4 Machine Learning Supercomputer
Yazhou Zu, Alireza Ghaffarkhah, Hoang-Vu Dang, Brian Towles, Steven Hand, Safeen Huda, Adekunle Bello, Alexander Kolbasov, Arash Rezaei, Dayou Du, Steve Lacy, Hang Wang, Aaron Wisner, Chris Lewis, and Henri Bahini.
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24),
2024.
BibTeX
Click
here to close the dropdown!
@inproceedings{a295551,
author = {Zu, Yazhou and Ghaffarkhah, Alireza and Dang, Hoang-Vu and Towles, Brian and Hand, Steven and Huda, Safeen and Bello, Adekunle and Kolbasov, Alexander and Rezaei, Arash and Du, Dayou and Lacy, Steve and Wang, Hang and Wisner, Aaron and Lewis, Chris and Bahini, Henri},
title = {Resiliency at Scale: Managing {Google{\textquoteright}s} {TPUv4} Machine Learning Supercomputer},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {761--774},
url = {https://www.usenix.org/conference/nsdi24/presentation/zu},
publisher = {USENIX Association}
}
Jupiter evolving: transforming google’s datacenter network via optical circuit switches and software-defined networking
Leon Poutievski, Omid Mashayekhi, Joon Ong, Arjun Singh, Mukarram Tariq, Rui Wang, Jianan Zhang, Virginia Beauregard, Patrick Conner, Steve Gribble, Rishi Kapoor, Stephen Kratzer, Nanfang Li, Hong Liu, Karthik Nagaraj, Jason Ornstein, Samir Sawhney, Ryohei Urata, Lorenzo Vicisano, Kevin Yasumura, Shidong Zhang, Junlan Zhou, and Amin Vahdat.
Proceedings of the ACM SIGCOMM 2022 Conference,
Amsterdam, Netherlands,
2022.
Abstract
BibTeX
Click
here to close the dropdown!
We present a decade of evolution and production experience with Jupiter datacenter network fabrics. In this period Jupiter has delivered 5x higher speed and capacity, 30% reduction in capex, 41% reduction in power, incremental deployment and technology refresh all while serving live production traffic. A key enabler for these improvements is evolving Jupiter from a Clos to a direct-connect topology among the machine aggregation blocks. Critical architectural changes for this include: A datacenter interconnection layer employing Micro-Electro-Mechanical Systems (MEMS) based Optical Circuit Switches (OCSes) to enable dynamic topology reconfiguration, centralized Software-Defined Networking (SDN) control for traffic engineering, and automated network operations for incremental capacity delivery and topology engineering. We show that the combination of traffic and topology engineering on direct-connect fabrics achieves similar throughput as Clos fabrics for our production traffic patterns. We also optimize for path lengths: 60% of the traffic takes direct path from source to destination aggregation blocks, while the remaining transits one additional block, achieving an average block-level path length of 1.4 in our fleet today. OCS also achieves 3x faster fabric reconfiguration compared to pre-evolution Clos fabrics that used a patch panel based interconnect.
Click
here to close the dropdown!
@inproceedings{a10.1145/3544216.3544265,
author = {Poutievski, Leon and Mashayekhi, Omid and Ong, Joon and Singh, Arjun and Tariq, Mukarram and Wang, Rui and Zhang, Jianan and Beauregard, Virginia and Conner, Patrick and Gribble, Steve and Kapoor, Rishi and Kratzer, Stephen and Li, Nanfang and Liu, Hong and Nagaraj, Karthik and Ornstein, Jason and Sawhney, Samir and Urata, Ryohei and Vicisano, Lorenzo and Yasumura, Kevin and Zhang, Shidong and Zhou, Junlan and Vahdat, Amin},
title = {Jupiter evolving: transforming google's datacenter network via optical circuit switches and software-defined networking},
year = {2022},
isbn = {9781450394208},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3544216.3544265},
doi = {10.1145/3544216.3544265},
booktitle = {Proceedings of the ACM SIGCOMM 2022 Conference},
pages = {66–85},
numpages = {20},
keywords = {datacenter network, optical circuit switches, software-defined networking, topology engineering, traffic engineering},
series = {SIGCOMM '22}
}
SiP-ML: high-bandwidth optical network interconnects for machine learning training
Mehrdad Khani, Manya Ghobadi, Mohammad Alizadeh, Ziyi Zhu, Madeleine Glick, Keren Bergman, Amin Vahdat, Benjamin Klenk, and Eiman Ebrahimi.
Proceedings of the 2021 ACM SIGCOMM 2021 Conference,
Virtual Event, USA,
2021.
Abstract
BibTeX
Click
here to close the dropdown!
This paper proposes optical network interconnects as a key enabler for building high-bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML, accelerates the training time of popular DNN models using silicon photonics links capable of providing multiple terabits-per-second of bandwidth per GPU. SiP-ML partitions the training job across GPUs with hybrid data and model parallelism while ensuring the communication pattern can be supported efficiently on the network interconnect. We develop task partitioning and device placement methods that take the degree and reconfiguration latency of optical interconnects into account. Simulations using real DNN models show that, compared to the state-of-the-art electrical networks, our approach improves training time by 1.3–9.1x.
Click
here to close the dropdown!
@inproceedings{10.1145/3452296.3472900,
author = {Khani, Mehrdad and Ghobadi, Manya and Alizadeh, Mohammad and Zhu, Ziyi and Glick, Madeleine and Bergman, Keren and Vahdat, Amin and Klenk, Benjamin and Ebrahimi, Eiman},
title = {SiP-ML: high-bandwidth optical network interconnects for machine learning training},
year = {2021},
isbn = {9781450383837},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3452296.3472900},
doi = {10.1145/3452296.3472900},
booktitle = {Proceedings of the 2021 ACM SIGCOMM 2021 Conference},
pages = {657–675},
numpages = {19},
keywords = {distributed machine learning, optical networks, reconfigurable networks, silicon photonics},
series = {SIGCOMM '21}
}
Efficient Direct-Connect Topologies for Collective Communications
Liangyu Zhao, Siddharth Pal, Tapan Chugh, Weiyang Wang, Jason Fantl, Prithwish Basu, Joud Khoury, and Arvind Krishnamurthy.
22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25),
2025.
BibTeX
Click
here to close the dropdown!
@inproceedings{a305352,
author = {Zhao, Liangyu and Pal, Siddharth and Chugh, Tapan and Wang, Weiyang and Fantl, Jason and Basu, Prithwish and Khoury, Joud and Krishnamurthy, Arvind},
title = {Efficient {Direct-Connect} Topologies for Collective Communications},
booktitle = {22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)},
year = {2025},
isbn = {978-1-939133-46-5},
address = {Philadelphia, PA},
pages = {705--737},
url = {https://www.usenix.org/conference/nsdi25/presentation/zhao-liangyu},
publisher = {USENIX Association}
}
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang, Moein Khazraee, Zhizhen Zhong, Manya Ghobadi, Zhihao Jia, Dheevatsa Mudigere, Ying Zhang, and Anthony Kewitsch.
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23),
2023.
BibTeX
Click
here to close the dropdown!
@inproceedings{a285119,
author = {Wang, Weiyang and Khazraee, Moein and Zhong, Zhizhen and Ghobadi, Manya and Jia, Zhihao and Mudigere, Dheevatsa and Zhang, Ying and Kewitsch, Anthony},
title = {{TopoOpt}: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {739--767},
url = {https://www.usenix.org/conference/nsdi23/presentation/wang-weiyang},
publisher = {USENIX Association}
}
LUMION: Fast Fault Recovery for ML Jobs Using Programmable Optical Fabrics
Abhishek Vijaya Kumar, Eric Ding, Arjun Devraj, Darius Bunandar, and Rachee Singh.
arXiv preprint arXiv:2505.23105,
2025.
BibTeX
Click
here to close the dropdown!
@article{kumar2025lumion,
title = {LUMION: Fast Fault Recovery for ML Jobs Using Programmable Optical Fabrics},
author = {Kumar, Abhishek Vijaya and Ding, Eric and Devraj, Arjun and Bunandar, Darius and Singh, Rachee},
journal = {arXiv preprint arXiv:2505.23105},
year = {2025},
url = {https://arxiv.org/abs/2505.23105}
}
A case for server-scale photonic connectivity
Abhishek Vijaya Kumar, Arjun Devraj, Darius Bunandar, and Rachee Singh.
Proceedings of the 23rd ACM Workshop on Hot Topics in Networks,
Irvine, CA, USA,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
The commoditization of machine learning is fuelling the demand for compute required to both train large models and infer from them. At the same time, scaling the performance of individual microprocessors to satisfy the demand for compute has become increasingly difficult since the end of Moore’s law and Dennard scaling. As a result, compute resources in modern servers are distributed across multiple accelerators on the server board. In this work, we make the case for using optics to interconnect accelerators within a server. A key benefit of on-board chip-to-chip optical connectivity is its ability to dynamically allocate bandwidth between accelerators, where necessary, rather than the common practice of statically dividing bandwidth among links within the topology of a multi-accelerator server, as seen in popular direct-connect architectures. This property prevents bandwidth under-utilization in state-of-the-art rack-scale multi-accelerator deployments. Moreover, server-scale optical connectivity can reduce the blast radius of individual accelerator failures in rack-scale ML deployments. Our early experiments with the prototype of a newly commercialized server-scale photonic interconnect show how the capability of the hardware can enable our vision.
Click
here to close the dropdown!
@inproceedings{10.1145/3696348.3696856,
author = {Kumar, Abhishek Vijaya and Devraj, Arjun and Bunandar, Darius and Singh, Rachee},
title = {A case for server-scale photonic connectivity},
year = {2024},
isbn = {9798400712722},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3696348.3696856},
doi = {10.1145/3696348.3696856},
booktitle = {Proceedings of the 23rd ACM Workshop on Hot Topics in Networks},
pages = {290–299},
numpages = {10},
keywords = {Silicon photonics, collective communication, distributed machine learning, optical networks, reconfigurable networks},
series = {HotNets '24}
}
PipSwitch: A Circuit Switch Using Programmable Integrated Photonics
Eric Ding and Rachee Singh.
arXiv preprint arXiv:2501.18136,
2025.
BibTeX
Click
here to close the dropdown!
@article{ding2025pipswitch,
title = {PipSwitch: A Circuit Switch Using Programmable Integrated Photonics},
author = {Ding, Eric and Singh, Rachee},
journal = {arXiv preprint arXiv:2501.18136},
year = {2025},
url = {https://arxiv.org/abs/2501.18136}
}
TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Package Optical I/O
Mark Wade, Erik Anderson, Shahab Ardalan, Pavan Bhargava, Sidney Buchbinder, Michael L. Davenport, John Fini, Haiwei Lu, Chen Li, Roy Meade, Chandru Ramamurthy, Michael Rust, Forrest Sedgwick, Vladimir Stojanovic, Derek Van Orden, Chong Zhang, Chen Sun, Sergey Y. Shumarayev, Conor O’Keeffe, Tim T. Hoang, David Kehlet, Ravi V. Mahajan, Matthew T. Guzy, Allen Chan, and Tina Tran.
IEEE Micro,
2020.
BibTeX
Click
here to close the dropdown!
@article{a9007742,
author = {Wade, Mark and Anderson, Erik and Ardalan, Shahab and Bhargava, Pavan and Buchbinder, Sidney and L. Davenport, Michael and Fini, John and Lu, Haiwei and Li, Chen and Meade, Roy and Ramamurthy, Chandru and Rust, Michael and Sedgwick, Forrest and Stojanovic, Vladimir and Van Orden, Derek and Zhang, Chong and Sun, Chen and Shumarayev, Sergey Y. and O'Keeffe, Conor and Hoang, Tim T. and Kehlet, David and Mahajan, Ravi V. and Guzy, Matthew T. and Chan, Allen and Tran, Tina},
journal = {IEEE Micro},
title = {TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Package Optical I/O},
year = {2020},
volume = {40},
number = {2},
pages = {63-71},
keywords = {Optical fibers;Photonics;High-speed optical techniques;Energy efficiency;Bandwidth;Packaging;optical I/O;silicon photonics;FPGA;chiplets;multi-chip package;AIB;EMIB},
doi = {10.1109/MM.2020.2976067}
}
Enhancing Network Management Using Code Generated by Large Language Models
Sathiya Kumaran Mani, Yajie Zhou, Kevin Hsieh, Santiago Segarra, Trevor Eberl, Eliran Azulai, Ido Frizler, Ranveer Chandra, and Srikanth Kandula.
Proceedings of the 22nd ACM Workshop on Hot Topics in Networks,
Cambridge, MA, USA,
2023.
Abstract
BibTeX
Click
here to close the dropdown!
Analyzing network topologies and communication graphs is essential in modern network management. However, the lack of a cohesive approach results in a steep learning curve, increased errors, and inefficiencies. In this paper, we present a novel approach that enables natural-language-based network management experiences, leveraging large language models (LLMs) to generate task-specific code from natural language queries. This method addresses the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code, removing the need to share network data with LLMs, and focusing on application-specific requests combined with program synthesis techniques. We develop and evaluate a prototype system using benchmark applications, demonstrating high accuracy, cost-effectiveness, and potential for further improvements using complementary program synthesis techniques.
Click
here to close the dropdown!
@inproceedings{10.1145/3626111.3628183,
author = {Mani, Sathiya Kumaran and Zhou, Yajie and Hsieh, Kevin and Segarra, Santiago and Eberl, Trevor and Azulai, Eliran and Frizler, Ido and Chandra, Ranveer and Kandula, Srikanth},
title = {Enhancing Network Management Using Code Generated by Large Language Models},
year = {2023},
isbn = {9798400704154},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3626111.3628183},
doi = {10.1145/3626111.3628183},
booktitle = {Proceedings of the 22nd ACM Workshop on Hot Topics in Networks},
pages = {196–204},
numpages = {9},
keywords = {Communication graphs, Graph manipulation, Large language model, Natural language processing, Network lifecycle management, Network management, Program synthesis},
series = {HotNets '23}
}
Learning to Configure Computer Networks with Neural Algorithmic Reasoning
Luca Beurer-Kellner, Martin Vechev, Laurent Vanbever, and Petar Veličković.
Advances in Neural Information Processing Systems,
2022.
BibTeX
Click
here to close the dropdown!
@inproceedings{NEURIPS2022_04cc90ec,
author = {Beurer-Kellner, Luca and Vechev, Martin and Vanbever, Laurent and Veli\v{c}kovi\'{c}, Petar},
booktitle = {Advances in Neural Information Processing Systems},
editor = {Koyejo, S. and Mohamed, S. and Agarwal, A. and Belgrave, D. and Cho, K. and Oh, A.},
pages = {730--742},
publisher = {Curran Associates, Inc.},
title = {Learning to Configure Computer Networks with Neural Algorithmic Reasoning},
url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/04cc90ec6868b97b7423dc38ced1e35c-Paper-Conference.pdf},
volume = {35},
year = {2022}
}
What do LLMs need to Synthesize Correct Router Configurations?
Rajdeep Mondal, Alan Tang, Ryan Beckett, Todd Millstein, and George Varghese.
Proceedings of the 22nd ACM Workshop on Hot Topics in Networks,
Cambridge, MA, USA,
2023.
Abstract
BibTeX
Click
here to close the dropdown!
We investigate whether Large Language Models (e.g., GPT-4) can synthesize correct router configurations with reduced manual effort. We find GPT-4 works very badly by itself, producing promising draft configurations but with egregious errors in topology, syntax, and semantics. Our strategy, that we call Verified Prompt Programming, is to combine GPT-4 with verifiers, and use localized feedback from the verifier to automatically correct errors. Verification requires a specification and actionable localized feedback to be effective. We show results for two use cases: translating from Cisco to Juniper configurations on a single router, and implementing a no-transit policy on multiple routers. While human input is still required, if we define the leverage as the number of automated prompts to the number of human prompts, our experiments show a leverage of 10X for Juniper translation, and 6X for implementing the no-transit policy, ending with verified configurations.
Click
here to close the dropdown!
@inproceedings{10.1145/3626111.3628194,
author = {Mondal, Rajdeep and Tang, Alan and Beckett, Ryan and Millstein, Todd and Varghese, George},
title = {What do LLMs need to Synthesize Correct Router Configurations?},
year = {2023},
isbn = {9798400704154},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3626111.3628194},
doi = {10.1145/3626111.3628194},
booktitle = {Proceedings of the 22nd ACM Workshop on Hot Topics in Networks},
pages = {189–195},
numpages = {7},
keywords = {CoSynth, large language models (LLMs), network verification and synthesis},
series = {HotNets '23}
}
Designing Network Algorithms via Large Language Models
Zhiyuan He, Aashish Gottipati, Lili Qiu, Xufang Luo, Kenuo Xu, Yuqing Yang, and Francis Y. Yan.
Proceedings of the 23rd ACM Workshop on Hot Topics in Networks,
Irvine, CA, USA,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
We introduce Nada, the first framework to autonomously design network algorithms by leveraging the generative capabilities of large language models (LLMs). Starting with an existing algorithm implementation, Nada enables LLMs to create a wide variety of alternative designs in the form of code blocks. It then efficiently identifies the top-performing designs through a series of filtering techniques, minimizing the need for full-scale evaluations and significantly reducing computational costs. Using adaptive bitrate (ABR) streaming as a case study, we demonstrate that Nada produces novel ABR algorithms—previously unknown to human developers—that consistently outperform the original algorithm in diverse network environments, including broadband, satellite, 4G, and 5G.
Click
here to close the dropdown!
@inproceedings{10.1145/3696348.3696868,
author = {He, Zhiyuan and Gottipati, Aashish and Qiu, Lili and Luo, Xufang and Xu, Kenuo and Yang, Yuqing and Yan, Francis Y.},
title = {Designing Network Algorithms via Large Language Models},
year = {2024},
isbn = {9798400712722},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3696348.3696868},
doi = {10.1145/3696348.3696868},
booktitle = {Proceedings of the 23rd ACM Workshop on Hot Topics in Networks},
pages = {205–212},
numpages = {8},
keywords = {Large Language Models, Network Algorithms},
series = {HotNets '24}
}
NetConfEval: Can LLMs Facilitate Network Configuration?
Changjie Wang, Mariano Scazzariello, Alireza Farshin, Simone Ferlin, Dejan Kostić, and Marco Chiesa.
Proc. ACM Netw.,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
This paper explores opportunities to utilize Large Language Models (LLMs) to make network configuration human-friendly, simplifying the configuration of network devices & development of routing algorithms and minimizing errors. We design a set of benchmarks (NetConfEval) to examine the effectiveness of different models in facilitating and automating network configuration. More specifically, we focus on the scenarios where LLMs translate high-level policies, requirements, and descriptions (i.e., specified in natural language) into low-level network configurations & Python code. NetConfEval considers four tasks that could potentially facilitate network configuration, such as (i) generating high-level requirements into a formal specification format, (ii) generating API/function calls from high-level requirements, (iii) developing routing algorithms based on high-level descriptions, and (iv) generating low-level configuration for existing and new protocols based on input documentation. Learning from the results of our study, we propose a set of principles to design LLM-based systems to configure networks. Finally, we present two GPT-4-based prototypes to (i) automatically configure P4-enabled devices from a set of high-level requirements and (ii) integrate LLMs into existing network synthesizers.
Click
here to close the dropdown!
@article{10.1145/3656296,
author = {Wang, Changjie and Scazzariello, Mariano and Farshin, Alireza and Ferlin, Simone and Kosti\'{c}, Dejan and Chiesa, Marco},
title = {NetConfEval: Can LLMs Facilitate Network Configuration?},
year = {2024},
issue_date = {June 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {2},
number = {CoNEXT2},
url = {https://doi.org/10.1145/3656296},
doi = {10.1145/3656296},
journal = {Proc. ACM Netw.},
articleno = {7},
numpages = {25},
keywords = {benchmark, code generation, function calling, large language models (llms), network configuration, network synthesizer, p4, rag, routing algorithms}
}
Credence: Augmenting Datacenter Switch Buffer Sharing with ML Predictions
Vamsi Addanki, Maciej Pacut, and Stefan Schmid.
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24),
2024.
BibTeX
Click
here to close the dropdown!
@inproceedings{a295535,
author = {Addanki, Vamsi and Pacut, Maciej and Schmid, Stefan},
title = {Credence: Augmenting Datacenter Switch Buffer Sharing with {ML} Predictions},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {613--634},
url = {https://www.usenix.org/conference/nsdi24/presentation/addanki-credence},
publisher = {USENIX Association}
}
Improving Online Algorithms via ML Predictions
Manish Purohit, Zoya Svitkina, and Ravi Kumar.
Advances in Neural Information Processing Systems,
2018.
BibTeX
Click
here to close the dropdown!
@inproceedings{NEURIPS2018_73a427ba,
author = {Purohit, Manish and Svitkina, Zoya and Kumar, Ravi},
booktitle = {Advances in Neural Information Processing Systems},
editor = {Bengio, S. and Wallach, H. and Larochelle, H. and Grauman, K. and Cesa-Bianchi, N. and Garnett, R.},
pages = {},
publisher = {Curran Associates, Inc.},
title = {Improving Online Algorithms via ML Predictions},
url = {https://proceedings.neurips.cc/paper_files/paper/2018/file/73a427badebe0e32caa2e1fc7530b7f3-Paper.pdf},
volume = {31},
year = {2018}
}
Towards Integrating Formal Methods into ML-Based Systems for Networking
Fengchen Gong, Divya Raghunathan, Aarti Gupta, and Maria Apostolaki.
Proceedings of the 22nd ACM Workshop on Hot Topics in Networks,
Cambridge, MA, USA,
2023.
Abstract
BibTeX
Click
here to close the dropdown!
Owing to its adaptability and scalability, Machine Learning (ML) has gained significant momentum in the networking community. Yet, ML models can still produce outputs that contradict knowledge, i.e., established networking rules and principles. On the other hand, Formal Methods (FM) use rigorous mathematical reasoning based on knowledge, but suffer from the lack of scalability. To capitalize on the complementary strengths of both approaches, we advocate for the integration of knowledge-based FM into ML-based systems for networking problems. Through a case study, we demonstrate the benefits and limitations of using ML models or FM alone. We find that incorporating FM in the training and inference of an ML model yields not only more reliable results but also better performance in various downstream tasks. We hope that our paper inspires a tighter integration of FM-based and ML-based approaches in networking, facilitating the development of more robust and dependable systems.
Click
here to close the dropdown!
@inproceedings{10.1145/3626111.3628188,
author = {Gong, Fengchen and Raghunathan, Divya and Gupta, Aarti and Apostolaki, Maria},
title = {Towards Integrating Formal Methods into ML-Based Systems for Networking},
year = {2023},
isbn = {9798400704154},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3626111.3628188},
doi = {10.1145/3626111.3628188},
booktitle = {Proceedings of the 22nd ACM Workshop on Hot Topics in Networks},
pages = {48–55},
numpages = {8},
keywords = {Formal Methods, Imputation, Telemetry, Transformer},
series = {HotNets '23}
}
Zoom2Net: Constrained Network Telemetry Imputation
Fengchen Gong, Divya Raghunathan, Aarti Gupta, and Maria Apostolaki.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
Fine-grained monitoring is crucial for multiple data-driven tasks such as debugging, provisioning, and securing networks. Yet, practical constraints in collecting, extracting, and storing data often force operators to use coarse-grained sampled monitoring, degrading the performance of the various tasks. In this work, we explore the feasibility of leveraging the correlations among coarse-grained time series to impute their fine-grained counterparts in software. We present Zoom2Net, a transformer-based model for network imputation that incorporates domain knowledge through operational and measurement constraints, ensuring that the imputed network telemetry time series are not only realistic but align with existing measurements. This approach enhances the capabilities of current monitoring infrastructures, allowing operators to gain more insights into system behaviors without the need for hardware upgrades. We evaluate Zoom2Net on four diverse datasets (e.g., cloud telemetry and Internet data transfer) and use cases (e.g., bursts analysis and traffic classification). We demonstrate that Zoom2Net consistently achieves high imputation accuracy with a zoom-in factor of up to 100 and performs better on downstream tasks compared to baselines by an average of 38%.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672225,
author = {Gong, Fengchen and Raghunathan, Divya and Gupta, Aarti and Apostolaki, Maria},
title = {Zoom2Net: Constrained Network Telemetry Imputation},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672225},
doi = {10.1145/3651890.3672225},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {764–777},
numpages = {14},
keywords = {telemetry, imputation, formal methods, transformer},
series = {ACM SIGCOMM '24}
}
DOTE: Rethinking (Predictive) WAN Traffic Engineering
Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch, Srikanth Kandula, Ishai Menache, Michael Schapira, and Aviv Tamar.
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23),
2023.
BibTeX
Click
here to close the dropdown!
@inproceedings{a286421,
author = {Perry, Yarin and Frujeri, Felipe Vieira and Hoch, Chaim and Kandula, Srikanth and Menache, Ishai and Schapira, Michael and Tamar, Aviv},
title = {{DOTE}: Rethinking (Predictive) {WAN} Traffic Engineering},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {1557--1581},
url = {https://www.usenix.org/conference/nsdi23/presentation/perry},
publisher = {USENIX Association}
}
RedTE: Mitigating Subsecond Traffic Bursts with Real-time and Distributed Traffic Engineering
Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min, and Yi Wang.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
Internet traffic bursts usually happen within a second, thus conventional burst mitigation methods ignore the potential of Traffic Engineering (TE). However, our experiments indicate that a TE system, with a sub-second control loop latency, can effectively alleviate burst-induced congestion. TE-based methods can leverage network-wide tunnel-level information to make globally informed decisions (e.g., balancing traffic bursts among multiple paths). Our insight in reducing control loop latency is to let each router make local TE decisions, but this introduces the key challenge of minimizing performance loss compared to centralized TE systems.In this paper, we present RedTE, a novel distributed TE system with a control loop latency of < 100ms, while achieving performance comparable to centralized TE systems. RedTE’s innovation is the modeling of TE as a distributed cooperative multi-agent problem, and we design a novel multi-agent deep reinforcement learning algorithm to solve it, which enables each agent to make globally informed decisions solely based on local information. We implement real RedTE routers and deploy them on a WAN spanning six city datacenters. Evaluation reveals notable improvements compared to existing solutions: < 100ms of control loop latency, a 37.4% reduction in maximum link utilization, and a 78.9% reduction in average queue length.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672231,
author = {Gui, Fei and Wang, Songtao and Li, Dan and Chen, Li and Gao, Kaihui and Min, Congcong and Wang, Yi},
title = {RedTE: Mitigating Subsecond Traffic Bursts with Real-time and Distributed Traffic Engineering},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672231},
doi = {10.1145/3651890.3672231},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {71–85},
numpages = {15},
keywords = {traffic engineering, network optimization, machine learning},
series = {ACM SIGCOMM '24}
}
Transferable Neural WAN TE for Changing Topologies
Abd AlRhman AlQiam, Yuanjun Yao, Zhaodong Wang, Satyajeet Singh Ahuja, Ying Zhang, Sanjay G. Rao, Bruno Ribeiro, and Mohit Tawarmalani.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
Recently, researchers have proposed ML-driven traffic engineering (TE) schemes where a neural network model is used to produce TE decisions in lieu of conventional optimization solvers. Unfortunately existing ML-based TE schemes are not explicitly designed to be robust to topology changes that may occur due to WAN evolution, failures or planned maintenance. In this paper, we present HARP, a neural model for TE explicitly capable of handling variations in topology including those not observed in training. HARP is designed with two principles in mind: (i) ensure invariances to natural input transformations (e.g., permutations of node ids, tunnel reordering); and (ii) align neural architecture to the optimization model. Evaluations on a multi-week dataset of a large private WAN show HARP achieves an MLU at most 11% higher than optimal over 98% of the time despite encountering significantly different topologies in testing relative to training data. Further, comparisons with state-of-the-art ML-based TE schemes indicate the importance of the mechanisms introduced by HARP to handle topology variability. Finally, when predicted traffic matrices are provided, HARP outperforms classic optimization solvers achieving a median reduction in MLU of 5 to 10% on the true traffic matrix.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672237,
author = {AlQiam, Abd AlRhman and Yao, Yuanjun and Wang, Zhaodong and Ahuja, Satyajeet Singh and Zhang, Ying and Rao, Sanjay G. and Ribeiro, Bruno and Tawarmalani, Mohit},
title = {Transferable Neural WAN TE for Changing Topologies},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672237},
doi = {10.1145/3651890.3672237},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {86–102},
numpages = {17},
keywords = {traffic engineering, wide-area networks, network optimization, machine learning},
series = {ACM SIGCOMM '24}
}
FIGRET: Fine-Grained Robustness-Enhanced Traffic Engineering
Ximeng Liu, Shizhen Zhao, Yong Cui, and Xinbing Wang.
Proceedings of the ACM SIGCOMM 2024 Conference,
Sydney, NSW, Australia,
2024.
Abstract
BibTeX
Click
here to close the dropdown!
Traffic Engineering (TE) is critical for improving network performance and reliability. A key challenge in TE is the management of sudden traffic bursts. Existing TE schemes either do not handle traffic bursts or uniformly guard against traffic bursts, thereby facing difficulties in achieving a balance between normal-case performance and burst-case performance. To address this issue, we introduce FIGRET, a Fine-Grained Robustness-Enhanced TE scheme. FIGRET offers a novel approach to TE by providing varying levels of robustness enhancements, customized according to the distinct traffic characteristics of various source-destination pairs. By leveraging a burst-aware loss function and deep learning techniques, FIGRET is capable of generating high-quality TE solutions efficiently. Our evaluations of real-world production networks, including Wide Area Networks and data centers, demonstrate that FIGRET significantly outperforms existing TE schemes. Compared to the TE scheme currently deployed in Google’s Jupiter data center networks, FIGRET achieves a 9%-34% reduction in average Maximum Link Utilization and improves solution speed by 35\texttimes-1800\texttimes. Against DOTE, a state-of-the-art deep learning-based TE method, FIGRET substantially lowers the occurrence of significant congestion events triggered by traffic bursts by 41%-53.9% in topologies with high traffic dynamics.
Click
here to close the dropdown!
@inproceedings{10.1145/3651890.3672258,
author = {Liu, Ximeng and Zhao, Shizhen and Cui, Yong and Wang, Xinbing},
title = {FIGRET: Fine-Grained Robustness-Enhanced Traffic Engineering},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672258},
doi = {10.1145/3651890.3672258},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {117–135},
numpages = {19},
keywords = {traffic engineering, wide-area networks, datacenter networks, machine learning},
series = {ACM SIGCOMM '24}
}
TCP ex machina: computer-generated congestion control
Keith Winstein and Hari Balakrishnan.
Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM,
Hong Kong, China,
2013.
Abstract
BibTeX
Click
here to close the dropdown!
This paper describes a new approach to end-to-end congestion control on a multi-user network. Rather than manually formulate each endpoint’s reaction to congestion signals, as in traditional protocols, we developed a program called Remy that generates congestion-control algorithms to run at the endpoints.In this approach, the protocol designer specifies their prior knowledge or assumptions about the network and an objective that the algorithm will try to achieve, e.g., high throughput and low queueing delay. Remy then produces a distributed algorithm—the control rules for the independent endpoints—that tries to achieve this objective.In simulations with ns-2, Remy-generated algorithms outperformed human-designed end-to-end techniques, including TCP Cubic, Compound, and Vegas. In many cases, Remy’s algorithms also outperformed methods that require intrusive in-network changes, including XCP and Cubic-over-sfqCoDel (stochastic fair queueing with CoDel for active queue management). Remy can generate algorithms both for networks where some parameters are known tightly a priori, e.g. datacenters, and for networks where prior knowledge is less precise, such as cellular networks. We characterize the sensitivity of the resulting performance to the specificity of the prior knowledge, and the consequences when real-world conditions contradict the assumptions supplied at design-time.
Click
here to close the dropdown!
@inproceedings{10.1145/2486001.2486020,
author = {Winstein, Keith and Balakrishnan, Hari},
title = {TCP ex machina: computer-generated congestion control},
year = {2013},
isbn = {9781450320566},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2486001.2486020},
doi = {10.1145/2486001.2486020},
booktitle = {Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM},
pages = {123–134},
numpages = {12},
keywords = {computer-designed algorithms, congestion control},
series = {SIGCOMM '13}
}
Mowgli: Passively Learned Rate Control for Real-Time Video
Neil Agarwal, Rui Pan, Francis Y. Yan, and Ravi Netravali.
22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25),
2025.
BibTeX
Click
here to close the dropdown!
@inproceedings{a306039,
author = {Agarwal, Neil and Pan, Rui and Yan, Francis Y. and Netravali, Ravi},
title = {Mowgli: Passively Learned Rate Control for {Real-Time} Video},
booktitle = {22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)},
year = {2025},
isbn = {978-1-939133-46-5},
address = {Philadelphia, PA},
pages = {579--594},
url = {https://www.usenix.org/conference/nsdi25/presentation/agarwal},
publisher = {USENIX Association}
}
{PCC}: Re-architecting congestion control for consistent high performance
Mo Dong, Qingxi Li, Doron Zarchy, P Brighten Godfrey, and Michael Schapira.
12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15),
2015.
BibTeX
Click
here to close the dropdown!
@inproceedings{dong2015pcc,
title = {$\{$PCC$\}$: Re-architecting congestion control for consistent high performance},
author = {Dong, Mo and Li, Qingxi and Zarchy, Doron and Godfrey, P Brighten and Schapira, Michael},
booktitle = {12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15)},
pages = {395--408},
year = {2015},
url = {https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-dong.pdf}
}
PCC Vivace: Online-Learning Congestion Control
Mo Dong, Tong Meng, Doron Zarchy, Engin Arslan, Yossi Gilad, Brighten Godfrey, and Michael Schapira.
15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18),
2018.
BibTeX
Click
here to close the dropdown!
@inproceedings{a211245,
author = {Dong, Mo and Meng, Tong and Zarchy, Doron and Arslan, Engin and Gilad, Yossi and Godfrey, Brighten and Schapira, Michael},
title = {{PCC} Vivace: {Online-Learning} Congestion Control},
booktitle = {15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18)},
year = {2018},
isbn = {978-1-939133-01-4},
address = {Renton, WA},
pages = {343--356},
url = {https://www.usenix.org/conference/nsdi18/presentation/dong},
publisher = {USENIX Association}
}