Computer Science Thesis Oral
- Gates Hillman 8102 and Zoom
- In Person and Virtual ET
- DAEHYEOK KIM
- Ph.D. Student
- Computer Science Department
- Carnegie Mellon University
Towards Elastic and Resilient In-Network Computing
Recent advances in programmable networking hardware technology such as programmable switches and network interface cards create a new computing paradigm called in-network computing. This new paradigm allows functionality that has been served by servers or proprietary hardware devices, ranging from network middleboxes to components of distributed systems, to now be performed in the network. The demand for higher performance and the commercial availability of programmable hardware have driven the popularity of in-network computing.
While many recent efforts have demonstrated the performance benefit of in-network computing, we observe that there is still a huge gap between what it offers today and evolving application demands. In particular, we argue that in-network computing lacks resource elasticity and fault resiliency which are essential building blocks for any practical computing platform, limiting its
potential. Elasticity can address the shortcoming that today's in-network computing only supports a simple deployment model where a single application runs on a single device equipped with fixed and limited resources. Similarly, fault resiliency is critical for managing prevalent device failures for the correctness and performance of applications, but it has gained little attention. Although resource elasticity and fault resiliency have been extensively studied for traditional server-based computing, we find that enabling them on programmable networking devices is challenging, especially due to their hardware constraints and workload characteristics.
In this thesis, we argue that by designing abstractions that effectively leverage resources available outside a single type of device while hiding the complexities of dealing with device heterogeneity, we can make in-network computing more elastic and resilient without any hardware modifications. This concept, which we call device resource augmentation, is a key enabler for resource elasticity and fault resiliency for stateful in-network applications written for programmable switches. In particular, we design three systems, named TEA, ExoPlane, and RedPlane, that use this concept to provide support for elastic memory and elastic compute/memory, and fault resiliency, respectively. Each of these systems consists of a key abstraction, programming APIs, and a runtime environment. We demonstrate their feasibility and effectiveness with prototype implementations and evaluations using various in-network applications.
Srinivasan Seshan (Co-Chair)
Vyas Sekar (Co-Chair)
Jennifer Rexford (Princeton University)
Jitendra Padhye (Microsoft Research)
In Person and Zoom Participation. See announcement.