Intuitionistic type theory is a widely-used framework for constructive mathematics and computer programming. Martin-Lof's meaning explanations justify the rules of type theory by defining types as classifications of programs according to their behaviors. Recently, researchers noticed that the rules of type theory also admit homotopy-theoretic models, and subsequently extended type theory with constructs inspired by these models: higher inductive types and Voevodsky's univalence axiom. Although such higher-dimensional type theories have proved useful for homotopy-theoretic reasoning, they lack a computational interpretation.

In this proposal, I describe a modification of the meaning explanations that accounts for higher-dimensional types as classifications of higher-dimensional programs. I aim to extend these cubical meaning explanations with additional features inspired by other higher-dimensional and computational type theories, and explore the programming applications of higher-dimensional programs.

**Thesis Committee: **

Robert Harper (Chair)

Jeremy Avigad

Karl Crary

Daniel R. Licata (Wesleyan University)

Todd Wilson (California State University, Fresno)

# CSD

In this talk, we will discuss the classic bipartite matching problem in the online setting, first introduced in the seminal work of Karp, Vazirani and Vazirani (STOC '90). Specifically, we consider the problem for the well-studied class of *regular* graphs. Matching in this class of graphs was studied extensively in the offline setting. In the online setting, an optimal deterministic algorithm, as well as efficient algorithms under stochastic input assumptions were known. In this work, we present a novel randomized algorithm with competitive ratio tending to *one* on this family of graphs, under *adversarial* arrival order. Our main result is an algorithm which achieves competitive ratio 1-O(\sqrt{\log d} / \sqrt{d}) in expectation on d-regular graphs, and thus the problem becomes easier for randomized algorithms as d increases (for deterministic algorithms, the optimal competitive ratio is known to decrease as d increases, and tends to 1-1/e).

In contrast, we show that all previously-known online algorithms, such as the generally worst-case optimal RANKING algorithm of Karp et al., are restricted to a competitive ratio strictly bounded away from one, even as d grows. Moreover, we show the converge rate of our algorithm's competitive ratio to one is nearly tight, as no algorithm achieves competitive ratio better than 1-O(1 / \sqrt{d}). Finally, we show that our algorithm yields a similar competitive ratio with high probability, as well as guaranteeing each offline vertex a probability of being matched tending to one.

*Joint work with Ilan R. Cohen (Hebrew University of Jerusalem). *

Machine learning (ML) methods are used to analyze data which are collected from various sources. As the problem size grows, we turn to distributed parallel computation to complete ML training in a reasonable amount of time. However, naive parallelization of ML algorithms often hurts the effectiveness of parameter updates due to the dependency structure among model parameters, and a subset of model parameters often bottlenecks the completion of ML algorithms due to the uneven convergence rate. In this proposal, I propose two efforts: 1) STRADS that improves the training speed in an order of magnitude and 2) STRADS-AP that makes parallel ML programming easier.

In STRADS, I will first present scheduled model-parallel approach with two specific scheduling schemes: 1) model parameter dependency checking to avoid updating dependent parameters concurrently; 2) parameter prioritization to give more update chances to the parameters far from its convergence point. To efficiently run the scheduled model-parallel in a distributed system, I implement a prototype framework called STRADS. STRADS improves the parameter update throughput by pipelining iterations and overlapping update computations with network communication for parameter synchronization. With ML scheduling and system optimizations, STRADS improves the ML training time by an order of magnitude. However, these performance gains are at the cost of extra programming burden when writing ML schedules. In STRADS-AP, I will present a high-level programming library and a system infrastructure that automates ML scheduling. The STRADS-AP library consist of three programming constructs: 1) a set of distributed data structures (DDS); 2) a set of functional style operators; and 3) an imperative style loop operator. Once an ML programmer writes an ML program using STRADS-AP library APIs, the STRADS-AP runtime automatically parallelizes the user program over a cluster ensuring data consistency.

**Thesis Committee: **

Garth A. Gibson (Co-chair)

Eric P. Xing (Co-chair)

Phillip B. Gibbons

Joseph E. Gonzalez (University of California, Berkeley)

The identification of anomalies and communities of nodes in real-world graphs has applications in widespread domains, from the automatic categorization of wikipedia articles or websites to bank fraud detection. While recent and ongoing research is supplying tools for the analysis of simple unlabeled data, it is still a challenge to find patterns and anomalies in large labeled datasets such as time evolving networks. What do real communities identified in big networks look like? How can we find sources of infections in bipartite networks? Can we predict who is most likely to join an online discussion on a given topic?

We model interaction networks using appropriate matrix and tensor representations in order to gain insights into these questions. We incorporate edge attributes, such as timestamps in phone-call networks or airline codes in flight networks, and complex node side-information, such as who-retweets-whom in order to understand who uses a given hashtag on Twitter. We provide three major contributions:

1. Hyperbolic communities: Our exploration of real communities provides novel theoretical ideas regarding their structure, and we show how typical long-tail degree distributions can be leveraged to create efficient algorithms on problems that seem to suffer from quadratic explosion.

2. Anomaly detection: Novel distributed algorithms that identify problematic nodes whose influence can only be detected on their neighbors, validated through the analysis of data breaches in bank transactions.

3. Forecasting: New techniques that forecast network evolution by incorporating contextual side-information and the evolution of independent community structures.

Leveraging these techniques, we explore massive datasets such as networks with billions of credit card transactions, Twitter graphs with over 300 million interactions and phone-call networks with over 50 million phone-calls.

**Thesis Committee: **

Christos Faloutsos (Co-Chair)

Pedro Ribeiro (Co-Chair)

William Cohen

Aarti Singh

Tina Eliassi-Rad (Northeastern University)

Beatriz Santos (University of Aveiro)

Alexandre Francisco (University of Lisbon)

Join the students in the Cognitive Robotics class, who will be demoing their final projects. All welcomed.

This class uses Anki's new Cozmo robot: the first truly affordable vision-based mobile manipulator. Anki was founded in 2010 by three CMU Robotics PhD students: Boris Sofman, Mark Palatucci, and Hanns Tappeiner. The company has received $208 million in venture funding; Marc Andreesen serves on their board.

*Faculty: *David Touretzky

An optimization problem seeks to minimize or maximize the value of an objective function subject to a set of equality and inequality constraints. Many applications in science and engineering often require solving optimization problems, with matrices possibly appearing in the objective function, the constraints set or both. Although most of these problems can be solved exactly using polynomial time algorithms, these algorithms are too slow to deal with the large size of modern datasets and the matrices obtained by the latter. In practice, fast approximate algorithms, that are designed by carefully trading the accuracy of a solution with the runtime complexity, are preferred to slow exact algorithms. This talk will introduce linear sketching and highlight how it can be used to obtain fast approximate algorithms for least squares regression, least absolute deviations regression and low rank approximation.

*Presented in Partial Fulfillment of the CSD Speaking Skills Requirement*

In today's datacenters, storage and network resources are shared to lower costs and achieve better utilization, but how does one meet tail latency performance goals in these shared environments? Workloads exhibit different behaviors (e.g., burstiness, load) and can have different latency requirements. Furthermore, storage and networks each exhibit unique performance characteristics that make it hard to provide latency guarantees. For example, in SSD storage devices, writes are slower than reads, while in networks, packets traverse a series of queues and congest with different workloads at each queue.

In this talk, I will describe my thesis work on meeting tail latency goals when sharing storage and networks. Our work looks at the problem from the perspective of scheduling policies, admission control, and workload placement. In particular, we implement new policies and mechanisms for controlling the congestion between multiple workloads of varying burstiness. Our system is unique in that it incorporates mathematically grounded techniques such as Deterministic Network Calculus (DNC) and Stochastic Network Calculus (SNC) to guarantee tail latency. In fact, we are the first to implement SNC in a computer system.

**Thesis Committee: **

Mor Harchol-Balter (Chair)

Gregory R. Ganger

David G. Andersen

Michael A. Kozuch (Intel Labs)

Arif Merchant (Google)

Though a small part of the body, the human hand is complex and remarkably versatile and multipurpose. Much work has gone into understanding the hand, such as understanding the physical capabilities of the human hand, how humans develop manipulation skills throughout the lifespan, how people translate task requirements into grasping strategy, and so on. Despite that, human manipulation is still not well understood. For example, how many grasps or manipulation actions do people use in daily life? How often and under what circumstances do people use both hands simultaneously instead of one? A better understanding of how humans grasp can improve our ability to control robotic hands, which are still far behind human hands in dexterity.

In our work we have used a variety of methods to observe how humans grasp and manipulate in natural, everyday settings. We have used photos taken throughout a normal day; high-framerate video in a specific setting (that of a convenience store); and cameras and motion capture systems in the context of a controlled experiment involving transporting a bowl from one location to another. In these studies we found that a single grasp pose can be used for a variety of actions, were able to observe the grasping process in detail, and found that minimizing body rotation plays a large role in the use of one hand vs. two in transport tasks.

We propose applications of some of the main findings of these studies to the goals of improving the success or naturalness of grasping performed by robotic hands and virtual characters. In particular, we propose using the detailed grasping behavior found in the high-framerate video to create a virtual hand controller capable of levering up objects into the hand. We also propose using the results of the bowl transport experiment to create a character whose transporting behavior looks natural and believable.

This work thus presents the results and insights from investigations of human manipulation and lays out ways in which those insights can be used to improve the capabilities of artificial manipulators.

**Thesis Committee:**

Nancy Pollard (Chair)

Jessica Hodgins

Roberta Klatzky

Carol O'Sullivan (Trinity College, Dublin)

Copy of Thesis Summary

Increasingly, decisions and actions affecting people's lives are determined by automated systems processing personal data. Excitement about these systems has been accompanied by serious concerns about their opacity and the threats that they pose to privacy, fairness, and other values. Recognizing these concerns, it is important to make real-world automated decision-making systems accountable for privacy and fairness by enabling them to detect and explain violations of these values. System maintainers may leverage such accounts to repair systems to avoid future violations with minimal impact on the utility goals.

In this thesis, I aim to develop theories and tools for analyzing information use that enable practical accountability mechanisms and ensure data-driven systems respect meaningful privacy and fairness properties. In particular, I focus on two forms of information use: (i) explicit use, the direct causal influence of information, and (ii) proxy use, the indirect use of information through associations. In prior work, I have developed methods for detecting and explaining explicit information use. In this proposal, I will address issues due to covariate shifts in causal testing of machine-learned systems. Further, I will focus on a new formalization of proxy use, and tools for its detection and repair. Finally, I will explore theories of use privacy and proxy non-discrimination built on top of this formalization of proxy use.

**Thesis Committee:**

Anupam Datta (Chair)

Jaime Carbonell

Matt Fredrikson

Sriram Rajamani (Microsoft Research)

Jeannette Wing (Microsoft Research)

Copy of Thesis Summary

In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik and Bottou, 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.

*This is joint work with Mu Li, Krishna Pillutla, Colin White, Nina Balcan, and Alex Smola. *

*Presented in Partial Fulfillment of the CSD Speaking Skills Requirement. *

*This talk is also part of the AI Lunch Seminar Series.*