Adults read text at an average speed of 3 words per seconds. At this rate, they perceive letters, identify word forms, construct the meaning of sentences and interpret the undergoing narrative. Story comprehension is therefore a rich phenomenon, requiring multiple simultaneous processes. On the other hand, fMRI, a commonly used brain-imaging tool to study reading, has a time resolution of a couple seconds along with slowly varying time dynamics. Classically fMRI has been used by localizing which areas correspond to a certain element of language processing (e.g. syntax) by presenting stimuli in blocks that vary in a few critical conditions (e.g. sentences with simple syntax vs. complex syntax). We are interested in a more extensive exploration of reading that accounts for all the levels of processing that are involved when subjects read a complex non-experimental text under close-to-normal conditions. Our aim is to reveal distinct patterns of neural representation for these levels of cognitive processing. These patterns can be used to identify the cognitive roles of different brain areas in language processing and story understanding, to investigate the mechanisms of reading and other high-level tasks, and to relate individual differences in reading to neural differences.

This proposal is based on existing results from a reading experiment for which we developed a predictive model of brain activity that was able to tell with 85% accuracy which of two passages of a story a subject was reading from a segment of unlabeled fMRI data. This model learned different patterns of representation for the different levels of story processing and those patterns aligned with many of the key findings in the literature of language processing. The model replicated all these results simultaneously and using only one experiment. This dissertation will elaborate this method of studying the brain representations involved in reading along three axes: (1) it will further develop fMRI and MEG experimental paradigms and associated approaches to model the content of the stories in a detailed intermediate feature space, (2) it will develop machine learning tools to study the mapping between this story feature space and the brain activity and (3) it will develop statistical tests and classification approaches to characterize the information content of brain activity. The result will be a comprehensive, state-of-the-art computational method to study language processing and story understanding in the brain.

Thesis Committee:
Tom Mitchell (Chair)
Eduard Hovy
Cosma Shalizi
Jack Gallant (University of California, Berkeley)
Brian Murphy(Queen's University, Belfast)

The monotone linear complementarity problem (LCP) unifies the primal and dual aspects of linear and convex quadratic programs. These optimization problems occur often in machine learning and optimal control. For example, training support vector machines (SVMs) and finding optimal policies in Markov decision processes (MDPs) can be cast as monotone LCPs. LCP instances are commonly too large to solve exactly in applications like MDP planning. This thesis will focus on fast but approximate LCP algorithms that carry approximation bounds–bounds on the distance between an approximate solution and the set of actual solutions. We have already developed several fixed point algorithms that carry bounds. These algorithms are based on Galerkin approximations. Many LCP instances are estimated from data, so we also want to bound how far a solution of an estimated LCP can be from the true set of solutions. The reasoning and theoretical tools associated with bounding approximation error can also be used to bound the error introduced by estimation and other perturbations. These bounds can be combine with probabilistic perturbation models to get tail bounds on how extreme error can be; we provide examples of the kind of perturbation bounds that we want. We also intend to run a number of experiments based on applications drawn from real world problems including large SVMs and a large MDP based on the one used in the Airborne Collision Avoidance System (ACAS). These experiments will allow us to compare our general-purpose LCP algorithms to problem-specific algorithms and characterize their relative performances. The data will also allow us to determine how tight our approximation and perturbation bounds are in practice.

Thesis Committee:
Geoff Gordon (Co-chair)
André Platzer (Co-chair)
Drew Bagnell
Ron Parr (Duke University)

Thesis Summary

The challenge this thesis addresses is to speed up the development of concurrent programs by increasing the efficiency with which concurrent programs can be tested and consequently evolved. The goal of this thesis is to generate methods and tools that help software engineers increase confidence in the correct operation of their programs. To achieve this goal, this thesis advocates testing of concurrent software using a systematic approach capable of enumerating possible executions of a concurrent program. The practicality of the systematic testing approach is demonstrated by presenting a novel software infrastructure that repeatedly executes a program test, controlling the order in which concurrent events happen so that different behaviors can be explored across different test executions. By doing so, systematic testing circumvents the limitations of stochastic testing, which relies on chance to discover concurrency errors. However, the idea of systematic testing alone does not quite solve the problem of concurrent software testing. The combinatorial nature of the number of ways in which concurrent events of a program can execute causes an explosion of the number of possible interleavings of these events, a problem referred to as state space explosion. To address the state space explosion problem, this thesis studies techniques for quantifying the extent of state space explosion and explores several directions for mitigating state space explosion: parallel state space exploration, restricted runtime scheduling, and abstraction reduction. In the course of its research exploration, this thesis pushes the practical limits of systematic testing by orders of magnitude, scaling systematic testing to real-world programs of unprecedented complexity.

Thesis Committee:
Randal E. Bryant (Co-Chair)
Garth A. Gibson (Co-Chair)
David G. Andersen
André Platzer
Junfeng Yang (Columbia University)



In this thesis I present a higher order logic programming language, Caledon, with a pure type system and a Turing complete type inference and im-plicit argument system based on the same logic programming semantics. Because the language has dependent types and type inference, terms can be generated by providing type constraints. I design the dynamic semantics of this language to be the same used to perform type inference, such that there is no disparity between compilation and running. The lack of distinction between compilation and execution permits certain metaprogramming techniques which are normally either unavailable or only possible with second thought extensions. The addition of control structures such as implicit arguments, shared holes, polymorphism, and nondeterminism control makes programming computation during type inference more natural. As a consequence of these extensions, unification problems must be generated to solve for terms in addition to the usual problems generated to solve for types. Furthermore, because every result of execution is a term in the consistent calculus of constructions, Caledon can be considered an interactive theorem prover with a less orthogonal combination of proof search and proof checking than has previously been designed or implemented.

Thesis Committee: 
Frank Pfenning (Chair)
Karl Crary

Thesis Summary


This thesis proposal aims to study problems that arise in economic settings from two aspects – game theoretic and computational. Economic settings form a natural setting where self interested agents interact with each other, often with a competitive spirit. In this thesis proposal, we ask how to design mechanisms that achieve the desired objectives such as social welfare or privacy, in the presence of the game theoretic interactions of the agents whose utilities are often not aligned with the objectives of the mechanism designer. We consider the design of mechanisms both in the presence and absence of money – a tool that is very useful, if present, to a mechanism designer to help align the utility of the agent with the overall objective.

In BGMS'11, we considered designing a pricing scheme to allocate limited resources to an online stream of buyers to maximize social welfare. This pricing scheme however fails to maintain the privacy of the purchase decisions of the buyers. In this proposal, we explore techniques to incorporate privacy in the pricing mechanism. The pricing mechanism naturally involves money. As a second question, we propose how in the absence of money, one can allocate limited resources to buyers to achieve proportional fairness in a strategy-proof manner.

The other perspective is computational. In DDSSS'13, we explored how well submodular functions, a class of functions used to model buyer valuations, can be approximated by some of its better understood sub-classes. Online allocation is an optimization problem that occurs often in economic settings. We propose to study online allocation of impressions to advertisers in the presence of multiple constraints (e.g. budget, number of impressions etc.) on the part of the advertisers. Prior work in this setting has considered the case where only one constraint is present.

Thesis Committee:
Avrim Blum (Co-chair)
Anupam Gupta (Co-chair)
Ariel Procaccia
Jan Vondrak (IBM Almaden)

Thesis Summary

Given the massive size of many networks, conventional algorithms which scale as polynomials in the network size are woefully inadequate. In the first part of this talk, we consider how to use locality to deliver much more efficient algorithms (quasi-linear or even sub-linear in the network size) for quantities and questions like pagerank, coverage, diffusion, and determining the most influential nodes. In this second part of this talk, we consider another aspect of locality, namely the question of local data access. Many large networks are encoded locally, e.g., with adjacency lists. How does the nature of the data access impact the efficiency of algorithms? Surprisingly, we show that small differences in data access can lead to very large differences in efficiency and approximability of network algorithms. As an example, we consider a covering problem which arises naturally for recruiters on social networks, and show how small differences between the information on neighbors in LinkedIn and Facebook lead to large differences in their utility to recruiters.


Jennifer Tour Chayes is Distinguished Scientist and Managing Director of Microsoft Research New England in Cambridge, Massachusetts, which she co-founded in 2008, and Microsoft Research New York City, which she co-founded in 2012. Chayes was Research Area Manager for Mathematics, Theoretical Computer Science and Cryptography at Microsoft Research Redmond until 2008. Chayes joined Microsoft Research in 1997, when she co-founded the Theory Group. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, and algorithmic game theory. She is the co-author of over 110 scientific papers and the co-inventor of more than 25 patents.

She was for many years Professor of Mathematics at UCLA. She serves on numerous institute boards, advisory committees and editorial boards, including the Turing Award Selection Committee of the Association for Computing Machinery, the Board of Trustees of the Mathematical Sciences Research Institute and the Institute for Computational and Experimental Research in Mathematics, the Advisory Boards of the Center for Discrete Mathematics and Computer Science, the Howard Hughes Medical Institute Janelia Farm Research Campus, and Women Entrepreneurs in Science and Technology. Chayes is a past Chair of the Mathematics Section of the American Association for the Advancement of Science, and a past Vice-President of the American Mathematical Society.

Chayes received her B.A. in biology and physics at Wesleyan University, where she graduated first in her class, and her Ph.D. in mathematical physics at Princeton. She did her postdoctoral work in the mathematics and physics departments at Harvard and Cornell. She is the recipient of a National Science Foundation Postdoctoral Fellowship, a Sloan Fellowship, and the UCLA Distinguished Teaching Award. Chayes has recently been the recipient of many leadership awards including the Leadership Award of Women Entrepreneurs in Science and Technology, the Leading Women Award of the Girl Scouts of Eastern Massachusetts, the Women to Watch Award of the Boston Business Journal, and the Women of Leadership Vision Award of the Anita Borg Institute. She has twice been a member of the Institute for Advanced Study in Princeton. Chayes is a Fellow of the American Association for the Advancement of Science the Fields Institute, the Association for Computing Machinery, the American Mathematical Society, and a National Associate of the National Academies.


Faculty Host: Ariel Procaccia


Subscribe to CSD