ISR

Cyber-Physical Systems (CPS) are software-controlled systems that have complex interactions with the physical world. Many CPS, such as autonomous drones and self-driving cars, are becoming increasingly more embedded in our society, and therefore safety-critical and demanding of rigorous quality assurance. To this end, CPS engineering relies on modeling methods from diverse scientific and engineering fields, for example control theory and real-time scheduling. Diverse modeling methods are difficult to combine with each other due to their complexity and heterogeneity. Inconsistencies between models and analyses that come from different modeling methods often lead to implicit design errors, which subsequently can cause critical CPS failures with loss of lives and substantial material resources.

To detect and prevent inconsistencies between CPS modeling methods, this thesis investigates an improved architectural approach to integration of CPS modeling methods. This approach relies on architectural views (annotated component-and-connector models) to abstract out and check integration-relevant information from detailed models (e.g., hybrid programs). On top of these views I introduce a novel integration perspective based analyses -- algorithms and procedures that interpret and augment models. For each analysis I propose to specify a contract that captures inputs, outputs, assumptions and guarantees of the analysis in terms of view elements. A particularly challenging task is creating a language to express assumptions, guarantees, and consistency statements over heterogeneous models and views. This language needs to strike a balance between expressiveness and decidability to be effective in the CPS context.

The conceptual advances of this thesis enable a new level of automation for CPS modeling method integration. I will implement these advances in a toolset that will support automated model-view synchronization, analysis execution, and verification of semantic consistency between models. This toolset will serve as a means of evaluating the proposed integration approach in case studies of realistic CPS systems, such as autonomous spacecraft and collaborative robots. I will validate claims about correctness, effectiveness, and generality of my approach.

Thesis Committee:
David Garlan (Chair, ISR)
André Platzer (CSD CMU)
Bruce Krogh (ECE CMU)
Dionisio de Niz (Software Engineering Institute, CMU)
John Day (Jet Propulsion Laboratory/NASA)

Copy of Proposal Document

Solving the expression problem requires a language or system to allow one to add new variants to a datatype as well as new operations over it while ensuring strong static type safety without re-compiling the existing implementation.  The independently-extensible version goes a step further to require that these extensions can be added in any order.  In extensible languages this amounts to adding new (abstract) syntax to a language and new semantic analyses or translations to it.  Attribute grammars solve these versions of the expression problem, but with forwarding and a modular well-definedness analysis they also solve a version that requires that no 'glue' code need be written to combine the various extensions.  This means that the composition can always be done safely and automatically.  Using these techniques we have built ableC, an extensible language framework for C in which non-export programmers can select the language extensions that best fit their task at hand with the confidence that the supporting tools can generate a working compiler for their custom language.

Eric Van Wyk's research focuses on programming languages, in particular extensible programming languages and compilers, applications of temporal logic, and algebraic compilers. In 2005 he was awarded a McKnight Land-Grant Professorship and the National Science Foundation's CAREER award in 2004.

He has authored or co-authored more than 25 publications, including journal and conference papers, articles and technical reports. Van Wyk has developed various software packages including the Silver attribute grammar specification and evaluation system, extensible specifications of Java 1.4 and ANSI C written in Silver, and various domain-specific language extensions for these Java and C specifications. He is a member of ACM, ACM SIGPLAN, IEEE, the IEEE Computer Society, and is involved in numerous conference committees. Van Wyk also does outreach, serving as a member of the St. Louis Park High School School Business and Information Technology Advisory Board.

Faculty Host: Jonathan Aldrich

Structured probabilistic inference has shown to be useful in modeling complex latent structures of data. One successful way in which this technique has been applied is in the discovery of latent topical structures of text data, which is usually referred to as topic modeling. With the recent popularity of mobile devices and social networking, we can now easily acquire text data attached to meta information, such as geo-spatial coordinates and time stamps. This metadata can provide rich and accurate information that is helpful in answering many research questions related to spatial and temporal reasoning. However, such data must be treated differently from text data. For example, spatial data is usually organized in terms of a two dimensional region while temporal information can exhibit periodicities. While some work existing in the topic modeling community that utilizes some of the meta information, these models largely focused on incorporating metadata into text analysis, rather than providing models that make full use of the joint distribution of meta-information and text. 

In this thesis, I propose the event detection problem, which is a multi-dimensional latent clustering problem on spatial, temporal and textual data. The event detection problem can be treated as a generalization of the topic modeling problem where events can be considered as topics that are augmented by location and time. Preliminary models can effectively learn the representations of major events covered in a corpus of Twitter data and can also be used for various prediction tasks such as predicting the spatial coordinates, time stamps of the documents as well as estimating life cycles of new born events. 

The approaches proposed in this thesis are largely based on Bayesian non-parametric methods to deal with steaming data and unpredictable number of data clusters. The research proposed will not only serve the event detection problem itself but also shed light into a more general structured clustering problem in spatial, temporal and textual data.

Thesis Committee:
Kathleen M. Carley (Chair)
Huan Liu (School of Computing, Informatics, & Decision Systems Engineering, Arizona State University)
Tom Mitchell (Machine Learning/CMU)
Alexander J. Smola (Machine Learning/CMU)

Copy of Proposal Document

Software development environments are increasingly integrating social media features that allow transparency of software projects and developers by making visible all work-related activity that software developers can use to enhance their work. Developers in these environments often choose to interact with other projects or developers, whether through forming dependencies, learning from an expert, or evaluating code contributions. Transparency can inform these decisions by allowing interested developers to see an entire network of information, such as all the developers that work on a particular project and their prior development history. With this promise of staying aware and interacting with an ecosystem of potentially millions of projects and developers comes the peril of consuming overwhelming amounts of mostly noisy information generated by the broadcast development activity of these projects and developers. However, by identifying what information developers need to be aware of in transparent environments, there is the opportunity to inform new practices and tools that assist developers in their tasks by only displaying information that is most relevant for the current task.

My dissertation work is to identify signals that developers make use of during tasks in transparent development environments and to use this knowledge to create tools that assist developers in performing these tasks. When developers are evaluating contributions, they make use of technical signals from the contribution and social signals from the submitter such as social connections and prior interaction on the project. Developers solving problems through discussion on contributions use signals such as the political influence of the community and how new the submitter is to inform decisions such as the scope of implemented solutions and the necessary amount of etiquette in a response. When deciding to use or participate in a project, developers make inferences about working dynamics, personal utility, and the project’s community. I propose the development of a tool that uses signals to assist developers in finding and evaluating projects in transparent development environments. Depending on the user’s current task, the tool should visualize different relevant project-related signals to assist the developer in evaluating projects. For example, a user looking to find a project to form a dependency with will see signals for a project’s maturity while a user looking to find a project to learn from might see signals for popularity. The development of this tool will occur in 2 phases: 1) enhanced project summary and 2) personalized project search. The enhanced project summary phase augments potential projects with additional signals that assist the user in evaluating and selecting projects. The personalized project search phase uses information about the user and potential projects to select only relevant projects for the particular user to evaluate.

Thesis Committee:
James Herbsleb (Co-chair)
Laura Dabbish (Co-chair)
Claire Le Goues
André van der Hoek (University of California, Irvine)

Copy of Proposal Document

Online privacy notices are supposed to act as the primary mechanism to inform users about the data practices of online services. In practice, users ignore notices as they are too long and complex to read. Instead, users rely on expectations to determine which sites they feel comfortable interacting with. Mismatches between actual practices and users’ expectations may result in users exposing themselves to unanticipated privacy risks. One approach for mitigating these risks is to highlight elements of privacy notices that users do not likely expect.

I propose to simplify privacy notices by understanding mismatches between users’ privacy expectations regarding data practices of online services and actual data practices of online services. I present an approach for identifying such mismatches. I distinguish between two types of privacy expectations: subjective expectation (“desire” or “should”) and objective expectation (“likelihood” or “will”). I identify different types of mismatches that result from each expectation type and investigate their impact on users’ privacy. I study how expectations and mismatches vary by contextual factors and user characteristics. Based on the understanding gained from studying expectations and mismatches, I design and test simplified privacy notices. I show that such simplified notices can be shorter and easier to comprehend for users.

Thesis Committee:
Norman Sadeh (Chair)
Alessandro Acquisti (Heinz College)
James Herbsleb
Joel Reidenberg (School of Law, Fordham University)
Florian Schaub

Software bugs and ineffective testing cost the US economy tens of billions of dollars each year.  Performance bugs are programming mistakes that slow down program execution.  Performance bugs affect the user-perceived software quality, degrade application responsiveness, and lower system throughput. In addition to impacting everyday software usage, performance bugs have also created high profile incidents, e.g., brought down the Wikipedia and Facebook servers. In this talk I will present my recent work on understanding, detecting, and fixing performance bugs.  I will first discuss Caramel, a static analysis technique that detects and fixes performance bugs that have non-intrusive fixes.  I will then discuss Toddler, a dynamic analysis technique that detects a different class of performance bugs than Caramel. The idea in Caramel and Toddler is to identify code and execution patterns that are indicative of common programming mistakes affecting performance.

I will also briefly present several other of my projects on performance, concurrency, and mobile bugs.  Caramel and Toddler found over 190 new performance bugs in widely used Java (Ant, Lucene, Google Core Libraries, Groovy, Tomcat, etc.) and C/C++ applications (GCC, Google Chrome, Mozilla, MySQL).  140 of these bugs have already been fixed by developers based on our reports.

Adrian Nistor is an Assistant Professor of Computer Science in the Schmid College of Science and Technology at Chapman University. He joined Chapman in August 2014. Adrian received his PhD from the Department of Computer Science at the University of Illinois, Urbana-Champaign in May 2014. Adrian's research interests are in software engineering, with a focus on detecting, repairing, and preventing bugs in real-world applications.

His current research projects investigate performance bugs and concurrency bugs. His techniques found more than 150 previously unknown bugs in widely used software, e.g., Google Chrome, Mozilla, Google Core Libraries, GCC, MySQL, Ant, Lucene, Groovy, Tomcat, JUnit, JMeter, Log4J, Struts, etc. More than 100 of these bugs are already fixed by developers. Adrian's research includes empirical and analytical work, static and dynamic techniques, and bugs from various application types---client, server, mobile, and scientific applications. His Caramel paper won an ACM SIGSOFT Distinguished Paper award at ICSE 2015. He received an NSF grant to investigate performance bugs that have non-intrusive fixes.

Faculty Host: Claire Le Goues

Cloud computing is an important industry trend that is having significant impact on businesses.  Many computer science research communities have responded to this trend with vibrant research activities.  However, the programming languages community seems to be mostly ignoring this trend, possibly because there is a perception that there are no interesting problems to solve.

In this talk, I'll explain the motivations for the cloud computing trend, focusing on the impact on software development.  I'll explain how this impact creates many research opportunities in the traditional areas of programming languages and software engineering, giving  examples of research projects from IBM Research.

Michael Hind is a Distinguished Research Staff Member and Senior Manager of the Programming Technologies Department at the T.J. Watson Research Center in Yorktown Heights, New York.

After receiving his Ph.D. from NYU in 1991, Michael spent 7 years as an assistant/associate professor of computer science at SUNY - New Paltz and a post-doc/academic visitor at IBM Research.  For the past 17 year, Michael has been a Research Staff Member, Manager, and Senior Manager in the Programming Technologies Department at IBM Research, where he has focused on programming languages, program analysis, tools, and language optimization, with a particular focus on open source infrastructure.  His department of about 40 researchers is currently focusing on applying programming languages and software engineering expertise to cloud computing.  Michael's team has successfully transferred technology to various parts of IBM.

Michael is an ACM Distinguished Scientist, an Associate Editor of ACM TACO, has served on over 30 program committees, given talks at top universities and conferences, and co-authored over 40 publications. His 2000 paper on Adaptive Optimization was recognized as the OOPSLA'00 Most Influential Paper and his work on Jikes RVM was recognized with the SIGPLAN Software Award in 2012.

Since 2013, a stream of disclosures have prompted reconsideration of surveillance law and policy. One of the most controversial principles, both in the United States and abroad, is that communications metadata receives substantially less protection than communications content. Several nations currently collect telephone metadata in bulk, including on their own citizens. In this paper, we attempt to shed light on the privacy properties of telephone metadata. Using a novel crowdsourcing methodology, we demonstrate that telephone metadata is densely interconnected, can trivially be re-identified, and can be used to draw sensitive inferences.

Jonathan Mayer is a Ph.D. candidate in computer science and a lawyer at Stanford University, where he received his J.D. in 2013. He was named one of the Forbes 30 Under 30 in 2014, for his work on technology security and privacy. Jonathan’s research and commentary frequently appear in national publications, and he has contributed to federal and state law enforcement actions. Jonathan is a Cybersecurity Fellow at the Center for International Security and Cooperation, a Junior Affiliate Scholar at the Center for Internet and Society, and a Stanford Interdisciplinary Graduate Fellow. He earned his A.B. at Princeton University in 2009, concentrating in the Woodrow Wilson School of Public and International Affairs.

This talk is based on Jonathan Mayer’s forthcoming PNAS paper, “The Privacy Properties of Telephone Metadata.” Contact  ttodd@cs.cmu.edu if you would like a pre-print.

I am a policy technologist: a person trained technically that applies those skills to affect law and policy. In this talk, I will explore the work of a policy technologist and cover a number of emerging themes that may be useful to you in your own work, or in thinking about possible career paths. First, I'll make the case that in a world of Moore's and Metcalfe's laws, sound technical input is increasingly necessary for making good policy; I'll cover the ongoing Crypto Wars (CALEA II), privacy in mobile devices against law enforcement (Riley v. California), and efforts to technically ground the often confused net neutrality debate (BITAG). Second, it is also increasingly important that human rights and public interest values be embedded in technology and infrastructure; I'll cover ongoing work at the IETF and W3C that aims to do this.

Finally, there are a number of efforts by technologists working across diverse communities or within relatively entrenched cultures to affect important change; here, I'll cover the HTTPS-Only work in US Federal Government -- almost all .gov domains will be strict HTTPS by the end of 2016 -- and US civil society's response to the potentially chilling Wassenaarexport control rules, which would have grave consequences for common privacy and security tools as well as discussing potential security vulnerabilities

Joseph Lorenzo Hall is the Chief Technologist and Director of the Internet Architecture project at the Center for Democracy & Technology, a Washington, DC-based non-profit advocacy organization dedicated to ensuring the internet remains open, innovative and free.

Hall's work focuses on the intersection of technology, law, and policy, working to ensure that technical considerations are appropriately embedded into legal and policy instruments. Supporting work across all of CDT's programmatic areas, Hall provides substantive technical expertise to CDT's programs, and interfaces externally with CDT supporters, stakeholders, academics, and technologists. Hall leads CDT's Internet Architecture project, which focuses on embedding human rights values into core internet standards and infrastructure, engaging technologists in policy work, and producing accessible technical material for policymakers.

Pages

Subscribe to ISR