Software bugs are expensive, costing companies billions of dollars in repairs, lawsuits and lost sales. A study in 2013 by University of Cambridge (UK) has estimated this cost to be 312 Billion dollars annually to the economy. Furthermore, the bugs are produced at a rate faster than developers can fix them. According to Cambridge study more than 50% of developers time is spent while debugging. Automatic program repair aims to remove bugs without or minimal human intervention. In this talk I will describe my work, done as part of my PhD dissertation, that utilizes state repair as the basis of automatic program repair. In a traditional debugging environment a developer manually traces the program for failing test input, at every control point in the program he explicitly or implicitly has an idea of program invariants that must be enforced, and he makes changes such that final desired state is reached without causing any other error. I will describe mechanical steps that an automatic program repair approach can use to mimic this behavior. Using our implementation we were able to repair errors in programs manipulating textbook data structures as well as open source programs like ANTLR and RayTrace.

Muhammad Zubair Malik earned his PhD in Software Engineering from the University of Texas at Austin in 2014 where he was part of the Center for Identity and the Center for Advanced Research in Software Engineering (Software verification, validation and testing group). His work focuses on applying machine learning and heuristics to program transformation and repair.

People are living increasingly large swaths of their lives through their online accounts. These accounts are brimming with sensitive data, and they are often protected only by a text password. Attackers can break into service providers and steal the hashed password files that store users' passwords. This lets attackers make a large number of guesses to crack users' passwords. The stronger a password is, the more difficult it is for an attacker to guess.

Many service providers have implemented password-composition policies. These policies constrain or restrict passwords in order to prevent users from creating easily guessed passwords. Too lenient a policy may permit easily cracked passwords, and too strict a policy may encumber users. The ideal password-composition policy balances security and usability. Prior to the work in this thesis, many password-composition policies were based on heuristics and speculation, rather than scientific analysis. Passwords research often examined passwords constructed under a single uniform policy, or constructed under unknown policies.

In this thesis, we contrast the strength and usability of passwords created under different policies. We do this through online, crowdsourced human-subjects studies with randomized, controlled password-composition policies. This result is a scientific comparison of how different password-composition policies affect both password strength and usability. We studied a range of policies, including those similar to policies found in the wild, policies that trade usability for security by requiring longer passwords, and policies in which passwords are system-assigned with known security.

One contribution of this thesis is a tested methodology for collecting passwords under different policies. Another contribution is the comparison between password policies. We find that some password-composition policies make more favorable tradeoffs between security and usability, allowing us to make evidence-based recommendations for service providers. We also offer insights for researchers interested in conducting larger-scale online studies, having collected data from tens of thousands of participants.

Thesis Committee:
Lorris Faith Cranor (Chair)
Lujo Bauer
Nicolas Christin
Brian LaMacchia (Microsoft Research)

Copy of Thesis Document

System design involves modeling structural and behavioral properties. However existing languages rarely integrate these two aspects very well. There are famous examples of languages that are very good in modeling structures (say Ecore), and some that are very good in modeling behaviour (e.g. Promela), and even some that can do both, but it is quite unclear how these two aspects of their semantics relate (the famous casus of UML).

I will present the design of “Clafer with Behaviour”, a language that combines rich modeling of structures with specification of their dynamics. Clafer with behaviour allows incremental specification of systems mixing partial elaboration of structure, with partial elaboration of behavior, using information as it becomes available.  I will show how Clafer in a single unified syntax and semantics allows capturing feature models, component models, discrete control models (automata, traces) and variability encompassing all these aspects.  The presented example model will have similar flavour to models in established architecture definition languages such AADL or EAST-ADL, even though it will be build using only a handful of basic concepts.

The semantic base of Clafer is a language of traces over structures.  The trace language is specified using first order logic with quantifiers over basic entities (for modeling structures) combined with linear temporal logic (for modeling dynamics).  Both are mixed together using the classic scheme proposed by Abadi.  On top of this basic semantic structure we build a simple but expressive syntax, enriched with carefully selected syntactic expansions that cover hierarchical modeling, associations, automata, scenarios, and Dwyer’s property patterns.

Andrzej Wasowski is Associate Professor at the IT University of Copenhagen. Earlier, he worked also at Aalborg University in Denmark, and as visiting professor at INRIA Rennes and University of Waterloo, Ontario. His interests are in semantic foundations and tool support for model-driven development, especially for software product lines and component-based systems. Many of these projects involve commercial or open-source partners, primarily in the domain of safety-critical embedded systems. Wasowski holds a PhD degree from the IT University of Copenhagen, Denmark (2005) and a MSC Eng degree from the Warsaw University of Technology, Poland (2000). He is a recipient of the Sapere Aude research leadership award from The Danish Council for Independent Research (2012).

Faculty Host: Christian Kästner

What is the greatest threat to our privacy today? Not the NSA, but trusted American companies…

Facebook, Twitter, Google, Amazon purchases, frequent-flyer numbers and loyalty cards.  Every day we share personal information about ourselves, usually to buy something, gain access or perks, or share a bit of our daily lives with others.  Any one piece of information that we share isn’t that important, we think. Why worry?

But each bit of personal data we give out can be combined easily and with alarming speed into a personal profile that others—companies, marketing services, or more nefarious groups—can use to their own advantage. In WHAT STAYS IN VEGAS, investigative business reporter Adam Tanner penetrates the world of big data to lay bare these tactics.

Tanner goes inside one of the savviest companies using data nowadays for marketing purposes, Caesars Entertainment, whose pioneering loyalty program allows them to know more about casino-goers than their competition. Caesars knows exactly which games its customers like to play, what foods they enjoy, when they prefer to visit, who their favorite hosts and hostesses might be, and how to keep them coming back for more. The data-gathering methods at Caesars have allowed them to grow their business dramatically, and also inspired companies from across industries to ramp up their own data mining in the hopes of boosting their profitability.

But this abundance of personal data, our willingness to share it, and the trails we leave behind can also create some terrifying situations. Tanner includes cautionary tales of the trouble that individuals can get into once their data and photos land in the hands of companies that highlight the worst episodes in our lives such as mug shot website Busted! or background check sites.

Adam Tanner writes about the business of personal data. He is a fellow at the Institute for Quantitative Social Science at Harvard University and was previously a Nieman fellow there. Adam Tanner has worked for Reuters News Agency as Balkans bureau chief based in Belgrade, Serbia, as well as San Francisco bureau chief, and has had previous postings in Berlin, Moscow, and Washington, DC. He also contributes to Forbes and other magazines. 

Adam’s book will be available for purchase during the seminar.


The students and their projects:

Alessandro Iorio, Tepper School of Business
—Public Policies and Network Structure: The case of Italian Board Interlocks

Anton Pleshakov, Tepper School of Business
—Propagation of Emotion in 4CHAN /B/ Threads

Ashwini Rao, Institute for Software Research
—Network Analysis of Tracking on the Internet

Austin Ankney, College of Engineering, Dietrich College of Humanities and Social Sciences
Social Juicing: Buying Followers on Twitter to Enhance Reputation

Benjamin Chung, Institute for Software Research
Diffusion of Programming Languages on Github

Bob Fang, Heinz College
Analyzing Salary Distributions on Job Communities

Carl Malings, Department of Civil and Environmental Engineering
Comparing Approaches for Network Reliability Analysis in Systems with Non-Independent Components

Chalalai Chaihirunkarn, Institute for Software Research
Understanding Open-source Scientific Software Ecosystems:  What Do We Learn from Code Contribution on GitHub?

Chris Tomaszewski, Robotics Institute
Graph Partitioning for Distributed Task Allocation

Evan Lee, Carnegie Institute of Technology
Applications of Network Science in Game Character: Design and Balance in Competitive Team Based Games

Gabriel Ferreira, Institute for Software Research
Improving Software Comprehension of Highly Configurable Systems with Network Analysis

Hemank Lamba, Institute for Software Research
Analyzing Change in Beer-Review Networks

Holly Li, Lane Center of Computational Biology
Interaction Network of RNA-Binding Proteins and Small  Non-coding RNA

Ian Quah, Department of Psychology
Modeling the Diffusion of Information through a Social Group

Jingxia Pang, Heinz College
—Chinese Weibo (Twitter) Analysis and PersonRank Algorithm Design

Kevin Eng, Department of Mathematical Science
Air Traffic Freight Hubs

Mark Helenurm, Department of Mathematical Science
The Wikipedia Internal Link Graph

Nick Ettlinger, Statistics Department
The Social Networks of Lawyers

Olutayo Fabusuyi , Department of Engineering and Public Policy
Analyzing Regional Economies

Sejal Popat, College of Fine Arts
Comparing Artist Residency Alumni Networks

While getting rid of passwords is a laudable goal, they represent the proof of knowledge part of the user authentication problem. Arguably, if you are a web site, you want to know if the sentient, responsible, person is there before you let him in and give him the power to buy, move money, or possibly just search for or provide information. In the last five years I have been researching a platform for deeply exploring the "proof of knowledge" side of the equation. It represents a deep integration of current efforts through NIST NSTIC, Privacy, and Cognitive Testing (going back to Cattell in the 1800s). In this talk I will describe the architecture and show the live system for the purpose of discussion with people interested in human factors, computer security, web services, and the like.

Dr. Robert Thibadeau, PhD, is a Pittsburgh resident with a long history as a faculty member in Robotics starting in 1980, and has taught computer security part-time since 1996 in SCS. In 2002 he joined Seagate Research, and is well recognized with the creation of self-encrypting drive (SED) technology now deployed by all major storage device vendors including Seagate, Micron, Sandisk, HGST, and Samsung, under the industry standards he created while Chief Technologist at Seagate. He is currently SVP and Chief Scientist at Wave (WAVX on NASDAQ) which is the leading supplier of software for SEDs. The current talk, though, has to do with a private project and venture he has had underway since 2008 on a radical new way to think about user authentication for the web.

Introductions: Virgil Gligor, Co-Director, CyLab.

Code review is an important component in software engineering, practiced both in open source and industrial contexts.  Review today differs from the code inspections performed and studied in the 70s and 80s and is now less formal and more "lightweight". Over the past two years, we have been investigating many aspects in code review both at Microsoft and in Open Source.  In this talk I will discuss our exploration the motivations, challenges, and outcomes of tool-based code reviews including our findings that code reviews are less about finding defects than expected and instead provide additional benefits to software teams, such as knowledge transfer, increased team awareness, or creation of alternative solutions to problems.  I will also present results from our analysis of a broad spectrum of projects including Office, Bing, Chrome, and Android, that uncovered a phenomena that we term "convergent practices of peer review".  Finally, I will show the beginning of our efforts to address one of the largest challenges code review, that of change understanding.

Christian Bird is a researcher in the empirical software engineering group at Microsoft Research.  He is primarily interested in the relationship between software design, social dynamics, and development processes in large software projects and in developing tools and techniques to help software teams.  He has studied software development at Microsoft, IBM, and in the Open Source realm, examining the effects of distributed development, ownership policies, and the ways in which teams complete software tasks. He has published in the top Software Engineering venues including three ACM SIGSOFT Distinguished papers and Communications of the ACM.  Christian received B.S. from BYU and his Ph.D. from U.C. Davis under Prem Devanbu.

Faculty Host:  Clair LeGoues


It’s hard to believe that it has been almost twenty-five years since the Masters of Software Engineering program was conceived at CMU. But it’s true; and next spring we’ll be celebrating this remarkable milestone here in Pittsburgh with an MSE-wide community reunion. The two day event will kick-off the evening of Friday, March 20 with an all-classes cocktail reception at the fun and fascinating Senator John Heinz Historical Center in Pittsburgh’s famous Strip District. Then, during the day on Saturday (3/21), we’ll return to campus for talks, panels, and social networking events designed to inform and engage our entire community. To view and download a save-the-date reminder card, click here.

Information regarding the schedule, registration, and lodging will be coming shortly. In the meantime, help get the word out! Consider volunteering to be a class, company, or general reunion representative and help us to make this gathering the biggest to date!

To volunteer, contact Josh Quicksall, Student-Alumni Relations Coordinator.


Software development is inherently incremental; however, it is challenging to correctly introduce changes on top of existing code. Recent studies show that 15%-24% of the bug fixes are incorrect, and the most important yet hard-to-acquire information for programming changes is whether this change breaks any code elsewhere.

In this talk, I will present a framework, called Hydrogen, for patch verification. Hydrogen aims to automatically determine whether a patch correctly fixes a bug, a new bug is introduced in the change, a bug can impact multiple software releases, and the patch is applicable for all the impacted releases.

Hydrogen consists of a novel program representation,  namely multiversion interprocedural control flow graph (MVICFG), that integrates and compares control flow of multiple versions of programs, and a demand-driven, path-sensitive symbolic analysis that traverses the MVICFG for detecting bugs related to software changes and versions. Our experimental results show that Hydrogen correctly builds desired MVICFGs and is scalable to real-life programs such as libpng, tightvnc and putty. We experimentally demonstrate that MVICFGs can enable efficient patch verification. Using the results generated by Hydrogen, we have found a few documentation errors related to patches for a set of open-source programs.


Dr. Wei Le is an assistant professor in B. Thomas Golisano College of Computing and Information Sciences at the Rochester Institute of Technology. She received her Ph.D. in Computer Science from the University of Virginia in 2010. Her research focuses on program analysis and testing for improving software reliability, security and productivity.

Dr. Le has published papers in ICSE, FSE, TOSEM, TSE and ISSTA. She is a winner of NSF Career Award (2014), Google Faculty Research Award (2011), FSE Best Presentation Award (2008) and Google Anita Borg Memorial Scholarship (2007).

Faculty Host: Christian Kaestner

Software developers rely on media to communicate, learn, collaborate, and coordinate with others. Recently, social media has dramatically changed the landscape of software engineering, challenging some old assumptions about how developers learn and work with one another. We see the rise of the social programmer who actively participates in online communities and openly contributes to the creation of a large body of crowdsourced socio-technical content.

In this talk, I will present the past, present, and future roles of socially enabled tools in software engineering, reviewing research that examines the use of different media channels in software engineering from 1968 to the present day. I will also provide preliminary results from a large survey with developers that actively use social media to understand how they communicate and collaborate, and to gain insights into the challenges they face. We found that while this particular population values social media, traditional channels, such as face-to-face communication, are still considered crucial. I will further synthesize findings from our historical review and survey to propose a roadmap for future research on this topic.

This talk will present joint work with Leif Singer, Brendan Cleary, Alexey Zagalsky and Fernando Figueira Filho.


Margaret-Anne (Peggy) Storey is a professor of computer science at the University of Victoria and a Canada Research Chair in Human Computer Interaction for Software Engineering. She is a Visiting Scientist at the IBM Centre for Advanced Studies in Toronto and one of the principal investigators for the National Center for Biomedical Ontology, US. Her research goal is to understand how technology can help people explore, understand and share complex information and knowledge. She applies and evaluates techniques from knowledge engineering, social software and visual interface design to applications such as collaborative software development, program comprehension, biomedical ontology development, and learning in web-based environments.

Faculty Host:  Jim Herbsleb



Subscribe to ISR