Robotics Thesis Proposal
- Remote Access - Zoom
- Virtual Presentation
- NICHOLAS GISOLFI
- Ph.D. Student
- Robotics Institute
- Carnegie Mellon University
Verification and Accreditation of Artificial Intelligence
This work involves formally verifying a trained model's adherence to important design specifications for the purpose of model accreditation. Accreditation of a trained model requires enumeration of the explicit operational conditions under which the model is certified to meet all necessary specifications. By verifying model adherence to specifications set by developers, we increase the trustworthiness of the model along the dimensions that are most relevant to earning a developer's trust. We argue that this gives developers a tool to quantitatively define and formally answer the fundamental question of ‘should we trust this model.
We intend to demonstrate utility of the framework with the verification of a repertoire of design specifications on voting tree ensemble models for classification. Verification is posed as an instance of the Boolean Satisfiability (SAT) problem. Our novel SAT encoding yields disruptive speed gains over related tree ensemble verification techniques, which facilitates the verification of harder specifications on models of larger scale than reported in literature. It is possible to extend our framework to other model classes and design specifications that are beyond our current scope.
We show it is possible to use our framework to provide explanations for model behavior. We introduce a quantitative definition for interpretability; a provably correct interpretation of a model is a causal specification relating inputs to outputs that the model never violates. We demonstrate how we can mine data for a specific type of causal structure; 2D, axis-aligned range rules that offer candidate specifications for verification. These bounding box rules represent intelligible, intermediate-level concepts that, pending verification, offer provably correct summaries of model behavior. In addition to explanations for input-output behavior of a model, we demonstrate that it is possible to provide explanations that diagnose discrepancies between what is observed and what was prescribed; if the model violates a required specification, we want to understand why. We provide explanations by summarizing proofs of unsatisfiability that accompany the certificates. A minimized unsatisfiable set comprises the literals and clauses implicated in a logical contradiction. These provide the basis for provably correct explanations that are counterfactual and contrastive in nature.
We propose a few directions to further demonstrate and expand the utility of our accreditation framework. For a supervised radiation risk assessment application, where safety is paramount, we propose verifying a suite of new safety criteria that will increase domain-expert trust in the model's predictions. For selected clinical decision support applications, we propose to diagnose the sources of systematic disagreements between opinions of experts who annotate training data, and to assess the model's adherence to common-sense-based expectations of its behavior. We expect these new capabilities to enhance adoption and effectiveness of artificial intelligence applications to guide clinical decisions.
Artur Dubrawski (Chair)
Madalina Fiterau (University of Massachusetts Amherst)
Zoom Participation. See announcement.