If algorithms are going to impact nearly every aspect of our lives, they should meet the highest standards of accountability. Yet little is known—particularly to those outside the field of computer science—about how these algorithms actually operate.
Making this form of artificial intelligence more transparent to the lay audience is the driving force behind the work of Lehigh University faculty member Sihong Xie. “Creating accountable machine learning is the ultimate goal of our research,” he says.
Xie, an assistant professor of computer science and engineering in the P.C. Rossin College of Engineering and Applied Science, recently won support from the National Science Foundation’s Faculty Early Career Development (CAREER) program for his proposal to make machine learning models more transparent, more robust, and more fair.
The prestigious NSF CAREER award is given annually to junior faculty members across the U.S. who exemplify the role of teacher-scholars through outstanding research, excellent education, and the integration of those two pursuits. Each award provides stable support at the level of approximately $500,000 for a five-year period.
Machine learning models are capable of solving complex problems by analyzing an enormous amount of data. But how they computationally solve the problem is a mystery. “It’s difficult for humans to make sense of the reasoning process of the program,” says Xie. “How do we know that the artificial intelligence is analyzing the data like a human domain expert would analyze it?”
For example, he says, scientists may turn to a machine learning model to tell them which molecular combination to experiment with in a particular experiment. The model could analyze thousands of molecules in seconds, and come up with a list of promising candidates.
“So the model says, try A, B, and C. But the researchers might not be confident in that list because they don’t know why they should try A, B, and C,” says Xie. “Conducting experiments in such a domain can be highly expensive, and I want the machine to also tell chemical engineers that A, B, and C have certain characteristics that make them more promising for whatever reason, and that’s why they should try that combination.”
He says such explainability of the machine learning algorithms will generate greater confidence and trust of the human users in the models. To establish that explainability, he’ll work with domain experts to combine human knowledge with machine learning programs. He’ll incorporate the constraints that guide these professionals in their decision-making into the development of algorithms that more closely reflect human domain knowledge and reasoning patterns.
“If the model can satisfy those constraints, then they behave pretty much like the human is expected to behave,” he says. “The technical intent should be general enough to apply to many different domains, including cybersecurity, computational neuroscience, and smart cities.”
Ultimately, few human experts will be able to dedicate the time necessary to fully compile the constraints around any one question. To that end, Xie intends to automate the creation of such checklists by collecting relevant data from the experts instead.
“They have years’ worth of data, so the idea is to have my machine run through it all, and create the checklist for us. And of course, that’s the problem,” he says. “That list may not be 100 percent accurate.”
It could be missing things. It could introduce noise. In short, he says, the checklist of constraints the model might develop on its own—a checklist it will then use to determine the answer to something like, what combination of molecules should I study?—could be too sensitive, or not sensitive enough. Xie and his team will design another algorithm to find what he calls the sweet spot in this checklist creation. One that is sensitive enough to detect subtle but useful positives, but not so sensitive it generates too many false positives.
“Real-world data are dirty,” he says. “We want the machine program to be robust enough so that if it’s dealing with reasonably dirty data, it will still generate reliable output.”
Along with questions about accountability come concerns about algorithmic fairness: If machine learning algorithms now influence what we read in our social feeds, which job postings we see, and how our loan applications are processed, how can we be sure that choices are made ethically?
To address those concerns, Xie will utilize multiple objective optimization to find the most efficient solutions to competing perspectives on what’s considered fair.
“Different people, different organizations, different countries, they all have their own definition of fairness,” he says. “So we have to explore all the possible trade-offs between these different definitions, and that’s the technical challenge, because there are so many different ways to trade off. The computer has to actually search for how much each of these fairness standards has to be respected.”
He will provide algorithmic solutions that can efficiently search such trade-offs. “We did recognize that trading one objective for another is a ubiquitous situation in the broader accountable machine learning,” he adds, “for example, trading transparency and accuracy, or multiple forms of explanations, etc.”
The implications of this research could be profound. Xie says that, eventually, experts could have much more confidence in artificial intelligence, and algorithms could become more responsive to social norms.
“The biggest motivation for me in conducting this research is that it has the potential to make a real social impact,” he says. “And because we always have humans in the loop, we’re going to ensure that these models inspire more confidence and treat people fairly.”
About Sihong Xie
Sihong Xie is an assistant professor in the Department of Computer Science and Engineering at Lehigh University. His research interests include misinformation detection in adversarial environments, interpretable and fair graphical models, and human–machine learning collaboration in data annotation.
Xie received his PhD from the Department of Computer Science at the University of Illinois Chicago in 2016. He holds bachelor’s and master’s degrees from the School of Software Engineering at Sun Yat-Sen University in China.
Xie has published over 60 papers in major data mining conferences, such as KDD, ICDM, WWW, AAAI, IJCAI, WSDM, SDM, TKDE, with over 2000 citations and an h-index of 17. He serves on the Senior Program Committee for AAAI and is a program committee member for other ML and AI conferences, including KDD, ICLR, ICDM, and SIGIR.