What is Software Supply Chain?
Software has become an integral part of crucial infrastructures throughout the United States. Underlying modern software systems is the supply chain of open-source software components, such as Apache Spark, whose functionalities are reused and integrated into various systems underpinning modern society.
Risks in Software Supply Chains
While software supply chains empower the
rapid development of software
systems, they also increase the risks, since any bugs, vulnerabilities, and unauthorized changes
in upstream components can propagate to downstream systems and cause severe consequences.
This is evident through many software crises witnessed in recent years, such as the Heartbleed bug, the Equifax data breach, and the NPM
left-pad incident that almost broke the Internet.
Our solution
In this project, our team aim to develop a
unified knowledge graph that captures rich, upto date information about software components in
heterogenous software ecosystems. Building
upon our prior work on noise-robust open knowledge extraction, we will develop a new neural
knowledge acquisition pipeline that (1) extracts software information from various information
sources, including but not limited to official documentation, software release notes, bug reports,
CVEs, and online discussions, (2) consolidates the extracted information via an array of quality
control and fact-checking mechanisms, and (3) constantly updates the knowledge graph by tracking
new information from various sources. The resulting knowledge graph will empower us to
further develop a novel multi-modal query interface for knowledge dissemination, as well as new
risk mitigation approaches that perform deep scans on software systems, detect potential risks,
and automatically repair them. Finally, we will collaborate closely with our industrial partners to
deploy the resulting knowledge graph and knowledge-based techniques and evaluate their usefulness
in real-world software systems.
The figure below demonstrates an example knowledge graph for software supply chain security, where
each entity—such as a software library or a vulnerability—is represented as a node, and the relations
between them are depicted as edges. For instance, the entity "Apache Log4j," a widely-used logging
library, is shown as a node connected to its different versions, such as v2.16.0. These versions are
further linked to other entities, indicating dependencies on different applications (e.g., Cisco CX
Cloud) or vulnerabilities associated with them (e.g., "Denial of Service").
Introduction
The Secure Chain Knowledge Graph is a comprehensive knowledge graph designed to model the relationships between software, hardware, vulnerabilities, and other entities to support secure and transparent management of software supply chains.
Resources
Design
The Secure Chain Ontology builds on top of Schema.org, as shown in the figure below, extending its vocabulary to seamlessly integrate with its metadata properties and enhance interoperability across various systems.
We use sc:Software as a central concept in the Secure Chain Ontology to represent software within secure supply chains, with associated sc:SoftwareVersions capturing the evolution of software over time. These versions are critical for tracking vulnerabilities, compliance, and updates. The ontology models dependencies between software versions and other components, such as hardware, through properties like sc:dependsOn and sc:OperatesOn, which help assess potential risks and identify vulnerabilities. Additionally, sc:License links each software version to its legal aspects, ensuring compliance across the supply chain. The ontology also extends to hardware through sc:Hardware and sc:HardwareVersions, allowing for comprehensive tracking of both digital and physical components. Vulnerabilities are represented through sc:Vulnerability and sc:VulnerabilityType, with links to the entities that discover them, providing a detailed view of security risks across software and hardware versions.
Year One Deliverables
Other Tools
Yuan Tian, Toby Li, Jonathan K. Kummerfeld, and Tianyi Zhang
Proceedings of the 35rd ACM User Interface Software and Technology Symposium (UIST), 20 pages, 2024.
Zihan Zhou, Zhongkai Zhao, Bonan Kou, and Tianyi Zhang
In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pp. 547-551. 2024.
Bonan Kou, Shengmai Chen, Zhijie Wang, Lei Ma, and Tianyi Zhang.
Proceedings of the ACM on Software Engineering 1, no. FSE (2024): 2261-2284.
Zian Su, Xiangzhe Xu, Ziyang Huang, Kaiyuan Zhang, and Xiangyu Zhang.
NeurIPS 2024: Conference and Workshop on Neural Information Processing Systems
Fei Wang, Wenjie Mo, Yiwei Wang, Wenxuan Zhou, Muhao Chen
EMNLP 2023: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, Xiaoheng Xie, and Xiangyu Zhang.
NeuIPS 2024
Yiwei Wang, Bryan Hooi, Fei Wang, Yujun Cai, Yuxuan Liang, Wenxuan Zhou, Jing Tang, Manjuan Duan, and Muhao Chen
NeurIPS 2024: Conference and Workshop on Neural Information Processing Systems
Yuan Tian, Zheng Zhang, Zheng Ning, Toby Li, Jonathan K. Kummerfeld, Tianyi Zhang
EMNLP 2023: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Tai Nguyen, Yifeng Di, Joohan Lee, Muhao Chen, Tianyi Zhang
ASE'23: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering
Weihao Chen, Xiaoyu Liu, Jiacheng Zhang, Ian Iong Lam, Zhicheng Huang, Rui Dong, Xinyu Wang, Tianyi Zhang
UIST'23: Proceedings of the 34rd ACM User Interface Software and Technology Symposium
Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, Hoifung Poon
Zhongkai Zhao, Bonan Kou, Mohamed Yilmaz Ibrahim, Muhao Chen, Tianyi Zhang
ESEC/FSE'23: Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Bonan Kou, Muhao Chen, Tianyi Zhang
ICSE'23: Proceedings of the 45th International Conference on Software Engineering
Purdue University
University of Southern California
Purdue University
University of Southern California
Purdue University
Purdue University
Purdue University
Purdue University
University of California, Davis
University of Southern California
Assistant Professor, Department of Computer Science, Purdue University
Lawson 3154H, Purdue University