Final Thesis: URL2SBOM

Abstract: This thesis concerns itself with the identification of third-party JavaScript libraries used on websites. To identify the libraries and their version from client-side scripts, which are likely to be slightly modified, minified, obfuscated or bundled together, a corpus of popular client-side JavaScript libraries is constructed. The entry point for each library is determined by the jsDelivr API and the corresponding file is downloaded in minified form and if available also in non-minified form. These files are then used as input for Siamese (Ragkhitwetsagul & Krinke, 2019), constructing multiple representations and indexing them for later queries in ElasticSearch. Siamese performs static analysis comparing multiple representations of source code to identify potential code clone pairs. A small proto-benchmark consisting of four popular JavaScript libraries in multiple transformed variations is created to evaluate Siamese’s suitability for the task of code clone search on JavaScript libraries to match them to the corpus for their identification. The testing on said benchmark revealed that Siamese is not able to process all the supplied input data correctly, often failing quietly. The tool is also inherently susceptible to obfuscated code.

Keywords: Third-party JavaScript dependencies, JavaScript library identification, Siamese, jsDelivr, ElasticSearch, Code clone search, Static analysis, Obfuscation, Minification

PDF: Master Thesis

Reference: Nadine Laube. URL2SBOM. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2024.