Abstract: Artificial Intelligence (AI) is one of the central technological topics of our time. The quality of AI models strongly depends on the quality and timeliness of the underlying data. Insufficient or outdated data directly leads to poorer results. Regular and reliable data provision is therefore essential for the sustainable use of AI systems. The JValue project uses specialized web crawlers to provide structured data sets for AI systems. As the number and
complexity of crawlers increase and their runtime lengthens, the complexity of crawler orchestration increases
significantly. Existing workflow and orchestration frameworks offer functionalities for executing and orchestrating workflows, but no domain-specific functionalities such as crawler-specific metrics and analyses. The aim of the work is to design a web-based crawler management system in the context of the JValue project. The system should be based on an existing orchestration framework and extend it with domain-specific functionalities without being closely coupled to the underlying orchestration framework. The system enables structured orchestration of
distributed crawlers and provides domain-specific metrics and evaluations. The results of this thesis show how a crawler management system can provide an up-to-date and reliable database, which is a key prerequisite for high-quality AI systems. Although the system is being developed for the JValue project, it is generic and can therefore also be used in other scenarios for the reliable provision of data by crawlers.
Keywords: None
PDF: Master Thesis
Reference: Florian Oberndörfer. Konzeption und Implementierung einer generischen Crawler-Management-Plattform auf Basis bestehender Workflow-Orchestrierungstools. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2026.
Discover more from Professorship for Open-Source Software
Subscribe to get the latest posts sent to your email.