Facepalm: Generative AI services lack true intelligence and fail to meaningfully contribute to open-source development efforts. A security expert, frustrated with “spammy,” hallucinated bug listings, is urging the FOSS community to disregard AI-generated reports.
Generative AI models have demonstrated their power as tools for cyber-criminals and fraudsters. Additionally, they are being used by hucksters to inundate open-source projects with useless bug reports. According to Seth Larson, there’s been a surge in “extremely” low-quality, spammy, and LLM-hallucinated security reports, causing maintainers to waste time on unintelligent submissions.
Seth Larson, a security developer at the Python Software Foundation, also volunteers on “triage teams” that vet security reports for popular open-source projects like CPython, pip, urllib3, Requests, and others. In a recent blog post, Larson criticizes the troubling trend of sloppy security reports generated by AI systems.
These AI-generated reports are particularly problematic because they initially appear legitimate and worth investigating. However, as indicated by Curl and other projects, they are essentially well-crafted yet useless content. Thousands of open-source projects face this issue, and maintainers are often hesitant to discuss their findings due to the sensitive nature of security-related development.
“If this is happening to the few projects I oversee, I suspect it is widespread across open source,” Larson commented.
Hallucinated reports waste the time of volunteer maintainers, creating confusion, stress, and frustration. Larson recommends that the community treat low-quality AI reports as malicious, even if this wasn’t the senders’ original aim.
Offering valuable guidance for those dealing with a rise in AI-hallucinated reports, Larson suggests employing CAPTCHA and other anti-spam tools to prevent automated security report generation. He also advises against using AI models to identify security vulnerabilities in open-source projects.
Large language models lack understanding of code. Discovering valid security flaws requires comprehension of “human-level concepts” such as intent, usage, and context. Maintainers can mitigate this issue by responding to apparent AI reports with minimal effort, reflecting the “near zero” input by the original submitters.
Larson acknowledges that many vulnerability reporters operate in good faith and usually produce high-quality reports. Nonetheless, the rising number of low-effort, low-quality reports is detrimental to everyone involved in development.