Supplemental material to the paper From Zero-Shot to Reward-Aware: Evaluating Prompting and Memory in LLM-Based Cyber Defense Agents
Date
Formats
Categories
Content
Large language models (LLMs) emerge as promising candidates for autonomous cyber defense (ACD), yet their reliability, and adaptability remain uncertain.
This work presents a systematic evaluation of LLM-based defense agents in the CybORG++ environment. We compare multiple models, and prompting strategies, against a Proximal Policy Optimization (PPO) Reinforcement Learning (RL) baseline, for diverse adversaries, and diverse network topologies.
The repository provides the code, configuration files, prompts, and step-by-step instructions required to reproduce the experiments described in the paper From Zero-Shot to Reward-Aware: Evaluating Prompting and Memory in LLM-Based Cyber Defense Agents. It also includes logs for each experiment reported in the paper, together with code to replicate the statistical analysis.