Supplemental material to the paper From Zero-Shot to Reward-Aware: Evaluating Prompting and Memory in LLM-Based Cyber Defense Agents


Date

Formats

Categories

Content

Large language models (LLMs) emerge as promising candidates for autonomous cyber defense (ACD), yet their reliability, and adaptability remain uncertain.

This work presents a systematic evaluation of LLM-based defense agents in the CybORG++ environment. We compare multiple models, and prompting strategies, against a Proximal Policy Optimization (PPO) Reinforcement Learning (RL) baseline, for diverse adversaries, and diverse network topologies.

The repository provides the code, configuration files, prompts, and step-by-step instructions required to reproduce the experiments described in the paper From Zero-Shot to Reward-Aware: Evaluating Prompting and Memory in LLM-Based Cyber Defense Agents. It also includes logs for each experiment reported in the paper, together with code to replicate the statistical analysis.