Autopentest-drl

The agent learns basics: scan → detect vulnerable service → execute correct exploit. Rewards are given immediately.

Introduction: The End of Manual Poking and Prodding For decades, penetration testing has relied on a paradoxical blend of high-level intuition and repetitive, low-level grunt work. A human pentester spends roughly 70% of their time on reconnaissance, credential stuffing, and basic exploitation—tasks ripe for automation—and only 30% on creative lateral movement and zero-day discovery. As networks grow to cloud-scale and attack surfaces expand exponentially, the traditional "man-with-a-laptop" model is breaking. autopentest-drl

The agent must pivot from Host A to Host B. It learns credential reuse and lateral movement. The agent learns basics: scan → detect vulnerable

Defenders deploy simple firewalls and IDS alerts. The agent learns to add random delays or route through decoys. A human pentester spends roughly 70% of their

Simulators are imperfect. They do not model network latency jitter, packet loss, or ephemeral service failures. An agent that thrives in CybORG may freeze when a real web server occasionally drops a FIN packet, interpreting it as a firewall.

The agent encounters varied topologies, forcing generalization beyond memorization.