Microsoft has recently announced the open-source availability of the Python-based enterprise environment simulator named ‘CyberBattleSim’.
It is an experimental research project that investigates how autonomous agents operate in a simulated enterprise environment using the high-level abstraction of computer networks and cybersecurity concepts.
The toolkit uses the Python-based OpenAI Gym interface to allow the training of automated agents using reinforcement learning algorithms.
CyberBattleSim provides a way to build a highly abstract simulation of the complexity of computer systems, making it possible to frame cybersecurity challenges in the context of reinforcement learning.
“We encourage the community to investigate how cyber agents interact and evolve in simulated environments and research how high-level abstractions of cybersecurity concepts help us understand how cyber-agents would behave in actual enterprise networks”, says Microsoft.
How CyberBattleSim works?
CyberBattleSim focuses on threat modeling the post-breach lateral movement stage of a cyberattack. The environment consists of a network of computer nodes. It is parameterized by fixed network topology and a set of predefined vulnerabilities that an agent can exploit to laterally move through the network.
The simulated attacker’s goal is to take ownership of some portion of the network by exploiting these planted vulnerabilities. While the simulated attacker moves through the network, a defender agent watches the network activity to detect the presence of the attacker and contain the attack.
The graph below depicts an example of a network with machines running various operating systems and software. Each machine has a set of properties, a value, and pre-assigned vulnerabilities. Black edges represent traffic running between nodes and are labelled by the communication protocol.
Visual representation of lateral movement in a computer network simulation
There are predefined vulnerability outcomes such as leaked credentials, leaked references to other computer nodes, leaked node properties, taking ownership of a node, and privilege escalation on the node.
“We implement mitigation by reimaging the infected nodes, a process abstractly modeled as an operation spanning multiple simulation steps”, says Microsoft.
To compare the performance of the agents, two metrics are used: the number of simulation steps taken to attain their goal and the cumulative rewards over simulation steps across training epochs.
“To perform well, agents now must learn from observations that are not specific to the instance they are interacting with. They cannot just remember node indices or any other value related to the network size. They can instead observe temporal features or machine properties”.
“A potential area for improvement is the realism of the simulation. The simulation in CyberBattleSim is simplistic, which has advantages: Its highly abstract nature prohibits direct application to real-world systems, thus providing a safeguard against potential nefarious use of automated agents trained with it”, concludes the post published by Microsoft.