Contents
All blogs / Build a reinforcement learning environment using Unity ML-Agents
September 01, 2021 • Joy Zhang • Tutorial • 9 minutes
This article is part 2 of the series 'A hands-on introduction to deep reinforcement learning using Unity ML-Agents'. It's also suitable for anyone new to Unity interested in using ML-Agents for their own reinforcement learning project.
In my previous post, I went over how to set up ML-Agents and train an agent.
In this article, I'll walk through how to build a 3D physics-based volleyball environment in Unity. We'll use this environment later to train agents that can successfully play volleyball using deep reinforcement learning.
Skip ahead to Part 4 if you just want to dive straight into the training.
Volleyball.unity
scene.VolleyballArea.prefab
object into the scene.If you click Play ▶️ above the Scene viewer you'll notice some weird things happening because we haven't added any physics or logic to define how the game objects should interact yet. We'll do that in the next section.
⚠ Before we start, open the VolleyballArea prefab (Project panel > Assets > Prefabs). We'll make our edits to the base prefab, so that they are reflected in all instances of this prefab. This will come in handy later when we duplicate our environment multiple times for parallel training.
Make our volleyball subject to Unity's physics engine:
ball
.Add 'bounciness' to our ball:
Bouncy.physicMaterial
into the 'Material' slot.Bouncy.physicMaterial
to change the 'bounciness'.Both blue and purple agent cubes have already been set up for you in a similar way to the Volleyball.
walkableSurface
. This is used later to check whether or not the agent is 'grounded' for its jump action.Goals are represented by a thin layer on top of the ground.
When a game object is set as a trigger, it no longer registers any physics-based collisions. Even though the goals are placed above the ground layer, technically the agents are moving on the Ground layer collider we created earlier.
Setting triggers allows us to use the OnTriggerEnter
method later which will detect when a ball has hit the collider.
💡 Some shortcuts: Alt+click to rotate, middle-click to pan, middle mouse wheel to zoom in/out.
There are three invisible boundaries:
Colliders, tags, and triggers for these boundaries have already been set up for you.
In this section, we'll add scripts that define the environment behavior (e.g. what happens when the ball hits the floor or when the episode starts).
VolleyballSettings.cs
Our first script will simply hold some constants that we'll reuse throughout the project.
public float agentRunSpeed = 1.5f;
public float agentJumpHeight = 2.75f;
public float agentJumpVelocity = 777;
public float agentJumpVelocityMaxChange = 10;
// Slows down strafe & backward movement
public float speedReductionFactor = 0.75f;
public Material blueGoalMaterial;
public Material purpleGoalMaterial;
public Material defaultMaterial;
// This is a downward force applied when falling to make jumps look less floaty
public float fallingForce = 150;
Note: there is also a ProjectSettingsOverride.cs
script provided. This contains additional default settings related to time-stepping and resolving physics.
Go back to the Unity editor and select the VolleyballSettings game object. You should see that these variables are shown in the Inspector panel.
VolleyballController.cs
This script is attached to the Volleyball game object and lets us detect when the ball has hit our boundary or goal trigger.
VolleyballController.cs
script attached to the Volleyball.VolleyballController : MonoBehaviour
class (above the Start()
method), declare the variables:[HideInInspector]
public VolleyballEnvController envController;
public GameObject purpleGoal;
public GameObject blueGoal;
Collider purpleGoalCollider;
Collider blueGoalCollider;
This will allow us to access their child objects later.
Start()
This method is called when the environment is first rendered. It will:
GetComponent<Collider>
method:purpleGoalCollider = purpleGoal.GetComponent<Collider>();
blueGoalCollider = blueGoal.GetComponent<Collider>();
envController = GetComponentInParent<VolleyballEnvController>();
Copy these statements into the Start()
method:
void Start()
{
envController = GetComponentInParent<VolleyballEnvController>();
purpleGoalCollider = purpleGoal.GetComponent<Collider>();
blueGoalCollider = blueGoal.GetComponent<Collider>();
}
OnTriggerEnter(Collider other)
This method is called when the ball hits a collider.
Some scenarios to detect are:
This method will detect each scenario and pass this information to envController
(which we'll add in the next section). Copy the following block into this method:
if (other.gameObject.CompareTag("boundary"))
{
// ball went out of bounds
envController.ResolveEvent(Event.HitOutOfBounds);
}
else if (other.gameObject.CompareTag("blueBoundary"))
{
// ball hit into blue side
envController.ResolveEvent(Event.HitIntoBlueArea);
}
else if (other.gameObject.CompareTag("purpleBoundary"))
{
// ball hit into purple side
envController.ResolveEvent(Event.HitIntoPurpleArea);
}
else if (other.gameObject.CompareTag("purpleGoal"))
{
// ball hit purple goal (blue side court)
envController.ResolveEvent(Event.HitPurpleGoal);
}
else if (other.gameObject.CompareTag("blueGoal"))
{
// ball hit blue goal (purple side court)
envController.ResolveEvent(Event.HitBlueGoal);
}
VolleyballEnvController.cs
This script holds all the main logic for the environment: the max steps it should run for, how the ball and agents should spawn, when the episode should end, how rewards should be assigned, etc.
In the sample skeleton script, some variables and helper methods are already provided:
Start()
— fetch the components and objects we'll need for laterUpdateLastHitter()
— keeps track of which agent was last in control of the ballGoalScoredSwapGroundMaterial()
— changes the color of the ground (helps us visualise which agent scored)FixedUpdate()
This is called by the Unity engine each time there is a frame update (which is set to every FixedDeltaTime=0.02
seconds in ProjectSettingsOverride.cs
).
This will control the max number of updates (i.e. 'steps') the environment takes before we interrupt the episode (e.g. if the ball gets stuck somewhere).
Add the following to void FixedUpdate()
:
/// <summary>
/// Called every step. Control max env steps.
/// </summary>
void FixedUpdate()
{
resetTimer += 1;
if (resetTimer >= MaxEnvironmentSteps && MaxEnvironmentSteps > 0)
{
blueAgent.EpisodeInterrupted();
purpleAgent.EpisodeInterrupted();
ResetScene();
}
}
ResetScene()
This controls the starting spawn behavior.
Our goal is to learn a model that allows our agent to return the ball from its side of the court no matter where the ball is sent. To help with training, we'll randomise the starting conditions of the agents and ball within some reasonable boundaries:
/// <summary>
/// Reset agent and ball spawn conditions.
/// </summary>
public void ResetScene()
{
resetTimer = 0;
lastHitter = Team.Default; // reset last hitter
foreach (var agent in AgentsList)
{
// randomise starting positions and rotations
var randomPosX = Random.Range(-2f, 2f);
var randomPosZ = Random.Range(-2f, 2f);
var randomPosY = Random.Range(0.5f, 3.75f); // depends on jump height
var randomRot = Random.Range(-45f, 45f);
agent.transform.localPosition = new Vector3(randomPosX, randomPosY, randomPosZ);
agent.transform.eulerAngles = new Vector3(0, randomRot, 0);
agent.GetComponent<Rigidbody>().velocity = default(Vector3);
}
// reset ball to starting conditions
ResetBall();
}
/// <summary>
/// Reset ball spawn conditions
/// </summary>
void ResetBall()
{
var randomPosX = Random.Range(-2f, 2f);
var randomPosZ = Random.Range(6f, 10f);
var randomPosY = Random.Range(6f, 8f);
// alternate ball spawn side
// -1 = spawn blue side, 1 = spawn purple side
ballSpawnSide = -1 * ballSpawnSide;
if (ballSpawnSide == -1)
{
ball.transform.localPosition = new Vector3(randomPosX, randomPosY, randomPosZ);
}
else if (ballSpawnSide == 1)
{
ball.transform.localPosition = new Vector3(randomPosX, randomPosY, -1 * randomPosZ);
}
ballRb.angularVelocity = Vector3.zero;
ballRb.velocity = Vector3.zero;
}
ResolveEvent()
This method will resolve the scenarios we defined earlier in VolleyballController.cs
.
We can use this method to assign rewards in different ways to encourage different types of behavior. In general, it's good practise to keep rewards within [-1,1].
To keep it simple, our goal for now is to train agents that can bounce the ball back and forth and keep the ball in play. We'll assign a reward of +1 each time an agent hits the ball over the net using the AddReward(1f)
method in the corresponding scenario:
case Event.HitIntoBlueArea:
if (lastHitter == Team.Purple)
{
purpleAgent.AddReward(1);
}
break;
case Event.HitIntoPurpleArea:
if (lastHitter == Team.Blue)
{
blueAgent.AddReward(1);
}
break;
We won't assign any rewards for now if a goal is scored or the ball is hit out of bounds. If either of these scenarios happen, we'll just end the episode. Add the following code block to the sections indicated by the // end episode
comment.
blueAgent.EndEpisode();
purpleAgent.EndEpisode();
ResetScene();
Here's what ResolveEvent
should look like:
/// <summary>
/// Resolves scenarios when ball enters a trigger and assigns rewards
/// </summary>
public void ResolveEvent(Event triggerEvent)
{
switch (triggerEvent)
{
case Event.HitOutOfBounds:
if (lastHitter == Team.Blue)
{
// apply penalty to blue agent
}
else if (lastHitter == Team.Purple)
{
// apply penalty to purple agent
}
// end episode
blueAgent.EndEpisode();
purpleAgent.EndEpisode();
ResetScene();
break;
case Event.HitBlueGoal:
// blue wins
// turn floor blue
StartCoroutine(GoalScoredSwapGroundMaterial(volleyballSettings.blueGoalMaterial, RenderersList, .5f));
// end episode
blueAgent.EndEpisode();
purpleAgent.EndEpisode();
ResetScene();
break;
case Event.HitPurpleGoal:
// purple wins
// turn floor purple
StartCoroutine(GoalScoredSwapGroundMaterial(volleyballSettings.purpleGoalMaterial, RenderersList, .5f));
// end episode
blueAgent.EndEpisode();
purpleAgent.EndEpisode();
ResetScene();
break;
case Event.HitIntoBlueArea:
if (lastHitter == Team.Purple)
{
purpleAgent.AddReward(1);
}
break;
case Event.HitIntoPurpleArea:
if (lastHitter == Team.Blue)
{
blueAgent.AddReward(1);
}
break;
}
}
Now when you click Play ▶️ you should see the environment working correctly: the ball is affected by gravity, the agents can stand on the ground, and the episode resets when the ball hits the floor.
You should now have a volleyball environment ready for our agents to train in. It will assign our agents rewards to encourage a certain type of behavior (volleying the ball back and forth).
In the next section, we'll design our agents and give it actions to choose from and a way to observe its environment.