Simple AWS DeepRacer Reward Function Using Waypoints

AWS DeepRacer is a fun way to begin practicing and exploring machine learning, particularly reinforcement learning.

While there are a lot of approaches to creating a reward function based on your goal, I discovered that integrating the “waypoints” parameter within the reward function allowed me to develop a relatively stable model.

In my experiment, although it might not result in the fastest model, this reward function intriguingly maintained an almost constant zero off-track rate for my car.

Here’s how I did it.

Download Waypoints Files for DeepRacer Tracks

In case you’re unaware, there’s a dedicated community of DeepRacer enthusiasts called the AWS DeepRacer Community. They often provide some insightful tutorials and information related to DeepRacer.

To access the desired waypoints files, let’s visit the AWS DeepRacer Community’s GitHub page and find the specific track we’re interested in.

In this example, let’s choose Smile Speedway track.

Let’s download the numpy file by clicking the View raw button.

View the Track Waypoints

For plotting the track, you can utilize either Jupyter Notebook or Google Colab. In this blog, I’ve opted for Google Colab.

Execute the following code to upload the file to your Google Colab notebook.

This will generate a “Choose File” button allowing you to select a numpy file from your local computer. The track numpy file downloaded earlier is named ‘reInvent2019_track.npy’, so let’s select that.

We’ll visualize our track waypoints using blue dots on the plot.

It should look something like this.

Now, rather than displaying the waypoints as blue dots, let’s visualize them using numerical labels to precisely identify the locations of the curves.

Now that we’ve successfully displayed the numbered waypoints, let’s proceed to the next step.

Reward Function

Within this function, I minimized the break_speed as a penalty. I specifically targeted two of the most challenging curves, namely those between 77-99 and 117-150.
To ensure the car slows down while turning, I established a condition: if the car’s speed exceeds 2.5 during these curves, it incurs a penalty; otherwise, it receives a reward based on its speed.

This approach encourages the car to maintain maximum speed while delicately slowing down during turns, preventing it from going off-track.

If you’re curious, here’s a glimpse of the training configuration.

Here’s how the graph appears after three hours of training. Perhaps stopping the training at the two-hour mark would have yielded better results.

 

Now, let’s have a look at the evaluation result.

The car ran relatively smoothly and remained stable throughout. Additionally, in this evaluation, the car never veered off-track for 5 consecutive attempts.

Conclusion

While it may not achieve the highest speed, crafting a reward function utilizing waypoints, as demonstrated above, contributes to achieving a consistently smooth and stable drive across various tracks. Feel free to adapt and experiment with the code and training time to explore further possibilities!

For exploring additional DeepRacer reward function input parameters, click here.

この記事が気に入ったら
いいね ! しよう

Twitter で
The following two tabs change content below.
アバター
Esar Suwandi
Software engineer based in Tokyo focused on Web Development. 4x AWS Certified: CLF, SAA, SAP, DOP. Let's build something awesome! ― Web開発に特化したソフトウェアエンジニアです。 4x AWS認定者です。素晴らしいものを作りましょう!

【採用情報】一緒に働く仲間を募集しています

採用情報
ページトップへ