Self-Driving Mini Car on a Hiking Trail

7 min readJun 13, 2021

Introduction

Self-driving cars on urban streets make effective use of road markers, traffic signs, and various sensors to stay on the road. The roads are generally well illuminated and maintained in order to make day-to-day traffic as smooth as possible. Our project aims to explore what can be done in more chaotic conditions without explicit road markings and armed with only a camera. More specifically, we attempt to teach a robot platform to navigate narrow footpaths in the wild using computer vision and neural networks.

The DonkeyCar platform

The chosen platform is DonkeyCar — a Raspberry Pi based robot platform aimed at developing self-driving cars using neural networks and computer vision. The platform comes with several convenient software utilities to get up and running as soon as possible and a few predefined neural network architectures to get started with. The stock DonkeyCar platform was modified by adding a 3S LiPo battery for a longer runtime and an Xbox controller receiver for controlling the robot remotely with low latency.

DonkeyCar set up for capturing training data.

Linear models

Humble beginnings

To prove the concept of driving on the desired type of trail, the first model was trained on a short track to give it a better chance of success. To collect the training data, the track was driven several times in both directions. As the track was curved, it allowed us to check whether steering works correctly in both directions, thus giving very good visual feedback and showing whether the model behaves as expected.

DonkeyCar being controlled manually to capture training data.

Our first experiment made use of the default DonkeyCar linear model architecture. Like most of the models implemented, it takes the camera image as an argument and feeds it through five convolutional layers and flattens the output. In the linear model, the output from the convolutional block goes through feed-forward layers (100 and 50 neurons each). The model outputs two values: one for steering and one for throttle.

According to the DonkeyCar website, the linear model runs well on a Raspberry Pi, is quite robust, steers smoothly and our results support that claim. The results with the simple model were satisfying and during around six test runs it did not make any errors on the track it was trained on. If anything, it drove even slightly better than when it was controlled manually.

DonkeyCar driving autonomously on the track that it knows well.

The same model was then tested on a similar track, but with the surrounding environment slightly changed. During this test, the car showed a significant amount of correct behaviour but also had some errors. It was clear that the model could not make the necessary corrections when the driving direction drifted significantly.

DonkeyCar driving on a road it hasn’t seen before.

Further testing verified that issue and it can be explained by not having training data that demonstrates how to act when the car is going off the track. This data is more difficult to gather as well, as it requires creating this situation enough times without recording the creation of the situation, as this is something we still want to avoid during normal driving.

Another issue that we came across when testing our first models was the control over throttle. Namely, when driving on different terrain, the throttle required to hold the same speed is very different. Parsing this information out of the image is a lot harder than estimating the angle for steering. To solve this problem, one option is to try to use IMU (inertial measurement unit) data as one input to the network to help with the throttle control. Still, this is not the only variable that determines how fast the car drives. The speed is also highly dependant on the remaining battery voltage level.

The search for training data

A versatile dataset is required in order to successfully train an end-to-end model that is able to generalize over different tracks in different weather conditions. This includes varying the geometry of the track and the amount of foliage around the track and the color of the soil. Lighting conditions can also affect the results, so data in different weather conditions is required.

To build this versatile dataset, we collected data by driving the DonkeyCar in several different places around Tartu over the span of several months. To test how well the model generalized for different trails we evaluated visually how well the car could keep itself on the track. The parts of the tracks we tested it on were specifically chosen for their unique features such as straight parts, curves, hilly terrain, etc. Overall we collected around 70 000 samples of training data, each consisting of the camera image along with the corresponding throttle and steering values.

Places where training and testing tracks were located.

Limitations of the platform

The DonkeyCar platform has several inherent limitations that we did not realize before testing the system. The first problem comes from the lack of computational power. This resource limitation leads to latency between the input image and the output commands (steering and throttle). The latency was around half a second for the simplest linear models and got worse when the complexity of the model increased. This resulted in the car only driving well at a low speed. Any time the speed would increase, oscillations also increased, which finally resulted in the car driving off of the road.

The limitation of having to use a low speed leads to another problem with the platform — namely, control over speed. Although it is possible to control the speed with continuous input in a range from -1 (reversing at max speed) to 1 (driving forward at max speed), the DonkeyCar platform has no feedback to its actual speed. This means that, in practice, using a throttle value of 0.6 would achieve a speed of multiple meters per second if the track is going slightly downhill, while the car would get stuck when the track goes slightly uphill. Such drastic changes in speed are not suitable for controlling it by an algorithm that has around a second of latency. While it is true that the model could learn to estimate the speed of the vehicle based on the camera image, doing this through the video feed would be much more difficult to learn than to steer the platform.

The final limitation of the platform relates to its size and mechanical rigidity. The platform is small and mechanically not very rigid — a sophisticated way to say it is not made for off-road tracks. While we have covered the weak drivetrain, its steering is a bit loose as well. Once the front wheels touch some larger strands of grass by the side of the track, the wheels tend to get stuck, which pulls the car even more off of the track. The controlling software can correct this behavior, but with a latency of around a second, it is usually already too late.

Visualizing the model

Model output vs training data

We visualized the model throttle and angle in comparison to the training data. Here is a short example.

Throttle and angle from model output (green) and training data (red)

As we can see from the angle output, the model is steering significantly smoother than the user inputs, having corrected for the human error while steering.

Feature maps

Most of the predefined models feed the input image through the convolutional block. It is relatively tiny, containing 5 convolutional layers. The first two layers have 24 and 32 layers, the last three 64. We visualized these by plotting feature maps for each layer using different images, to get some kind of aim which features are detected on the image.

With the first layers, it is easier to understand what the feature maps represent. Let’s take a look at one example, where we can see that trail turning right soon, some trees far ahead and a lot of grass.

When we look at the first convolutional layer output, we can see some filters represent the edges of the left side of the trail quite nicely. We can see how the hiking trail’s location and some trees are more prominent on feature maps.

The feature maps from the first convolutional layer

When we take a look at the final layer output is rather difficult to identify what the feature maps are representing and how the decision is made.

The feature maps from the final convolutional layer

RNN model

A Recurrent Neural Network architecture was also experimented with. The idea was that using several sequential frames would help to enhance the correction by knowing which way we are currently moving. Six sequential frames were analyzed and the model was able to learn steering successfully.

Using the model on a real track, however, was a failure due to the limited computational power of the Raspberry Pi. Calculating the forward pass of the network was so slow that by the time the system got a response for a frame it was irrelevant.

Conclusion

We were able to implement an autonomous vehicle that can navigate hiking trails. Most of the difficulties arose from physical limitations of the DonkeyCar robotics platform, such as low computational power and lackluster speed controllers. As such, the best performing models were the ones that had simple architectures and a short computing time.