DARPA Challenge in a sandbox
When online education was getting popular, I took a course at ai-class.com. Then I got an idea of creating a game where a neural network would learn to play by observing a user’s actions. I wanted to build something simple like Flappy-birds and the model training should be done in real-time. I ended up building a simple simulated environment for a car which is learning how to drive autonomously by observing user’s actions. It turned out to be an interesting combination of PyGame, PyTorch, and multiprocessing.
The idea of building a game was simplified to creating a car simulator with a random road and obstacles. The screenshot above shows an untrained net (on the left) and a trained model (on the right).
How it works
In the beginning, the network is initialized with random weights. After each frame, normalized LIDAR values and the last command (left, right, or straight) are memorized. Thus it is a multiclass classification problem. After collecting N new training samples (in this case 500) they are sent to the training process via the task_queue. After training is complete the state of the trained model is sent back to the UI process via the result_queue. This is when the new state is rendered and the user can switch to the self-driving mode.
It is worth mentioning some issues while training using these data:
- The most common command is “straight” so the training set is unbalanced and the network might train to prefer going straight in most of the cases. This can be fixed by downsampling the majority class.
- When the self-driving car gets into the critical situations for which it was not trained, it’s behavior is unpredictable. In my case, the car will hit the obstacle but this can be fixed by “teleporting” the car to a critical situation and training the strategy.
Model and training
For this setup, I used a network with 24 input neurons (by a number of lidars), 3 hidden layers, and 3 output layers:
class Model(nn.Module):
def __init__(self, in_features=24, hidden=[56, 48, 48], out_features=3):
super().__init__()
layer_sizes = [in_features] + hidden
layers = []for i in range(len(layer_sizes) - 1):
layers.append(nn.Linear(layer_sizes[i], layer_sizes[i + 1]))
layers.append(nn.ReLU(inplace=True))layers.append(nn.Linear(layer_sizes[-1], out_features))
self.layers = nn.Sequential(*layers)def forward(self, x):
return self.layers(x)
The code for backpropagation looks as follows:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.005)epochs = 7000for i in range(epochs):
y_pred = model.forward(X_train)
loss = criterion(y_pred, y_train)if i % 100 == 1:
print(f'epoch: {i:2} loss: {loss.item():10.8f}')optimizer.zero_grad()
loss.backward()
optimizer.step()
The video below shows how it works:
I encourage you to explore the source code and play with the network architecture and train the car in your own driving style.
Thanks for reading!
P.S. Assets were provided by ilyar