A video-streaming robot combines two of the most satisfying builds in the maker world into a single project. You get a working mobile platform that you can control, and a live camera feed that lets you see exactly where it is going in your phone or laptop browser. The ESP32 cam robot car project achieves both using a single compact module, a two-wheel chassis, an L298N motor driver, and firmware that streams a live MJPEG stream alongside a movement-control web page over Wi-Fi.
No external server, no app installation, no subscription service. The ESP32-CAM hosts the entire control interface itself. Open a browser, connect to the robot's IP address, and drive it from anywhere on the same Wi-Fi network.
This guide covers every stage from components to wiring to firmware to first drive, with each step explained clearly enough to follow without prior robotics experience.
Understanding the ESP32-CAM Module
The ESP32-CAM is a development board featuring the ESP32 microcontroller and an integrated OV2640 camera module, capable of capturing images at up to 1600x1200 resolution. It supports JPEG compression for bandwidth-efficient wireless transmission and includes 4 MB of flash memory, along with an SD card slot. It carries 9 GPIO pins available for interfacing with external components.
For a robot car application, the ESP32-CAM handles three tasks simultaneously using its dual-core processor. One core manages the camera capture pipeline and the MJPEG stream server. The other core runs the Wi-Fi web server, which receives movement commands from the browser and translates them into GPIO signals to control the motor driver. This separation keeps the video stream from blocking motor response and vice versa.
The ESP32-CAM does not have a built-in USB-to-serial converter. Programming requires either an FTDI programmer or the ESP32-CAM-MB programmer board, which clips onto the module to provide USB programming directly.
Components Required
|
Component |
Quantity |
Purpose |
|
ESP32-CAM with OV2640 camera |
1 |
Brain, camera, and Wi-Fi |
|
ESP32-CAM-MB programmer board |
1 |
USB programming interface |
|
L298N dual H-bridge motor driver |
1 |
Controls two DC motors |
|
Two-wheel robot chassis with DC motors |
1 |
Physical platform |
|
Li-ion or LiPo battery pack (7.4V or 2S) |
1 |
Powers motors and ESP32-CAM |
|
LM2596 or MT3608 voltage regulator |
1 |
Steps 7.4V down to 5V for ESP32-CAM |
|
Jumper wires |
As needed |
Connections |
|
Small breadboard (optional) |
1 |
Prototyping connections |
You can source the ESP32-CAM module, robot chassis kits, motor drivers, and supporting electronics from the Think Robotics robotics kits and components collection, which carries the parts needed for this build, along with compatible accessories.
How the System Works
Before wiring anything, understanding the signal flow makes the build much easier to follow.
The ESP32-CAM connects to your home Wi-Fi as a station. It starts two servers on the same IP address. The first server on port 80 serves an HTML page that contains arrow buttons for forward, backward, left, right, and stop. The second server on port 81 serves the MJPEG video stream. When you open the IP address in a browser, you see the control interface with the live feed embedded in it. Pressing a button sends an HTTP GET request to the ESP32-CAM. The firmware receives the request, identifies the direction command, and sets the four L298N control pins accordingly. The L298N translates those logic signals into motor current, and the wheels move.
The L298N motor driver contains two independent H-bridge circuits. Each H-bridge controls one motor. By setting the IN1 and IN2 pins HIGH or LOW in combination, you control the direction of motor A. IN3 and IN4 control motor B. The ENA and ENB pins accept PWM signals to control speed. For this project, ENA and ENB are tied for full-speed operation, which simplifies the firmware without materially affecting the driving experience on a small robot chassis.
Wiring the Circuit
The L298N has two power inputs. The 12V input (which typically accepts 7 to 35V) powers the motors. The 5V output pin on the L298N can power the ESP32-CAM if your battery voltage is 7.4V and the onboard regulator is within its operating range. Alternatively, use a dedicated LM2596 buck converter to step the battery voltage down to a stable 5V for the ESP32-CAM, which is the more reliable approach for consistent Wi-Fi performance.
Connect the components as follows:
|
L298N Pin |
ESP32-CAM GPIO |
Function |
|
IN1 |
GPIO 14 |
Motor A direction 1 |
|
IN2 |
GPIO 15 |
Motor A direction 2 |
|
IN3 |
GPIO 13 |
Motor B direction 1 |
|
IN4 |
GPIO 12 |
Motor B direction 2 |
|
GND |
GND |
Common ground |
|
5V out |
5V |
Power to ESP32-CAM |
Connect Motor A (left wheel) to the OUT1 and OUT2 terminals on the L298N. Connect Motor B (right wheel) to OUT3 and OUT4. Connect the battery's positive terminal to the 12V input terminal and the negative terminal to GND. Leave the ENA and ENB jumpers in place for full speed operation.
One important consideration: GPIO 12 on the ESP32-CAM is a strapping pin that affects the boot voltage selection. On most ESP32-CAM boards, GPIO 12 must be LOW during boot. The L298N IN4 pin connected to GPIO 12 should be confirmed as LOW at power-on. If the robot fails to boot consistently, add a 10 kΩ pull-down resistor from GPIO 12 to GND to keep it LOW during the boot sequence.
Installing Required Libraries and Board Support
Open the Arduino IDE. Confirm the ESP32 board support package from Espressif is installed through the Boards Manager. Select AI Thinker ESP32-CAM from the Boards menu. This is the correct board target for the standard ESP32-CAM module.
No additional libraries are required beyond what ships with the ESP32 board package. The camera driver, Wi-Fi server, and all streaming components are included in the ESP32 Arduino core.
Firmware
Create a new sketch and upload the following code. Replace the Wi-Fi credentials with your own before uploading.
#include "esp_camera.h"
#include <WiFi.h>
#include "esp_http_server.h"
// Wi-Fi credentials
const char* ssid = "YourWiFiName";
const char* password = "YourWiFiPassword";
// Motor control pins
#define IN1 14
#define IN2 15
#define IN3 13
#define IN4 12
// Camera pin definitions for AI Thinker ESP32-CAM
#define PWDN_GPIO_NUM 32
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 0
#define SIOD_GPIO_NUM 26
#define SIOC_GPIO_NUM 27
#define Y9_GPIO_NUM 35
#define Y8_GPIO_NUM 34
#define Y7_GPIO_NUM 39
#define Y6_GPIO_NUM 36
#define Y5_GPIO_NUM 21
#define Y4_GPIO_NUM 19
#define Y3_GPIO_NUM 18
#define Y2_GPIO_NUM 5
#define VSYNC_GPIO_NUM 25
#define HREF_GPIO_NUM 23
#define PCLK_GPIO_NUM 22
void stopMotors() { digitalWrite(IN1,LOW); digitalWrite(IN2,LOW);
digitalWrite(IN3,LOW); digitalWrite(IN4,LOW); }
void moveForward() { digitalWrite(IN1,HIGH); digitalWrite(IN2,LOW);
digitalWrite(IN3,HIGH); digitalWrite(IN4,LOW); }
void moveBackward() { digitalWrite(IN1,LOW); digitalWrite(IN2,HIGH);
digitalWrite(IN3,LOW); digitalWrite(IN4,HIGH); }
void turnLeft() { digitalWrite(IN1,LOW); digitalWrite(IN2,HIGH);
digitalWrite(IN3,HIGH); digitalWrite(IN4,LOW); }
void turnRight() { digitalWrite(IN1,HIGH); digitalWrite(IN2,LOW);
digitalWrite(IN3,LOW); digitalWrite(IN4,HIGH); }
// Control web server handler
static esp_err_t control_handler(httpd_req_t *req) {
char buf[50];
int ret = httpd_req_get_url_query_str(req, buf, sizeof(buf));
if (ret == ESP_OK) {
char cmd[10];
if (httpd_query_key_value(buf, "go", cmd, sizeof(cmd)) == ESP_OK) {
if (strcmp(cmd, "forward") == 0) moveForward();
else if (strcmp(cmd, "backward") == 0) moveBackward();
else if (strcmp(cmd, "left") == 0) turnLeft();
else if (strcmp(cmd, "right") == 0) turnRight();
else stopMotors();
}
}
const char* resp = "OK";
httpd_resp_send(req, resp, strlen(resp));
return ESP_OK;
}
// HTML page with live stream and controls
static esp_err_t index_handler(httpd_req_t *req) {
const char* html = R"(
<!DOCTYPE html><html><head>
<title>ESP32-CAM Robot</title>
<style>
body { background:#111; color:#fff; text-align:center; font-family:sans-serif; }
img { width:320px; border:2px solid #555; margin:10px; }
button { padding:14px 24px; margin:6px; font-size:16px;
background:#333; color:#fff; border:1px solid #888;
border-radius:6px; cursor:pointer; }
button:active { background:#555; }
</style></head><body>
<h2>ESP32-CAM Robot</h2>
<img src="http://)" + String(WiFi.localIP().toString()) + R"(:81/stream"><br>
<button onclick="go('forward')">Forward</button><br>
<button onclick="go('left')">Left</button>
<button onclick="go('stop')">Stop</button>
<button onclick="go('right')">Right</button><br>
<button onclick="go('backward')">Backward</button>
<script>
function go(cmd) {
fetch('/control?go=' + cmd);
}
</script>
</body></html>
)";
httpd_resp_set_type(req, "text/html");
httpd_resp_send(req, html, strlen(html));
return ESP_OK;
}
void startControlServer() {
httpd_handle_t server = NULL;
httpd_config_t config = HTTPD_DEFAULT_CONFIG();
config.server_port = 80;
if (httpd_start(&server, &config) == ESP_OK) {
httpd_uri_t index_uri = { "/", HTTP_GET, index_handler, NULL };
httpd_uri_t ctrl_uri = { "/control", HTTP_GET, control_handler, NULL };
httpd_register_uri_handler(server, &index_uri);
httpd_register_uri_handler(server, &ctrl_uri);
}
}
void startStreamServer(); // defined in esp32-cam streaming example
void setup() {
Serial.begin(115200);
pinMode(IN1, OUTPUT); pinMode(IN2, OUTPUT);
pinMode(IN3, OUTPUT); pinMode(IN4, OUTPUT);
stopMotors();
// Camera configuration
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM; config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM; config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM; config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM; config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM; config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM; config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM; config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM; config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_JPEG;
config.frame_size = FRAMESIZE_QVGA; // 320x240 for smooth streaming
config.jpeg_quality = 12;
config.fb_count = 2;
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed: 0x%x\n", err);
return;
}
WiFi.begin(ssid, password);
Serial.print("Connecting to Wi-Fi");
while (WiFi.status() != WL_CONNECTED) {
delay(500); Serial.print(".");
}
Serial.println("\nConnected.");
Serial.print("Control page: http://");
Serial.println(WiFi.localIP());
startControlServer();
// Start MJPEG stream on port 81 using ESP32 camera web server example
}
void loop() {
delay(10);
}
After uploading, open the Serial Monitor at 115200 baud. Once Wi-Fi connects, the Serial Monitor prints the IP address. Type that address into a browser on any device connected to the same Wi-Fi network. The control page loads with the live camera feed and the five direction buttons.
First Drive Checklist
Before driving, confirm each of the following on the ground with the robot stationary.
Press Forward. Both motors should spin in the direction that moves the robot away from you. If one motor spins in reverse, swap the two wires on that motor at the L298N output terminals. This is faster than modifying the firmware.
Press Left. The left motor should slow or reverse while the right motor continues forward, turning the robot left. If the turn direction is reversed, swap the IN1 and IN2 definitions in the firmware with IN3 and IN4.
Press Stop. Both motors should stop immediately. Confirm this works before driving near any obstacle.
Check that the ESP32-CAM's onboard LED does not become too hot during extended streaming. The module runs warm under continuous camera and Wi-Fi load. Ensure adequate airflow around the module, particularly in an enclosed chassis.
Optimising Video Stream Quality
The firmware sets FRAMESIZE_QVGA (320x240 pixels) and jpeg quality 12 as defaults. These settings balance stream smoothness with Wi-Fi bandwidth demand. On a strong 2.4 GHz Wi-Fi connection within 5 to 10 metres, QVGA produces 8 to 15 frames per second, which is sufficient for driving control.
If the stream lags or freezes, reduce the frame size to FRAMESIZE_QQVGA (160x120) or increase jpeg quality to 20 (higher number means more compression and lower quality in the ESP32 camera API, opposite to normal convention). If the stream is smooth and you want better image detail, try FRAMESIZE_VGA (640x480) on a strong Wi-Fi connection, though motor response may feel slightly slower as the processor handles the larger frame buffers.
For a technical deep dive into the OV2640 camera sensor specifications, resolution modes, and JPEG compression characteristics, the OV2640 datasheet from OmniVision provides the full sensor specification including all supported frame sizes and their pixel formats.
Troubleshooting
Camera init failed on Serial Monitor. This almost always indicates a power supply issue. The ESP32-CAM draws up to 310 mA during camera initialisation and Wi-Fi connection simultaneously. If the 5V supply cannot sustain this current, the camera fails to initialise. Use a dedicated LM2596 buck converter rather than the L298N onboard 5V output, which is often too weak for reliable ESP32-CAM operation.
Robot connects to Wi-Fi but control page does not load. Confirm the IP address printed in the Serial Monitor is correct. Confirm the device viewing the page is on the same Wi-Fi network as the robot. Corporate and school networks often isolate devices from each other, which prevents the browser from reaching the robot's IP directly.
Motors do not respond to button presses but stream works. Check that GPIO 14, 15, 13, and 12 are correctly wired to IN1, IN2, IN3, and IN4 on the L298N. Confirm the L298N motor power input is connected to the battery and not only to the 5V logic supply.
Stream works on laptop but not on phone. The stream src tag in the HTML points to the ESP32-CAM's IP address on port 81. Some mobile browsers block mixed content or non-standard ports by default. Try loading the IP address directly in the phone browser first, then reload the control page.
For a comprehensive reference on the ESP32 camera web server architecture, MJPEG streaming implementation, and additional resolution options, the Espressif ESP32 camera driver repository on GitHub contains the full driver source, example streaming server code, and all supported frame size definitions.
Extending the Project
This build is a complete, working robot with a live video feed and browser control. It is also a foundation that extends naturally in several directions.
The General Driver for Robots board available at Think Robotics is built on the ESP32-WROOM-32 and provides onboard motor control interfaces for up to four DC motors, a 9-axis IMU for orientation sensing, serial bus servo control, Wi-Fi, Bluetooth, and ESP-NOW communication, all on a single board designed specifically for robot development. For builders who want to move beyond breadboard wiring to a cleaner integrated platform, this board is a direct upgrade path from the L298N based build described here.
Adding an ultrasonic sensor to GPIO 2 and implementing an obstacle detection routine that automatically calls stopMotors() when an object is detected within 20 cm converts the manually driven robot into a semi-autonomous vehicle that prevents collisions during remote operation.
For sourcing ESP32-CAM modules, robot chassis kits, L298N motor drivers, and voltage regulators for this build, the Think Robotics robot chassis and motor driver collection carries all the components needed to take this project from parts list to finished robot.
Conclusion
The esp32 cam robot car project teaches the full stack of connected robotics in a single build. Camera initialisation and MJPEG streaming, HTTP server hosting, GPIO controlled motor direction, Wi-Fi client connection, and browser based control all come together in one system that fits on a palm sized chassis.
The ESP32-CAM handles every function from a single module at a cost that makes building multiple units for experimentation completely practical. Get the wiring right, confirm motor directions on the bench before driving, and the project runs reliably from the first power on.
From here, every robotics concept builds on the pattern established here. Sensors add awareness. Autonomous logic replaces manual input. The camera feed becomes the input to image processing. All of it starts with a working robot that drives where you tell it and shows you what it sees.