How Machine Learning Could Help Detect a Shooter: A Practical Approach with CNNs (and a Dash of AI Detective Work) for Object Detection and Person Re-Identification!!
What just happened!? An ex-president’s assassination in this day and age? Do we just have too many tools to harm each other now? Is that what it is? No amount of education or technology can save us from this. All the news channels are flooded, creating a positive or sympathetic narrative, maybe? Or maybe it’s high time to reform gun policies.
There will be tons of conspiracy theories around it, but that’s not our job to believe them. We should stay away from the drama. Lol. We can’t afford to lose our minds and our little sensitive attention spans.
So let’s focus on our machine learning baby! But yeah, with the perspective of how ML could have helped detect the shooter.
I‘ll outline the process of creating a system that leverages a Convolutional Neural Network (CNN) for object detection and person re-identification, which can be a crucial part of identifying suspicious individuals in video feeds.
Overview
- Data Collection and Preprocessing: Collecting and preprocessing video footage data for training.
- Model Selection: Choosing a suitable object detection model (e.g., YOLO, Faster R-CNN).
- Training the Model: Training the object detection model to identify people and weapons.
- Inference: Running the trained model on live or recorded video footage.
- Person Re-identification: Using a re-identification model to track individuals across different camera feeds.
Step-by-Step Code Example
1. Data Collection and Preprocessing
Collect annotated datasets containing images of people with and without weapons. Preprocess the images to be fed into the neural network.
import os
import cv2
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Load and preprocess the dataset
def load_images_from_folder(folder):
images = []
labels = []
for filename in os.listdir(folder):
img = cv2.imread(os.path.join(folder, filename))
if img is not None:
images.append(cv2.resize(img, (224, 224)))
if 'weapon' in filename:
labels.append(1)
else:
labels.append(0)
return np.array(images), np.array(labels)
images, labels = load_images_from_folder('dataset_folder')
# Split the dataset
X_train, X_val, y_train, y_val = train_test_split(images, labels, test_size=0.2, random_state=42)
# Data augmentation
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow(X_train, y_train, batch_size=32)
val_generator = val_datagen.flow(X_val, y_val, batch_size=32)
2. Model Selection
We’ll use a pre-trained YOLO model for object detection. YOLO (You Only Look Once) is suitable for real-time object detection.
import torch
from torchvision import transforms
# Load a pre-trained YOLO model from PyTorch Hub
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# Define the transform
transform = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
# Perform object detection on an image
def detect_objects(image):
results = model(image)
return results
3. Training the Model
For simplicity, we’ll use the pre-trained model directly. If you have a custom dataset, you would fine-tune the model on your dataset.
# Fine-tuning the model (if you have a custom dataset)
# model.train()
# optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# criterion = torch.nn.CrossEntropyLoss()
# for epoch in range(num_epochs):
# for images, labels in train_generator:
# images = transform(images)
# outputs = model(images)
# loss = criterion(outputs, labels)
# optimizer.zero_grad()
# loss.backward()
# optimizer.step()
4. Inference
Run the trained model on live or recorded video footage to detect people and weapons.
# Capture video
cap = cv2.VideoCapture('video_file.mp4')
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Convert frame to appropriate format and perform object detection
frame_transformed = transform(frame)
results = detect_objects(frame_transformed)
# Draw bounding boxes
for box in results.xyxy[0]:
x1, y1, x2, y2, conf, cls = box
if conf > 0.5:
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, f'{model.names[int(cls)]} {conf:.2f}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
cv2.imshow('Video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
5. Person Re-identification
For person re-identification, we use another model to match individuals across different camera feeds.
# Dummy code for person re-identification (PRID)
# You would need a trained PRID model for this step
def re_identify_person(image, known_faces):
# Match the detected person with known faces
# This is a placeholder function
return None
# Example usage in the video loop
known_faces = [] # List of known faces
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame_transformed = transform(frame)
results = detect_objects(frame_transformed)
for box in results.xyxy[0]:
x1, y1, x2, y2, conf, cls = box
if conf > 0.5:
person_image = frame[y1:y2, x1:x2]
matched_face = re_identify_person(person_image, known_faces)
if matched_face:
cv2.putText(frame, 'Known Person', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
else:
cv2.putText(frame, 'Unknown Person', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)
cv2.imshow('Video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
The algorithm described in the example is a combination of object detection and person re-identification.
Here are the key components:
- Object Detection: This involves detecting and localizing objects (e.g., people, weapons) in images or video frames. The algorithm used for object detection in the example is YOLO (You Only Look Once).
- Person Re-identification: This involves recognizing and tracking individuals across different camera feeds or video frames to determine if they are the same person. This typically uses a deep learning model trained on person re-identification datasets.
Object Detection: YOLO (You Only Look Once)
YOLO is a state-of-the-art, real-time object detection system that can detect multiple objects in an image and predict their bounding boxes and class probabilities. YOLO frames object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.
Key features of YOLO:
- Speed: YOLO is known for its fast detection speeds.
- Unified Architecture: YOLO applies a single neural network to the full image, which makes it efficient.
- Generalization: YOLO generalizes well to new domains and unexpected inputs.
Person Re-identification
Person re-identification (Re-ID) is a computer vision task that aims to match a given person across different images or video frames, even when captured by different cameras or under different conditions. This is particularly useful for surveillance and tracking applications. Re-ID models typically use deep learning to extract features from images of people and compare these features to find matches.
Key steps in a typical Re-ID pipeline:
- Feature Extraction: Use a convolutional neural network (CNN) to extract features from the images.
- Feature Matching: Compare the extracted features using distance metrics (e.g., Euclidean distance) to determine if they represent the same person.
- Ranking and Identification: Rank the matches based on similarity scores to identify the most likely matches.
This example provides a high-level overview of how an ML system could help detect a shooter in surveillance footage. The system involves:
- Data Collection and Preprocessing: Preparing the dataset for training.
- Model Selection: Using a pre-trained object detection model.
- Training the Model: (Optional) Fine-tuning the model on a custom dataset.
- Inference: Detecting objects in video footage.
- Person Re-identification: Tracking individuals across different camera feeds.
In a real-world scenario, further steps would include more sophisticated data augmentation, model tuning, deployment to edge devices, and integrating the system into a comprehensive surveillance and alerting infrastructure.
Follow for more things on AI! The Journey — AI By Jasmin Bharadiya