Nov 5, 2024
In the rapidly evolving landscape of artificial intelligence and robotics, innovative applications are emerging that have the potential to revolutionize various industries. One such application, presented recently by Godfrey Nolan, President of RIIS, at the OpenAI Applications Explorers meetup, showcases the integration of OpenAI’s ChatGPT with a Xiaomi Cyberdog to create a system capable of assisting first responders in emergency situations.
Introduction
Today’s project aims to utilize a Cyberdog equipped with ChatGPT capabilities to interview injured individuals in catastrophic events, providing crucial information to medical personnel before they arrive on the scene.
Why Use AI and Robotics?
The implementation of AI and robotics in emergency response situations presents several advantages:
Scalability: Multiple Cyberdogs can assess numerous people simultaneously, increasing the efficiency of triage operations.
Safety: By sending robots into potentially hazardous areas, the risk to human responders is significantly reduced.
Efficiency: The system enables faster medical response times, crucial in life-threatening situations.
Resilience: Cyberdogs demonstrate impressive adaptability to challenging terrains and obstacles.
Accuracy: AI-powered analysis can provide consistent and emotion-free evaluations of patient conditions.
The First Responder Challenge
First responders face numerous challenges in their line of work:
Immediate needs and urgent calls require quick decision-making and action.
Critical assessments must be made under pressure and with limited information.
Responders are often stretched thin, handling multiple emergencies simultaneously.
The need for rapid triage and response is paramount in saving lives.
Combining Robotics and AI for Triage
Having a Cyberdog powered by OpenAI technology can potentially revolutionize emergency response by navigating hazardous areas inaccessible to humans, collecting crucial audio-visual data, and communicating with injured individuals through pre-recorded questions. In this tutorial we’re going to utilize OpenAI's Whisper API for audio transcription and ChatGPT for analysis to enable rapid assessment of victims' conditions, allowing for efficient prioritization of medical assistance.
Why the Cyberdog, and why not Spot
The Xiaomi Cyberdog 1.0 used in this project is a quadruped robot powered by NVIDIA’s Jetson Xavier AI platform. Key features include:
Price point of $2,500, significantly more affordable than alternatives like Boston Dynamics’ Spot ($75,000)
Maximum speed of 3.2 m/s
Multiple sensors including cameras, GPS, and an ultra-wide angle fisheye lens
Intel RealSense camera for depth perception
Compatibility with ROS 2 (Robot Operating System)
The Cyberdog’s balance of affordability and capabilities make it an ideal platform for research and development in AI-assisted emergency response. Boston Dynamic’s Spot is somewhere around $75,000, so the choice is easy for this research experiment.
Integration with OpenAI
Here’s the flow. First, we are going to use Whisper with Text-to-Speech to have the Cyberdog ask our first aid questions. Since the Cyberdog is literally a walking Unix box with lots of sensors, we will use it to collect our data, both visual and audio. Once we have our data, the audio from the patient will be transcribed from Speech-to-Text (opposite of before). Chat GPT will analyze the text to determine the person’s condition. Those who need immediate help will then be flagged for the operator.
Our pre-recorded questions aren’t recorded in the traditional sense. Instead we use OpenAI’s Text-to-Speech API, to turn a list of pre-generated questions stored on the Cyberdog into audio for the injured person to hear that we will later store on Cyberdog.
This simple Python script below shows how you can query the Whisper API with your text and have it respond with a finished audio file.
Getting data on and off the dog
The Cyberdog wasn’t built for this use, so we have to take advantage of the available ports. Out of the available HDMI, Charger, Extension for Sensors, and Download ports, we are going to use the Download port. We simply connect to it, then issue a simple Unix SCP command to copy over the necessary files. The SCP (Secure Copy) command is a tool used in Unix-like operating systems to securely transfer files between hosts on a network.
Next, to capture and stream video and audio from the Cyberdog, we set up a Real-Time Streaming Protocol (RTSP) server:
Setting up RTSP Feed: A Real-Time Streaming Protocol server is established on the Cyberdog to transmit video and audio data.
Data Capture: An application running on the operator’s laptop captures the RTSP feed, recording the audio for further analysis.
Transcription and Analysis: The recorded audio is transcribed using OpenAI’s Whisper API and then analyzed using ChatGPT to summarize the patient’s condition.
This integration allows the Cyberdog to function as an intelligent first responder, capable of gathering and analyzing critical information in emergency situations.
Cyberdog Questions
The system uses a set of pre-defined questions to assess the patient’s condition:
These questions are designed to quickly identify critical issues that may require immediate medical attention. The responses to these questions are analyzed by the AI to determine the severity of the patient’s condition and prioritize care.
Push Questions to Cyberdog
To enable the Cyberdog to communicate with injured individuals, pre-recorded questions are pushed to the robot’s onboard storage. This process involves using the OpenAI API to generate audio files of the questions, which are then transferred to the Cyberdog.
RTSP Streaming
To enable real-time video and audio streaming from the Cyberdog, a Real-Time Streaming Protocol (RTSP) server is set up on the robot. RTSP is a network protocol designed for use in entertainment and communications systems to control streaming media servers.
The following Python script sets up an RTSP server on the Cyberdog:
This script creates an RTSP server that streams video and audio from the Cyberdog’s camera and microphone. The stream can be accessed by connecting to the specified IP address and port. Note that you will have to change the IP address, Video Device, and Audio Device to your own. The script also allows multiple clients to connect to the stream simultaneously, making it suitable for real-time audio/video streaming applications. When you run, it prints the URL where the stream can be accessed.
Data Capture Overview
The data capture process involves recording the audio and video feed from the Cyberdog for further analysis. This is crucial for the AI system to process the responses from injured individuals and assess their condition.
Data Capture Steps
To play the recordings to the patient, we implemented a simple bash script to run.
All this is doing is taking our recorded questions and issuing them at 10 second intervals. Although in a perfect world you would have the dog wait for a user response, but to reduce the variables that could lead to failure, this method is very efficient.
So that covers the prompts to the patient. Now we have to capture the data from them.
The next immediate thing to do is check that you have FFmpeg installed. It’s going to be important for processing our audio and video files.
We’ll implement these steps in a Python script that runs on the operator’s laptop. The script will connect to the RTSP stream provided by the Cyberdog and records the incoming audio data. This is what it looks like:
In the above code, capture_audio_stream
function uses FFmpeg to capture the audio stream from the RTSP URL. It sets up FFmpeg to output raw audio data to stdout, which we'll process later. Then our start_recording
function starts the recording process. It captures audio data in chunks, converts it to a NumPy array, and writes it to a WAV file using the soundfile
library. The handle_exit_signal
ensures we have a graceful termination of the signal. Finally, The main()
function sets up signal handlers, starts the recording in a separate thread, and waits for the recording to finish or for a keyboard interrupt.
Transcribing the audio.
The next block of code we are going to add is a very basic call to the Whisper API with our new captured and processed audio.
As you can see, there’s not a lot happening. The script loads an audio file, transcribes it, and then uses the AI model to interpret the transcription. Boom! If everything goes as planned you should get a console output, similar to this:
f you want to troubleshoot and see what has been transcribed from the audio file, just uncomment out the #print(transcription.text)
line.
Conclusion
The integration of AI and robotics in emergency response situations, as demonstrated by the Cyberdog project, represents a significant advancement in first responder capabilities. By combining the mobility and resilience of the Cyberdog with the analytical power of ChatGPT, this system offers a unique solution to the challenges faced by emergency responders.
By following this guide, you've learned how to set up an RTSP server on the Cyberdog, capture audio data from a remote source, transcribe it using OpenAI's Whisper API, and analyze the content with GPT-3.5-turbo. This system you've built can potentially revolutionize how first responders gather critical information in emergency situations, making triage more efficient and potentially saving lives. You've combined cutting-edge technologies to create a solution that addresses real-world challenges faced by emergency responders.