mirror of
https://github.com/gradio-app/gradio.git
synced 2025-03-07 11:46:51 +08:00
* Fix guides * Add import statement * Update guides/07_streaming/02_object-detection-from-webcam-with-webrtc.md Co-authored-by: Abubakar Abid <abubakar@huggingface.co> --------- Co-authored-by: Abubakar Abid <abubakar@huggingface.co>
105 lines
5.1 KiB
Markdown
105 lines
5.1 KiB
Markdown
# Real Time Object Detection from a Webcam Stream with WebRTC
|
|
|
|
Tags: VISION, STREAMING, WEBCAM
|
|
|
|
In this guide, we'll use YOLOv10 to perform real-time object detection in Gradio from a user's webcam feed. We'll utilize the latest streaming features introduced in Gradio 5.0. You can see the finished product in action below:
|
|
|
|
<video src="https://github.com/user-attachments/assets/4584cec6-8c1a-401b-9b61-a4fe0718b558" controls
|
|
height="600" width="600" style="display: block; margin: auto;" autoplay="true" loop="true">
|
|
</video>
|
|
|
|
## Setting up
|
|
|
|
Start by installing all the dependencies. Add the following lines to a `requirements.txt` file and run `pip install -r requirements.txt`:
|
|
|
|
```bash
|
|
opencv-python
|
|
twilio
|
|
gradio>=5.0
|
|
gradio-webrtc
|
|
onnxruntime-gpu
|
|
```
|
|
|
|
We'll use the ONNX runtime to speed up YOLOv10 inference. This guide assumes you have access to a GPU. If you don't, change `onnxruntime-gpu` to `onnxruntime`. Without a GPU, the model will run slower, resulting in a laggy demo.
|
|
|
|
We'll use OpenCV for image manipulation and the [Gradio WebRTC](https://github.com/freddyaboulton/gradio-webrtc) custom component to use [WebRTC](https://webrtc.org/) under the hood, achieving near-zero latency.
|
|
|
|
**Note**: If you want to deploy this app on any cloud provider, you'll need to use the free Twilio API for their [TURN servers](https://www.twilio.com/docs/stun-turn). Create a free account on Twilio. If you're not familiar with TURN servers, consult this [guide](https://www.twilio.com/docs/stun-turn/faq#faq-what-is-nat).
|
|
|
|
## The Inference Function
|
|
|
|
We'll download the YOLOv10 model from the Hugging Face hub and instantiate a custom inference class to use this model.
|
|
|
|
The implementation of the inference class isn't covered in this guide, but you can find the source code [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n/blob/main/inference.py#L9) if you're interested. This implementation borrows heavily from this [github repository](https://github.com/ibaiGorordo/ONNX-YOLOv8-Object-Detection).
|
|
|
|
We're using the `yolov10-n` variant because it has the lowest latency. See the [Performance](https://github.com/THU-MIG/yolov10?tab=readme-ov-file#performance) section of the README in the YOLOv10 GitHub repository.
|
|
|
|
```python
|
|
from huggingface_hub import hf_hub_download
|
|
from inference import YOLOv10
|
|
|
|
model_file = hf_hub_download(
|
|
repo_id="onnx-community/yolov10n", filename="onnx/model.onnx"
|
|
)
|
|
|
|
model = YOLOv10(model_file)
|
|
|
|
def detection(image, conf_threshold=0.3):
|
|
image = cv2.resize(image, (model.input_width, model.input_height))
|
|
new_image = model.detect_objects(image, conf_threshold)
|
|
return new_image
|
|
```
|
|
|
|
Our inference function, `detection`, accepts a numpy array from the webcam and a desired confidence threshold. Object detection models like YOLO identify many objects and assign a confidence score to each. The lower the confidence, the higher the chance of a false positive. We'll let users adjust the confidence threshold.
|
|
|
|
The function returns a numpy array corresponding to the same input image with all detected objects in bounding boxes.
|
|
|
|
## The Gradio Demo
|
|
|
|
The Gradio demo is straightforward, but we'll implement a few specific features:
|
|
|
|
1. Use the `WebRTC` custom component to ensure input and output are sent to/from the server with WebRTC.
|
|
2. The [WebRTC](https://github.com/freddyaboulton/gradio-webrtc) component will serve as both an input and output component.
|
|
3. Utilize the `time_limit` parameter of the `stream` event. This parameter sets a processing time for each user's stream. In a multi-user setting, such as on Spaces, we'll stop processing the current user's stream after this period and move on to the next.
|
|
|
|
We'll also apply custom CSS to center the webcam and slider on the page.
|
|
|
|
```python
|
|
import gradio as gr
|
|
from gradio_webrtc import WebRTC
|
|
|
|
css = """.my-group {max-width: 600px !important; max-height: 600px !important;}
|
|
.my-column {display: flex !important; justify-content: center !important; align-items: center !important;}"""
|
|
|
|
with gr.Blocks(css=css) as demo:
|
|
gr.HTML(
|
|
"""
|
|
<h1 style='text-align: center'>
|
|
YOLOv10 Webcam Stream (Powered by WebRTC ⚡️)
|
|
</h1>
|
|
"""
|
|
)
|
|
with gr.Column(elem_classes=["my-column"]):
|
|
with gr.Group(elem_classes=["my-group"]):
|
|
image = WebRTC(label="Stream", rtc_configuration=rtc_configuration)
|
|
conf_threshold = gr.Slider(
|
|
label="Confidence Threshold",
|
|
minimum=0.0,
|
|
maximum=1.0,
|
|
step=0.05,
|
|
value=0.30,
|
|
)
|
|
|
|
image.stream(
|
|
fn=detection, inputs=[image, conf_threshold], outputs=[image], time_limit=10
|
|
)
|
|
|
|
if __name__ == "__main__":
|
|
demo.launch()
|
|
```
|
|
|
|
## Conclusion
|
|
|
|
Our app is hosted on Hugging Face Spaces [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n).
|
|
|
|
You can use this app as a starting point to build real-time image applications with Gradio. Don't hesitate to open issues in the space or in the [WebRTC component GitHub repo](https://github.com/freddyaboulton/gradio-webrtc) if you have any questions or encounter problems. |