2022-09-28 23:03:07 +08:00
|
|
|
# Running Background Tasks
|
|
|
|
|
|
|
|
Related spaces: https://huggingface.co/spaces/freddyaboulton/gradio-google-forms
|
|
|
|
Tags: TASKS, SCHEDULED, TABULAR, DATA
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
This guide explains how you can run background tasks from your gradio app.
|
|
|
|
Background tasks are operations that you'd like to perform outside the request-response
|
|
|
|
lifecycle of your app either once or on a periodic schedule.
|
|
|
|
Examples of background tasks include periodically synchronizing data to an external database or
|
|
|
|
sending a report of model predictions via email.
|
|
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
|
|
We will be creating a simple "Google-forms-style" application to gather feedback from users of the gradio library.
|
|
|
|
We will use a local sqlite database to store our data, but we will periodically synchronize the state of the database
|
|
|
|
with a [HuggingFace Dataset](https://huggingface.co/datasets) so that our user reviews are always backed up.
|
|
|
|
The synchronization will happen in a background task running every 60 seconds.
|
|
|
|
|
|
|
|
At the end of the demo, you'll have a fully working application like this one:
|
|
|
|
|
|
|
|
<gradio-app space="freddyaboulton/gradio-google-forms"> </gradio-app>
|
|
|
|
|
|
|
|
|
|
|
|
## Step 1 - Write your database logic 💾
|
|
|
|
Our application will store the name of the reviewer, their rating of gradio on a scale of 1 to 5, as well as
|
|
|
|
any comments they want to share about the library. Let's write some code that creates a database table to
|
|
|
|
store this data. We'll also write some functions to insert a review into that table and fetch the latest 10 reviews.
|
|
|
|
|
|
|
|
We're going to use the `sqlite3` library to connect to our sqlite database but gradio will work with any library.
|
|
|
|
|
|
|
|
The code will look like this:
|
|
|
|
|
|
|
|
```python
|
|
|
|
DB_FILE = "./reviews.db"
|
|
|
|
db = sqlite3.connect(DB_FILE)
|
|
|
|
|
|
|
|
# Create table if it doesn't already exist
|
|
|
|
try:
|
|
|
|
db.execute("SELECT * FROM reviews").fetchall()
|
|
|
|
db.close()
|
|
|
|
except sqlite3.OperationalError:
|
|
|
|
db.execute(
|
|
|
|
'''
|
|
|
|
CREATE TABLE reviews (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
|
|
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
|
|
|
|
name TEXT, review INTEGER, comments TEXT)
|
|
|
|
''')
|
|
|
|
db.commit()
|
|
|
|
db.close()
|
|
|
|
|
|
|
|
def get_latest_reviews(db: sqlite3.Connection):
|
|
|
|
reviews = db.execute("SELECT * FROM reviews ORDER BY id DESC limit 10").fetchall()
|
|
|
|
total_reviews = db.execute("Select COUNT(id) from reviews").fetchone()[0]
|
|
|
|
reviews = pd.DataFrame(reviews, columns=["id", "date_created", "name", "review", "comments"])
|
|
|
|
return reviews, total_reviews
|
|
|
|
|
|
|
|
|
|
|
|
def add_review(name: str, review: int, comments: str):
|
|
|
|
db = sqlite3.connect(DB_FILE)
|
|
|
|
cursor = db.cursor()
|
|
|
|
cursor.execute("INSERT INTO reviews(name, review, comments) VALUES(?,?,?)", [name, review, comments])
|
|
|
|
db.commit()
|
|
|
|
reviews, total_reviews = get_latest_reviews(db)
|
|
|
|
db.close()
|
|
|
|
return reviews, total_reviews
|
|
|
|
```
|
|
|
|
|
|
|
|
Let's also write a function to load the latest reviews when the gradio application loads:
|
|
|
|
```python
|
|
|
|
def load_data():
|
|
|
|
db = sqlite3.connect(DB_FILE)
|
|
|
|
reviews, total_reviews = get_latest_reviews(db)
|
|
|
|
db.close()
|
|
|
|
return reviews, total_reviews
|
|
|
|
```
|
|
|
|
|
|
|
|
## Step 2 - Create a gradio app ⚡
|
|
|
|
Now that we have our database logic defined, we can use gradio create a dynamic web page to ask our users for feedback!
|
|
|
|
|
|
|
|
```python
|
|
|
|
with gr.Blocks() as demo:
|
|
|
|
with gr.Row():
|
|
|
|
with gr.Column():
|
|
|
|
name = gr.Textbox(label="Name", placeholder="What is your name?")
|
|
|
|
review = gr.Radio(label="How satisfied are you with using gradio?", choices=[1, 2, 3, 4, 5])
|
|
|
|
comments = gr.Textbox(label="Comments", lines=10, placeholder="Do you have any feedback on gradio?")
|
|
|
|
submit = gr.Button(value="Submit Feedback")
|
|
|
|
with gr.Column():
|
|
|
|
data = gr.Dataframe(label="Most recently created 10 rows")
|
|
|
|
count = gr.Number(label="Total number of reviews")
|
|
|
|
submit.click(add_review, [name, review, comments], [data, count])
|
|
|
|
demo.load(load_data, None, [data, count])
|
|
|
|
```
|
|
|
|
|
|
|
|
## Step 3 - Synchronize with HuggingFace Datasets 🤗
|
|
|
|
|
|
|
|
We could call `demo.launch()` after step 2 and have a fully functioning application. However,
|
|
|
|
our data would be stored locally on our machine. If the sqlite file were accidentally deleted, we'd lose all of our reviews!
|
|
|
|
Let's back up our data to a dataset on the HuggingFace hub.
|
|
|
|
|
|
|
|
Create a dataset [here](https://huggingface.co/datasets) before proceeding.
|
|
|
|
|
|
|
|
Now at the **top** of our script, we'll use the [huggingface hub client library](https://huggingface.co/docs/huggingface_hub/index)
|
|
|
|
to connect to our dataset and pull the latest backup.
|
|
|
|
|
|
|
|
```python
|
|
|
|
TOKEN = os.environ.get('HUB_TOKEN')
|
|
|
|
repo = huggingface_hub.Repository(
|
|
|
|
local_dir="data",
|
|
|
|
repo_type="dataset",
|
|
|
|
clone_from="<name-of-your-dataset>",
|
|
|
|
use_auth_token=TOKEN
|
|
|
|
)
|
|
|
|
repo.git_pull()
|
|
|
|
|
|
|
|
shutil.copyfile("./data/reviews.db", DB_FILE)
|
|
|
|
```
|
|
|
|
|
|
|
|
Note that you'll have to get an access token from the "Settings" tab of your HuggingFace for the above code to work.
|
|
|
|
In the script, the token is securely accessed via an environment variable.
|
|
|
|
|
2023-07-04 21:33:22 +08:00
|
|
|
![access_token](https://github.com/gradio-app/gradio/blob/main/guides/assets/access_token.png?raw=true)
|
2022-09-28 23:03:07 +08:00
|
|
|
|
|
|
|
Now we will create a background task to synch our local database to the dataset hub every 60 seconds.
|
|
|
|
We will use the [AdvancedPythonScheduler](https://apscheduler.readthedocs.io/en/3.x/) to handle the scheduling.
|
|
|
|
However, this is not the only task scheduling library available. Feel free to use whatever you are comfortable with.
|
|
|
|
|
|
|
|
The function to back up our data will look like this:
|
|
|
|
|
|
|
|
```python
|
|
|
|
from apscheduler.schedulers.background import BackgroundScheduler
|
|
|
|
|
|
|
|
def backup_db():
|
|
|
|
shutil.copyfile(DB_FILE, "./data/reviews.db")
|
|
|
|
db = sqlite3.connect(DB_FILE)
|
|
|
|
reviews = db.execute("SELECT * FROM reviews").fetchall()
|
|
|
|
pd.DataFrame(reviews).to_csv("./data/reviews.csv", index=False)
|
|
|
|
print("updating db")
|
|
|
|
repo.push_to_hub(blocking=False, commit_message=f"Updating data at {datetime.datetime.now()}")
|
|
|
|
|
|
|
|
|
|
|
|
scheduler = BackgroundScheduler()
|
|
|
|
scheduler.add_job(func=backup_db, trigger="interval", seconds=60)
|
|
|
|
scheduler.start()
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Step 4 (Bonus) - Deployment to HuggingFace Spaces
|
|
|
|
You can use the HuggingFace [Spaces](https://huggingface.co/spaces) platform to deploy this application for free ✨
|
|
|
|
|
|
|
|
If you haven't used Spaces before, follow the previous guide [here](/using_hugging_face_integrations).
|
|
|
|
You will have to use the `HUB_TOKEN` environment variable as a secret in the Guides.
|
|
|
|
|
|
|
|
## Conclusion
|
|
|
|
Congratulations! You know how to run background tasks from your gradio app on a schedule ⏲️.
|
|
|
|
|
|
|
|
Checkout the application running on Spaces [here](https://huggingface.co/spaces/freddyaboulton/gradio-google-forms).
|
|
|
|
The complete code is [here](https://huggingface.co/spaces/freddyaboulton/gradio-google-forms/blob/main/app.py)
|