====== 📥 Step 5: Download HTML Files from Google Drive to AWS Server ======
This guide shows how to download `.html` files from a specific Google Drive folder directly to your AWS server using a Python script and a service account.
===== 📁 Requirements =====
You must have completed:
* Service account creation and key download.
* Shared your Google Drive folder with the service account email.
* Installed required Python packages in a virtual environment.
* Stored your `service-account.json` in a secure folder, such as:
`/home/ec2-user/credentials/service-account.json`
===== 📝 1. Create the Python Script =====
- Open a terminal on your AWS server and navigate to your working directory (e.g., your home directory):
cd ~
- Create the Python script:
nano download_html.py
- Paste the following code into the editor.
**Replace** `YOUR_HTML_FOLDER_ID` with your actual Google Drive folder ID.
import os
import io
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
SERVICE_ACCOUNT_FILE = '/home/ec2-user/credentials/service-account.json'
SCOPES = ['https://www.googleapis.com/auth/drive.readonly']
HTML_FOLDER_ID = 'YOUR_HTML_FOLDER_ID'
DESTINATION_FOLDER = './downloaded_html_files'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES
)
service = build('drive', 'v3', credentials=credentials)
def download_files_from_folder(folder_id, destination_folder):
if not os.path.exists(destination_folder):
os.makedirs(destination_folder)
print(f"[+] Created folder: {destination_folder}")
query = f"'{folder_id}' in parents and mimeType='text/html'"
results = service.files().list(q=query, fields="files(id, name)").execute()
files = results.get('files', [])
if not files:
print("[-] No HTML files found.")
return
for file in files:
print(f"[~] Downloading {file['name']}")
request = service.files().get_media(fileId=file['id'])
file_path = os.path.join(destination_folder, file['name'])
with io.FileIO(file_path, 'wb') as fh:
downloader = MediaIoBaseDownload(fh, request)
done = False
while not done:
status, done = downloader.next_chunk()
if status:
print(f" {int(status.progress() * 100)}% complete")
print(f"[+] Downloaded: {file['name']}")
if __name__ == '__main__':
download_files_from_folder(HTML_FOLDER_ID, DESTINATION_FOLDER)
- Save and exit:
* Press `Ctrl + O`, then `Enter` to save.
* Press `Ctrl + X` to exit the editor.
===== ▶️ 2. Run the Script =====
- Make sure your virtual environment is active:
source ~/gdrive-env/bin/activate
- Run the script:
python download_html.py
===== ✅ Result =====
All `.html` files from the specified Google Drive folder will be downloaded to:
~/downloaded_html_files/
You’ll see logs like:
[~] Downloading filename.html
100% complete
[+] Downloaded: filename.html
===== 🧼 Optional: Deactivate Virtual Environment =====
Once you’re done:
deactivate