====== 📥 Batched PDF to HTML Conversion (Google Apps Script) ======
This guide allows you to convert PDFs in a Google Drive folder to HTML files in **batches** using Google Apps Script. It overcomes timeout issues by tracking which files are already processed and converting only a few per run.
===== 📁 Folder Setup =====
1. Place all the PDFs you want to convert in a Google Drive folder.
2. Share this folder with your Google account used for Apps Script.
3. This script will create a new folder (e.g., `MyFolder_html`) next to your original and store the converted `.html` files there.
===== 🛠️ Script: Batched PDF Conversion with Resume Support =====
function convertPdfsToHtml_Batched() {
const BATCH_SIZE = 5; // Process 5 files per run
const originalFolderId = 'YOUR_FOLDER_ID_HERE';
const originalFolder = DriveApp.getFolderById(originalFolderId);
const newFolderName = originalFolder.getName() + '_html';
let newFolder;
const folders = DriveApp.getFoldersByName(newFolderName);
newFolder = folders.hasNext() ? folders.next() : DriveApp.createFolder(newFolderName);
Logger.log('Using folder: ' + newFolder.getName());
const processed = PropertiesService.getScriptProperties().getProperties();
const files = originalFolder.getFilesByType(MimeType.PDF);
let processedCount = 0;
while (files.hasNext() && processedCount < BATCH_SIZE) {
const file = files.next();
const fileId = file.getId();
if (processed[fileId]) {
Logger.log('Skipping already processed: ' + file.getName());
continue;
}
try {
Logger.log('Processing PDF: ' + file.getName());
const pdfBlob = file.getBlob();
const convertedFile = Drive.Files.insert({
title: file.getName(),
mimeType: MimeType.GOOGLE_DOCS
}, pdfBlob);
const doc = DocumentApp.openById(convertedFile.id);
const docContent = doc.getBody().getText();
const htmlFileName = file.getName().replace('.pdf', '.html');
newFolder.createFile(htmlFileName, docContent, MimeType.HTML);
Logger.log('✅ Converted and created: ' + htmlFileName);
DriveApp.getFileById(convertedFile.id).setTrashed(true);
PropertiesService.getScriptProperties().setProperty(fileId, 'done');
processedCount++;
} catch (e) {
Logger.log('Error converting: ' + file.getName() + ' – ' + e.message);
}
}
Logger.log(`Batch complete: ${processedCount} file(s) processed`);
}
===== ▶️ How to Use =====
1. Go to https://script.google.com and open or create a project.
2. Paste the code into a new `.gs` file.
3. Replace `'YOUR_FOLDER_ID_HERE'` with your actual Drive folder ID.
4. Run `convertPdfsToHtml_Batched`.
===== 🔁 Optional: Set Up Trigger for Automation =====
- Go to *Triggers* in Apps Script
- Click "Add Trigger"
- Choose:
- Function: `convertPdfsToHtml_Batched`
- Event: *Time-driven* → *Every 5 minutes*
- Save
This ensures your batch runs repeatedly and continues until all files are converted.
===== ✅ Notes =====
- The script avoids reprocessing by tracking completed files with `PropertiesService`.
- You can increase or decrease `BATCH_SIZE` to suit your needs.