User Tools

Site Tools


googleconvert

πŸ“₯ Batched PDF to HTML Conversion (Google Apps Script)

This guide allows you to convert PDFs in a Google Drive folder to HTML files in batches using Google Apps Script. It overcomes timeout issues by tracking which files are already processed and converting only a few per run.

πŸ“ Folder Setup

1. Place all the PDFs you want to convert in a Google Drive folder. 2. Share this folder with your Google account used for Apps Script. 3. This script will create a new folder (e.g., `MyFolder_html`) next to your original and store the converted `.html` files there.

πŸ› οΈ Script: Batched PDF Conversion with Resume Support

function convertPdfsToHtml_Batched() {
  const BATCH_SIZE = 5;  // Process 5 files per run
  const originalFolderId = 'YOUR_FOLDER_ID_HERE';
  const originalFolder = DriveApp.getFolderById(originalFolderId);
  const newFolderName = originalFolder.getName() + '_html';
 
  let newFolder;
  const folders = DriveApp.getFoldersByName(newFolderName);
  newFolder = folders.hasNext() ? folders.next() : DriveApp.createFolder(newFolderName);
  Logger.log('Using folder: ' + newFolder.getName());
 
  const processed = PropertiesService.getScriptProperties().getProperties();
  const files = originalFolder.getFilesByType(MimeType.PDF);
  let processedCount = 0;
 
  while (files.hasNext() && processedCount < BATCH_SIZE) {
    const file = files.next();
    const fileId = file.getId();
 
    if (processed[fileId]) {
      Logger.log('Skipping already processed: ' + file.getName());
      continue;
    }
 
    try {
      Logger.log('Processing PDF: ' + file.getName());
      const pdfBlob = file.getBlob();
 
      const convertedFile = Drive.Files.insert({
        title: file.getName(),
        mimeType: MimeType.GOOGLE_DOCS
      }, pdfBlob);
 
      const doc = DocumentApp.openById(convertedFile.id);
      const docContent = doc.getBody().getText();
      const htmlFileName = file.getName().replace('.pdf', '.html');
      newFolder.createFile(htmlFileName, docContent, MimeType.HTML);
 
      Logger.log('βœ… Converted and created: ' + htmlFileName);
      DriveApp.getFileById(convertedFile.id).setTrashed(true);
 
      PropertiesService.getScriptProperties().setProperty(fileId, 'done');
      processedCount++;
 
    } catch (e) {
      Logger.log('Error converting: ' + file.getName() + ' – ' + e.message);
    }
  }
 
  Logger.log(`Batch complete: ${processedCount} file(s) processed`);
}

▢️ How to Use

1. Go to https://script.google.com and open or create a project. 2. Paste the code into a new `.gs` file. 3. Replace `'YOUR_FOLDER_ID_HERE'` with your actual Drive folder ID. 4. Run `convertPdfsToHtml_Batched`.

πŸ” Optional: Set Up Trigger for Automation

- Go to *Triggers* in Apps Script - Click β€œAdd Trigger” - Choose:

  1. Function: `convertPdfsToHtml_Batched`
  2. Event: *Time-driven* β†’ *Every 5 minutes*

- Save

This ensures your batch runs repeatedly and continues until all files are converted.

βœ… Notes

- The script avoids reprocessing by tracking completed files with `PropertiesService`. - You can increase or decrease `BATCH_SIZE` to suit your needs. </code>

googleconvert.txt Β· Last modified: 2025/03/23 05:44 by lwattsii