Automating Legal Translation Workflows with AI-Powered Document Processing
- Client: DFDL (Internal Tool / Proof of Concept)
- Role: Designer & Developer
- Focus: AI, Document Automation, Translation Workflows
- Tech Stack: JavaScript, Tesseract.js, PDF.js, GPT-4 (via OpenAI), Netlify Functions,
docx
🎯 Project Overview
Legal and compliance teams in often receive government publications, announcements, and legal updates either as digital or local-language PDFs. These documents cannot be easily searched or copy/pasted and require manual translation by legal staff which takes away from billable time which is rightly is the main priority of such teams.
This repetitive, manual process was ripe for automation.
⚙️ The Solution
I designed and developed a browser-based tool that extracts, translates, and summarizes legal PDFs — all in one click. Whilst it's not perfect (largely due to the limitated capabilities of LLMs to translate lesser known languages such as Khmer for example), the proof of concept saves time and allows non-speakers of the local language to get high-quality summaries and basic translations without human involvement.
Core Features:
- OCR-based Text Extraction: Uses
Tesseract.js
to read scanned Khmer PDFs in-browser - GPT-4 Translation: Seamless, chunked translation into fluent, legal-grade English using OpenAI’s GPT models
- Smart Layout Preservation: Extracted paragraphs are intelligently grouped and styled in the final Word doc
- Summarization Mode: Automatically distills long translated documents into concise executive summaries
- Serverless Architecture: Powered by Netlify Functions — no backend server needed
🧠 How It Works
User Uploads PDFs
- “Upload and Translate” → returns a polished
.docx
in English - “Create AI Summary” → returns a structured executive summary
OCR + Grouping
- PDFs are rendered to canvas
- Tesseract performs Khmer + English OCR
- Lines are grouped into logical paragraphs and detected as body text or headings
Chunked AI Translation
- Text is translated in small batches to stay within serverless function limits
- Layout is retained for readability
Word Output Generation
- Final result is formatted, styled, paginated, and downloadable
✨ Outcomes
- ⚡ Reduced hours of manual translation down to seconds
- 🧩 Demonstrated how AI and document automation can augment compliance and legal teams
- 📁 Delivered structured
.docx
files, ready for internal distribution or client communication
🔗 Tech that Powered It
Tesseract.js
for OCRpdf.js
for renderingdocx
for Word doc creation- Netlify Functions + OpenAI for translation & summary logic
- Vanilla JS + modular architecture (progressive enhancement ready)
🧭 Why This Matters (and What I Learned)
This project lives at the intersection of AI, automation, and a real business use case. Whilst still only a proof of concept, it solves a clear operational inefficiency using reliable, extensible tech.
Most importantly, it proved how AI tooling isn’t just chatbots — it’s infrastructure that improves process at scale.
📌 Next Steps
I’m continuing to iterate to build a suite of microtools like this to streamline operations for legal, finance, and operations teams. If that’s a space you're in — let’s connect.
📕 Get my free eBook
If you're interested in learning how to unlock the mental modals that allow me to come up with ideas and tools like this, download my free eBook today to Master Problem Solving with Human Centered Design.