Mental Health Research Platform

The Problem

A mental health research organisation was running qualitative studies across multiple sites in India. Researchers in the field needed to capture journal entries, audio recordings, and video interviews from participants in English, Hindi, Kannada, and Tamil, and then get that data back to analysts in a form that was usable and compliant with participant privacy requirements.

The existing process was entirely manual: researchers used their personal phones to record sessions, transferred files via WhatsApp, and then spent hours each week manually reviewing transcripts to redact participant names, locations, and any identifiable detail before the data could be used. It was slow, inconsistent, and the kind of tedious work that burns out good researchers fast.

They needed a purpose-built platform that could handle the full lifecycle: structured capture in the field, secure upload with unreliable connectivity, AI-assisted privacy redaction, multilingual transcription, and finally a clean bulk export that analysts could actually work with.

What We Built

We designed and engineered the platform from the ground up: a Flutter mobile app for field researchers and a Django REST API backend with a fully automated AI processing pipeline.

Mobile App (Flutter)

The iOS and Android app is built offline-first. Researchers can create rich journal entries with text (using a WYSIWYG editor), attach audio recordings, video interviews, documents, and photos, and tag content with location and language metadata. Everything is stored locally in a SQLite database via Drift so researchers can work in low-connectivity environments without interruption.

Uploads happen via a background queue with presigned S3 URLs. Each file goes through a 3-phase flow: create the upload record, upload directly to S3, then confirm with the backend. The queue retries failed uploads up to 5 times with structured per-event logging. Researchers can access a full upload log inside the app by tapping the app logo 10 times, useful for diagnosing field connectivity issues without pulling server logs.

The app supports four roles: Researcher (creates posts), Reviewer (reviews and approves), Admin (full management), and Viewer (read-only). Posts move through a 5-stage review pipeline with a full audit trail on every state transition.

Backend (Django + Celery)

The API is built on Django 5 with a Celery + Redis task queue for all async operations. Every AI and media processing job runs through a ProcessingJob model that tracks status, progress, queue name, and error output. Jobs can have parent dependencies, so the pipeline enforces correct sequencing: video must be transcoded before transcription, transcription must complete before translation.

Seven specialised Celery queues handle different workload types (video_processing, audio_processing, transcription, translation, etc.) to prevent heavy video jobs from starving lighter document tasks.

The entire service layer uses an adapter pattern: transcription, translation, transcoding, storage, and SMS backends are all swappable via environment variables. The same codebase can run with Google Speech, OpenAI Whisper, or AWS Transcribe depending on deployment context.

AI Privacy Pipeline

The privacy pipeline chains four operations for every uploaded video:

Face detection and blurring using InsightFace SCRFD at frame level. Each face bounding box gets a Gaussian blur (99x99 kernel, sigma 30). The original audio is preserved and muxed back onto the blurred video via FFmpeg.
Multilingual transcription using a large language model via cloud APIs. Long media is split into 60-second chunks, each transcribed with word-level timestamps, then reassembled with corrected absolute offsets. The system handles English, Hindi, Kannada, and Tamil including code-mixed speech.
PII detection and audio muting using an AI model engineered for multilingual named entity recognition. Detected names and locations are marked [REDACTED] in the transcript and the corresponding audio segments are muted with FFmpeg's volume filter (with 20ms padding around each interval to avoid clipping).
Translation using an AI model that preserves word-level timestamps and passes [REDACTED] tokens through unchanged so they survive into translated outputs.

The pipeline is non-destructive: original media files are never modified. Every processing step creates a new media variant (original, proxy, redacted, transcoded) linked back to the parent via a parent foreign key. Researchers always have access to the original alongside any processed version.

Bulk Data Export

Analysts need to download entire research datasets for offline analysis. The export engine runs as an async Celery job and packages posts into ZIP files using a threshold-based batching algorithm: as soon as adding the next post would push the current batch over 500 MB, the batch is zipped and uploaded to S3 and a new batch begins. Each ZIP follows a consistent folder structure with metadata JSON files at every level.

For a dataset of 26 posts totalling 9.6 GB (including several large video interviews), the system produced 4 ZIP files with signed download URLs valid for 7 days. The export status screen in the admin app shows real-time progress and per-batch download links.

Results

96%

PII Redaction Accuracy

Across English, Hindi, Tamil, and Telugu

9.6 GB

Single Export Dataset

26 posts across 4 ZIP batches

4 langs

Transcription Support

English, Hindi, Kannada, Tamil including code-mixed speech

The research team went from spending 2 to 3 hours per document on manual redaction to a 15 to 20 minute AI-assisted review cycle. Analysts receive structured, consistently formatted exports they can work with directly. Field researchers spend their time doing research instead of data administration.

Technology Stack

Mobile: Flutter (iOS and Android), Drift (offline SQLite), GoRouter, Provider + StateNotifier
Backend: Django 5, Django REST Framework, PostgreSQL, Celery, Redis
AI: Large language models (transcription, PII detection, translation), InsightFace SCRFD (face detection)
Media: FFmpeg (video processing, audio muting), OpenCV (frame-level operations)
Storage: AWS S3 (primary), Google Cloud Storage (AI processing), presigned URL upload
Auth: Firebase Authentication, JWT, RBAC (4-role permission model)
Infrastructure: Docker, Celery Flower (queue monitoring), AWS SNS / Twilio (notifications)

Due to confidentiality agreements, the client name and specific deployment metrics have been generalised.