showcAIse — AI Presentation Coach

Overview

3rd place overall | 2nd place Huawei Challenge — LauzHack 2025, EPFL

Built in under 48 hours at LauzHack, EPFL's annual hackathon. I led a team of 4 to build an AI presentation coach that analyzes your speaking performance and can regenerate an improved version of your presentation in your own voice.

The presentation itself was part of the demo — we deliberately started poorly (lots of "ums", "uhs", awkward pauses), then stopped and said: "We can tell we're presenting badly... if only there was some tool that could help us do better." From there we showed showcAIse analyzing our own bad delivery in real-time and suggesting improvements. The judges loved it.

How It Works

Upload a presentation video and the platform delivers:

Overview Dashboard — word count, speaking pace, filler word frequency, overall confidence score (0–100)
Key Moments — identifies strong and weak segments with specific categorization and improvement suggestions
Sentiment Analysis — tone evaluation and emotional progression throughout the presentation
Delivery Metrics — confidence breakdowns, performance timelines, detailed speech analysis
Recommendations — prioritized actionable improvements by severity
Voice Cloning — generates an improved version of your presentation in your own voice, with filler words removed and uncertain language replaced with confident phrasing
Transcript View — full text with highlighted filler words and hedge phrases

Voice Cloning Pipeline

The standout feature. The pipeline:

Extracts audio from the uploaded video (MoviePy + FFmpeg)
Transcribes using Together AI's Whisper API
Generates an improved script — removes fillers ("um", "uh", "like"), replaces hedge words ("I guess", "kind of") with confident alternatives
Clones the speaker's voice using Coqui TTS XTTS v2 (~2GB model)
Outputs a WAV file with the improved presentation in the original speaker's voice

Processing takes 1–2 minutes after the initial model download.

Confidence Scoring

The scoring algorithm starts at a base of 50 and adjusts across five dimensions:

Pacing (±25): optimal range 130–160 words per minute
Filler Words (±30): threshold penalties at 4%, 8%, 15% filler rate
Sentiment (±20): positive/negative/neutral tone via DistilBERT
Language Quality (±15): hedge word detection via regex patterns
Final score: 0–100, with 70+ classified as strong delivery

Technical Stack

Backend: FastAPI (Python 3.11.6), Together AI Whisper for transcription, Coqui TTS XTTS v2 for voice synthesis, DistilBERT for sentiment analysis, MoviePy + FFmpeg for media processing

Frontend: React 18.2 with segment-isolated video player (auto-pauses at defined boundaries for focused review)

Deployment: Docker Compose multi-container architecture with hot reload for development

What I Learned

Hackathons force you to prioritize ruthlessly — the voice cloning feature was the "wow factor" that made us stand out, so we built that first and polished the dashboard second
Presentation is half the battle at hackathons — our deliberately bad opening got the judges' attention more than any slide could
Coqui TTS is impressive for voice cloning quality but extremely sensitive to Python version (3.11.6 specifically) and takes careful dependency management
Leading a team under extreme time pressure means making fast architectural decisions you'd normally deliberate on — Docker Compose saved us from "works on my machine" issues across 4 developers