How to Clean Up Interview Recordings
Techniques for editing interview audio and video to remove technical issues, improve pacing, and create professional final recordings.

How to Clean Up Interview Recordings
Interview recordings present unique editing challenges. A typical hour-long interview contains 8-15 minutes of dead air, 150-300 filler words, uneven audio levels between speakers, background noise, and 20-40 instances of crosstalk or interruptions.
Interview cleanup is the process of removing technical imperfections, balancing multi-speaker audio, improving pacing, and eliminating distracting elements from interview recordings while preserving the natural flow of conversation. This transforms raw interview files into polished, professional content suitable for publication.
Common Problems in Raw Interview Recordings
Unedited interview files typically contain multiple issue types:
Technical Issues
- Level imbalances: One speaker 8-15dB quieter than the other
- Background noise: HVAC systems, traffic, room echo
- Mic pops and clicks: Plosives and handling noise
- Connection problems: Dropouts, glitches, or buffering in remote recordings
- Uneven room tone: Different acoustic environments for each speaker
Pacing Issues
- Extended pauses: 2-5 second gaps while thinking or searching for words
- Dead air: 5-30 second gaps during technical issues or breaks
- Rambling sections: Long tangential discussions
- Repetition: Same point made multiple times in different ways
- False starts: Multiple attempts at phrasing before successful version
Communication Issues
- Filler words: Um, uh, like, you know (averaging 150-300 per hour)
- Crosstalk: Both speakers talking simultaneously
- Interruptions: One speaker cutting off the other
- Side conversations: Off-topic discussion between main points
- Verbal tics: Repeated phrases or sounds specific to speakers
Interview-Specific Editing Challenges
Interviews differ from single-speaker content:
Multi-Track Considerations
- Each speaker needs individual level adjustment
- EQ and compression settings differ by voice characteristics
- Noise reduction must be applied per-speaker
- Timing edits affect both tracks simultaneously
Conversational Flow
- Natural overlap between speakers should be preserved
- Turn-taking pauses are typically shorter than thinking pauses
- Interruptions might be intentional and conversational
- Some repetition emphasizes important points
Content Decisions
- Determining which tangents add value vs distract
- Identifying natural segment boundaries
- Deciding when crosstalk should be kept vs cleaned
- Balancing authenticity with polish
Manual Interview Cleanup Workflow
Traditional approach to interview editing:
Step 1: Audio Technical Cleanup
- Import separate speaker tracks or split stereo to mono
- Analyze peak levels for each speaker
- Apply gain adjustment to balance speakers within 2-3dB
- Apply noise reduction to each track separately
- Set EQ for clarity and consistency
- Apply compression to reduce dynamic range
- Check for pops, clicks, and handle individually
Time: 45-75 minutes per hour of content
Step 2: Silence and Dead Air Removal
- Scan both tracks for simultaneous silence
- Identify gaps exceeding normal turn-taking pauses (1+ seconds)
- Measure each pause duration
- Shorten or remove pauses exceeding threshold
- Verify edits didn't remove meaningful pauses
Time: 60-90 minutes per hour of content
Step 3: Filler Word Removal
- Listen to each speaker track
- Mark filler words (um, uh, like, you know)
- Evaluate context for each instance
- Delete fillers that don't serve communicative purpose
- Smooth transitions around deletions
- Verify natural speech rhythm maintained
Time: 60-90 minutes per hour of content
Step 4: Crosstalk and Overlap Management
- Identify sections where both speakers talk simultaneously
- Determine if overlap is natural conversation or problematic
- For problematic crosstalk, isolate cleaner sections
- Rearrange timing or reduce volume of one speaker if needed
- Ensure transitions feel natural
Time: 30-60 minutes per hour of content
Step 5: Content Editing
- Remove false starts and repeated phrasings
- Trim or remove tangential sections
- Tighten rambling answers to key points
- Ensure logical flow between topics
- Add intro/outro if required
Time: 45-90 minutes per hour of content
Step 6: Final Review and Export
- Listen to complete edited interview
- Check for jarring transitions or errors
- Verify consistent levels throughout
- Apply master processing if needed
- Export in required formats
Time: 30-45 minutes per hour of content
Total manual cleanup time: 270-450 minutes (4.5-7.5 hours) per hour of interview content
Limitations of Manual Interview Cleanup
Manual interview editing faces significant obstacles:
Complexity: Managing two or more speaker tracks multiplies editing decisions.
Attention requirements: Evaluating conversational dynamics requires sustained focus.
Technical skill needed: Proper audio balancing and processing requires experience.
Time investment: 4.5-7.5 hours per interview makes frequent publishing difficult.
Consistency challenges: Maintaining standards across multiple interviews is difficult.
For podcasters conducting weekly interviews, manual cleanup consumes 18-30 hours per month - equivalent to a part-time job.
Automatic Cleanup Capabilities
Modern tools automate many interview cleanup tasks:
Automated Processes
Silence and pause removal: Detect and shorten or remove gaps in conversation
- Processing time: 8-12 minutes
- Replaces: 60-90 minutes of manual work
Filler word detection: Use speech recognition to identify and remove common fillers
- Processing time: 10-15 minutes
- Replaces: 60-90 minutes of manual work
Dead air removal: Identify and delete extended gaps
- Processing time: 5-8 minutes
- Replaces: 20-30 minutes of manual work
Basic level balancing: Normalize speaker volumes to similar ranges
- Processing time: 3-5 minutes
- Replaces: 15-25 minutes of manual work
Manual-Required Processes
Crosstalk management: Requires judgment about conversational appropriateness
Content trimming: Needs editorial evaluation of value and relevance
Advanced audio processing: Speaker-specific EQ and compression for optimal sound
Quality verification: Ensuring automated edits maintained intended meaning
Remote Interview Special Considerations
Remote interviews via Zoom, Riverside, or similar platforms have additional issues:
Common Remote Recording Problems
- Connection instability: Audio dropouts, glitches, robot voices
- Compression artifacts: Platform audio compression creates quality issues
- Echo or feedback: Especially when participants don't use headphones
- Inconsistent audio quality: Different microphone and room setups
- Sync drift: Audio/video gradually falling out of sync
Remote Cleanup Strategies
- Record local tracks when possible for better source quality
- Use noise reduction aggressively on problematic tracks
- Replace robotic or glitchy sections with clean audio if backup exists
- Apply more aggressive EQ to compensate for platform limitations
- Verify sync every 10-15 minutes of edited content for video
Time addition for remote issues: 30-90 minutes per hour of content depending on severity.
Multi-Speaker Balancing Techniques
Consistent levels between speakers improves listener experience:
Manual Balancing
- Identify average peak levels for each speaker
- Calculate gain adjustment needed to match levels
- Apply gain to quieter speaker(s)
- Use compression to reduce dynamic range
- Verify perceived loudness, not just peak levels
Automated Balancing
Modern tools can:
- Analyze LUFS (Loudness Units) for each speaker
- Apply calculated gain to match perceived loudness
- Set consistent levels across entire interview
This produces consistent results in 3-5 minutes vs 15-25 minutes manually.
Efficient Interview Cleanup Workflow
Combined approach leveraging automation:
Phase 1: Automated Processing
- Upload raw recording (2-5 minutes)
- Automated cleanup processes:
- Dead air removal
- Pause shortening
- Silence removal
- Optional filler word removal
- Processing completes (8-15 minutes)
Total: 10-20 minutes
Phase 2: Manual Refinement
- Download processed file (1-3 minutes)
- Import to editing software
- Apply speaker-specific audio processing (15-30 minutes)
- Review and adjust any problematic automated edits (10-20 minutes)
- Handle crosstalk and complex timing issues (15-30 minutes)
- Content-level editing (trim tangents, rearrange if needed) (20-40 minutes)
- Final review (10-15 minutes)
Total: 71-138 minutes
Combined workflow total: 81-158 minutes (1.3-2.6 hours) per hour of interview
Time savings: 189-292 minutes (3.2-4.9 hours) per interview, or 68-78% reduction.
Rendezvous handles the automated phase, processing interview recordings to remove dead air, tighten pauses, and optionally remove filler words. The tool works with multi-speaker content, producing files that are typically 20-40% shorter than originals with improved pacing. Users then apply speaker-specific audio processing and content-level editing as needed.
Interview Cleanup Checklist
Before publishing an interview, verify:
- [ ] Both speakers are at similar perceived volume
- [ ] No silence gaps exceeding 2 seconds
- [ ] Filler words reduced by 70-80%
- [ ] False starts and repetitions removed
- [ ] Background noise minimized or removed
- [ ] Pops, clicks, and mouth sounds addressed
- [ ] Crosstalk manageable and natural-sounding
- [ ] Content flows logically from topic to topic
- [ ] Introduction and conclusion present (if applicable)
- [ ] Total length reasonable for content (20-40% shorter than raw)
Summary
Interview cleanup involves technical audio processing, pacing improvement, and content refinement. Manual cleanup takes 4.5-7.5 hours per hour of interview, while automated tools reduce this to 1.3-2.6 hours including all manual work.
Key principles for effective interview cleanup:
- Balance speaker levels for consistent listening experience
- Automate silence, pause, and filler removal (saves 2-4 hours per interview)
- Preserve natural conversational overlap and energy
- Apply speaker-specific audio processing for optimal quality
- Make content-level decisions based on editorial value
For interview-based podcasts and video channels, automated cleanup tools save 12-20 hours per month while producing consistently polished content.
Content reviewed on January 2026.