COMPLETE GUIDE

Pro Tools for Audio Post-Production: The Full Workflow

Session setup at 48kHz/24-bit, dialogue editing, ADR recording, foley, music mixing, bus structure, iZotope RX roundtrip, loudness metering (LUFS), and stem delivery for video sync. Every setting, every routing decision, every export step. Written by audio engineers who mix in Pro Tools daily.

Hire an Audio Engineer See Our Work

Sections

4000+

Words

40+

Specific Settings

Avid Pro Tools remains the industry standard for audio post-production in film, television, advertising, and online content. Nuendo competes in features, Logic Pro offers a lower entry cost, and Reaper provides extreme customization, but Pro Tools holds the largest market share in professional post facilities worldwide. This guide covers the complete audio post-production pipeline in Pro Tools: session setup, dialogue editing, ADR, foley, music mixing, loudness compliance, and stem delivery. Every recommendation is tested across commercial, music video, and short film projects at BLKRIP Studio in Da Nang.

1. Session Setup for Audio Post

Sample Rate and Bit Depth

Create a new session: File > New Session. Set Sample Rate to 48kHz and Bit Depth to 24-bit. These are the broadcast and film standards. 48kHz is required by all major delivery specs (EBU R128, ATSC A/85, Netflix, Amazon). 96kHz is occasionally used for high-end film mixes but doubles file sizes and CPU load without audible benefit on most monitoring systems. Stick with 48kHz unless the delivery spec explicitly requires 96kHz. Bit Depth: 24-bit provides 144 dB of dynamic range, which is more than enough for any post-production workflow. 32-bit float is available in Pro Tools but is unnecessary for recording and editing. Use 32-bit float only when bouncing stems that will undergo further processing. Audio File Format: BWF (.WAV) — Broadcast Wave Format. BWF includes timecode metadata that WAV does not. This matters when exchanging files with video editors (Premiere Pro, DaVinci Resolve) who need to sync audio to picture. Always use BWF for post-production.

Session Parameters Checklist

Sample Rate: 48kHz. Bit Depth: 24-bit. Audio Format: BWF (.WAV). Timecode Rate: match the video edit (23.976fps for cinema, 25fps for PAL broadcast, 29.97fps for NTSC). Start Time: 01:00:00:00 (industry standard for post sessions). Track Timestamp: Timestamped (absolute timecode position). Enable Elastic Audio on dialogue tracks (Polyphonic algorithm) for time compression/expansion without pitch shift.

Bus Structure and Routing

Post-production sessions use a structured bus hierarchy. Set up your buses before importing any media: Aux Input tracks (buses): - DX BUS: All dialogue tracks route here. This bus receives dialogue from boom, lav, and plant mics. - FX BUS: All sound effects and foley route here. - MX BUS: All music tracks route here. - BG BUS: All backgrounds and ambience route here. - STEM MSTR: The master bus that receives DX, FX, MX, and BG buses. This is your print bus for the final mix. Why bus through aux inputs instead of routing directly to the master? Because each bus gets its own insert chain. You can apply bus-level compression, EQ, and limiting to the dialogue bus without affecting the music bus. This is the standard routing in every professional post facility. Create these buses: Setup > I/O Setup > Bus tab. Name them DX, FX, MX, BG, STEM MSTR. Assign colors: DX = green, FX = blue, MX = orange, BG = purple, STEM MSTR = red.

2. Dialogue Editing

Dialogue Edit Workflow

Dialogue editing is the foundation of audio post. The goal: clean, intelligible dialogue with natural room tone, smooth transitions between takes, and no audible edits. Import the guide track (mixed-down audio from the video editor's timeline). This becomes your reference for sync. Lock this track (right-click > Lock) so you cannot accidentally move it. Import production audio: boom mic tracks, lavalier tracks, and any plant mic tracks. These arrive as polyphonic WAV files (multiple channels in one file) or separate mono WAV files. Pro Tools handles both. If polyphonic, use Track > Split to Mono to separate channels onto individual tracks. Sync check: Play the production audio alongside the guide track. They should be sample-accurate in sync. If they drift, check the sample rate of the production audio against the session sample rate. A mismatch (44.1kHz audio in a 48kHz session) causes progressive drift. Edit passes: 1. Comping: Select the best take for each line. If boom sounds better for one line and lav for the next, crossfade between them. Use equal-power crossfades (Setup > Preferences > Editing > Default Crossfade = Equal Power). 2. Noise removal: Send problematic sections to iZotope RX (covered in section 6). Remove aircon hum, traffic noise, room echo. 3. Fills: Every gap in dialogue must be filled with room tone. Record 30 seconds of room tone on set (silence with everyone standing still). Copy and paste this room tone into every gap. Never leave digital silence — the human ear detects absolute silence as unnatural. 4. De-breath: Reduce (do not remove) loud breaths. Use clip gain to bring breaths down 6-12 dB rather than deleting them. Deleted breaths create audible gaps. 5. De-plosive: Remove plosive hits (P, B, T sounds that pop the microphone). Use a high-pass filter at 80-120 Hz on the dialogue tracks, or use iZotope RX De-plosive module.

Dialogue Editing Tips

Use Strip Silence (Edit > Strip Silence) to quickly identify and remove long silence sections between dialogue lines. Set Threshold to -40 dB, Minimum Strip Duration to 500ms. This speeds up the initial pass on long-form content.
Crossfade every edit point. Even if two clips appear to butt perfectly, a crossfade prevents clicks at the edit boundary. Minimum crossfade length: 5ms for tight edits, 50-100ms for overlap blends.
Match room tone between takes. If the lav has more room reverb than the boom in a crossfade, the transition is audible. Apply light reverb matching or use clip-level EQ to balance the tonal difference.
Check dialogue on headphones after editing on monitors. Headphones reveal low-frequency rumble, mouth clicks, and clothing rustle that nearfield monitors miss at low volumes.
Label every track and every clip. A session with 40 tracks of DX_BOOM_TAKE3_V2 helps no one. Use consistent naming: DX_BOOM, DX_LAV1, DX_LAV2, DX_PLANT.

3. ADR (Automated Dialogue Replacement)

Recording and Editing ADR

ADR replaces production dialogue that is unusable due to noise, technical problems, or performance issues. The actor re-records their lines in a controlled environment while watching the picture. Session setup for ADR: Create a mono audio track for recording. Route the video output to a display the actor can see. Route the guide track (original production audio) to the actor's headphones at a comfortable level. Set input monitoring to Input Only on the record track so the actor hears themselves live. Recording workflow: 1. Play the clip with the original dialogue. The actor listens. 2. Roll recording. The actor performs the line, matching the original timing and emotion. 3. Record 3-5 takes of each line. More takes give you more options during the edit. 4. Move to the next line. ADR editing: The recorded ADR must sync with the actor's lip movements on screen. Pro Tools does not have built-in vocal alignment (unlike Logic Pro's Flex Pitch or Cubase's Audio Warp). Use one of these approaches: Elastic Audio: Enable Elastic Audio (Polyphonic) on the ADR track. Use the Selector tool to grab warp markers and stretch/compress the audio to match the lip sync. This works for small timing adjustments (10-50ms). Revoice Pro (Synchro Arts): The industry standard for ADR alignment. Send the original production dialogue and the ADR take to Revoice Pro. It automatically aligns the ADR to the original timing, pitch, and energy. Revoice Pro handles alignment that would take 30-60 minutes manually in 10 seconds. Cost: $599. VocALign (Synchro Arts): A lighter version of Revoice Pro. Aligns the timing of one audio clip to match another. Select the guide (original dialogue) and the dub (ADR take), click Align. VocALign compresses and stretches the ADR to match the guide timing. Cost: $399. Room tone matching: ADR recorded in a treated booth sounds dryer than production dialogue recorded on location. Apply convolution reverb to the ADR track using an impulse response captured from the filming location (if available) or a similar acoustic space. Match the reverb tail length and level to the production dialogue.

4. Foley and Sound Effects

Recording and Editing Foley

Foley is the reproduction of everyday sound effects that are added to film in post-production to enhance audio quality. Footsteps, cloth movement, prop handling, door closes — these are performed by a foley artist while watching the picture. Foley recording setup: You need a foley stage or a quiet room with different floor surfaces (concrete, wood, carpet, gravel pit). A pair of matched condenser microphones in ORTF or XY stereo configuration, positioned 30-50 cm above the performance surface. Record to a stereo audio track in Pro Tools at 48kHz/24-bit. Three categories of foley: 1. Moves: Cloth and clothing sounds. The foley artist moves fabric in sync with the actor's on-screen movement. Record on a dedicated MOVES track. 2. Footsteps: The foley artist walks on the matching surface in sync with the actor's on-screen footsteps. Record on dedicated FS tracks (separate tracks for left and right shoes if the mix demands it). 3. Specifics: Individual sound effects — door opens, glass sets, paper rustles. Record on dedicated SPEC tracks. Editing foley: Trim each foley hit to the exact frame. Adjust clip gain to match the perspective of the shot (closer = louder, wider = quieter). Add perspective EQ: roll off high frequencies for distant sounds, boost presence (2-5 kHz) for close sounds. Fades: 5ms fade in, 10-20ms fade out on each hit to remove handling noise.

Sound Effects Libraries and Layering

Not every sound needs to be recorded by a foley artist. Library sound effects handle off-screen sounds, ambient backgrounds, and effects that are impractical to perform. Recommended libraries: - Pro Sound Effects (PSE) Core: 12,000+ general effects. $299/year. - Boom Library: specialized libraries for gunshots, vehicles, weather, UI sounds. $99-299 per library. - Sound Ideas: the original broadcast SFX library. Comprehensive but dated. Layering technique: Professional sound design layers multiple sounds to create a single effect. A gunshot is not one sound — it is four layers: the crack (high-frequency transient), the boom (low-frequency body), the tail (reverb/echo of the environment), and the reaction (glass breaking, debris, character response). Each layer occupies a different frequency range and has its own dynamics envelope. This layering produces richer, more convincing sound design than a single library effect.

5. Mixing and Loudness Metering

Setting Mix Levels

Mixing audio for video is about balance and intelligibility, not loudness. The dialogue must be clearly understood above music and effects. Level hierarchy (relative to dialogue at 0 dB reference): - Dialogue: 0 dB (the anchor everything else references) - Music (under dialogue): -8 to -14 dB below dialogue - Music (alone, no dialogue): play at full level, loudness-compliant - Sound effects (hard effects): -2 to +2 dB relative to dialogue - Backgrounds/ambience: -12 to -20 dB below dialogue Mix workflow: Start with dialogue. Set the DX BUS fader so dialogue peaks at -10 dBFS on the master meter. This leaves headroom for effects and music. Bring in backgrounds and set their level so they are barely audible under dialogue. Bring in music and set it to sit under dialogue without competing for the same frequency range. Bring in effects last and set their level to match the on-screen action. EQ for clarity: Dialogue occupies 300 Hz to 5 kHz as its fundamental range. Music and effects that compete in this range make dialogue harder to understand. Apply a mid-range dip (2-4 kHz, -3 to -6 dB) on the music bus during dialogue passages using an automation lane. This is called dialog ducking and it keeps music from masking speech.

LUFS Loudness Standards

All delivery platforms measure loudness in LUFS (Loudness Units Full Scale). Different platforms have different targets: - YouTube: -14 LUFS integrated (measured across the entire program) - Netflix: -27 LUFS integrated (dialogue-normalized) - Amazon Prime: -24 LUFS integrated - Spotify: -14 LUFS integrated - Apple Music / Podcasts: -16 LUFS integrated - Broadcast (EBU R128, Europe): -23 LUFS integrated, +/- 1 LU tolerance - Broadcast (ATSC A/85, US): -24 LKFS integrated Pro Tools does not include a LUFS meter. Install a third-party meter plugin on the master bus. Recommended: Youlean Loudness Meter (free), iZotope Insight 2 ($99), or Waves WLM Plus ($29). Measurement method: Play the entire mix from start to finish with the LUFS meter active. The integrated (average) reading must hit the target. The true peak must not exceed -1 dBTP (decibels true peak) for streaming or -2 dBTP for broadcast. If your mix is too loud: lower the master fader and re-bounce. If your mix is too quiet: do not raise the master fader — this raises noise floor. Instead, apply a limiter (Waves L2, FabFilter Pro-L2) on the master bus set to catch peaks and raise the average level.

6. iZotope RX Integration

Pro Tools to iZotope RX Roundtrip

iZotope RX integrates with Pro Tools through two methods: RX Connect (plugin) and standalone RX with manual file exchange. RX Connect (preferred): Insert RX Connect on the audio track you want to clean up. Click the RX Connect plugin to open the RX editor with the selected audio loaded. Perform noise reduction, de-hum, de-reverb, or any RX module processing. Click Send Back in RX. The processed audio returns to Pro Tools, replacing the original. This workflow is non-destructive — RX Connect creates a new clip, and you can undo if the result is unsatisfactory. Standalone RX: Export the audio clip from Pro Tools (right-click clip > Export Clips as Files). Open the exported file in RX standalone. Process with any RX module. Export the cleaned file. Import back into Pro Tools and replace the original clip on the timeline. Common RX modules for post-production: - Voice De-noise: Real-time noise reduction. Set Learn to capture the noise profile from a section of silence, then adjust Reduction amount (6-12 dB for light noise, 12-20 dB for heavy noise). Avoid exceeding 20 dB reduction — artifacts become audible. - De-hum: Removes electrical hum. In Vietnam, set to 50Hz (and harmonics: 100, 150, 200 Hz). Set Q to narrow (10-20) for precise removal without affecting nearby frequencies. - De-reverb: Reduces room echo. Set Reduction to 4-8 dB for moderate echo. Higher settings produce watery artifacts. - Mouth De-click: Removes lip smacks and mouth clicks. Set Sensitivity to 6-8. Higher sensitivity catches more clicks but risks removing legitimate transient consonants. - Spectral Repair: Manual frequency repair for isolated noises (a door slam during a line, a phone ring). Select the noise in the spectrogram and click Replace. RX interpolates the surrounding audio to fill the gap.

7. Stem Delivery for Video Sync

Bouncing Stems

Stems are individual mix buses exported as separate audio files. Video editors, colorists, and broadcast engineers need stems to make final adjustments without requiring a full remix. Required stems for standard delivery: - DX STEM: Dialogue mix (boom + lav + ADR, processed) - FX STEM: Sound effects and foley mix - MX STEM: Music mix - BG STEM: Backgrounds and ambience - M&E STEM: Music and Effects only (no dialogue) — required for international distribution - FULL MIX: All stems combined (the complete mix) Bounce procedure in Pro Tools: 1. Solo the DX BUS aux input. File > Bounce Mix. Format: BWF (.WAV). Sample Rate: 48kHz. Bit Depth: 24-bit. Channels: Mono for DX, Stereo for FX/MX/BG/M&E, Stereo for Full Mix. File type: Interleaved (single file for stereo). Name: ProjectName_DX_STEM.wav. 2. Repeat for each stem. 3. Bounce the Full Mix last with all buses unsolo'd. Timecode: All bounces start at 01:00:00:00 regardless of where the session starts. Check the bounce dialog to ensure the start timecode matches the video's start timecode. If they mismatch, the stems will not sync when imported into the video editor. Head and tail handles: Include 2 seconds of silence before the start and after the end of the program. This gives the video editor room for crossfades at the beginning and end of the timeline.

Standard Stem Naming Convention

ProjectName_DX_STEM_4824_01.wav (Dialogue, 48kHz, 24-bit, version 1)
ProjectName_FX_STEM_4824_01.wav (Effects)
ProjectName_MX_STEM_4824_01.wav (Music)
ProjectName_BG_STEM_4824_01.wav (Backgrounds)
ProjectName_ME_STEM_4824_01.wav (Music & Effects, no dialogue)
ProjectName_FULLMIX_4824_01.wav (Complete mix)
Include a text file (ProjectName_AudioDelivery_Notes.txt) with: sample rate, bit depth, timecode start, duration, LUFS measurement, any known issues.

8. Hardware and Monitoring

Monitoring for Post-Production

Accurate monitoring is essential for making mixing decisions that translate across playback systems. You need two monitoring paths: nearfield monitors and headphones. Nearfield monitors: Position on speaker stands 1-1.5 meters from the listening position, forming an equilateral triangle with your head. The tweeters should be at ear height. Recommended nearfield monitors for post: Genelec 8030C ($700/pair), Focal Solo6 Be ($1,400/each), Adam A7V ($700/each). Calibrate the monitoring level to 79 dB SPL (C-weighted) using pink noise and an SPL meter. This is the film mixing standard reference level. Headphones: Use closed-back headphones for dialogue editing and noise assessment. Recommended: Sony MDR-7506 ($100), Beyerdynamic DT 770 Pro ($150), Audio-Technica ATH-M50x ($150). Check your mix on headphones after every major session. Headphones reveal problems that monitors miss at the edges of the frequency spectrum. Subwoofer: Add a subwoofer (Genelec 7050C, $1,200) for LFE monitoring if your content includes significant low-frequency content (explosions, bass music, vehicle sounds). Calibrate the subwoofer to +10 dB relative to the main monitors (this is the film standard for the LFE channel). Room treatment: Even the best monitors sound wrong in an untreated room. Add bass traps in the corners (first priority), absorption panels at the first reflection points on the side walls, and a diffuser panel on the rear wall. Budget $500-1,500 for acoustic treatment on a small room. Without treatment, your mixing decisions are based on the room's resonances, not the audio.

9. Pro Tools Shortcuts for Audio Post

Essential Post-Production Shortcuts (Mac / Windows)

Action	Mac	Windows
New session	Cmd+N	Ctrl+N
Import audio	Cmd+Shift+I	Ctrl+Shift+I
Bounce mix	Opt+Cmd+B	Alt+Ctrl+B
Separate clip at selection	Cmd+E	Ctrl+E
Trim to selection	Cmd+T	Ctrl+T
Consolidate clip	Opt+Shift+3	Alt+Shift+3
Fade (default)	F	F
Crossfade	Cmd+F	Ctrl+F
Zoom to selection	Opt+Cmd+[	Alt+Ctrl+[
Toggle solo	S (with key focus)	S (with key focus)
Toggle mute	M (with key focus)	M (with key focus)
Cycle edit modes (Shuffle/Spot/Slip/Grid)	1/2/3/4 (key focus)	1/2/3/4 (key focus)

10. Honest Limitations of Pro Tools for Audio Post

What Pro Tools Does Not Do Well

ARA (Audio Random Access) support. Pro Tools does not support ARA, which means plugins like iZotope RX, Melodyne, and SpectraLayers cannot run in real-time with direct audio access. You must use the RX Connect roundtrip or offline rendering. Nuendo supports ARA natively, making RX and Melodyne integration much faster.
Video playback quality. Pro Tools includes a basic video engine that plays reference video alongside audio. It does not support high-bitrate or high-resolution video well. Large ProRes files cause dropped frames and stuttering. Transcode reference video to H.264 at 1080p for smooth playback in Pro Tools.
Spatial audio mixing. Pro Tools supports 7.1.4 Dolby Atmos mixing with the Dolby Atmos Renderer, but the workflow is clunky compared to Nuendo's integrated Atmos panner. For serious Atmos mixing, consider Nuendo or DaVinci Resolve Fairlight.
Collaboration. Pro Tools projects do not support real-time multi-user editing. Cloud collaboration exists (Pro Tools | Cloud Collaboration) but is unreliable for professional deadlines. Nuendo and Reaper handle multi-user workflows more effectively.
Price. Pro Tools Studio (the tier needed for post-production features) costs $39.92/month or $399/year. This is a perpetual subscription with no permanent license option. Reaper provides 90% of the post-production functionality for a one-time $60 license. The remaining 10% (HDX hardware support, certain broadcast deliverables) matters only in high-end facility environments.

iZotope RX Dialogue Cleanup Guide DaVinci Resolve Color Grading Guide Resolve vs Premiere Pro Sound Design Cost Breakdown After Effects VFX Workflows Guide

Pro Tools Audio Post-Production FAQ

What sample rate should I use for audio post-production?

48kHz at 24-bit. This is the broadcast and streaming standard used by Netflix, Amazon, YouTube, EBU R128, and ATSC A/85. Use 96kHz only if the delivery spec explicitly requires it. Use BWF (.WAV) format for timecode metadata compatibility with video editors.

How do I integrate iZotope RX with Pro Tools?

Use RX Connect plugin on the track you want to process. Click the plugin to send audio to the RX editor, process it with De-noise, De-hum, De-reverb, or other modules, then click Send Back to return the processed audio to Pro Tools. This roundtrip workflow is non-destructive and preserves the original clip.

What LUFS level should my final mix target?

It depends on the platform. YouTube: -14 LUFS. Netflix: -27 LUFS (dialogue-normalized). Broadcast (Europe): -23 LUFS. Broadcast (US): -24 LKFS. Always check the specific platform's delivery specification before mastering. Use a LUFS meter plugin on the master bus and measure integrated loudness across the entire program.

What audio stems do I need to deliver for video production?

Six stems minimum: DX (dialogue), FX (sound effects), MX (music), BG (backgrounds), M&E (music and effects without dialogue, for international distribution), and Full Mix (complete mix). All at 48kHz/24-bit BWF, starting at the same timecode as the video, with 2-second handles.

Pro Tools or DaVinci Resolve Fairlight for audio post?

Pro Tools if you work in a facility that uses it (most do) or need to exchange sessions with other studios. Fairlight if you are already in the Resolve ecosystem and want to avoid the Pro Tools subscription cost. Fairlight has improved significantly and handles most post-production tasks, but Pro Tools remains the industry standard for session interchange.

How do I remove 50Hz electrical hum from audio recorded in Vietnam?

Use iZotope RX De-hum module. Set the fundamental frequency to 50Hz (Vietnam uses 50Hz power, unlike 60Hz in the US). Enable harmonics up to 10x (500Hz). Set Q to 10-20 for narrow notch filtering. Start with 6-8 dB reduction and increase if needed. Avoid over-processing — too much De-hum makes dialogue sound thin and phasey.

Need Professional Audio Post-Production?

Our Pro Tools engineers handle dialogue editing, ADR, foley, sound design, mixing, and loudness-compliant stem delivery for music videos, commercials, short films, and long-form content. Free audio consultation within 24 hours.

Get a Free Audio Consultation