Skip to content
Language
COMPLETE GUIDE

iZotope RX Dialogue Cleanup: Settings That Actually Work

Module-by-module walkthrough of dialogue cleanup in iZotope RX: Voice De-noise, De-hum (50Hz Vietnam power grid), De-reverb, Mouth De-click, Spectral Repair, and De-wind. Specific settings for real noise problems recorded on location in Vietnam — aircon rumble, motorbike traffic, monsoon wind, tile-room echo. Not theoretical. Tested on 50+ projects.

8
Modules Covered
20+
Specific Settings
6
Real-World Examples
iZotope RX is the dominant tool for dialogue cleanup in film, television, advertising, and online video production. No other software combines spectral editing, machine-learning noise reduction, and targeted repair modules in a single application. This guide covers each module relevant to dialogue cleanup with specific starting settings, honest limitations, and real examples from production work in Vietnam. The settings here assume RX 10 or RX 11 — module interfaces are consistent between these versions.

1. Voice De-noise

Adaptive vs Manual Noise Reduction

Voice De-noise removes broadband noise (hiss, aircon, fan noise, ambient room noise) from dialogue recordings. It offers two modes: Learn and Adaptive. Learn mode: Select a section of pure noise (no dialogue — just the background), click Learn. RX captures a noise profile — a spectral snapshot of the noise floor. Then process the entire clip. The module subtracts the noise profile from the audio, leaving dialogue. This is the most reliable method because you control exactly what gets removed. Adaptive mode: The module continuously analyzes the audio and separates noise from dialogue in real-time. No noise profile needed. Adaptive is faster but less precise — it can misinterpret quiet dialogue as noise and remove it. Use Adaptive only when there is no clean noise-only section to learn from. Settings for common scenarios: Light aircon noise (hotel conference room, office): - Mode: Learn (capture 2-3 seconds of aircon-only sound between dialogue lines) - Reduction: 6-10 dB - Noise profile: captures the low-frequency rumble (60-200 Hz) plus broadband hiss - Learn time: 2-3 seconds minimum - Result: dialogue sounds natural, aircon fades to inaudible. No artifacts at these levels. Moderate aircon + fan noise (Vietnam cafe, restaurant kitchen area): - Mode: Learn - Reduction: 10-16 dB - The noise profile captures both the low-frequency component and the mid-range fan whir (800-2000 Hz) - Potential artifacts: slight thinning of dialogue above 10 dB reduction. Compensate with gentle high-shelf boost (1-2 dB at 8 kHz) after De-noise. Heavy background noise (street recording, motorbike traffic in Hanoi/Ho Chi Minh City): - Mode: Adaptive (traffic noise changes constantly — a static noise profile cannot capture the variation) - Reduction: 12-18 dB - Adaptive speed: Medium (faster speed tracks noise changes but risks pulling dialogue with it) - Result: significant noise reduction but some residual noise remains. Heavy traffic cannot be fully removed without artifacts. Expect 60-80% improvement, not 100%. Honest limitation: above 18-20 dB of reduction, Voice De-noise introduces audible artifacts — a watery, phasey quality on dialogue that sounds unnatural. This is true for all noise reduction algorithms, not just RX. If you need more than 18 dB of reduction, the recording is too noisy for dialogue and should be re-recorded (ADR) if possible.

Voice De-noise Best Practices

  • Always learn the noise profile from the quietest section of the recording — not the loudest noise. The quietest section contains the consistent noise floor without transient contamination.
  • Process in multiple light passes instead of one heavy pass. Two passes at 6-8 dB reduction sound better than one pass at 14 dB. The artifacts compound less aggressively with lighter processing.
  • Listen to the Difference signal (toggle in the module panel) — this plays what RX is removing. If you hear dialogue in the Difference signal, you are removing too much. Reduce the amount.
  • Use the Frequency range control to limit De-noise to specific bands. If noise is only below 500 Hz, set the high frequency limit to 500 Hz. This protects dialogue clarity above 500 Hz from unnecessary processing.
  • Apply De-noise before De-hum. De-noise may partially reduce hum as broadband noise, making the De-hum module's job easier and reducing the risk of musical noise artifacts from aggressive De-hum settings.

2. De-hum: 50Hz Vietnam Power Grid

Removing 50Hz Electrical Hum

Vietnam's power grid operates at 50Hz (like Europe and most of Asia, unlike the 60Hz used in the US, Canada, and parts of South America). Electrical hum from grounding problems, cheap power supplies, and nearby electrical equipment introduces 50Hz fundamental plus harmonics at 100, 150, 200, 250, 300 Hz and beyond. De-hum module settings for Vietnam: - Fundamental Frequency: 50 Hz - Filter Type: Notch (default and recommended) - Number of Harmonics: 10 (captures up to 500 Hz harmonic series) - Filter Q: 10-20 (narrow notch — removes the hum frequency without affecting surrounding audio) - Reduction Amount: Start at 6 dB, increase to 12 dB if hum is still audible Ground loop hum (constant, present in the entire recording): - This is the easiest hum to remove because it is consistent. - Reduction: 8-12 dB usually eliminates it completely. - Q: 15-20 for precise removal. - Result: hum gone, no audible side effects at these settings. Intermittent hum (equipment cycling on and off, refrigerator compressor): - Learn mode: capture the hum profile during a loud section. - Adaptive mode: enables the module to track hum amplitude changes. - Reduction: 6-10 dB (conservative, because the hum is not constant and aggressive settings remove too much during quiet periods). - Result: hum reduced but may not be fully eliminated during loud bursts. Manual spectral repair (Spectral Repair module) on the worst sections may be needed as a follow-up. Honest limitation: De-hum cannot fix a ground loop that saturates the recording. If the hum is so loud that it distorts the audio waveform (visible clipping in the waveform display), the damage is permanent. De-hum removes the hum frequency but cannot reconstruct the dialogue that was distorted by the hum.

De-hum Troubleshooting: When Hum Persists

If De-hum does not fully remove the hum, check these: 1. Is the fundamental actually 50Hz? Some equipment introduces hum at slightly offset frequencies (49.8 Hz, 50.3 Hz). Set the Fundamental Frequency control to match exactly. Use the Spectrum Analyzer (Window > Spectrum Analyzer) to identify the precise peak frequency. 2. Are harmonics above 500 Hz contributing? Increase Number of Harmonics to 15 or 20 to capture higher harmonic content. 3. Is the hum narrowband or does it have broadband noise floor? If the recording also has significant broadband noise (hiss, fan), De-hum alone will not fix it. Run Voice De-noise first to lower the noise floor, then De-hum. 4. Is the recording 24-bit or 16-bit? 16-bit recordings have a higher noise floor (-96 dB vs -144 dB for 24-bit). Hum is more audible relative to the noise floor in 16-bit recordings and may require more aggressive De-hum settings.

3. De-reverb

Reducing Room Echo and Reverb

De-reverb reduces the reflections and tail of room reverb on dialogue recordings. It cannot remove reverb completely, but it can reduce it by 4-10 dB, making dialogue sound closer and more present. When De-reverb works well: moderate reverb from hard surfaces (tile floors, concrete walls, glass windows). This is the most common reverb problem in Vietnamese interiors — modern apartments, hotel lobbies, restaurants, and offices often have tile or marble floors with minimal soft furnishings. Settings for tile/marble room echo (common in Vietnam): - Reduction: 6-8 dB - Reverb Profile: Auto (RX analyzes the reverb tail automatically) - Enhancement: 0-2 dB (adds presence to dialogue that was softened by the reverb) - Result: dialogue sounds noticeably closer and more direct. Echo is reduced but not eliminated. This is a good result — accept it. Settings for church/hall reverb (high ceilings, long reverb tail): - Reduction: 4-6 dB - Reverb Profile: Auto - Enhancement: 2-3 dB - Result: moderate improvement. Long reverb tails are mathematically intertwined with the direct dialogue. Aggressive De-reverb on long reverb produces hollow, metallic-sounding dialogue. When De-reverb fails: reverb with a very short tail (under 200ms) — there is not enough reverb tail for the algorithm to analyze. Very long reverb (over 3 seconds) — the reverb energy exceeds the direct sound energy, and the algorithm cannot separate them. Honest limitation: De-reverb at any setting above 10 dB reduction produces a hollow, phasey artifact on dialogue. The voice starts sounding like it was recorded through a metal pipe. This is the hard limit of current machine-learning De-reverb technology. If your dialogue needs more than 10 dB of reverb reduction, ADR is the better solution.

4. Mouth De-click

Removing Lip Smacks and Mouth Clicks

Mouth De-click removes lip smacks, tongue clicks, and wet mouth sounds from dialogue recordings. These sounds are caused by saliva and are particularly problematic in close-mic recordings (lavalier microphones) and dry-mouthed speakers. Settings: - Sensitivity: 6 (default starting point). Range 1-10. Higher sensitivity detects more clicks but risks flagging legitimate consonant sounds (T, K, D) as clicks and softening them. - Click Widening: 1.0ms (default). Controls how much audio around each detected click is processed. Increase to 2.0ms for loud, broad mouth clicks. Decrease to 0.5ms for subtle clicks. - Frequency Skew: 0 (neutral). Positive values prioritize high-frequency clicks, negative values prioritize low-frequency smacks. Workflow: Run Mouth De-click on the entire dialogue track at Sensitivity 6. Listen to the result. If mouth clicks remain, increase to 7 and re-run on the specific sections. If legitimate consonants sound softened, decrease to 5. Pro tip: Mouth De-click works best after Voice De-noise. Broadband noise interferes with click detection — the algorithm cannot distinguish between noise transients and mouth click transients when the noise floor is high.

5. Spectral Repair

Manual Frequency Repair for Isolated Noises

Spectral Repair is a manual tool for removing isolated noises that are too short or too specific for the automatic modules. It works in the spectrogram view — you select the offending frequency range and time range, and RX interpolates the surrounding audio to fill the gap. Common uses: - Door slam during a dialogue line - Phone ring or notification sound in the background - Single cough or sneeze from someone off-mic - Bird call or insect buzz during an outdoor take - Equipment beep or alarm How to use: 1. Open the spectrogram view (Tab > Spectrogram). 2. Zoom to the offending sound. Adjust frequency zoom to see the noise clearly. 3. Select the noise with the Time-Frequency Selection tool (I key). Select only the noise — not the dialogue around it. 4. Open Spectral Repair module. 5. Mode: Replace (fills the selection with interpolated audio from surrounding time/frequency data) or Pattern (replaces with a pattern matched from adjacent regions). 6. Click Render. Settings: - Replace mode with default interpolation settings handles 90% of spectral repair needs. - Pattern mode works better for sustained tones (phone ring, alarm) because Replace mode can produce a gap in sustained frequencies. - Band Limiting: set the frequency range to match the noise. A door slam occupies 0-500 Hz. A phone ring occupies 800-2000 Hz. Limiting the repair to these frequencies protects the rest of the spectrum. Honest limitation: Spectral Repair cannot fix noise that overlaps dialogue in the same frequency range and time. If someone slams a door in the same frequency range as the dialogue (which it often does — low frequencies), removing the door slam also removes that portion of the dialogue. In these cases, Spectral Repair can reduce the noise but cannot eliminate it without damaging dialogue.

6. De-wind

Removing Wind Noise from Outdoor Dialogue

De-wind removes low-frequency wind rumble from dialogue recordings. Wind hitting the microphone diaphragm produces broadband low-frequency noise (0-300 Hz) that varies in amplitude as gusts change. This is a common problem for outdoor shoots in Vietnam, particularly during the monsoon season (July-September) and on coastal locations (Da Nang beach, Hoi An waterfront). Settings: - Strength: 3-6 (default starting point). Range 1-10. Higher strength removes more wind but risks thinning dialogue fundamentals. - Cutoff Frequency: 200 Hz (default). Wind noise is predominantly below 200 Hz. Increase to 300 Hz if wind rumble extends higher. - Adaptation: Medium. Controls how quickly the module tracks wind gust amplitude changes. Coastal wind (Da Nang beach, steady breeze with gusts): - Strength: 4-6 - Cutoff: 200 Hz - Result: wind rumble significantly reduced. Dialogue sounds more present. Some low-frequency body may be lost — compensate with gentle low-shelf boost (1-2 dB at 80 Hz) after De-wind. Monsoon wind (strong, gusty, with rain): - Strength: 7-9 - Cutoff: 300 Hz - Result: wind reduced but dialogue sounds thin below 300 Hz. This is a difficult scenario — monsoon wind is aggressive and unpredictable. Consider ADR for the worst sections. Honest limitation: De-wind cannot fix wind that clipped the recording. If wind gusts overloaded the microphone preamp (visible as flat-topped waveform peaks), the damage is permanent. Use a windscreen (deadcat) on all outdoor microphones to prevent this problem at the source.

7. Real Examples from Vietnam Production

Example 1: Hotel Interview with Aircon and 50Hz Hum

Problem: Interview recorded in a Da Nang hotel conference room. Aircon unit produces constant low-frequency rumble. Electrical grounding issue introduces 50Hz hum at -35 dBFS. Tile floor creates moderate reverb on the subject's voice. Lavalier mic picks up mouth clicks. Processing chain: 1. Voice De-noise: Learn from 2-second gap between sentences. Reduction: 10 dB. Removes aircon broadband noise. 2. De-hum: Fundamental 50Hz, 10 harmonics, Q=15, Reduction: 8 dB. Removes electrical hum completely. 3. De-reverb: Reduction: 6 dB, Enhancement: 1 dB. Reduces tile-floor echo to acceptable level. 4. Mouth De-click: Sensitivity: 6. Removes 90% of mouth clicks. Result: dialogue is clean, present, and natural-sounding. Residual room tone is barely audible. Total processing time: 2 minutes per minute of audio.

Example 2: Street Vlog with Motorbike Traffic

Problem: Presenter speaking directly to camera on a Da Nang street. Motorbikes pass every 3-5 seconds, producing broadband noise in the 200-4000 Hz range. Individual motorbike passes cannot be removed — they overlap with dialogue in frequency and time. Constant low-level traffic noise. Processing chain: 1. Voice De-noise: Adaptive mode. Reduction: 14 dB. Reduces constant traffic bed noise significantly. 2. Spectral Repair: Manual removal of two isolated horn honks that occurred between words (selected in spectrogram, replaced with interpolation). 3. De-hum: Fundamental 50Hz, Reduction: 6 dB. Removes hum from a nearby electrical transformer. Result: presenter's voice is significantly clearer. Motorbike passes are reduced but still audible — they cannot be fully removed without destroying dialogue. Recommend re-recording voiceover in studio and syncing to picture for a clean final result. This is the honest answer for noisy street recordings.

Example 3: Beach Wedding with Coastal Wind

Problem: Wedding ceremony on My Khe Beach, Da Nang. Steady coastal wind producing low-frequency rumble on the officiant's lapel mic. Wind gusts causing momentary overload on 3 occasions. Guest background chatter during quiet passages. Processing chain: 1. De-wind: Strength: 5, Cutoff: 200 Hz. Reduces steady wind rumble effectively. 2. Voice De-noise: Learn from a quiet passage. Reduction: 8 dB. Reduces guest chatter bed noise. 3. Spectral Repair: Manual repair of 3 clipped gust moments — selected and replaced with interpolation. Some dialogue loss in those 3 moments is unavoidable. 4. De-reverb: Reduction: 4 dB. Beach has surprisingly little reverb (sand absorbs sound) but the open-air environment adds a slight diffuse quality. Result: ceremony audio is usable. The 3 clipped moments have slight quality loss but the words are still intelligible. Wind is reduced to a barely audible bed. For a highlight film, this is acceptable. For a documentary, re-record the officiant's key vows as ADR.

8. Processing Order and Workflow

Recommended Module Processing Order

The order in which you apply RX modules matters. Earlier modules can interfere with later modules if applied in the wrong order. Recommended processing chain: 1. Spectral Repair — fix isolated problems first (door slams, phone rings, equipment clicks) so they do not confuse the automatic modules. 2. De-wind — remove low-frequency wind rumble that affects the noise floor for De-noise. 3. De-hum — remove electrical hum. Narrow, specific frequencies. Best done before broadband noise reduction. 4. Voice De-noise — reduce broadband noise floor. Now that hum and wind are already addressed, De-noise can focus on the remaining noise. 5. De-reverb — reduce room echo. De-reverb works better on clean dialogue than on noisy dialogue. 6. Mouth De-click — remove mouth clicks last, when the noise floor is at its lowest and click detection is most accurate. This order is a starting point. Adjust based on the specific problems in your recording. If the recording has no wind noise, skip De-wind. If the recording has no electrical hum, skip De-hum. Never apply a module to fix a problem that does not exist — every processing step introduces a small quality cost.

RX Workflow Best Practices

  • Work on short sections (10-30 seconds) at a time, not the entire file. This lets you A/B compare before/after and adjust settings per section.
  • Always keep the original file. RX modifies the audio in place by default. Use File > Save As to create a new file, keeping the original untouched.
  • Use the History panel (Window > History) to undo any processing step. RX keeps a full undo history, so you can revert to any point in the processing chain.
  • Render and listen after each module. Do not stack 4 modules and render all at once — you cannot identify which module caused a problem if artifacts appear.
  • Compare against the original frequently. After 3-4 modules of processing, ear fatigue sets in and you lose perspective. Take a 5-minute break, then compare your processed audio against the original.
  • Export a Difference track. After processing, export the removed noise (Module > Output: Difference). Listen to it. If you hear dialogue in the difference track, you have removed too much.

iZotope RX Dialogue Cleanup FAQ

How much noise reduction is too much in iZotope RX?
Above 18-20 dB of reduction in Voice De-noise, audible artifacts appear — a watery, phasey quality on dialogue. Stay under 12 dB for clean results. If you need more than 18 dB of reduction, the recording is too noisy and ADR should be considered. Multiple light passes (6-8 dB each) sound better than one heavy pass.
How do I remove 50Hz hum from audio recorded in Vietnam?
Use the De-hum module. Set Fundamental Frequency to 50Hz (Vietnam uses 50Hz power). Enable 10 harmonics. Set Q to 10-20 for narrow notch filtering. Start with 6 dB reduction and increase to 12 dB if needed. If hum persists, use the Spectrum Analyzer to check if the actual frequency is offset from exactly 50Hz.
Can De-reverb completely remove room echo from dialogue?
No. De-reverb can reduce room echo by 4-10 dB, making dialogue sound closer and more present. Complete removal is not possible with current technology. Above 10 dB reduction, dialogue starts sounding hollow and metallic. For severe reverb problems, ADR is the practical solution.
What is the correct processing order for RX modules?
Recommended order: Spectral Repair (fix isolated problems first), De-wind (remove wind rumble), De-hum (remove electrical hum), Voice De-noise (reduce broadband noise), De-reverb (reduce room echo), Mouth De-click (remove mouth clicks last when noise floor is lowest). Adjust based on the specific problems in your recording.
Can iZotope RX fix clipped or distorted audio?
The De-clip module can repair mild clipping (a few consecutive flat-topped samples) by interpolating the missing waveform peaks. It cannot fix severe clipping where large portions of the waveform are squared off. For moderate clipping, set Threshold to just above the clip level and De-clip will reconstruct the peaks. Results vary — test on your specific audio.
How do I use RX with Pro Tools?
Use the RX Connect plugin on the audio track in Pro Tools. Click the plugin to send audio to the RX editor, process it, then click Send Back to return the processed audio. This is non-destructive. Alternatively, export clips from Pro Tools, process in standalone RX, and re-import. The plugin method is faster for most workflows.

Need Professional Dialogue Cleanup?

Our audio engineers clean dialogue recorded in Vietnam's challenging environments — aircon rumble, motorbike traffic, monsoon wind, tile-room echo. We use iZotope RX, Pro Tools, and 15 years of location recording experience. Send us a 30-second sample for a free assessment.

Get a Free Audio Assessment