Tools and techniques for removing unwanted noise from vocal recordings
by Richie Nieto
One of the biggest differences between film and documentary sound versus animation and video game sound is that, usually, in films and documentaries, the recording environments are not fully controlled and often chaotic. When shooting a scene in the middle of a busy street intersection, for instance, the recorded audio will contain much more than just the voice of the subjects being filmed. Even in a closed filming environment in a “quiet” set, there is a lot of ambient noise that will end up in the dialogue tracks.
Traditionally, in the film world, dialogue lines that are unusable due to poor sound quality are replaced in a recording studio, in a process called ADR (Automated Dialogue Replacement). Documentaries don’t have the same luxury, as interview answers are not scripted and the interviewed subjects are not actors (and wouldn’t be able easily duplicate their own previously spoken words accurately in the studio). There are also issues of budget limitations – ADR is expensive, and most small-budget productions can’t afford to replace every line that needs to be replaced.
So, the next best option is to clean up what is already recorded, as best as we can. I’ll explain some of the tips and techniques to ensure that you get the most out of the material you have. As a brief disclaimer, keep in mind that some of these techniques are divided up between dialogue editors and re-recording mixers on most professional-level projects, so if you’re not mixing, make sure to consult with your mixer before doing any kind of processing.
As an example of bad-sounding dialogue, we have the following clip:
This clip has a number of problems (aside from the poor performance by yours truly). There is hum and hiss in the background and, due to improper microphone technique, loud pops from air hitting the diaphragm too hard and the voice sounds very boomy. This would be an immediate candidate for ADR. However, we will assume the budget doesn’t allow for it to be replaced, or the actor is not available. I’ve had situations where the actor just doesn’t want to come into the studio to do ADR, even though it’s in their contract, and no amount of legal threats will convince them otherwise, so the only course of action in those cases has been to make the bad-sounding lines good enough to pass a network’s quality control.
Okay, so let’s get to it. The first step is to filter out some of the boominess with an EQ plugin. For this example, I’m just using one of the stock plug-ins in ProTools. All of the processing here is file-based, as opposed to real time processing, mostly to be able to show how each step affects the clip.
By listening and a bit of experimenting, we can hear that there is a lot of bass around 100 Hz in our audio file. Here’s how the clip sounds after removing some of the offending low frequency content:
Next, we’ll use a noise reduction plug-in to get rid of some of the constant background noise. There are plenty of other options in the market, but I’ll use Waves’ X-Noise for this example. The trick here is to not go overboard; if you start to hear the voice breaking down and getting “phasey”-sounding, you need to pull back on the Threshold and the Reduction parameters. You won’t get rid of all the noise with this step, but I find it yields better results to use moderate amounts of processing in different stages instead of trying to cure the problem by using a single tool.
After having the plug-in “learn” the noise and then adjusting the Threshold, Reduction, Attack and Release parameters, we process the clip, which now will sound like this:
There is still a fair bit of background noise in there, so now we’re going to use a multiband compressor/expander to deal with it. In this particular case I’ll use Waves’ C4, but, once again, there are many equivalent plug-ins to choose from. I am just very familiar with the C4 and how it behaves with different kinds of sounds.
We need to set the parameters for expansion, which does the exact opposite of compression: it makes quiet things quieter. That’s why we apply it after the noise reduction plug-in, so that the noise level is much lower than the voice when it goes through the expander. A normal single-band expander will not work as well because the noise lives in different areas of the frequency spectrum, and those need to be addressed independently with different amounts of expansion.
Now there is a vast improvement on the noise level on the clip, as we can hear:
Okay! The following step is to tackle the pops caused by the microphone’s
diaphragm being slammed hard by the air coming out of my mouth. Obviously,
this is a problem caused by bad planning, and it is replicated here to
illustrate a very common mistake in recording voiceovers. In this case,
the most offending pops are in the words “demonstrate” and
A solution that has worked really well for me many times is actually
very simple. It involves three quick steps. First, select the part of
the clip that contains the pop, and be sure to include a good portion
of the adjacent audio before and after the pop in the selection.
Use an EQ to filter out most of the low end of that selection. This will automatically create a new region in the middle of the original one.
Then crossfade the resulting regions to eliminate any clicks and to smooth the transitions. You will need to experiment with the crossfades’ proper positions and lengths, and the exact frequency and the amount of low end content to be removed, based on the severity of the pop.
And now, without those loud bumps, our clip sounds like this:
If you compare the first version of or audio file to this last one, you’ll hear the huge difference that is accomplished in sound quality by using several different steps and combining tools and tricks. As you know, there are always better and more affordable software applications being created for dealing with noisy audio, and some of the newer ones are able to cover several of the stages that I’ve described here. Others, which use spectral analisys algorithms, can even isolate and eliminate incidental background noises that happen at the same time as the dialogue, like a glass clinking or a dog bark. So the game is constantly changing.
In closing, hopefully this article will serve as a guide on how to tackle some problems with audio material that, for any reason, can’t be recorded again. It’s by no means a definitive approach to eliminating noise, since the number of variables and tools out there is staggering. So experiment, and have fun!
About the author: Richie Nieto has been a professional composer and sound designer since the early nineties. He has been involved with projects for DreamWorks, Lucasfilms, Dimension Films, Sony Pictures, HBO, VH1, FOX Sports, Sony Music, BMG, EA, THQ, Harmonix and many more of the biggest companies in the entertainment industry. His work can be heard on many commercially-released CDs, feature films, documentaries, video games and over 30 television series for the U.S. and Canada. Recently, Richie has composed music and/or designed sounds for projects like EA’s “Nerf N-Strike”, “Nerf 2: N-Strike Elite”, “Littlest Pet Shop”. “Littlest Pet Shop Friends”, and THQ/Marvel’s “Marvel Super Hero Squad”. He also finished work on Ubisoft’s “James Cameron’s AVATAR: The Game” and is currently a contract composer and sound designer for EA. VIsit Richies website at www.richienieto.com
Voiceover talent doesn’t come naturally to everyone, but having the right tools, advice, and some great background music from Shockwave-Sound.com is a great way to get started.
Whether you’re working on a low budget podcast, give prerecorded presentations, or plan to work in the film industry, everyone looking to accomplish voiceover work has to start somewhere. Sometimes just knowing what you need can be a challenge. Max Laing has spent decades working with audio recording, and he recently shared some advice on how to get started with voiceover recording.
How and why did you choose to get started in radio and voice work?
Somebody made the mistake of telling me I had a good voice a long time ago, and when people weren’t looking I’d walk up to a microphone that somebody happened to leave on or whatever and I could hear myself rumble through these big speakers. All of a sudden, that was a new level of power. I was very used to low frequencies anyway because I played the tuba, so anything that was very low and the rumbling tones, the fact that I could produce that with my voice back then was just an amazing thing. Of course you can’t do that very loud with the human body. Most people can’t, and I’m the same. Just because I can produce the tone doesn’t mean that I could project it. But the microphones – now that allowed my voice to be projected and I’ll tell you what, one’s rumbling voice combined with amplification the first time and it was over with. I was hooked. After that, came the whole new world of equalization and taking out the highs and lows and making it sound like you know what you’re talking about.
What would you say is your formula for creating a good voiceover?
Well, certainly it goes without saying, if it’s not something that’s whimsical, if it’s going to be meaningful and means something to the person that’s asking to do it, then you’ve got to be in the right mindset. If you’re aggravated, it’s going to come out in your voice. It’s very hard to mask that. That’s just on the personal end. The flip side of that on the technology end is you’ve got to have a flat room. You’ve got to get rid of all the extra noise in the background. You’ve got to be able to amp up your mic as much as possible while you deaden the rest of the room. You want a flat, flat, flat response. Nothing extra boomy, nothing extra tinty, all the popping p’s – you want to get everything out of the way. You warm up with some of the standard little things that everybody says, “Wheat checks, rice checks, corn checks” – you want to make sure the mic is not popping on your voice. You’ll say a few “p” words, a few “s” words, making sure you’re not hissing. You listen to a couple of the initial recordings back to make sure you don’t have a fan running in the background; as we’ve converted from tape machines, well we’ve now gone into computers and with computers comes what? Fans. And they have fans all over the place. Some of them enough to cause a small tornado. So you’ve got to really keep down as much of that background noise as possible. One of the tricks is to put a noise gate on the mic so that the gate or openness of the mic isn’t actually on until you’re making a noise above a certain level. So when you get it set just right, your voice will be over any extra background noise that you’re not able to contain. Then as soon as you stop speaking, it “closes” the gate, preventing that background noise from seeping through. That’s one of the very powerful tricks that everybody in that business uses. To be able to suppress as much sound as possible and mask it in ways – and that’s what a noise gate allows you to do is mask it with your own voice – but when you’re not speaking and the gaps between what you’re saying, where the background noise is, still doesn’t get to seep in then either because the gate is very quickly responding to you not making a sound at a particular level. When you have stopped, just for that brief moment of making a sound, it shuts the gate very, very quickly. It’s all electronic so it can move literally at the speed of light, and if you’ve got the thing set just right, your audience will not be able to tell. The listening ear will not be able to tell on the other end that you’re dealing with a noise gate.
You mentioned to once that your setup is the same whether the recording is going to be 12 minutes or 12 hours. What kind of audio equipment do you currently use for your voiceover work?
Well I’ve got a microphone from Carvin, and it is an XLR mic. We run it completely straight, balanced with 48 volt phantom power on it. It’s a condenser mic so it picks up very, very rich sounds. Obviously I have a windscreen on that. We’re recording with Sonar, the next to latest version of Sonar. We’ll probably be upgrading soon. There’s far more elaborate software out there, far more elaborate programs for anyone to draw from, but the needs for simple voiceover work are not that complex. As a matter of fact, the program that I’m using far exceeds the need that I have on getting the recording into the machine, but that’s okay, that means that there’s going to be a nice crisp, clear recording, usually around at least 24, maybe 48-bit level. So we’re exceeding right out of the gate the 44.1 kHz frequencies of what you’d get off of a CD, of what the limitations of a CD would be. We’re already recording it better than standards. So, if we ever reduce audio down to the CD level, the initial recordings are better than what you would expect when you go and purchase something out of a store. Not that any of us could really tell with our ear; the fact of the matter is recording things this way just allows you to not lose anything.
Speaking of microphones, what should someone look for when choosing a microphone for voice recording? And is it different if you’re singing as opposed to speaking?
Yes. There are differences. Depending on your application you have to figure out what the best kind of mic is for your application. If you are trying to record a complete room, well then you’re wanting an omnidirectional mic. If you’re trying to record a person speaking straight on into that mic, or maybe an instrument playing directly into it, then you would go for the unidirectional. Meaning from one direction. The mic that I use is unidirectional. There’s a sweet spot, and wherever you can position your mouth in relevance to your mic or whatever’s making the noise you’re trying to record in relevance to the mic, you adjust it in that uni position, that one position that’s the sweet spot for that mic. That gives you the best sound for the recording. If you’re going with an omnidirectional, you may have 4 or 5 omni mics strategically placed around the room.
What about the computer? Can you do audio work with a simple desktop system or does it have to be a $3,000 computer with some fancy audio card?
It used to have to be a fancy $3,000 computer. Now we’re up into 48 and 96 bit recording arenas, but 24 bit was the lowest anyone could go if you were going to be real serious in capturing audio on your computer and the processing speed – we just didn’t have a lot of that back then. So $3,000 was not out of the question. You really had to have a beefy machine back then and what made it so costly, aside from obviously the processor, was the separate sound card. You needed these very specialized sound cards and all the RAM that you could possibly get put into these machines. Of course hard drive price was at a premium back then and if you were going to do anything more than a minute or two recording at a time, you’d better have a very fast hard drive. You had to have processors with L1 and L2 cache that was just maxed out as much as possible, giving as much buffer as you possibly could to the hard drives because they just were not transferring data. Hard drives back then, if they even had a buffer at all, were certainly nothing to write home about. So you had to get the fastest drives possible. 5400 rpm would never do. Anything under 7200 rpm would just cause a lot of hiccups in the audio when it would go to write out of the memory. And of course if you didn’t have enough RAM, well you’d wind up crashing your system or just all sorts of stuff, trying to get throughput back to the drive. So yeah there were many growing pains back then. But fortunately a lot of people stuck with it and hardware prices came down. You can literally now set up a turnkey studio, with the recording software from Sonar, you can do this for around $600-700 now and have a very nice system.
Does that include the $200 microphone?
No, absolutely not. You gotta buy gas for your car too.
What would you recommend to someone who is interested in starting to record voiceovers? Like training, demos, lots of practice, that sort of thing?
Well, back when, if we weren’t in today’s age, I would say call up radio stations and see if the DJs would release their little demo recordings. A lot of DJs would do that; they would just put out their stuff like that, or you could flip back from one radio station to another and listen to the commercials, because the commercials contain a tremendous amount of talent. They’ve got to sell people on the idea that those people in the local area – their good name is going to sound great coming out of that box at the radio station. And the only way to do that is to show pizzazz in the voice so the radio stations were looking to get as many different ways of speaking and saying various things out, becomes their commercials. That’s a great way to study what’s going on.
In today’s more so modern technological advances, I would say if someone’s interested, is to go online to various radio stations and find the individual web site for each one of the DJs. There’s just a preponderance of information available on the web. Free audio out there on various talents’ web sites – a DJ or someone doing voiceover, that’s called a “talent” – that’s what you’re going to be if you’re getting into this line, so you stumble onto their web sites, you find where their audio demos are, you download those and just have at it. And take a wide variety of what’s out there. You can get more now in one concerted effort in one sitting online and finding this information than you could flipping back and forth between radio stations for an entire month. You can really go out there and get a lot of this audio down and in front of you so that you can consume it at your own pace.
Would you recommend that someone get a voice coach or acting lessons or some kind of formal training?
Especially if it were someone younger, and maybe they have a great voice but they’re not so good on their pronunciation, or maybe they’re not so good on being able to speak the King’s English without an accent applied, a voice coach is something they could explore, but most people can just listen to sitcoms. If they can hear something and then replicate it with their own voice, they can work around whatever their local social conditioning is. But they have to be able to hear it.
Great speaking isn’t about speaking, it’s about hearing and converting that into the presentation.
You’ve got to be able to hear it first, even if you’re just hearing it in your head you’ve got to be able to hear it first before you can present it. Sadly, that’s a step that most people skip.
I know a lot of voiceover professionals work out of their homes. What kind of challenges are there with working out of a home studio?
Other people. If you’re not living in your own bat cave; if you don’t have absolute control over your environment; you might even be living alone and recording in an apartment arena but your next door neighbors have kids or maybe they like to fight a lot and beat on the walls – who knows what – but if you cannot control your recording environment, it’s just nothing but frustration. You’ve got to be able to control the world around you because if you can’t do that and reduce stress – because that’s going to come out in your voice too – if this isn’t something that is easy for you to control, then maybe you ought to consider moving or taking on another line of work or whatever. You’ve really got to be able to convey for your client whatever it is that they’re wanting, and if the stress is coming through your voice, that’s a big problem. The other thing is, other people have lives too and they cannot operate around your schedule or your client’s schedule, so you know, many people who do this type of work, typically work late at night, they’re more so on the end of a recluse. If they can’t do full sound proof they’ll keep their studio as isolated as possible. It’s sometimes very difficult, but you don’t want to have to go back and fix something in post by canceling out sounds unless it was something you accidentally boo-booed during the recording and didn’t fix in real time. Other than that, trying to cancel out all other types of noises after-the-fact, you’re going to quadruple if not even more of the time you spend in this versus putting something in the can and getting it to your client. You shouldn’t have to fix that much in post outside of just the things you’re responsible for personally. You can’t take on the rest of the planet because that too will be recorded if you can’t control your environment. That’s one of the biggest frustrations about any of this is the working at home element. You’ve got to be able to cage yourself off somewhere; and it’s not just about the passerby or the random noise or someone setting something down on the coffee table, it could be something as simple as a ticking clock in the other room. Believe it or not it could be that. Or the dog that’s been sleeping all night and you’ve just gotten full stride and are in your first minute or minute and a half of recording and it’s fixing to go very well and all of a sudden the dog will bark for no apparent reason. If you’re not controlling your explosion, if you’re not controlling your surroundings, then you’re just in for a huge emotional ride.
About the author: Max Laing has spent years working with audio professionals for over 20 years. His latest endeavor, the 2 CD dichotic audio kit titled Allowing Success is helping individuals around the globe to realize their own potential.
I’ve been creating custom radio ads, television commercial soundtracks and special audio pieces for over 26 years and have a few tips to help in creating your productions from scratch:
I’ve seen so many people write copy and then go searching for that “perfect” music track afterwards and rarely, if ever, find it. I’ve found it’s always best to start with the music and write to it. Here’s how I do it most of the time:
Before searching for music, I’ll have a few copy ideas written along with the key points needed in the copy. If the client has a slogan or a specific one liner, that has to be mentioned, I have that written down as well.
Since I know the target demo we’ll be trying to reach, I’ll begin my music search within the musical styles that would best appeal to that demo. On the other hand, if the target demo range is so broad, I ‘ll simply look for a style that would best match the feel for the intended ad. (i.e. Humorous, Romantic, Western etc)
While listening, I look for things like interesting stops, beats, instrumentation, chord changes, tempo shifts or whatever, that adds interest to the piece. Just like a hit single, there has to be a hook of some sort. Sometimes I’ll hear a measure or two that sounds like a perfect place to announce the slogan or a key point. If it repeats again, within the track, all the better. It’s quite similar to writing lyrics for a song but in this case, the words will be spoken.
In essence, I keep thinking about the product, the slogan, key points and so forth, while listening and just find that something always seems to pop up, that inspires the imagination and generates even more ideas for the copy. When I think I have one, I’ll keep playing it, while adding rough copy and reading along. If it just feels right, then I’ll download it and begin fine tuning the copy.
Once the copy’s written and I’ve purchased the music, I’ll begin tracking. Many times I’ll change my voice for separate tracks to reinforce slogans or key lines. On the changed voices, I may add outboard effects to really make them stand out. I will also add sound effects like sweepers or other effects, appropriate to the style, to either help punch those areas or, to act as transition segues, to return to the previous voice(s) etc.
All the while, I keep the music track up pretty loud in my headsets so I can let the words flow to the rhythm. Since I’ve already ‘written to the music’ the words flow quite naturally. I’ll often edit some words during the tracking process, to take full advantage of certain nuances in the music, as described above. I feel the more the announcer blends with the feel and flow of the music, the more the ad will sing, if you will.
I use one of the broadcast industries standard microphones, the EV RE-20. Actually I bought their PL-20 version because a it was a tad less expensive. (An engineering buddy said they’re the exact same mic. but I didn’t say that) Anyway, both are dynamic microphones. When I used to have a full fledged recording studio, for jingle work etc., my favorite was a Neumann U87. Unfortunately, those babies can pick up, a hair falling on carpet from forty feet and not practical, nor any condenser mic, when used close to noisy computers or computer-driven gear. (Fan noise to be exact but all condensers are quite sensitive).
One thing I learned early on when announcing, is that the closer one gets to the microphone, the more the lower frequencies will be accentuated. If I’m doing a soft, romantic number, I’ll literally eat the microphone. Of course one has to be very careful with breath, sibilance and other nuances when mic-ing so closely. When recording others less familiar with technique etc., I’ll generally have them voice in a perpendicular fashion rather than their mouth facing head on. This helps popping “P”s. I don’t use a windscreen but probably used to all the time with condensers.
One thing I always have to watch out for, while doing my voice tracks, is the sound level in my headphones. I have a tendency to raise the volume level, once my ears become accustomed. That’s a problem when stacking voice tracks, because the previous recorded tracks can seep through the earpiece and either interfere with the new track or at least be apparent in those pre-roll areas, where I’m waiting to announce the new stuff.
As rule, before mixing down, I always go back through all of the voice tracks on screen and edit out all of the non-wave-apparent areas. Be careful around the exit areas though, because you can clip off the end sibilance or such, from a word, that doesn’t or barely shows up on a wave pattern that’s zoomed out a bit. What I usually do with exit areas of a voice track is listen to it and stop it physically when I know the word is over. I then delete to the right.
If you’re recording where the gear is there’s bound to be some noise apparent, even with dynamic microphones. If you’re using music in the background, no problemo. It’ll never be heard. 95% of my work is done with music so I can live with the super subtle noise. If I have to do a straight voiceover, I have a section of a closet I climb into that’s lined with acoustic foam. It works.
Here’s a neat trick. I just did an ad recently where I wanted to get the effect of an entire baseball stadium chanting the name of the client’s business. I achieved it with myself and my wife only and you’d swear it’s the entire stadium. I voice a little 1, 2, 3, count and then yelled the line, (about a foot or two from the microphone). I then continued stacking tracks and changed my voice up and down and with different timbres etc., about 24 to 30 times. I then brought my wife in to do the same. She’s not a pro so I had to mess with her EQ on different tracks. Once I had them all, I spread them across the stereo spectrum and added a little bit of dark reverb as well as some subtle, delayed-slap echo. I was close at that point but something was missing.
I then added the crowd ambience of a baseball game that wasn’t during any cheering section. Once that was added, low and behold, it sounded exactly like the whole crowd was chanting my client’s message and he loved it. So did I!
Once I’ve recorded all the tracks, I have to mix down to stereo or mono. The best tip I’ve ever had for mixing for radio, is to do so with small, near-field monitors and at a medium to medium loud level. To much volume seems to accentuate the music (and bass frequencies) but when played back at normal levels, it almost disappears.
I remember years ago while mixing to huge JBL monitors, at loud levels, everything smoked and sounded great! Then, when I’d hear the track on radio, I couldn’t understand what happened to the music and certain effects. They literally vanished into the background. It was a rude awakening for sure and after talking with some true audio gurus, I learned the error of my ways.
I still have a pair of larger JBL monitors and when I think I’ve achieved a great mix in the near-fields, I’ll flip to them quick. Sometimes the bass is a tad too much, so I’ll roll off a little bottom end. Again though, how many radio listeners are listening to radio, through studio monitors driven by a Crown power amp?
There are a zillion monitors out there and were I doing more sophisticated music tracking etc., I would definitely invest handsomely into monitors. Since the majority of my work is for radio or television and because I’ve been doing this for so many years and know my sound etc, it really doesn’t matter that much to me.
My near-fields are just a small pair of Roland powered speakers. I can adjust the tone on them as well to make up for my subtle, mid-hi hearing loss.
I’m sure most already know but if your stereo mix track will be played on Mono AM stations, you may suffer phase-out, that can wipe away A LOT of things in your mix. I always make separate mono mixes for those instances. (If you do that remember to adjust the outboard effects, like reverb, to mono as well, or some REALLY strange things can happen.)
Another thing to keep in mind is not OVERDOING compression. Radio stations are renowned for having gobs and gobs of compression on their signals and too much on your end, will render your track a compressed alien entity that you’ll actually hear breathing! I use a subtle 1.5 or 2 to 1 compression on my voice trax and a very subtle 3:1 on the master out. I hate hearing compression inhale and exhale.
Bob’s recommended compressor settings
Well, there’s a lot more to it but the above tips and thoughts have worked quite well for me in radio production. Best of luck with your endeavors and thanks Shockwave for providing such fabulous tracks and effects! Kudos as well to all the great composers and musicianship!
About the author: Bob Harper started as a DJ on FM radio station when he was 15. Since that time, he has written, produced and/or performed in thousands of broadcast & custom audio productions for clients worldwide. In his early 20’s he was a SAG/AFTRA spokesman/voice actor in NYC and in his late 20’s, started his own ad agency/jingle production house. After 10 years, he shifted his entire attention to custom broadcast and original audio productions. Recently, he wrote and fully produced a short story collection, Twisted Rhymes, that aired on XM Radio and is now distributed through audible.com & others around the world. www.hearbob.com.