audio notes
Contents
Make audio test files using Sox
The different synth types are sine, square, triangle, sawtooth, trapezium, exp, [white]noise, tpdfnoise, pinknoise, brownnoise, pluck.
DTMF frequencies | 1209 Hz 1336 Hz 1477 Hz 1633 Hz ------------------------------------------------- 697 Hz | 1 2 3 A 770 Hz | 4 5 6 B 852 Hz | 7 8 9 C 941 Hz | * 0 # D
Play a 3 second sine wave tone at a given frequency (440 Hz in this example).
play -n synth 3 sine 440
mix some tones
Play three DTMF numbers. Each is a mix of two sine waves.
play -n synth 1 sine 1209 sine 770 play -n synth 1 sine 1209 sine 697 play -n synth 1 sine 1336 sine 770
Play each string of a 6-string guitar in standard tuning
for note in E2 A2 D3 G3 B3 E4; do play -n synth 3 pluck $note done
Constant tone mixed with a swept tone
play --bits=16 -n synth 5 sine 1000 synth 4 sine mix 100-1000 channels 1 gain -3 play --bits=16 -n synth 9 sine 1000 synth 2 sine mix 1-2000 synth 2 sine mix 2000-1 channels 1 gain -3 # Same thing saved to a file: sox --bits=16 -n test-sound.wav synth 9 sine 1000 synth 2 sine mix 1-2000 synth 2 sine mix 2000-1 channels 1 gain -3
Almost play a tune
for note in C3 F4 A4 F4 C3 F4 A4 F4 C3 F4 F4 F4 E4 D4 C3; do play -n synth 0.25 pluck $note done
Remove/reduce background noise and hiss from an audio file
This is a two-step process; although, it can be run as a pipeline. First you need to analyze the audio to build up a profile of the noise. You want to sample a range that features only the background noise you want to remove. Typically you can sample the first 1 second of an audio file. This doesn't always work, but it mostly works.
sox audio_recording.wav -n trim 0 1 noiseprof | play audio_recording.wav noisered - 0.2
Trim silent gaps from audio
This removes silent sections from the beginning, middle, and end. Useful for compressing long audio logs that may contain many long pauses.
sox audio_recording.wav silence_removed.wav silence 1 0.1 1% -1 0.5 1%
Create a spectrogram (sonogram, FFT, etc.) of an audio file
spectrogram spectrograph sonogram sonograph spectral plot spectrum FFT Fourier Transform
The rate 6k option will narrow the frequency range view to the band most sensitive for human hearing. This cuts off frequencies above 3 kHz (half the sample rate of 6k). If you want the full frequency range then leave off the rate 6k option.
The -n is the NULL file option. This simply tells Sox that we don't want to actually create a new sound file. We are just analyzing the input file.
sox audio_recording.wav -n rate 6k spectrogram -t "Spectrogram of audio_recording.wav" -o spectrogram_20150531.png # For a white background use '-l' option: sox audio_recording.wav -n rate 6k spectrogram -l -t "Spectrogram of audio_recording.wav" -o spectrogram_20150531.png
Record audio from the microphone
Sox is probably the most universal tool for recording, manipulating, and playing back sound. There also alsa in Linux.
ALSA
List input audio devices (capture devices).
# arecord --list-devices **** List of CAPTURE Hardware Devices **** card 1: C930e [Logitech Webcam C930e], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0 card 2: Device [USB Audio Device], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0
List output audio devices (playback devices).
# aplay --list-devices **** List of PLAYBACK Hardware Devices **** card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA] Subdevices: 7/7 Subdevice #0: subdevice #0 Subdevice #1: subdevice #1 Subdevice #2: subdevice #2 Subdevice #3: subdevice #3 Subdevice #4: subdevice #4 Subdevice #5: subdevice #5 Subdevice #6: subdevice #6 card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 ALSA [bcm2835 IEC958/HDMI] Subdevices: 1/1 Subdevice #0: subdevice #0 card 2: Device [USB Audio Device], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0
Record and playback audio.
arecord --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 -V mono test.wav aplay --device=plughw:0,0 test.wav
Note plughw versus hw.
aplay --device=hw:0,0 test.wav
Play random random data from a stream
# Play random data. Listen to random numbers. aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0 /dev/urandom # Play whatever data is piped in through stdin. cat /dev/urandom | aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0 # This is the more explicit way to specify stdin. cat /dev/urandom | aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0 - # Listen to random data in a different format. cat /dev/urandom | aplay --format=U8 --rate=8000 --channels=1 --device=plughw:0,0 -
Test audio with loopback monitor (beware of loud feedback!)
These examples may be used to directly listen to the audio source from a capture device. This is also a useful end-to-end test of the audio system.
In the following examples the capture device is hw:1,0 (card 1, device 0) and the playback device is hw:0,0.
Note that -t 50000 option int he examples sets the latency in microseconds (50000 microseconds is 50 milliseconds). In my tests this should be included. If left out or set much lower than 50000 then the audio stream seems to occasionally get stuck or drops frames. I suspect this is due to sample rate drift between capture and playback streams. The effect with no latency set is harmless but you will hear annoying drops and buzzing. The effect is even worse if you use the plughw devices instead of hw.
This will loop a capture device to a playback device.
alsaloop -v -c 1 -C hw:1,0 -P hw:0,0 -t 50000
You can turn this loop into a daemon so that the feedback loop continues in the background. Kill the process to stop the loop.
alsaloop -daemonize -c 1 -C hw:1,0 -P hw:0,0 -t 50000
BONUS! You can also manually feedback the capture and playback streams just by connecting arecord and aplay with a pipe.
arecord -v -V mono --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 - | aplay -v --device=plughw:0,0 -
Setting format and rate is not strictly required. The following works, but at noticeably lower quality.
arecord -v -V mono --channels=1 --device=plughw:1,0 - | aplay -v --device=plughw:0,0 -
Buffering and latency
The default buffer time is 500000 microseconds or 500 millisecond or 1/2 a second. That means the latency will be at least half a second, so you will hear audio delayed by half a second. You can lower the buffer time to reduce latency. In this example, 50000 microseconds is 50 milliseconds, which is barely perceptible compared to the 500 millisecond default.
arecord -v --buffer-time=50000 -V mono --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 - | aplay -v --buffer-time=50000 --device=plughw:0,0 -
Debugging: Check that the mic capture switch is not turned off. Also check that Mic Capture Volume id set high.
# amixer --card 1 contents ... numid=7,iface=MIXER,name='Mic Capture Switch' ; type=BOOLEAN,access=rw------,values=1 : values=off ... # amixer --card=1 cset numid=7 1 numid=7,iface=MIXER,name='Mic Capture Switch' ; type=BOOLEAN,access=rw------,values=1 : values=on
Record audio using Sox
Sox works on Linux and OS X (installed through Brew).
Simple stereo recording:
rec --channels 2 audio_recording.wav sox audio_recording.wav -n spectrogram -z 100 -t "Spectrogram of audio_recording.wav" -c '' -o audio_recording.spectrogram.png
Record audio without silent gaps using sox. This splits into separate files based on silent gaps. This does not start recording until it detects sound. It splits the audio separated by 1 seconds of silence into separate files. It stops recording after 5 seconds of silence. You may set the 0:05 to 1:00' to choose 1 minute instead of 5 seconds. So, if you carefully count "1... 2... 3... 4... 5..." with 1 second of silence between each number terminated by 10 seconds of silence you should end up with 5 different files.
# Only set AUDIODRIVER and AUDIODEV for Linux ALSA systems. OSX systems should not change these. export AUDIODRIVER=alsa export AUDIODEV=hw:1,0 rec -V3 -p | \ sox -p -p silence 0 1 0:05 5% | \ sox -p -r 44100 -e signed-integer -b 16 --endian little audio_recording.wav silence -l 0 0 0:01 3% : newfile : restart
Playback audio OX X using Sox
Note that where an output filename is require you may substitute -d or -t coreaudio (for Mac OS X). These seem to be equivalent. The -d option seems to be the more general purpose style since it will automatically pick the correct sound output on a Mac and Linux.
Both examples below play audio and both will automatically detect the audio stream type. The play command is the easier to remember version. You may have special reasons for wanting to use the sox command alternative.
play audio_recording.wav cat audio_recording.wav | sox - -t coreaudio
Play noise
These are all equivalent using /dev/urandom.
# From a file or device file. sox -t raw -r 44100 -b 16 -e unsigned-integer /dev/urandom -d sox -t raw -r 44100 -b 16 -e unsigned-integer /dev/urandom -t coreaudio # Using a pipe... cat /dev/urandom | sox -t raw -r 44100 -b 16 -e unsigned-integer - -d cat /dev/urandom | sox -t raw -r 44100 -b 16 -e unsigned-integer - -t coreaudio
This uses Sox's built-in noise generator:
play --null synth whitenoise
This sounds best:
play --channels 2 --null --show-progress synth 01:00 brownnoise band -n 400 499 tremolo 0.1 70 reverb 19 bass -11 treble -1 vol 12dB repeat 19