Difference between revisions of "audio notes"

From Noah.org
Jump to: navigation, search
m (Remove/reduce background noise and hiss from an audio file)
m (DTMF: mix some tones to dial a phone)
 
(52 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
[[Category:Engineering]]
 +
[[Category:Audio]]
 +
 +
== Make audio test files using Sox ==
 +
 +
The different synth types are sine, square, triangle, sawtooth, trapezium, exp, [white]noise, tpdfnoise, pinknoise, brownnoise, pluck.
 +
 +
=== Play a 3 second sine wave at a given frequency (440 Hz in this example). ===
 +
 +
<pre>
 +
play -n synth 3 sine 440
 +
</pre>
 +
 +
=== DTMF: mix some tones to dial a phone ===
 +
 +
Detecting DTMF tones is usually done using the Goertzel algorithm; although other methods such NDFET (Normalized Direct Frequency Estimation Technique) and MUSIC (MUltiple SIgnal Classification). The FFT can, of course, also be used but is overkill for this problem. The MUSIC algorithm is the most accurate and will match with the fewest number of input samples.
 +
 +
<pre>
 +
DTMF frequencies (AUTOVON key labels are in parenthesis)
 +
 +
        | 1209 Hz  1336 Hz  1477 Hz  1633 Hz
 +
-------------------------------------------------
 +
697 Hz  |  1        2        3        A (FO - Flash Override)
 +
770 Hz  |  4        5        6        B (F - Flash)
 +
852 Hz  |  7        8        9        C (I - Immediate)
 +
941 Hz  |  *        0        # (A)    D (P - Priority)
 +
</pre>
 +
 +
Play DTMF numbers. Each is a mix of two sine waves.
 +
<pre>
 +
# 0
 +
play -n synth 0.5 sin 941 sin 1336
 +
# 1
 +
play -n synth 0.5 sin 697 sin 1209
 +
# 2
 +
play -n synth 0.5 sin 697 sin 1336
 +
# 3
 +
play -n synth 0.5 sin 697 sin 1477
 +
# 4
 +
play -n synth 0.5 sin 770 sin 1209
 +
# 5
 +
play -n synth 0.5 sin 770 sin 1336
 +
# 6
 +
play -n synth 0.5 sin 770 sin 1477
 +
# 7
 +
play -n synth 0.5 sin 852 sin 1209
 +
# 8
 +
play -n synth 0.5 sin 852 sin 1336
 +
# 9
 +
play -n synth 0.5 sin 852 sin 1477
 +
# *
 +
play -n synth 0.5 sin 941 sin 1209
 +
# #
 +
play -n synth 0.5 sin 941 sin 1477
 +
# A
 +
play -n synth 0.5 sin 697 sin 1633
 +
# B
 +
play -n synth 0.5 sin 770 sin 1633
 +
# C
 +
play -n synth 0.5 sin 852 sin 1633
 +
# D
 +
play -n synth 0.5 sin 941 sin 1633
 +
</pre>
 +
 +
=== Play each string of a 6-string guitar in standard tuning ===
 +
<pre>
 +
for note in E2 A2 D3 G3 B3 E4; do
 +
    play -n synth 3 pluck $note
 +
done
 +
</pre>
 +
 +
=== Play some complex harmonics ===
 +
<pre>
 +
play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1
 +
</pre>
 +
 +
=== Effect of Tremolo ===
 +
<pre>
 +
play -n synth 1 sin 3200 sin 3500 sin 3700 sin 2900 sin 3900 tremolo 10 40
 +
play -n synth 1 sin 3200 sin 3500 sin 3700 sin 2900 sin 3900 tremolo 10 90
 +
</pre>
 +
 +
=== Constant tone mixed with a swept tone ===
 +
 +
<pre>
 +
play --bits=16 -n synth 5 sine 1000 synth 4 sine mix 100-1000 channels 1 gain -3
 +
play --bits=16 -n synth 9 sine 1000 synth 2 sine mix 1-2000 synth 2 sine mix 2000-1 channels 1 gain -3
 +
# Same thing saved to a file:
 +
sox --bits=16 -n test-sound.wav synth 9 sine 1000 synth 2 sine mix 1-2000 synth 2 sine mix 2000-1 channels 1 gain -3
 +
</pre>
 +
 +
=== Almost play a tune ===
 +
<pre>
 +
for note in C3 F4 A4 F4 C3 F4 A4 F4 C3 F4 F4 F4 E4 D4 C3; do
 +
    play -n synth 0.25 pluck $note
 +
done
 +
</pre>
 +
 +
=== Play from a command here file ===
 +
 +
Need a better way to handle reading two tokens at a time from a single returned line.
 +
<pre>
 +
while read note length -d ","; do
 +
    play -n synth ${length} pluck ${note}
 +
done < <( echo C3 0.25,F4 0.25,A4 0.25,F4 0.25,C3 0.25,F4 0.25,A4 0.25,F4 0.25,C3 0.25,F4 0.25,F4 0.25,E4 0.25,D4 0.25,C3 0.25)
 +
</pre>
 +
 
== Remove/reduce background noise and hiss from an audio file ==
 
== Remove/reduce background noise and hiss from an audio file ==
  
Line 9: Line 116:
 
== Trim silent gaps from audio ==
 
== Trim silent gaps from audio ==
  
This removes silent sections from the beginning, middle, and end. Useful for compressing long auidio logs that may contain many long pauses.
+
This removes silent sections from the beginning, middle, and end. Useful for compressing long audio logs that may contain many long pauses.
 
<pre>
 
<pre>
 
sox audio_recording.wav silence_removed.wav silence 1 0.1 1% -1 0.5 1%
 
sox audio_recording.wav silence_removed.wav silence 1 0.1 1% -1 0.5 1%
 
</pre>
 
</pre>
  
== Create a spectrogram of an audio file ==
+
== Create a spectrogram (sonogram, FFT, etc.) of an audio file ==
 +
 
 +
'''spectrogram spectrograph sonogram sonograph spectral plot spectrum FFT Fourier Transform'''
  
 
The '''rate 6k''' option will narrow the frequency range view to the band most sensitive for human hearing. This cuts off frequencies above 3 kHz (half the sample rate of 6k). If you want the full frequency range then leave off the '''rate 6k''' option.
 
The '''rate 6k''' option will narrow the frequency range view to the band most sensitive for human hearing. This cuts off frequencies above 3 kHz (half the sample rate of 6k). If you want the full frequency range then leave off the '''rate 6k''' option.
Line 28: Line 137:
 
== Record audio from the microphone ==
 
== Record audio from the microphone ==
  
 +
Sox is probably the most universal tool for recording, manipulating, and playing back sound.
 +
There also alsa in Linux.
 +
 +
=== ALSA ===
 +
 +
List input audio devices ('''capture''' devices).
 
<pre>
 
<pre>
arecord -vv -fdat recording.wav
+
# arecord --list-devices
 +
**** List of CAPTURE Hardware Devices ****
 +
card 1: C930e [Logitech Webcam C930e], device 0: USB Audio [USB Audio]
 +
  Subdevices: 1/1
 +
  Subdevice #0: subdevice #0
 +
card 2: Device [USB Audio Device], device 0: USB Audio [USB Audio]
 +
  Subdevices: 1/1
 +
  Subdevice #0: subdevice #0
 
</pre>
 
</pre>
  
Using Sox (this works on OS X). This splits on silent gaps:
+
List output audio devices ('''playback''' devices).
 +
<pre># aplay --list-devices
 +
**** List of PLAYBACK Hardware Devices ****
 +
card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
 +
  Subdevices: 7/7
 +
  Subdevice #0: subdevice #0
 +
  Subdevice #1: subdevice #1
 +
  Subdevice #2: subdevice #2
 +
  Subdevice #3: subdevice #3
 +
  Subdevice #4: subdevice #4
 +
  Subdevice #5: subdevice #5
 +
  Subdevice #6: subdevice #6
 +
card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 ALSA [bcm2835 IEC958/HDMI]
 +
  Subdevices: 1/1
 +
  Subdevice #0: subdevice #0
 +
card 2: Device [USB Audio Device], device 0: USB Audio [USB Audio]
 +
  Subdevices: 1/1
 +
  Subdevice #0: subdevice #0
 +
</pre>
 +
 
 +
Record and playback audio.
 +
<pre>
 +
arecord --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 -V mono test.wav
 +
aplay --device=plughw:0,0 test.wav
 +
</pre>
 +
Note '''plughw''' versus '''hw'''.
 +
<pre>
 +
aplay --device=hw:0,0 test.wav
 +
</pre>
 +
 
 +
==== Play random random data from a stream ====
 +
 
 +
<pre>
 +
# Play random data. Listen to random numbers.
 +
aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0 /dev/urandom
 +
# Play whatever data is piped in through stdin.
 +
cat /dev/urandom | aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0
 +
# This is the more explicit way to specify stdin.
 +
cat /dev/urandom | aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0 -
 +
# Listen to random data in a different format.
 +
cat /dev/urandom | aplay --format=U8 --rate=8000 --channels=1 --device=plughw:0,0 -
 +
</pre>
 +
 
 +
==== Test audio with loopback monitor (beware of loud feedback!) ====
 +
 
 +
These examples may be used to directly listen to the audio source from a capture device. This is also a useful end-to-end test of the audio system.
 +
 
 +
In the following examples the capture device is '''hw:1,0''' (card 1, device 0) and the playback device is '''hw:0,0'''.
 +
 
 +
Note that '''-t 50000''' option int he examples sets the latency in microseconds (50000 microseconds is 50 milliseconds). In my tests this should be included. If left out or set much lower than 50000 then the audio stream seems to occasionally get stuck or drops frames. I suspect this is due to sample rate drift between capture and playback streams. The effect with no latency set is harmless but you will hear annoying drops and buzzing. The effect is even worse if you use the '''plughw''' devices instead of '''hw'''.
 +
 
 +
This will loop a capture device to a playback device.
 +
<pre>
 +
alsaloop -v -c 1 -C hw:1,0 -P hw:0,0 -t 50000
 +
</pre>
 +
You can turn this loop into a daemon so that the feedback loop continues in the background. Kill the process to stop the loop.
 +
<pre>
 +
alsaloop -daemonize -c 1 -C hw:1,0 -P hw:0,0 -t 50000
 +
</pre>
 +
 
 +
BONUS! You can also manually feedback the capture and playback streams just by connecting '''arecord''' and '''aplay''' with a pipe.
 +
<pre>
 +
arecord -v -V mono --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 - | aplay -v --device=plughw:0,0 -
 +
</pre>
 +
Setting format and rate is not strictly required. The following works, but at noticeably lower quality.
 +
<pre>
 +
arecord -v -V mono --channels=1 --device=plughw:1,0 - | aplay -v --device=plughw:0,0 -
 +
</pre>
 +
 
 +
==== Buffering and latency ====
 +
 
 +
For ALSA, the default buffer time is 500000 microseconds or 500 millisecond or 1/2 a second. That means the latency will be at least half a second, so you will hear audio delayed by half a second. You can lower the buffer time to reduce latency. In this example, 50000 microseconds is 50 milliseconds, which is barely perceptible.
 +
<pre>
 +
arecord -v --buffer-time=50000 -V mono --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 - | aplay -v --buffer-time=50000 --device=plughw:0,0 -
 +
</pre>
 +
 
 +
==== Debugging ====
 +
 
 +
Check that the '''Mic Capture Switch''' is on. Also check that '''Mic Capture Volume''' is set high (50% to 75%). Too high is better than too low.
 +
<pre>
 +
# amixer --card 1 contents
 +
...
 +
numid=7,iface=MIXER,name='Mic Capture Switch'
 +
  ; type=BOOLEAN,access=rw------,values=1
 +
  : values=off
 +
...
 +
# amixer --card=1 cset numid=7 1
 +
numid=7,iface=MIXER,name='Mic Capture Switch'
 +
  ; type=BOOLEAN,access=rw------,values=1
 +
  : values=on
 +
</pre>
 +
 
 +
== Record audio using Sox ==
 +
 
 +
Sox works on Linux and OS X (installed through Brew), so this is more portable than using ALSA.
 +
 
 +
Simple stereo recording with default sample rates and format:
 +
<pre>
 +
rec --channels 2 audio_recording.wav
 +
</pre>
 +
 
 +
== Splitting and Skipping Silence in Sox ==
 +
 
 +
Sox can be used to skip or trim silent sections from audio, and/or split into separate files between silent gaps.
 +
 
 +
Sox can detect silence in an audio stream. When silence is detected two actions may be triggered; the output stream may be split into a new file, and the silence may be skipped. Either or both actions are possible.
 +
 
 +
=== setup ===
 +
 
 +
For Linux you may need to setup environment variables so Sox knows which ALSA devices to use.
 +
This is not necessary in OSX.
 +
 
 +
<pre>
 +
# Only set AUDIODRIVER and AUDIODEV for Linux ALSA systems. OSX systems should not change these.
 +
export AUDIODRIVER=alsa
 +
# This in for a CAPTURE source (mic, line-in, etc.), not PLAYBACK.
 +
export AUDIODEV=hw:1,0
 +
</pre>
 +
 
 +
=== This splits audio stream on silent gaps (silence is kept at the beginnings of splits) ===
 +
 
 +
This splits into separate files based on silent gaps.
 +
This does not start recording until it detects sound.
 +
It splits the audio separated by 1 seconds of silence into separate files.
 +
It stops recording after 5 seconds of silence.
 +
You may set the '''0:05''' to '''1:00''' to choose 1 minute instead of 5 seconds.
 +
So, if you carefully count "1... 2... 3... 4... 5..." with 1 second of silence between each number terminated by 10 seconds of silence you should end up with 5 different files.
 +
<pre>
 +
rec -V3 -p | \
 +
sox -p -p silence 0 1 0:05 5% | \
 +
sox -p -r 48000 -e signed-integer -b 16 --endian little audio_recording.wav silence -l 0 0 0:01 3% : newfile : restart
 +
</pre>
 +
=== This splits and skips silence (silence is trimmed) ===
 +
This splits a stream separated by silent gaps into separate files, but it also trims the silence gaps so that the output files won't include the gaps. For this example, count numbers but allow 3 seconds of silence between each number and 10 seconds of silence to end. Note that the silence floor percentages were increased slightly to prevent noise from breaking up a silent gap.
 +
<pre>
 +
rec -V3 -p | sox -p -p silence 0 1 0:10 10% | sox -p -r 48000 -e signed-integer -b 16 --endian little audio_recording.wav silence 0 1 0:01 6% : newfile : restart
 +
</pre>
 +
Use this to check that gaps are skipped.
 +
<pre>
 +
aplay --device=hw:0,0 audio_recording*.wav
 +
</pre>
 +
 
 +
=== This only skips silence (no splitting) ===
 +
 
 +
This trims all silence from the beginning, middle, and end of a stream. If the silence lasts more than 10 seconds then the processing terminates. So if you count slowly with long gaps between numbers (less than 10 seconds) then all those gaps will be removed from the output until a silent gap longer than 10 seconds is found. Counting, "....1..2...3.4.........5.....6....7......8...9...........", turns into ".1.2.3.4.5.6.7.8.9.". Getting the percentages for silent level can be the trickiest part. This might be improved if some sort of filter were included in the stream, but a filter that would improve silence detection would also distort the signal if it were recorded. What is needed at parallel paths for the stream. One path would be filtered and used just for triggering actions on the other, unfiltered path, which would be the path that gets recorded. I am not sure if complex pipelines like this are possible with Sox. This is the sort of thing you can do with '''gstreamer''' and video.
 +
<pre>
 +
rec -V3 -p | sox -p -p silence 1 0.1 6% 1 0:10 6% | sox -p -r 48000 -e signed-integer -b 16 --endian little audio_recording.wav silence 1 0.1 6% -1 0.25 6%
 +
</pre>
 +
 
 +
== Generate a spectrogram with Sox ==
 +
(spectrogram, spectrograph, spectrum, sonogram, sonograph, waterfall, spectral view)
 
<pre>
 
<pre>
rec -r 44100 -b 16 -s -p silence 1 0.50 0.1% 1 10:00 0.1% \
+
sox audio_recording.wav -n spectrogram -z 100 -t "Spectrogram of audio_recording.wav" -c '' -o audio_recording.spectrogram.png
    | sox -p audio_recording.wav silence 1 0.50 0.1% 1 2.0 0.1% :  newfile : restart
+
 
</pre>
 
</pre>
  
Line 62: Line 333:
 
This uses Sox's built-in noise generator:
 
This uses Sox's built-in noise generator:
 
<pre>
 
<pre>
play -n synth whitenoise
+
play --null synth whitenoise
 +
</pre>
 +
 
 +
This sounds like ocean waves rolling in and out:
 +
<pre>
 +
play --null --channels=2 --show-progress synth brownnoise band -n 400 499 tremolo 0.1 70 reverb 19 bass -11 treble -1 vol 12dB repeat 19
 +
</pre>
 +
 
 +
This sounds like rain:
 +
<pre>
 +
play -n -c 2 synth pinknoise band -n 2500 4000 reverb 20
 +
</pre>
 +
 
 +
This sounds like fun:
 +
<pre>
 +
play -n synth 2.5 sin 667 gain -5 bend .35,180,.25  .15,740,.53  0,-520,.3
 +
</pre>
 +
 
 +
This sounds like aliens:
 +
<pre>
 +
play -n synth 10 sine mix 200-1000 synth 10 sine mix 700-1 synth 10 sine amod 10-1 synth 10 sine fmod 10-1
 +
</pre>
 +
 
 +
 
 +
 
 +
== Jack ==
 +
 
 +
Use '''qjackctl''' to start and stop the Jack engine and to connect audio readable clients (capture) to writable clients (playback). For example, connect '''system/capture_1''' (microphone) to '''system/playback_1''' (left channel speaker). The example below set different devices (cards) for capture (-C hw:2,0) and playback (-P hw:0,0) when starting '''jackd'''
 +
<pre>
 +
jackd -v -R -d alsa -C hw:2,0 -P hw:0,0 -r 48000 -H -z s
 +
ecasound -f:32,1,48000 -i null -o jack_alsa,myport -b:1024 -el:sine_fcac,440,1
 +
qjackctl
 +
</pre>
 +
 
 +
== AWK Music ==
 +
 
 +
<pre>
 +
#!/usr/bin/env bash
 +
#
 +
# This plays random notes on a pentatonic scale.
 +
# I think this was originally from Kyle Keen.
 +
#
 +
 
 +
if uname -a | grep -iq darwin; then
 +
AUDIO_TARGET='| sox -t raw -r 64k -c 1 -e unsigned -b 8 - -d'
 +
else
 +
# Assume Linux.
 +
if [ -e /dev/dsp ]; then
 +
AUDIO_TARGET='> /dev/dsp'
 +
else
 +
AUDIO_TARGET='| aplay -r 64000'
 +
fi
 +
fi
 +
 
 +
# Note that 0.87055 is the fifth root of 1/2, so the octave is divided
 +
# into fifths -- a pentatonic scale.
 +
AWK_SCRIPT='awk '"'"'
 +
    function wl() {
 +
        rate=34000;
 +
        return (rate/160)*(0.87055^(int(rand()*10)))};
 +
    BEGIN {
 +
        srand();
 +
        wla=wl();
 +
        while(1) {
 +
            wlb=wla;
 +
            wla=wl();
 +
            if (wla==wlb)
 +
                {wla*=2;};
 +
            d=(rand()*10+5)*rate/4;
 +
            a=b=0; c=128;
 +
            ca=40/wla; cb=20/wlb;
 +
            de=rate/10; di=0;
 +
            for (i=0;i<d;i++) {
 +
                a++; b++; di++; c+=ca+cb;
 +
                if (a>wla)
 +
                    {a=0; ca*=-1};
 +
                if (b>wlb)
 +
                    {b=0; cb*=-1};
 +
                if (di>de)
 +
                    {di=0; ca*=0.9; cb*=0.9};
 +
                printf("%c",c)};
 +
            c=int(c);
 +
            while(c!=128) {
 +
                c<128?c++:c--;
 +
                printf("%c",c)};};}'"'"''
 +
eval "${AWK_SCRIPT} ${AUDIO_TARGET}"
 
</pre>
 
</pre>

Latest revision as of 01:03, 15 April 2019


Make audio test files using Sox

The different synth types are sine, square, triangle, sawtooth, trapezium, exp, [white]noise, tpdfnoise, pinknoise, brownnoise, pluck.

Play a 3 second sine wave at a given frequency (440 Hz in this example).

play -n synth 3 sine 440

DTMF: mix some tones to dial a phone

Detecting DTMF tones is usually done using the Goertzel algorithm; although other methods such NDFET (Normalized Direct Frequency Estimation Technique) and MUSIC (MUltiple SIgnal Classification). The FFT can, of course, also be used but is overkill for this problem. The MUSIC algorithm is the most accurate and will match with the fewest number of input samples.

DTMF frequencies (AUTOVON key labels are in parenthesis)

        | 1209 Hz  1336 Hz  1477 Hz  1633 Hz
-------------------------------------------------
697 Hz  |  1        2        3        A (FO - Flash Override)
770 Hz  |  4        5        6        B (F - Flash)
852 Hz  |  7        8        9        C (I - Immediate)
941 Hz  |  *        0        # (A)    D (P - Priority)

Play DTMF numbers. Each is a mix of two sine waves.

# 0
play -n synth 0.5 sin 941 sin 1336
# 1
play -n synth 0.5 sin 697 sin 1209
# 2
play -n synth 0.5 sin 697 sin 1336
# 3
play -n synth 0.5 sin 697 sin 1477
# 4
play -n synth 0.5 sin 770 sin 1209
# 5
play -n synth 0.5 sin 770 sin 1336
# 6
play -n synth 0.5 sin 770 sin 1477
# 7
play -n synth 0.5 sin 852 sin 1209
# 8
play -n synth 0.5 sin 852 sin 1336
# 9
play -n synth 0.5 sin 852 sin 1477
# *
play -n synth 0.5 sin 941 sin 1209
# #
play -n synth 0.5 sin 941 sin 1477
# A
play -n synth 0.5 sin 697 sin 1633
# B
play -n synth 0.5 sin 770 sin 1633
# C
play -n synth 0.5 sin 852 sin 1633
# D
play -n synth 0.5 sin 941 sin 1633

Play each string of a 6-string guitar in standard tuning

for note in E2 A2 D3 G3 B3 E4; do
    play -n synth 3 pluck $note
done

Play some complex harmonics

play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1

Effect of Tremolo

play -n synth 1 sin 3200 sin 3500 sin 3700 sin 2900 sin 3900 tremolo 10 40
play -n synth 1 sin 3200 sin 3500 sin 3700 sin 2900 sin 3900 tremolo 10 90

Constant tone mixed with a swept tone

play --bits=16 -n synth 5 sine 1000 synth 4 sine mix 100-1000 channels 1 gain -3
play --bits=16 -n synth 9 sine 1000 synth 2 sine mix 1-2000 synth 2 sine mix 2000-1 channels 1 gain -3
# Same thing saved to a file:
sox --bits=16 -n test-sound.wav synth 9 sine 1000 synth 2 sine mix 1-2000 synth 2 sine mix 2000-1 channels 1 gain -3

Almost play a tune

for note in C3 F4 A4 F4 C3 F4 A4 F4 C3 F4 F4 F4 E4 D4 C3; do
    play -n synth 0.25 pluck $note
done

Play from a command here file

Need a better way to handle reading two tokens at a time from a single returned line.

while read note length -d ","; do
    play -n synth ${length} pluck ${note}
done < <( echo C3 0.25,F4 0.25,A4 0.25,F4 0.25,C3 0.25,F4 0.25,A4 0.25,F4 0.25,C3 0.25,F4 0.25,F4 0.25,E4 0.25,D4 0.25,C3 0.25)

Remove/reduce background noise and hiss from an audio file

This is a two-step process; although, it can be run as a pipeline. First you need to analyze the audio to build up a profile of the noise. You want to sample a range that features only the background noise you want to remove. Typically you can sample the first 1 second of an audio file. This doesn't always work, but it mostly works.

sox audio_recording.wav -n trim 0 1 noiseprof | play audio_recording.wav noisered - 0.2

Trim silent gaps from audio

This removes silent sections from the beginning, middle, and end. Useful for compressing long audio logs that may contain many long pauses.

sox audio_recording.wav silence_removed.wav silence 1 0.1 1% -1 0.5 1%

Create a spectrogram (sonogram, FFT, etc.) of an audio file

spectrogram spectrograph sonogram sonograph spectral plot spectrum FFT Fourier Transform

The rate 6k option will narrow the frequency range view to the band most sensitive for human hearing. This cuts off frequencies above 3 kHz (half the sample rate of 6k). If you want the full frequency range then leave off the rate 6k option.

The -n is the NULL file option. This simply tells Sox that we don't want to actually create a new sound file. We are just analyzing the input file.

sox audio_recording.wav -n rate 6k spectrogram -t "Spectrogram of audio_recording.wav" -o spectrogram_20150531.png
# For a white background use '-l' option:
sox audio_recording.wav -n rate 6k spectrogram -l -t "Spectrogram of audio_recording.wav" -o spectrogram_20150531.png

Record audio from the microphone

Sox is probably the most universal tool for recording, manipulating, and playing back sound. There also alsa in Linux.

ALSA

List input audio devices (capture devices).

# arecord --list-devices
**** List of CAPTURE Hardware Devices ****
card 1: C930e [Logitech Webcam C930e], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 2: Device [USB Audio Device], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

List output audio devices (playback devices).

# aplay --list-devices
**** List of PLAYBACK Hardware Devices ****
card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
  Subdevices: 7/7
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 ALSA [bcm2835 IEC958/HDMI]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 2: Device [USB Audio Device], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Record and playback audio.

arecord --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 -V mono test.wav
aplay --device=plughw:0,0 test.wav

Note plughw versus hw.

aplay --device=hw:0,0 test.wav

Play random random data from a stream

# Play random data. Listen to random numbers.
aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0 /dev/urandom
# Play whatever data is piped in through stdin.
cat /dev/urandom | aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0
# This is the more explicit way to specify stdin.
cat /dev/urandom | aplay --format=S16_LE --rate=44100 --channels=1 --device=plughw:0,0 -
# Listen to random data in a different format.
cat /dev/urandom | aplay --format=U8 --rate=8000 --channels=1 --device=plughw:0,0 -

Test audio with loopback monitor (beware of loud feedback!)

These examples may be used to directly listen to the audio source from a capture device. This is also a useful end-to-end test of the audio system.

In the following examples the capture device is hw:1,0 (card 1, device 0) and the playback device is hw:0,0.

Note that -t 50000 option int he examples sets the latency in microseconds (50000 microseconds is 50 milliseconds). In my tests this should be included. If left out or set much lower than 50000 then the audio stream seems to occasionally get stuck or drops frames. I suspect this is due to sample rate drift between capture and playback streams. The effect with no latency set is harmless but you will hear annoying drops and buzzing. The effect is even worse if you use the plughw devices instead of hw.

This will loop a capture device to a playback device.

alsaloop -v -c 1 -C hw:1,0 -P hw:0,0 -t 50000

You can turn this loop into a daemon so that the feedback loop continues in the background. Kill the process to stop the loop.

alsaloop -daemonize -c 1 -C hw:1,0 -P hw:0,0 -t 50000

BONUS! You can also manually feedback the capture and playback streams just by connecting arecord and aplay with a pipe.

arecord -v -V mono --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 - | aplay -v --device=plughw:0,0 -

Setting format and rate is not strictly required. The following works, but at noticeably lower quality.

arecord -v -V mono --channels=1 --device=plughw:1,0 - | aplay -v --device=plughw:0,0 -

Buffering and latency

For ALSA, the default buffer time is 500000 microseconds or 500 millisecond or 1/2 a second. That means the latency will be at least half a second, so you will hear audio delayed by half a second. You can lower the buffer time to reduce latency. In this example, 50000 microseconds is 50 milliseconds, which is barely perceptible.

arecord -v --buffer-time=50000 -V mono --format=S16_LE --rate=44100 --channels=1 --device=plughw:1,0 - | aplay -v --buffer-time=50000 --device=plughw:0,0 -

Debugging

Check that the Mic Capture Switch is on. Also check that Mic Capture Volume is set high (50% to 75%). Too high is better than too low.

# amixer --card 1 contents
...
numid=7,iface=MIXER,name='Mic Capture Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
...
# amixer --card=1 cset numid=7 1
numid=7,iface=MIXER,name='Mic Capture Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on

Record audio using Sox

Sox works on Linux and OS X (installed through Brew), so this is more portable than using ALSA.

Simple stereo recording with default sample rates and format:

rec --channels 2 audio_recording.wav

Splitting and Skipping Silence in Sox

Sox can be used to skip or trim silent sections from audio, and/or split into separate files between silent gaps.

Sox can detect silence in an audio stream. When silence is detected two actions may be triggered; the output stream may be split into a new file, and the silence may be skipped. Either or both actions are possible.

setup

For Linux you may need to setup environment variables so Sox knows which ALSA devices to use. This is not necessary in OSX.

# Only set AUDIODRIVER and AUDIODEV for Linux ALSA systems. OSX systems should not change these.
export AUDIODRIVER=alsa
# This in for a CAPTURE source (mic, line-in, etc.), not PLAYBACK.
export AUDIODEV=hw:1,0

This splits audio stream on silent gaps (silence is kept at the beginnings of splits)

This splits into separate files based on silent gaps. This does not start recording until it detects sound. It splits the audio separated by 1 seconds of silence into separate files. It stops recording after 5 seconds of silence. You may set the 0:05 to 1:00 to choose 1 minute instead of 5 seconds. So, if you carefully count "1... 2... 3... 4... 5..." with 1 second of silence between each number terminated by 10 seconds of silence you should end up with 5 different files.

rec -V3 -p | \
sox -p -p silence 0 1 0:05 5% | \
sox -p -r 48000 -e signed-integer -b 16 --endian little audio_recording.wav silence -l 0 0 0:01 3% : newfile : restart

This splits and skips silence (silence is trimmed)

This splits a stream separated by silent gaps into separate files, but it also trims the silence gaps so that the output files won't include the gaps. For this example, count numbers but allow 3 seconds of silence between each number and 10 seconds of silence to end. Note that the silence floor percentages were increased slightly to prevent noise from breaking up a silent gap.

rec -V3 -p | sox -p -p silence 0 1 0:10 10% | sox -p -r 48000 -e signed-integer -b 16 --endian little audio_recording.wav silence 0 1 0:01 6% : newfile : restart

Use this to check that gaps are skipped.

aplay --device=hw:0,0 audio_recording*.wav 

This only skips silence (no splitting)

This trims all silence from the beginning, middle, and end of a stream. If the silence lasts more than 10 seconds then the processing terminates. So if you count slowly with long gaps between numbers (less than 10 seconds) then all those gaps will be removed from the output until a silent gap longer than 10 seconds is found. Counting, "....1..2...3.4.........5.....6....7......8...9...........", turns into ".1.2.3.4.5.6.7.8.9.". Getting the percentages for silent level can be the trickiest part. This might be improved if some sort of filter were included in the stream, but a filter that would improve silence detection would also distort the signal if it were recorded. What is needed at parallel paths for the stream. One path would be filtered and used just for triggering actions on the other, unfiltered path, which would be the path that gets recorded. I am not sure if complex pipelines like this are possible with Sox. This is the sort of thing you can do with gstreamer and video.

rec -V3 -p | sox -p -p silence 1 0.1 6% 1 0:10 6% | sox -p -r 48000 -e signed-integer -b 16 --endian little audio_recording.wav silence 1 0.1 6% -1 0.25 6%

Generate a spectrogram with Sox

(spectrogram, spectrograph, spectrum, sonogram, sonograph, waterfall, spectral view)

sox audio_recording.wav -n spectrogram -z 100 -t "Spectrogram of audio_recording.wav" -c '' -o audio_recording.spectrogram.png

Playback audio OX X using Sox

Note that where an output filename is require you may substitute -d or -t coreaudio (for Mac OS X). These seem to be equivalent. The -d option seems to be the more general purpose style since it will automatically pick the correct sound output on a Mac and Linux.

Both examples below play audio and both will automatically detect the audio stream type. The play command is the easier to remember version. You may have special reasons for wanting to use the sox command alternative.

play audio_recording.wav
cat audio_recording.wav | sox - -t coreaudio

Play noise

These are all equivalent using /dev/urandom.

# From a file or device file.
sox -t raw -r 44100 -b 16 -e unsigned-integer /dev/urandom -d
sox -t raw -r 44100 -b 16 -e unsigned-integer /dev/urandom -t coreaudio
# Using a pipe...
cat /dev/urandom | sox -t raw -r 44100 -b 16 -e unsigned-integer - -d
cat /dev/urandom | sox -t raw -r 44100 -b 16 -e unsigned-integer - -t coreaudio

This uses Sox's built-in noise generator:

play --null synth whitenoise

This sounds like ocean waves rolling in and out:

play --null --channels=2 --show-progress synth brownnoise band -n 400 499 tremolo 0.1 70 reverb 19 bass -11 treble -1 vol 12dB repeat 19

This sounds like rain:

play -n -c 2 synth pinknoise band -n 2500 4000 reverb 20

This sounds like fun:

play -n synth 2.5 sin 667 gain -5 bend .35,180,.25  .15,740,.53  0,-520,.3

This sounds like aliens:

play -n synth 10 sine mix 200-1000 synth 10 sine mix 700-1 synth 10 sine amod 10-1 synth 10 sine fmod 10-1


Jack

Use qjackctl to start and stop the Jack engine and to connect audio readable clients (capture) to writable clients (playback). For example, connect system/capture_1 (microphone) to system/playback_1 (left channel speaker). The example below set different devices (cards) for capture (-C hw:2,0) and playback (-P hw:0,0) when starting jackd

jackd -v -R -d alsa -C hw:2,0 -P hw:0,0 -r 48000 -H -z s
ecasound -f:32,1,48000 -i null -o jack_alsa,myport -b:1024 -el:sine_fcac,440,1
qjackctl

AWK Music

#!/usr/bin/env bash
#
# This plays random notes on a pentatonic scale.
# I think this was originally from Kyle Keen.
#

if uname -a | grep -iq darwin; then
	AUDIO_TARGET='| sox -t raw -r 64k -c 1 -e unsigned -b 8 - -d'
else
	# Assume Linux.
	if [ -e /dev/dsp ]; then
		AUDIO_TARGET='> /dev/dsp'
	else
		AUDIO_TARGET='| aplay -r 64000'
	fi
fi

# Note that 0.87055 is the fifth root of 1/2, so the octave is divided
# into fifths -- a pentatonic scale.
AWK_SCRIPT='awk '"'"'
    function wl() {
        rate=34000;
        return (rate/160)*(0.87055^(int(rand()*10)))};
    BEGIN {
        srand();
        wla=wl();
        while(1) {
            wlb=wla;
            wla=wl();
            if (wla==wlb)
                {wla*=2;};
            d=(rand()*10+5)*rate/4;
            a=b=0; c=128;
            ca=40/wla; cb=20/wlb;
            de=rate/10; di=0;
            for (i=0;i<d;i++) {
                a++; b++; di++; c+=ca+cb;
                if (a>wla)
                    {a=0; ca*=-1};
                if (b>wlb)
                    {b=0; cb*=-1};
                if (di>de)
                    {di=0; ca*=0.9; cb*=0.9};
                printf("%c",c)};
            c=int(c);
            while(c!=128) {
                c<128?c++:c--;
                printf("%c",c)};};}'"'"''
eval "${AWK_SCRIPT} ${AUDIO_TARGET}"