Skip to content

PDM Audio Sampling & Data Packing #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
justiceamoh opened this issue Nov 4, 2019 · 20 comments
Closed

PDM Audio Sampling & Data Packing #86

justiceamoh opened this issue Nov 4, 2019 · 20 comments

Comments

@justiceamoh
Copy link

Subject of the issue

I have implemented an application that streams audio from a PDM microphone to the computer via Serial/Uart. While the audio data is transmitted as expected, it doesn't play back accurately. It sounds fast forwarded -- almost as if the audio was sampled at a much higher sampling rate than expected. However, changing playback sampling rate doesn't make much difference at all. I suspect it has to do with how the PDM data is packed in the FIFO. The Apollo3 Datasheet (on p360, v0.9.1) mentions the different operating modes and how the data is packed in the PDM FIFO. However, it's not clear from the table the correct way to undo the interleaving of samples.

I've attached a sample audio recording:
recording_1104_1115.wav.zip

Your workbench

macOS Catalina
Sparkfun Redboard Artemis Nano, Arduino 1.8.10, Apollo3 Boards v1.0.17

Steps to reproduce

Here is a link to a gist of a simplified implementation.

@oclyke
Copy link
Contributor

oclyke commented Nov 11, 2019

Thanks for the issue. I tried playing the provided recording but i got errors in several different players. Maybe I need to try Quicktime on my mac...

How did you convert your steamed data into an audio file? Does the bitrate (or frequency - I don't usually work with audio) in the .wav file match the frequency that the recording took place at?

@justiceamoh
Copy link
Author

thanks for looking into this @oclyke!

The audio file is just a wav file; should open with Audacity. Note that it's zipped though so needs to be unzipped (Github won't accept wav files).

I have a python script to receive and store the transmitted audio date. I do save with the expected sampling rate. The PDM controller, based on the PDM configurations, determines the sampling rate.

However, even if saved with the wrong sampling frequency, you could actually change the playback rate (in Audacity, for instance) and it should sound good since the data is still there. Unfortunately, all playback rates don't seem to work which makes me suspect it's not just sampling frequency, but rather the data format -- how the PDM controller interleaves the channels.

@oclyke
Copy link
Contributor

oclyke commented Nov 11, 2019

I agree with your analysis - just wanted to be sure since you have more experience with the setup/trouble. @nseidle may know more details about the inner workings of the PDM peripheral. He is out of work for a few weeks but hopefully this tag will leave him a little bookmark to revisit.

@justiceamoh
Copy link
Author

awesome -- thanks @oclyke. I'll look forward to hearing from @nseidle

@nseidle
Copy link
Member

nseidle commented Nov 18, 2019

Thanks for your work on this! Your gist got me started nicely.

I'm starting to scratch the surface on this. My initial guess is that it's something to do with the right and left channels.

sTransfer.ui32TargetAddr = (uint32_t ) PDMDataBuffer;
sTransfer.ui32TotalCount = BUFFSIZE * 2;

I think the databuffer is twice as big as it needs to be, and perhaps that's to store both left and right channel data? If that's the case, you might be reading left/right data when you should be reading every other byte for right only. This is just my current working theory.

@justiceamoh
Copy link
Author

Hi @nseidle! Welcome back and thanks for looking into this.

I'm also not very confident in that line. However, whenever I make it just BUFFSIZE, the PDM ISR is never triggered (in debug, I can tell the flag AM_HAL_PDM_INT_DCMP is never set in the PDM registers).

It doesn't help that the 2x factor is also used in the AmbiqSDK PDM example. So I rationalized that the ui32TotalCount expects the count of bytes rather than the 16-bit samples -- or something weird like that involving a factor of two. I may be completely wrong though.

@justiceamoh
Copy link
Author

Btw, I did try reading every other sample in case that both left/right data were interleaved. That didn't sound good either. In fact, I have tried playing with multiple combinations of the PDM controller operating modes -- the FIFO Data formats as determined by CHSET, PCMPACK and LRSWAP. None of those combos work.

@nseidle
Copy link
Member

nseidle commented Nov 18, 2019

I've got your python script running. Very cool.

image

But I'm having the same issue @oclyke had, WAV files don't play/seem corrupt. I'm looking at the tensorflow config of the PDM and their python tools to see if I can make them work.

OOC, where did you get a decimation rate of 24 and PDM clk of 750kHz?

@nseidle
Copy link
Member

nseidle commented Nov 18, 2019

Ah, I just figured out your 24 and 750kHz from the tensorflow comment:

    48,  // OSR = 1500/16 = 96 = 2*SINCRATE --> SINC_RATE = 48

@nseidle
Copy link
Member

nseidle commented Nov 18, 2019

And I've got WAV files playing in VLC on windows. Yep, sounds awful.

Maybe I'm barking up the wrong tree but here is one 512 chunk of FrameBuffer

//When frame is full, dump to serial
if (fidx == FRMESIZE) {
  //Serial.write((uint8_t*)FrameBuffer, sizeof(FrameBuffer));

  for (int x = 0 ; x < sizeof(FrameBuffer) ; x++)
  {
    //if (x % 2 > 0) //Only output right channel
    {
      //Debug
      if (x % 16 == 0) Serial.println();
      Serial.printf("%04X ", (uint16_t)FrameBuffer[x]);

    }
  }
  while(1);

image

That's a lot of zeros. What's annoying is I would have presumed a left and right output, zero being the channel that has no mic, but that's not the case. And switching the PDM to

.ePCMChannels = AM_HAL_PDM_CHANNEL_LEFT,

(a non existent channel) produces similar results. I need to wrap my head around how the PDM samples are being output.

@nseidle
Copy link
Member

nseidle commented Nov 18, 2019

Curiously, if we print the pi16Buffer directly the bytes align. I still don't see a difference between left/right/stereo but there might be an issue loading the circular buffer.

image

@justiceamoh
Copy link
Author

justiceamoh commented Nov 19, 2019

Oh! That looks interesting -- the pi16buffer. I also looked at it in the past and saw the zeros but they didn't seem aligned. Yours definitely looks like what I'd expect with CHSET = 1 and PCMPACK=0 -- basically when no data packing is used. And I'm seeing that in my gist, .bDataPacking = 0 so this is actually good. See the snapshots below from the Apollo3 Datasheet (p360 of v0.9.1):

PDM Operating Modes Packed

PDM Operating Modes Unpacked

Unfortunately, the datasheet does a very poor job explaining the data format -- it doesn't at all so I'm not sure how to read it.

As for the PDM left and right channel being the same, that is understandable. Basically, it just changes whether the PDM controller is to read the data on the rising edge or falling edge of the clock. That way, if you had two mics, they can both be driven by the same clock.

Let me see if I can get an output similar to yours. With that, it should be straightforward to undo the interleaving and obtain once channel audio.

Btw, I use Audacity for the wav file. It's tricky because it is not exactly 16KHz audio -- more like 15.625 KHz. So in Audacity, you can easily change the playback sampling rate.

@nseidle
Copy link
Member

nseidle commented Nov 19, 2019

Awesome. Thank you for the PDM summary and tables.

Quick check, is this the right CircularBuffer library? I like to put a search link next to it so I know I get the same one:

 #include <CircularBuffer.h> //By AgileWare - Click here to get the library: http://librarymanager/All#Circular_LIFO

After sleeping on it, I imagine the SciPy should correctly handle separating the right and left channels into the WAV file. So I shouldn't be necessarily be trying to change or filter at the serial output step. But if we're corrupting the buffers in the ISR, that may be the source of one issue.

@justiceamoh
Copy link
Author

Hi @nseidle! Yes, that's the CircularBuffer library; the one by AgileWare.

You were right that it's about the buffering. I investigated further and found that basically, I was reading from the CircularBuffer faster than I was writing. Within the CircularBuffer itself (accessed by AudioQueue[i]), the data looked well aligned. Yet, it was corrupted within my FrameBuffer.

One problem was that the FrameBuffer was significantly smaller than the PDMDataBuffer itself. So at certain times, even though the right data was in the buffer, it wouldn't be read fast enough. I fixed this and the alignment looks okay in debug. But the audio still sounds sped up :(.

I manually undo the channel interleaving because Scipy wavefile would expect a 2D array for a stereo signal.

@nseidle
Copy link
Member

nseidle commented Nov 19, 2019

Got it!

recording_1119_1159.zip

A few things, mainly the 115200bps was not fast enough and was dropping packets. Increased to 500kbps and works well. Also, I used the TensorFlow settings which enabled bDataPacking

//Tensor settings
.eClkDivider = AM_HAL_PDM_MCLKDIV_1,
.eLeftGain = AM_HAL_PDM_GAIN_0DB,
.eRightGain = AM_HAL_PDM_GAIN_P90DB,
.ui32DecimationRate =
48,  // OSR = 1500/16 = 96 = 2*SINCRATE --> SINC_RATE = 48
.bHighPassEnable = 0,
.ui32HighPassCutoff = 0xB,
.ePDMClkSpeed = AM_HAL_PDM_CLK_1_5MHZ,
.bInvertI2SBCLK = 0,
.ePDMClkSource = AM_HAL_PDM_INTERNAL_CLK,
.bPDMSampleDelay = 0,
.bDataPacking = 1,
.ePCMChannels = AM_HAL_PDM_CHANNEL_RIGHT,
.bLRSwap = 0,

I'm going to overhaul my code and then create a core example. Mind if I include your code and python script? It's really nice.

@nseidle
Copy link
Member

nseidle commented Nov 19, 2019

Here is my inelegant dual buffer solution. I tried to get circular buffer to work but I was getting corrupt audio. Not sure what the overhead on that library is.

@justiceamoh
Copy link
Author

Awesome Nathan! Thanks for sharing. And yes, please feel free to include my code and python script.

Makes sense the baud rate was a bottle neck as well. Perhaps that will fix my current CircularBuffer approach. Good catch!

@nseidle
Copy link
Member

nseidle commented Nov 20, 2019

PR submitted. I am seeing some very high bitrates in the output WAV files - 1024kbps:

image

Most bit rates are 96 to 320kbps. I think this is causing most audio players to fail to play. Do you have any idea how to decrease the WAV bit rate?

@justiceamoh
Copy link
Author

Awesome, the PR looks great. Thanks! I also thought the PDM Library wasn't very useful for streaming applications in its current state. That's why I went back to the HAL. So I'm glad you pointed that out in your PR.

As for the unreasonably high bit rate, it's because of the scipy.io.wavefile.write command. My python script was converting the 16-bit samples to high precision floating point numbers by the line buf = buf/maxval where maxval = 32768.. And when it's exported to a wav via:

if do_save:
    wavfile.write(wavname,fsamp,np.array(x))
    print "Recording saved to file: %s"%wavname

the bit rate ends up being way too high because Scipy.io attempts to preserve the precision. A quick fix is to convert back to 16-bit samples before saving:

if do_save:
    xx = (np.array(x) * maxval).astype('int16')
    wavfile.write(wavname,fsamp,np.array(xx))
    print "Recording saved to file: %s"%wavname

Btw, the only reason I'm converting to floats is just to keep the plot between [-1,1] range. So alternatively, you can keep everything in 16-bit integers and no conversions will be necessary.

@justiceamoh
Copy link
Author

justiceamoh commented Nov 20, 2019

As you can see in the image below, from my mac, the file exported with the float is 64 bit per sample (double precision floats), whereas the converted one is 16-bits per sample (int16). You can also see the difference in the file sizes -- a 4x reduction factor. Yet, they're all the same duration (and same sampling rate, even though one says 15.6).

Screen Shot 2019-11-20 at 5 49 22 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants