-
Notifications
You must be signed in to change notification settings - Fork 39
PDM Audio Sampling & Data Packing #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the issue. I tried playing the provided recording but i got errors in several different players. Maybe I need to try Quicktime on my mac... How did you convert your steamed data into an audio file? Does the bitrate (or frequency - I don't usually work with audio) in the .wav file match the frequency that the recording took place at? |
thanks for looking into this @oclyke! The audio file is just a wav file; should open with Audacity. Note that it's zipped though so needs to be unzipped (Github won't accept wav files). I have a python script to receive and store the transmitted audio date. I do save with the expected sampling rate. The PDM controller, based on the PDM configurations, determines the sampling rate. However, even if saved with the wrong sampling frequency, you could actually change the playback rate (in Audacity, for instance) and it should sound good since the data is still there. Unfortunately, all playback rates don't seem to work which makes me suspect it's not just sampling frequency, but rather the data format -- how the PDM controller interleaves the channels. |
I agree with your analysis - just wanted to be sure since you have more experience with the setup/trouble. @nseidle may know more details about the inner workings of the PDM peripheral. He is out of work for a few weeks but hopefully this tag will leave him a little bookmark to revisit. |
Thanks for your work on this! Your gist got me started nicely. I'm starting to scratch the surface on this. My initial guess is that it's something to do with the right and left channels.
I think the databuffer is twice as big as it needs to be, and perhaps that's to store both left and right channel data? If that's the case, you might be reading left/right data when you should be reading every other byte for right only. This is just my current working theory. |
Hi @nseidle! Welcome back and thanks for looking into this. I'm also not very confident in that line. However, whenever I make it just It doesn't help that the |
Btw, I did try reading every other sample in case that both left/right data were interleaved. That didn't sound good either. In fact, I have tried playing with multiple combinations of the PDM controller operating modes -- the FIFO Data formats as determined by CHSET, PCMPACK and LRSWAP. None of those combos work. |
I've got your python script running. Very cool. But I'm having the same issue @oclyke had, WAV files don't play/seem corrupt. I'm looking at the tensorflow config of the PDM and their python tools to see if I can make them work. OOC, where did you get a decimation rate of 24 and PDM clk of 750kHz? |
Ah, I just figured out your 24 and 750kHz from the tensorflow comment:
|
And I've got WAV files playing in VLC on windows. Yep, sounds awful. Maybe I'm barking up the wrong tree but here is one 512 chunk of FrameBuffer
That's a lot of zeros. What's annoying is I would have presumed a left and right output, zero being the channel that has no mic, but that's not the case. And switching the PDM to .ePCMChannels = AM_HAL_PDM_CHANNEL_LEFT, (a non existent channel) produces similar results. I need to wrap my head around how the PDM samples are being output. |
Oh! That looks interesting -- the pi16buffer. I also looked at it in the past and saw the zeros but they didn't seem aligned. Yours definitely looks like what I'd expect with Unfortunately, the datasheet does a very poor job explaining the data format -- it doesn't at all so I'm not sure how to read it. As for the PDM left and right channel being the same, that is understandable. Basically, it just changes whether the PDM controller is to read the data on the rising edge or falling edge of the clock. That way, if you had two mics, they can both be driven by the same clock. Let me see if I can get an output similar to yours. With that, it should be straightforward to undo the interleaving and obtain once channel audio. Btw, I use Audacity for the wav file. It's tricky because it is not exactly 16KHz audio -- more like 15.625 KHz. So in Audacity, you can easily change the playback sampling rate. |
Awesome. Thank you for the PDM summary and tables. Quick check, is this the right CircularBuffer library? I like to put a search link next to it so I know I get the same one:
After sleeping on it, I imagine the SciPy should correctly handle separating the right and left channels into the WAV file. So I shouldn't be necessarily be trying to change or filter at the serial output step. But if we're corrupting the buffers in the ISR, that may be the source of one issue. |
Hi @nseidle! Yes, that's the CircularBuffer library; the one by AgileWare. You were right that it's about the buffering. I investigated further and found that basically, I was reading from the CircularBuffer faster than I was writing. Within the CircularBuffer itself (accessed by One problem was that the I manually undo the channel interleaving because Scipy wavefile would expect a 2D array for a stereo signal. |
Got it! A few things, mainly the 115200bps was not fast enough and was dropping packets. Increased to 500kbps and works well. Also, I used the TensorFlow settings which enabled bDataPacking
I'm going to overhaul my code and then create a core example. Mind if I include your code and python script? It's really nice. |
Here is my inelegant dual buffer solution. I tried to get circular buffer to work but I was getting corrupt audio. Not sure what the overhead on that library is. |
Awesome Nathan! Thanks for sharing. And yes, please feel free to include my code and python script. Makes sense the baud rate was a bottle neck as well. Perhaps that will fix my current CircularBuffer approach. Good catch! |
PR submitted. I am seeing some very high bitrates in the output WAV files - 1024kbps: Most bit rates are 96 to 320kbps. I think this is causing most audio players to fail to play. Do you have any idea how to decrease the WAV bit rate? |
Awesome, the PR looks great. Thanks! I also thought the PDM Library wasn't very useful for streaming applications in its current state. That's why I went back to the HAL. So I'm glad you pointed that out in your PR. As for the unreasonably high bit rate, it's because of the if do_save:
wavfile.write(wavname,fsamp,np.array(x))
print "Recording saved to file: %s"%wavname the bit rate ends up being way too high because Scipy.io attempts to preserve the precision. A quick fix is to convert back to 16-bit samples before saving: if do_save:
xx = (np.array(x) * maxval).astype('int16')
wavfile.write(wavname,fsamp,np.array(xx))
print "Recording saved to file: %s"%wavname Btw, the only reason I'm converting to floats is just to keep the plot between [-1,1] range. So alternatively, you can keep everything in 16-bit integers and no conversions will be necessary. |
As you can see in the image below, from my mac, the file exported with the float is 64 bit per sample (double precision floats), whereas the converted one is 16-bits per sample (int16). You can also see the difference in the file sizes -- a 4x reduction factor. Yet, they're all the same duration (and same sampling rate, even though one says 15.6). |
Subject of the issue
I have implemented an application that streams audio from a PDM microphone to the computer via Serial/Uart. While the audio data is transmitted as expected, it doesn't play back accurately. It sounds fast forwarded -- almost as if the audio was sampled at a much higher sampling rate than expected. However, changing playback sampling rate doesn't make much difference at all. I suspect it has to do with how the PDM data is packed in the FIFO. The Apollo3 Datasheet (on p360, v0.9.1) mentions the different operating modes and how the data is packed in the PDM FIFO. However, it's not clear from the table the correct way to undo the interleaving of samples.
I've attached a sample audio recording:
recording_1104_1115.wav.zip
Your workbench
macOS Catalina
Sparkfun Redboard Artemis Nano, Arduino 1.8.10, Apollo3 Boards v1.0.17
Steps to reproduce
Here is a link to a gist of a simplified implementation.
The text was updated successfully, but these errors were encountered: