Skip to content

S3Client corrupts stack (or heap) on Ubuntu 24.04 #3038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
torchedplatypi opened this issue Jul 15, 2024 · 8 comments
Closed

S3Client corrupts stack (or heap) on Ubuntu 24.04 #3038

torchedplatypi opened this issue Jul 15, 2024 · 8 comments
Labels
bug This issue is a bug. closed-for-staleness p3 This is a minor priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.

Comments

@torchedplatypi
Copy link

Describe the bug

on AWS c7i.large EC2 instance

attempt to initialize and use an instance of Aws::S3::S3Client in a non-trivial production program and the S3Client seems to corrupt the stack resulting in SIGILL, Illegal Instruction.s

The "Hello S3" sample program here is still functional. Calling ListBuckets() from our application environment is also successful and does not cause stack corruption. So the bug may be localized to specific functions (other functions called in my application are GetObject, PutObject, ListObjects, and HeadObject)

Expected Behavior

Continue functional operation of a production application built against the aws-sdk on Ubuntu 24.04 (migrating from Ubuntu 22.04)

Current Behavior

Some portions redacted of identifying information

Thread 1 "*****" received signal SIGILL, Illegal instruction.                                                                                                                       
0x00005555555ba215 in *********::******** (fh=@0x7fffffffdc18: 0x5555558fe1d0,                                                                                                    
    readback=..., offsets=std::vector of length 50, capacity 50 = {...}, headerSize=528)                                                                                                      
    at /home/ubuntu/*****/*****/include/*****.cpp:288                                                                                                          
288        assert(headerSize == (readOffsets*sizeof(long)+readHeaders*sizeof(*************)));        

Reproduction Steps

on AWS c7i.large EC2 instance, install AWS SDK per the documentation here

    1  sudo apt update
    2  sudo apt upgrade
    3  sudo apt install build-essential
    4  sudo reboot
    5  sudo apt install build-essential
    6  sudo apt-get install libcurl4-openssl-dev libssl-dev uuid-dev zlib1g-dev libpulse-dev
    7  cmake
    8  sudo apt install cmake
    9  git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp
   10  ls
   11  mkdir aws-sdk-build
   12  cd aws-sdk-build/
   13  ls
   14  cmake ../aws-sdk-cpp -DCMAKE_BUILD_TYPE=Debug -DCMAKE_PREFIX_PATH=/usr/local/ -DCMAKE_INSTALL_PREFIX=/usr/local/ -DBUILD_ONLY="s3"
   15  make
   16  sudo make install
   17  sync
   18  cd
   19  curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
   20  unzip awscliv2.zip 
   21  sudo apt install unzip
   22  unzip awscliv2.zip 
   23  cd aws
   24  ls
   25  sudo ./install 
   26  aws --version
   27  cd
   28  ls
   29  mkdir .aws
   30  cd .aws
   31  ls
   32  vim config
   33  vim credentials

Now attempt to initialize and use an instance of Aws::S3::S3Client in a non-trivial production program and the S3Client seems to corrupt the stack resulting in SIGILL, Illegal Instruction.s

The following are snippets of how the S3Client is instantiated by my application to confirm that it was blowing up the stack. Previously, we instantiated the client on the heap, and it caused SIGILLs in a different heap location. I rewrote to the following to instantiate it on the stack since the sample S3 Hello program creates the client on the stack, and the SIGILL backtrace moved to stack memory (when the program continues on to attempt to read and parse the file it downloaded from S3).

AWSHelper.hpp

class AWSHelper
{
   public:
      static AWSHelper* getInstance();
      void shutdown();
      int S3Init();
   protected:
      AWSHelper();
   private:
      ~AWSHelper();

      static AWSHelper* m_instance;

      //Aws::S3::S3Client* m_s3client;
      static Aws::SDKOptions m_sdkoptions;
      Aws::Client::ClientConfiguration m_sdkClientConfig;
      Aws::S3::S3Client m_s3client;

AWSHelper.cpp

#include "AWSHelper.hpp"

using namespace Aws;

AWSHelper* AWSHelper::m_instance = NULL;
Aws::SDKOptions AWSHelper::m_sdkoptions;

AWSHelper* AWSHelper::getInstance(){
    if(m_instance == NULL){
        m_sdkoptions.loggingOptions.logLevel = Utils::Logging::LogLevel::Debug;
        Aws::InitAPI(m_sdkoptions);
        m_instance = new AWSHelper();
        m_instance->S3Init();
    }
    return m_instance;
}

AWSHelper::AWSHelper()
{

}

int AWSHelper::S3Init()
{
   Aws::S3::S3Client m_s3client(m_sdkClientConfig);
   //m_s3client = Aws::S3::S3Client();

   return 0;
}

void AWSHelper::shutdown()
{
   SDKOptions options;
   options.loggingOptions.logLevel = Utils::Logging::LogLevel::Debug;
   Aws::ShutdownAPI(options);
}  

Possible Solution

This was and still remains functional on Ubuntu 22.04 as well as RHEL9.

Suspect there is a bug or lack of support for gcc 13 in Ubuntu 24.04

Additional Information/Context

No response

AWS CPP SDK version used

1.11.367

Compiler and Version used

gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0

Operating System and version

Ubuntu 24.04 LTS -- Linux ip-172-31-15-168 6.8.0-1010-aws #10-Ubuntu SMP Thu Jun 13 17:36:15 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

@torchedplatypi torchedplatypi added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 15, 2024
@SergeyRyabinin
Copy link
Contributor

Hi @torchedplatypi ,

Thank you for providing your build steps and some reproduction code.

Please refer to our basic-use doc:

The SDK for C++ and its dependencies use C++ static objects, and the order of static object destruction is not determined by the C++ standard. To avoid memory issues caused by the nondeterministic order of static variable destruction, do not wrap the calls to Aws::InitAPI and Aws::ShutdownAPI into another static object.

Unfortunately, the provided sample and the stack trace do not contain enough information to confirm that it is the case from our documentation. It would be great if you could provide a complete reproduction sample.
The important part missing in the sample at the moment is about when your shutdown method is called. To have things work without an undefined behavior, both m_sdkClientConfig and m_s3client must be destructed before the shutdown is executed. This is due to our legacy API design that allows undefined behavior usage, we are trying to improve it, but we can't change much without breaking existing API contracts.

Additionally,

int AWSHelper::S3Init()

is not doing anything except creating and destroying a temporary local object m_s3client.
Your AWSHelper::m_s3client is default constructed and m_sdkClientConfig is not used, I guess you'd want to use your client config for the client creation.

Best regards,
Sergey

@jmklix jmklix added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Jul 15, 2024
@torchedplatypi
Copy link
Author

I do call the shutdown function from our main binary. Here are the snippets of invoking our AWSHelper (which is our wrapper object Singleton, around an Aws::S3::S3Client):

int main(int argc, char** argv):
   ... parse CLI args ...

   AWSHelper* awsHelper = AWSHelper::getInstance();

   // this call is functional
   awsHelper->S3ListBuckets();

   //continue using awsHelper to do things, which now fail (stack corrupted before we get to shutdown)

   //never get here
   awsHelper->shutdown();

Regarding int AWSHelper::S3Init() not doing anything, that was an artifact of me scrambling this morning to attempt to initialize our wrapper object AWSHelper as a Singleton with a member variable Aws::S3::S3Client m_s3client. I assumed that declaring the private member Aws::S3::S3Client m_s3client without invoking the constructor Aws::S3::S3Client m_s3client() would not successfully create a client with default config. That is actually OK. Our prior functional app used a default configuration.

Here is our original version, where the S3Client was initialized on the heap for easier management with our wrapper object:
AWSHelper.hpp

class AWSHelper
{
   public:

      static AWSHelper* getInstance()
      {
         static AWSHelper* m_instance;// = NULL;
         if (m_instance == NULL)
         {
            m_instance = new AWSHelper();
            m_instance->S3Init();
         }
         return m_instance;
      }

      void shutdown();

      // S3 utility fns
      int S3Init();
      int S3ListBuckets();
      std::vector<std::string> S3ListFiles(std::string bucketName);
      std::vector<std::string> S3ListFiles(std::string bucketName, std::string prefix);
      int S3DownloadFile(std::string bucketName, std::string fileName, std::string outDir);
      int S3DownloadFile(std::string bucketName, std::string fileName, std::string outDir, std::string outFilename);
      int S3UploadFile(std::string bucketName, std::string fileName, std::string dir);
      int S3UploadFileToDir(std::string bucketName, std::string bucketPath, std::string fileName, std::string localdir);
      int S3FileExists(std::string bucketName, std::string key);
      int S3DeleteObject(std::string objectKey, std::string fromBucket);

   private:
      AWSHelper();
      ~AWSHelper();

      Aws::S3::S3Client* m_s3client;


AWSHelper.cpp

#include "AWSHelper.hpp"

using namespace Aws;

AWSHelper::AWSHelper()
{
   SDKOptions options;
   options.loggingOptions.logLevel = Utils::Logging::LogLevel::Fatal;

   //The AWS SDK for C++ must be initialized by calling Aws::InitAPI.
   InitAPI(options);
   {
   }
}

int AWSHelper::S3Init()
{
   m_s3client = new Aws::S3::S3Client();
}

void AWSHelper::shutdown()
{
   SDKOptions options;
   options.loggingOptions.logLevel = Utils::Logging::LogLevel::Debug;
   Aws::ShutdownAPI(options);
}

I imagine I am running into an issue with wrapping into a static object, but in execution I am declaring S3Client as a static variable inside a non-static class, which I don't believe is the edge case your basic-use doc note actually covers.

I have been unable to find documentation, is there a canonical way to initialize and keep a singleton Aws::InitAPI and S3Client?

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jul 16, 2024
@SergeyRyabinin
Copy link
Contributor

Hi @torchedplatypi ,

I agree that it is likely not the case of using the AWS SDK from a static wrapper. (You still have the issue that S3Client is getting destructed after ShutdownAPI is called, but your app execution does not reach that point).

Regarding the issue, unfortunately, there is still not enough information to triage the issue.
It would be great to have a reproduction code or full/more stack trace at the moment of a segfault.
According to the info provided, there is something within the SDK / app at

//continue using awsHelper to do things, which now fail (stack corrupted before we get to shutdown)

That results in a segfault.
I've tried searching for a assert(headerSize == (readOffsets*sizeof(long)+readHeaders*sizeof(*************))); in the SDK code base but could not find anything. It seems that you have a debug build (that have asserts enabled), you could try to launch the app with a debugger attached and break at the moment of the assertion triggered.
Another thing I could suggest is to enable lower level logging within the SDK (and then search for ERROR / FATAL messages in the SDK log).

I have been unable to find documentation, is there a canonical way to initialize and keep a singleton Aws::InitAPI and S3Client

So far we've only advised against such usage in our documentation (to avoid wrapping the SDK Init-Shutdown methods and a client into a static class), but this is a recurring ask, so I guess we will have to provide a guidance / "API handle" wrapper. I can't give any estimate when it can be done though.

Best regards,
Sergey

@torchedplatypi
Copy link
Author

Hi Sergey,

Thank you for the responses.

I do not intend to abandon this, but currently, I cannot prioritize paring down our source base to a minimal example of just the SDK usage. My company can opt to pause the Ubuntu 22.04->24.04 migration as 22.04 is an LTS which is not EOL until 2027, so we can return to this issue at a less pressing time crunch.

"It's never a bug in the kernel" :) Our implementation is inherently more suspect than the SDK or any tools/library changes from the Ubuntu upgrade. So it can very plausibly still be an issue with our implementation. But our implementation has been working for some time.
If I solve via an implementation change (I will keep poking at it, with your comments here at hand for some extra info I didn't have previously) I will update here.

Otherwise I will try to come update with a useful full stack trace if I can demonstrate it more minimally scoped to the SDK.
If that means closing this one, and reopening when I have better debugging information, that's OK too. Please proceed

@SergeyRyabinin
Copy link
Contributor

A quick question,
so you can confirm that it works on the Ubuntu 22.04 but exactly the same code started to fail on 24.04?
It gives us some pointers to check in regards of OpenSSL and libCurl versions updated.

@torchedplatypi
Copy link
Author

Yes I can confirm that.

In my debugging, I spun up two fresh c7i EC2 instances, one with 22.04 and one with 24.04, ran the bash history steps I have pasted above in the Reproduction Steps.
This installed utilities, AWS SDK, and AWS CLI to check and confirm our AWS S3 credentials worked.

I then checked out our private repository and built and ran. 22.04 was functional, 24.04 failed.

This was an attempt to isolate the problem to the OS release (as opposed to instance type [we'd previously been using burstable instances], or system configuration. I cannot think of any other variables that might be confounding.

@jmklix
Copy link
Member

jmklix commented Jul 19, 2024

I tried running the hello_s3 sample in an ubuntu 24.04 docker container and it still runs fine. Here is the dockerfile that I used:

FROM ubuntu:24.04

#install deps
RUN apt-get update
RUN apt-get install -y git cmake zlib1g-dev libssl-dev libcurl4-openssl-dev build-essential

#clone and build sdk
RUN git clone --depth 1 --recurse-submodules https://github.com/aws/aws-sdk-cpp && \
    cd aws-sdk-cpp && \
    mkdir build && \
    cd build && \
    cmake -DAUTORUN_UNIT_TESTS=OFF -DBUILD_ONLY="s3" .. && \
    cmake --build . && \
    cmake --install .

#copy code and build sample application
RUN mkdir sdk-example
COPY CMakeLists.txt sdk-example/CMakeLists.txt
COPY hello_s3.cpp sdk-example/hello_s3.cpp
RUN cd sdk-example && \
    mkdir build && \
    cd build && \
    cmake .. && \
    cmake --build .

Does this only fail when you run it on a c7i EC2 with ubuntu 24.04? Can you try running the docker container to see if you still get the crash

@jmklix jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jul 19, 2024
Copy link

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Jul 30, 2024
@github-actions github-actions bot closed this as completed Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. closed-for-staleness p3 This is a minor priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.
Projects
None yet
Development

No branches or pull requests

3 participants