Skip to content

After Dockerizing the Selenium Web App not opening the webpage that needs to be crawled. #2976

Closed
@Orbiszeus

Description

@Orbiszeus

Dear Micheal,
I have build a full web app that scrapes some site and gathers some information. Locally everything runs perfectly, however after dealing with dockerizing the application. Never occured exception has risen due to:

INFO:     127.0.0.1:44352 - "POST /crawl_menu HTTP/1.1" 200 OK
        Exception in Getir Crawler:  Message: 
 Element {button[aria-label='Tümünü Reddet']} was not present after 7 seconds!

I have never gotten that, I think that after docker the container runs the crawler headless=True and tries to get to the site in 6 seconds but cannot do it. What should I do to workaround that? I will provide my crawler.py and dockerfile.

crawler.py:

def g_crawler(url, is_area):
    menu_items = []
    if not is_area: 
        with SB(uc=True, headless=True) as sb:
            sb.driver.uc_open_with_reconnect(url, 6)
            try:
                sb.uc_gui_handle_cf()
                sb.sleep(3)
                sb.click("button[aria-label='Tümünü Reddet']")
                sb.sleep(3)
                all_items = sb.find_elements("div[class='sc-be09943-2 gagwGV']")
                for item in all_items:
                    product_name = item.find_element("css selector", "h4[class='style__Title4-sc-__sc-1nwjacj-5 jrcmhy sc-be09943-0 bpfNyi']").text
                    sb.sleep(2)
                    try:
                        product_description = item.find_element("css selector", "p[contenteditable='false']").text
                    except:
                        product_description = "No description for this product."
                    sb.sleep(2)
                    product_price = item.find_element("css selector", "span[class='style__Text-sc-__sc-1nwjacj-0 jbOUDC sc-be09943-5 kA-DgzG']").text
                    sb.sleep(2)
                    menu_item = {
                        "Menu Item": product_name,
                        "Menu Ingredients": product_description,
                        "Price": product_price
                    }
                    if product_name == "Poşet":
                        continue
                    menu_items.append(menu_item)
                menu_items_json = json.dumps(menu_items, ensure_ascii=False, indent=4)   
                menu_items_list = json.loads(menu_items_json) 
                df = pd.DataFrame(menu_items_list)
                # title = sb.get_title()
                # excel_file = f'{title}_getir_menu.xlsx'
                # df.to_excel(excel_file, index=False)  
                return df.to_json(orient='split')                  
            except Exception as e:
                print(f"Exception in Getir Crawler:  {e}")

My Dockerfile:

# Use a smaller base image
FROM python:3.10-slim

# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV TZ=Europe/Istanbul
ENV LC_ALL=tr_TR.UTF-8
ENV LANG=tr_TR.UTF-8

# Set the working directory in the container
WORKDIR /app

# Install dependencies and Chrome in one layer to keep image size smaller
RUN apt-get update && apt-get install -y \
     wget \
     gnupg \
     unzip \
     curl \
     ca-certificates \
     fonts-liberation \
     libappindicator3-1 \
     libasound2 \
     libatk-bridge2.0-0 \
     libatk1.0-0 \
     libcups2 \
     libdbus-1-3 \
     libgdk-pixbuf2.0-0 \
     libnspr4 \
     libnss3 \
     libx11-xcb1 \
     libxcomposite1 \
     libxdamage1 \
     libxrandr2 \
     xdg-utils \
     locales \
     --no-install-recommends \
     && ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone \
     && wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
     && apt-get install ./google-chrome-stable_current_amd64.deb --yes \
     && apt-get clean \
     && rm -rf /var/lib/apt/lists/*

# Configure locale settings for Türkiye
RUN echo "LC_ALL=tr_TR.UTF-8" >> /etc/environment \
     && echo "LANG=tr_TR.UTF-8" >> /etc/environment \
     && locale-gen tr_TR.UTF-8

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application
COPY . .

# Expose the ports for FastAPI and Streamlit
EXPOSE 8000 8501

# Command to run FastAPI and Streamlit
CMD ["sh", "-c", "uvicorn menu_crawler:app --host 0.0.0.0 --port 8000 & streamlit run Hotel_Analyst.py"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    UC Mode / CDP ModeUndetected Chromedriver Mode / CDP Modeinvalid usageYou may need to change what you're doing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions