Closed
Description
Dear Micheal,
I have build a full web app that scrapes some site and gathers some information. Locally everything runs perfectly, however after dealing with dockerizing the application. Never occured exception has risen due to:
INFO: 127.0.0.1:44352 - "POST /crawl_menu HTTP/1.1" 200 OK
Exception in Getir Crawler: Message:
Element {button[aria-label='Tümünü Reddet']} was not present after 7 seconds!
I have never gotten that, I think that after docker the container runs the crawler headless=True and tries to get to the site in 6 seconds but cannot do it. What should I do to workaround that? I will provide my crawler.py and dockerfile.
crawler.py:
def g_crawler(url, is_area):
menu_items = []
if not is_area:
with SB(uc=True, headless=True) as sb:
sb.driver.uc_open_with_reconnect(url, 6)
try:
sb.uc_gui_handle_cf()
sb.sleep(3)
sb.click("button[aria-label='Tümünü Reddet']")
sb.sleep(3)
all_items = sb.find_elements("div[class='sc-be09943-2 gagwGV']")
for item in all_items:
product_name = item.find_element("css selector", "h4[class='style__Title4-sc-__sc-1nwjacj-5 jrcmhy sc-be09943-0 bpfNyi']").text
sb.sleep(2)
try:
product_description = item.find_element("css selector", "p[contenteditable='false']").text
except:
product_description = "No description for this product."
sb.sleep(2)
product_price = item.find_element("css selector", "span[class='style__Text-sc-__sc-1nwjacj-0 jbOUDC sc-be09943-5 kA-DgzG']").text
sb.sleep(2)
menu_item = {
"Menu Item": product_name,
"Menu Ingredients": product_description,
"Price": product_price
}
if product_name == "Poşet":
continue
menu_items.append(menu_item)
menu_items_json = json.dumps(menu_items, ensure_ascii=False, indent=4)
menu_items_list = json.loads(menu_items_json)
df = pd.DataFrame(menu_items_list)
# title = sb.get_title()
# excel_file = f'{title}_getir_menu.xlsx'
# df.to_excel(excel_file, index=False)
return df.to_json(orient='split')
except Exception as e:
print(f"Exception in Getir Crawler: {e}")
My Dockerfile:
# Use a smaller base image
FROM python:3.10-slim
# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV TZ=Europe/Istanbul
ENV LC_ALL=tr_TR.UTF-8
ENV LANG=tr_TR.UTF-8
# Set the working directory in the container
WORKDIR /app
# Install dependencies and Chrome in one layer to keep image size smaller
RUN apt-get update && apt-get install -y \
wget \
gnupg \
unzip \
curl \
ca-certificates \
fonts-liberation \
libappindicator3-1 \
libasound2 \
libatk-bridge2.0-0 \
libatk1.0-0 \
libcups2 \
libdbus-1-3 \
libgdk-pixbuf2.0-0 \
libnspr4 \
libnss3 \
libx11-xcb1 \
libxcomposite1 \
libxdamage1 \
libxrandr2 \
xdg-utils \
locales \
--no-install-recommends \
&& ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone \
&& wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
&& apt-get install ./google-chrome-stable_current_amd64.deb --yes \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Configure locale settings for Türkiye
RUN echo "LC_ALL=tr_TR.UTF-8" >> /etc/environment \
&& echo "LANG=tr_TR.UTF-8" >> /etc/environment \
&& locale-gen tr_TR.UTF-8
# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application
COPY . .
# Expose the ports for FastAPI and Streamlit
EXPOSE 8000 8501
# Command to run FastAPI and Streamlit
CMD ["sh", "-c", "uvicorn menu_crawler:app --host 0.0.0.0 --port 8000 & streamlit run Hotel_Analyst.py"]