-
Notifications
You must be signed in to change notification settings - Fork 91
Open
Description
Gave this a try :-)
Feedback:
- If this library works as advertised it'd be huge!
mlscraper.html
is missing from the PyPI package.- When no scraper can be found, the error message could be more helpful:
mlscraper.training.NoScraperFoundException: did not find scraper
Would be nice if the error message gave some guidance as to what fields
couldn't be found in the HTML.
Even with DEBUG log level it's not really helpful. - See more notes in my script below.
- Training the script was really slow (gave up after 15 min).
import requests
from mlscraper.html import Page
from mlscraper.samples import Sample, TrainingSet
from mlscraper.training import train_scraper
jonas_url = "https://github.com/jonashaag"
resp = requests.get(jonas_url)
resp.raise_for_status()
page = Page(resp.content)
sample = Sample(
page,
{
"name": "Jonas Haag",
"followers": "329", # Note that this doesn't work if 329 passed as an int.
#'company': '@QuantCo', # Does not work.
"twitter": "@_jonashaag", # Does not work without the "@".
"username": "jonashaag",
"nrepos": "282",
},
)
training_set = TrainingSet()
training_set.add_sample(sample)
scraper = train_scraper(training_set)
resp = requests.get("https://github.com/lorey")
result = scraper.get(Page(resp.content))
print(result)
lorey
Metadata
Metadata
Assignees
Labels
No labels