Skip to content
This repository was archived by the owner on Mar 22, 2022. It is now read-only.
This repository was archived by the owner on Mar 22, 2022. It is now read-only.

Unsupervised #9

@brooksjessup

Description

@brooksjessup

Explore the Data Using Pandas-
typo: "interpretation. <3 your data"

Why not apply some of the preprocessing techniques from the last lesson here on the music reviews data?

Creating the DTM using scikit-learn-
Explanation needed for why it's necessary to remove numbers.

Topic Modeling-
typo: "what the ext is about" -> "text"
The paragraph on the "theory" behind LDA is very dense and difficult to parse.

It is unnecessary to fit-transform both tf-idf and countvectorizer here - one or the other is fine.

Error message fitting the lda model:
"LatentDirichletAllocation(n_topics=10...)" -> "LatentDirichletAllocation(n_components=10"

It might be nice to include an interpretation of the 10 topics identified by the model.

Error message in cosine similarity example at end of notebook.

Further resources-
The link for the blog post is broken. Remove it?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions