-
Notifications
You must be signed in to change notification settings - Fork 1
Search: Indexing Text for Both Effective Search and Accurate Analysis #433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: explain
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new "effective-search" documentation guide covering indexing, analyzers, tokenizers, filters, and pipeline recommendations; updates the FTS index page structure/navigation and card links; and makes small edits to the explain docs (rubric/tag additions and guidance on reporting flaws). Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
cdf7c17
to
4e8a1e1
Compare
204f4fb
to
395b467
Compare
Add article "Indexing Text for Both Effective Search and Accurate Analysis" by David Norton to "Explanation" section. Original source: https://web.archive.org/web/20250210021928/https://www.qualtrics.com/eng/indexing-text-for-both-effective-search-and-accurate-analysis/
395b467
to
7ef86bd
Compare
7ef86bd
to
be80745
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/feature/search/fts/effective-search.md (1)
88-88
: Optional: Minor style refinements.Static analysis suggests a few stylistic improvements (lines 88, 159, 233), but these are preferences rather than issues. The current phrasing is natural and idiomatic. If you wish to polish: consider alternatives to "first of all" for variety, and review whether "completely" and "as long as" could be replaced with more concise alternatives. These are entirely optional in a chill review.
Also applies to: 159-159, 233-233
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/explain/index.md
(1 hunks)docs/feature/search/fts/effective-search.md
(1 hunks)docs/feature/search/fts/index.md
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/explain/index.md
🧰 Additional context used
🪛 LanguageTool
docs/feature/search/fts/effective-search.md
[grammar] ~83-~83: Ensure spelling is correct
Context: ...Indexing If a client was to search for "wlking to work", they would probably hope to g...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~84-~84: Ensure spelling is correct
Context: ...back like: "I walked to work", "I enjoy walkng to work", and "I walk to work every day...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~88-~88: Often, this adverbial phrase is redundant. Consider using an alternative.
Context: ...ts without other negative consequences. First of all, “walking” is spelled wrong. Second, di...
(FIRST_OF_ALL)
[style] ~159-~159: To elevate your writing, try using an alternative expression here.
Context: ...nd that the actual content of the index does not matter as long as the search results are accur...
(MATTERS_RELEVANT)
[style] ~233-~233: Consider using a different adverb to strengthen your wording.
Context: ...ords (less than 4 characters) which are completely ignored by Lucene. Our spell correctio...
(COMPLETELY_ENTIRELY)
🔇 Additional comments (4)
docs/feature/search/fts/effective-search.md (2)
1-14
: Excellent article header, metadata, and archival reference.The article-info frontmatter is properly structured and the archive link to the original Qualtrics engineering blog article is correctly formatted with appropriate versioning.
Also applies to: 262-267
33-113
: Strong technical depth and clear pedagogical structure.The content progresses logically from business rationale (Why CrateDB?) through analyzer fundamentals to implementation techniques (character folding, lemmatization, spelling correction). The lemmatization comparison table and spell correction pseudocode effectively communicate complex concepts with concrete examples (e.g., Unicode apostrophes, German character folding rules, Morphy vs. stemmer accuracy).
Also applies to: 150-248
docs/feature/search/fts/index.md (2)
277-277
: Semantic section renaming improves taxonomy consistency.The updates from "Guides" → "Tutorials" and "Articles" → "Explanations" align with the broader documentation structure (as referenced in the PR context for docs/explain/index.md). This creates clearer semantic distinction: Tutorials are procedural/hands-on, Explanations are conceptual/deep-dive.
Also applies to: 301-301
341-360
: New card entry is well-integrated with correct cross-references.The card title, description, and link target correctly reference the new effective-search.md article. The reference label "effective-fulltext-search" at line 342 matches the file header label (verified at effective-search.md:1), and the toctree entry at line 370 correctly resolves to docs/feature/search/fts/effective-search.md. Tag assignments (Introduction, Analyzer, Tokenizer, Plugin) accurately reflect article content.
About
The excellent article Indexing Text for Both Effective Search and Accurate Analysis by David Norton (Home, LinkedIn, Substack) should not be left behind.
Preview
https://cratedb-guide--433.org.readthedocs.build/feature/search/fts/effective-search.html
/cc @surister