Skip to content

Conversation

@colombod
Copy link
Member

@colombod colombod commented Oct 3, 2023

Adding

  • text chunking with overlapping size
  • text chunking by token count

@colombod colombod marked this pull request as draft October 3, 2023 22:29

namespace Microsoft.DotNet.Interactive.AIUtilities;

public static class TextChunkingExtensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on a better name? What else is going to end up in the category of string operations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm probably will be template parameter parsing?

};

Func<string, float[]> stringToFloatArray = a => a.Select(c => (float)c).ToArray();
var search = collection.Search(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this signature very well. Let's work on simplifying.

public float Score(T a, TQuery query);
}

public class CosineSimilarityComparer<T> : ISimilarityComparer<T, float[]>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate files please.

};

var search = collection.OrderBySimilarity(
var search = collection.ScoreBySimilarity(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ScoreBySimilarityTo("diego")?

@colombod colombod marked this pull request as ready for review October 5, 2023 00:03
@colombod colombod enabled auto-merge (rebase) October 5, 2023 00:04
@colombod colombod merged commit bae0646 into dotnet:main Oct 5, 2023
@colombod colombod deleted the cookcook_support branch October 5, 2023 02:37
@colombod colombod added the Area-Build & Infrastructure Relating to this repo's build and infrastructure label Oct 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Build & Infrastructure Relating to this repo's build and infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants