|
| 1 | +--- |
| 2 | +# These are optional elements. Feel free to remove any of them. |
| 3 | +status: proposed |
| 4 | +contact: dmytrostruk |
| 5 | +date: 2024-08-20 |
| 6 | +deciders: sergeymenshykh, markwallace, rbarreto, westey-m |
| 7 | +--- |
| 8 | + |
| 9 | +# Entity Framework as Vector Store Connector |
| 10 | + |
| 11 | +## Context and Problem Statement |
| 12 | + |
| 13 | +This ADR contains investigation results about adding Entity Framework as Vector Store connector to the Semantic Kernel codebase. |
| 14 | + |
| 15 | +Entity Framework is a modern object-relation mapper that allows to build a clean, portable, and high-level data access layer with .NET (C#) across a variety of databases, including SQL Database (on-premises and Azure), SQLite, MySQL, PostgreSQL, Azure Cosmos DB and more. It supports LINQ queries, change tracking, updates and schema migrations. |
| 16 | + |
| 17 | +One of the huge benefits of Entity Framework for Semantic Kernel is the support of multiple databases. In theory, one Entity Framework connector can work as a hub to multiple databases at the same time, which should simplify the development and maintenance of integration with these databases. |
| 18 | + |
| 19 | +However, there are some limitations, which won't allow Entity Framework to fit in updated Vector Store design. |
| 20 | + |
| 21 | +### Collection Creation |
| 22 | + |
| 23 | +In new Vector Store design, interface `IVectorStoreRecordCollection<TKey, TRecord>` contains methods to manipulate with database collections: |
| 24 | +- `CollectionExistsAsync` |
| 25 | +- `CreateCollectionAsync` |
| 26 | +- `CreateCollectionIfNotExistsAsync` |
| 27 | +- `DeleteCollectionAsync` |
| 28 | + |
| 29 | +In Entity Framework, collection (also known as schema/table) creation using programmatic approach is not recommended in production scenarios. The recommended approach is to use Migrations (in case of code-first approach), or to use Reverse Engineering (also known as scaffolding/database-first approach). Programmatic schema creation is recommended only for testing/local scenarios. Also, collection creation process differs for different databases. For example, MongoDB EF Core provider doesn't support schema migrations or database-first/model-first approaches. Instead, the collection is created automatically when a document is inserted for the first time, if collection doesn't already exist. This brings the complexity around methods such as `CreateCollectionAsync` from `IVectorStoreRecordCollection<TKey, TRecord>` interface, since there is no abstraction around collection management in EF that will work for most databases. For such cases, the recommended approach is to rely on automatic creation or handle collection creation individually for each database. As an example, in MongoDB it's recommended to use MongoDB C# Driver directly. |
| 30 | + |
| 31 | +Sources: |
| 32 | +- https://learn.microsoft.com/en-us/ef/core/managing-schemas/ |
| 33 | +- https://learn.microsoft.com/en-us/ef/core/managing-schemas/ensure-created |
| 34 | +- https://learn.microsoft.com/en-us/ef/core/managing-schemas/migrations/applying?tabs=dotnet-core-cli#apply-migrations-at-runtime |
| 35 | +- https://github.com/mongodb/mongo-efcore-provider?tab=readme-ov-file#not-supported--out-of-scope-features |
| 36 | + |
| 37 | +### Key Management |
| 38 | + |
| 39 | +It won't be possible to define one set of valid key types, since not all databases support all types as keys. In such case, it will be possible to support only standard type for keys such as `string`, and then the conversion should be performed to satisfy key restrictions for specific database. This removes the advantage of unified connector implementation, since key management should be handled for each database individually. |
| 40 | + |
| 41 | +Sources: |
| 42 | +- https://learn.microsoft.com/en-us/ef/core/modeling/keys?tabs=data-annotations |
| 43 | + |
| 44 | +### Vector Management |
| 45 | + |
| 46 | +`ReadOnlyMemory<T>` type, which is used in most SK connectors today to hold embeddings is not supported in Entity Framework out-of-the-box. When trying to use this type, the following error occurs: |
| 47 | + |
| 48 | +``` |
| 49 | +The property '{Property Name}' could not be mapped because it is of type 'ReadOnlyMemory<float>?', which is not a supported primitive type or a valid entity type. Either explicitly map this property, or ignore it using the '[NotMapped]' attribute or by using 'EntityTypeBuilder.Ignore' in 'OnModelCreating'. |
| 50 | +``` |
| 51 | + |
| 52 | +However, it's possible to use `byte[]` type or create explicit mapping to support `ReadOnlyMemory<T>`. It's already implemented in `pgvector` package, but it's not clear whether it will work with different databases. |
| 53 | + |
| 54 | +Sources: |
| 55 | +- https://github.com/pgvector/pgvector-dotnet/blob/master/README.md#entity-framework-core |
| 56 | +- https://github.com/pgvector/pgvector-dotnet/blob/master/src/Pgvector/Vector.cs |
| 57 | +- https://github.com/pgvector/pgvector-dotnet/blob/master/src/Pgvector.EntityFrameworkCore/VectorTypeMapping.cs |
| 58 | + |
| 59 | +### Testing |
| 60 | + |
| 61 | +Create Entity Framework connector and write the tests using SQLite database doesn't mean that this integration will work for other EF-supported databases. Each database implements its own set of Entity Framework features, so in order to ensure that Entity Framework connector covers main use-cases with specific database, unit/integration tests should be added using each database separately. |
| 62 | + |
| 63 | +Sources: |
| 64 | +- https://github.com/mongodb/mongo-efcore-provider?tab=readme-ov-file#supported-features |
| 65 | + |
| 66 | +### Compatibility |
| 67 | + |
| 68 | +It's not possible to use latest Entity Framework Core package and develop it for .NET Standard. Last version of EF Core which supports .NET Standard was version 5.0 (latest EF Core version is 8.0). Which means that Entity Framework connector can target .NET 8.0 only (which is different from other available SK connectors today, which target both net8.0 and netstandard2.0). |
| 69 | + |
| 70 | +Another way would be to use Entity Framework 6, which can target both net8.0 and netstandard2.0, but this version of Entity Framework is no longer being actively developed. Entity Framework Core offers new features that won't be implemented in EF6. |
| 71 | + |
| 72 | +Sources: |
| 73 | +- https://learn.microsoft.com/en-us/ef/core/miscellaneous/platforms |
| 74 | +- https://learn.microsoft.com/en-us/ef/efcore-and-ef6/ |
| 75 | + |
| 76 | +### Existence of current SK connectors |
| 77 | + |
| 78 | +Taking into account that Semantic Kernel already has some integration with databases, which are also supported Entity Framework, there are multiple options how to proceed: |
| 79 | +- Support both Entity Framework and DB connector (e.g. `Microsoft.SemanticKernel.Connectors.EntityFramework` and `Microsoft.SemanticKernel.Connectors.MongoDB`) - in this case both connectors should produce exactly the same outcome, so additional work will be required (such as implementing the same set of unit/integration tests) to ensure this state. Also, any modifications to the logic should be applied in both connectors. |
| 80 | +- Support just one Entity Framework connector (e.g. `Microsoft.SemanticKernel.Connectors.EntityFramework`) - in this case, existing DB connector should be removed, which may be a breaking change to existing customers. An additional work will be required to ensure that Entity Framework covers exactly the same set of features as previous DB connector. |
| 81 | +- Support just one DB connector (e.g. `Microsoft.SemanticKernel.Connectors.MongoDB`) - in this case, if such connector already exists - no additional work is required. If such connector doesn't exist and it's important to add it - additional work is required to implement that DB connector. |
| 82 | + |
| 83 | + |
| 84 | +Table with Entity Framework and Semantic Kernel database support (only for databases which support vector search): |
| 85 | + |
| 86 | +|Database Engine|Maintainer / Vendor|Supported in EF|Supported in SK|Updated to SK memory v2 design |
| 87 | +|-|-|-|-|-| |
| 88 | +|Azure Cosmos|Microsoft|Yes|Yes|Yes| |
| 89 | +|Azure SQL and SQL Server|Microsoft|Yes|Yes|No| |
| 90 | +|SQLite|Microsoft|Yes|Yes|No| |
| 91 | +|PostgreSQL|Npgsql Development Team|Yes|Yes|No| |
| 92 | +|MongoDB|MongoDB|Yes|Yes|No| |
| 93 | +|MySQL|Oracle|Yes|No|No| |
| 94 | +|Oracle DB|Oracle|Yes|No|No| |
| 95 | +|Google Cloud Spanner|Cloud Spanner Ecosystem|Yes|No|No| |
| 96 | + |
| 97 | +**Note**: |
| 98 | +One database engine can have multiple Entity Framework integrations, which can be maintained by different vendors (e.g. there are 2 MySQL EF NuGet packages - one is maintained by Oracle and another one is maintained by Pomelo Foundation Project). |
| 99 | + |
| 100 | +Vector DB connectors which are additionally supported in Semantic Kernel: |
| 101 | +- Azure AI Search |
| 102 | +- Chroma |
| 103 | +- Milvus |
| 104 | +- Pinecone |
| 105 | +- Qdrant |
| 106 | +- Redis |
| 107 | +- Weaviate |
| 108 | + |
| 109 | +Sources: |
| 110 | +- https://learn.microsoft.com/en-us/ef/core/providers/?tabs=dotnet-core-cli#current-providers |
| 111 | + |
| 112 | +## Considered Options |
| 113 | + |
| 114 | +- Add new `Microsoft.SemanticKernel.Connectors.EntityFramework` connector. |
| 115 | +- Do not add `Microsoft.SemanticKernel.Connectors.EntityFramework` connector, but add a new connector for individual database when needed. |
| 116 | + |
| 117 | +## Decision Outcome |
| 118 | + |
| 119 | +Based on the above investigation, the decision is not to add Entity Framework connector, but to add a new connector for individual database when needed. The reason for this decision is that Entity Framework providers do not uniformly support collection management operations and will require database specific code for key handling and object mapping. These factors will make use of an Entity Framework connector unreliable and it will not abstract away the underlying database. Additionally the number of vector databases that Entity Framework supports that Semantic Kernel does not have a memory connector for is very small. |
0 commit comments