Skip to content

semantic_types key in curie_to_bl_type_db includes duplicates #235

@gaurav

Description

@gaurav

Something has gone wrong with the way in which the semantic_types key in curie_to_bl_type_db (also known as semantic-count) is set: it is supposed to be a unique list of semantic types stored in this NodeNorm instance, but it currently (2023nov5) contains a list of 3,331 Biolink types. Here is a random selection:

2493) "biolink:GeneOrGeneProduct"
2494) "biolink:Entity"
2495) "biolink:NamedThing"
2496) "biolink:BiologicalEntity"
2497) "biolink:GeneFamily"
2498) "biolink:GeneGroupingMixin"
2499) "biolink:Human"
2500) "biolink:NucleicAcidEntity"
2501) "biolink:ClinicalAttribute"
2502) "biolink:Food"
2503) "biolink:OrganismAttribute"
2504) "biolink:Attribute"
2505) "biolink:MolecularActivity"
2506) "biolink:PhysiologicalProcess"
2507) "biolink:Event"
2508) "biolink:Device"
2509) "biolink:GeographicLocation"
2510) "biolink:PlanetaryEntity"
2511) "biolink:Phenomenon"
2512) "biolink:Behavior"
2513) "biolink:Activity"
2514) "biolink:Procedure"
2515) "biolink:ActivityAndBehavior"
2516) "biolink:Agent"
2517) "biolink:AdministrativeEntity"
2518) "biolink:Cohort"
2519) "biolink:PopulationOfIndividualOrganisms"
2520) "biolink:StudyPopulation"
2521) "biolink:Drug"
2522) "biolink:MolecularMixture"
2523) "biolink:Publication"
2524) "biolink:InformationContentEntity"
2525) "biolink:PhysicalEntity"
2526) "biolink:BiologicalProcess"
2527) "biolink:Occurrent"
2528) "biolink:BiologicalProcessOrActivity"
2529) "biolink:Disease"
2530) "biolink:DiseaseOrPhenotypicFeature"
2531) "biolink:CellularComponent"
2532) "biolink:Cell"
2533) "biolink:SubjectOfInvestigation"
2534) "biolink:OrganismalEntity"
2535) "biolink:AnatomicalEntity"
2536) "biolink:ComplexMolecularMixture"
2537) "biolink:ChemicalMixture"
2538) "biolink:SmallMolecule"
2539) "biolink:ChemicalOrDrugOrTreatment"
2540) "biolink:ChemicalEntity"
2541) "biolink:MolecularEntity"
2542) "biolink:Protein"
2543) "biolink:ChemicalEntityOrProteinOrPolypeptide"
2544) "biolink:GeneProductMixin"
2545) "biolink:Polypeptide"
2546) "biolink:Gene"
2547) "biolink:MacromolecularMachineMixin"
2548) "biolink:PhysicalEssenceOrOccurrent"
2549) "biolink:ThingWithTaxon"
2550) "biolink:OntologyClass"
2551) "biolink:PhysicalEssence"
2552) "biolink:ChemicalEntityOrGeneOrGeneProduct"
2553) "biolink:GenomicEntity"
2554) "biolink:GeneOrGeneProduct"
2555) "biolink:Entity"

Presumably this bug is caused by the loader, and may be caused by each Biolink type being added along with all of its ancestors.

I've fixed this at the endpoint by uniquifying the result (PR #232), but it would be good to figure out what's going wrong with the loader and fix it there.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions