-
Notifications
You must be signed in to change notification settings - Fork 51
TASK-5564 - Update data sources for CellBase 6.x #696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jtarraga
wants to merge
305
commits into
release-6.x.x
Choose a base branch
from
TASK-5564
base: release-6.x.x
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+14,923
−9,116
Open
Changes from all commits
Commits
Show all changes
305 commits
Select commit
Hold shift + click to select a range
e18506b
lib: update CellBase downloaders, #TASK-5775, #TASK-5564
jtarraga 69a58bf
core: update CellBase configuration file, #TASK-5775, #TASK-5564
jtarraga d4e0cd6
lib: update MANE Select downloader, #TASK-5775, #TASK-5564
jtarraga 6ee2f78
lib: update LRG, HGNC, Cancer HotSpot, DGIDB, Gene Uniprot Xref, Gene…
jtarraga d794ceb
lib: update RefSeq downloader, #TASK-5775, #TASK-5564
jtarraga 1b751de
lib: update missense scores (REVEL) downloader, #TASK-5775, #TASK-5564
jtarraga b635333
lib: update CADD and clinical variant downloaders, #TASK-5775, #TASK-…
jtarraga 106b96d
lib: update protein downloaders, #TASK-5775, #TASK-5564
jtarraga 55afe6b
lib: update gene downloader (specially for ensembl data), and improve…
jtarraga d81b68f
Merge branch 'TASK-5564' into TASK-5387
jtarraga 88c2b17
core: add Ensembl primary fasta URL into the configuration file for t…
jtarraga eee13e3
lib: update genome download manager by declaring and using constants …
jtarraga cd367b9
app: update genome builder by using constants from the class EtlCommo…
jtarraga ce6f8d5
app: fix sonnar issues in BuildCommandExecutor, #TASK-5564
jtarraga 3566e01
app: improve log/exception messages in DownloadCommandExecutor, #TASK…
jtarraga cd94452
app: update repeats builder, and improve log/exception messages, #TAS…
jtarraga 148814f
lib: update the repeats builder by removing the hardcoded filenames a…
jtarraga 30a4c87
lib: update conservation builder by removing the hardcoded filenames …
jtarraga 85e17db
lib: call bigWigToBedGraph to convert the GERP bigwig to bed graph fi…
jtarraga 0223cb5
lib: include log messages, #TASK-5564
jtarraga 833c337
lib: improve ProteinBuilder by removing hardcoded file names, adding …
jtarraga 01deb0c
lib: move DataSource reader from ConservationBuilder to the parent Ce…
jtarraga 9416894
lib: move the function to split UniProt into chuncks from the protein…
jtarraga 909c0b2
core: fix regulation URLs in the configuration file, #TASK-5775, #TAS…
jtarraga 71d8056
lib: launch a CellBase exception if executing a command (wget, gunzip…
jtarraga 1544824
lib: fix sonnar issues, #TASK-5775, #TASK-5564
jtarraga 3e43874
lib: move the function to parse and build PFMs from the regulation do…
jtarraga 959e423
core: update ontology section of the CellBase configuration since ont…
jtarraga 158c259
lib: update ontology download since ontology versions will be taken f…
jtarraga 0b83831
app: update the build command executor to check/copy the ontology ver…
jtarraga 39f0f41
lib: improve the ontology builder by removing hardcoded filenames, ad…
jtarraga 5c3dae0
lib: improve the PharmGKB downloader by moving the function to unzip …
jtarraga 971235e
lib: improve the PharmGKB builder by adding checks and log messages; …
jtarraga cd444b0
lib: improve the PubMed downloader by adding log messages and fixing …
jtarraga e19fe73
lib: create maps to get the names, categories and version filenames f…
jtarraga a29afe3
lib: update according to the EtlCommons changes, #TASK-5775, #TASK-5564
jtarraga 377ee9c
lib: improve PubMed builder by adding checks, log messages and fixing…
jtarraga 997c8ec
lib: update CADD downloader according to last changes, #TASK-5775, #T…
jtarraga 96078b7
lib: improve the CADD builder by adding checks, log messages, cleanin…
jtarraga 3163a90
lib: update the REVEL downloader according to the last changes, and a…
jtarraga bc22fad
lib: add log messages, #TASK-5776, #TASK-5564
jtarraga 0c9a299
lib: improve the Revel builder by fixing sonnar issues and adding che…
jtarraga 4f9e39a
lib: update CellBase downloaders according to the last changes, #TASK…
jtarraga 1586a77
app: update load command executor according to the EtlCommons changes…
jtarraga c7c398a
lib: update CellBase builders according to the EtlCommons changes, #T…
jtarraga 754384a
lib: fix revel builder, #TASK-5776, #TASK-5564
jtarraga 24eb091
configuration: update versions
imedina fc09da4
app: add bash script to fix the downloaded MirTarBase file, #TASK-577…
jtarraga 09d33a0
core: add some comments to the configuration file, #TASK-5775, #TASK-…
jtarraga 303585d
lib: update Ensembl/RefSeq indexers and builders (include major impro…
jtarraga 68c47ef
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga 312c654
Merge branch 'TASK-5564' into TASK-5387
jtarraga 5665648
Merge branch 'TASK-5387' into TASK-5388
jtarraga a25b9c1
core: fix PGS section in the configuration file, #TASK-5406, #TASK-5387
jtarraga df05c91
app: add PGS_DATA (polygenic scores) as valid data in the CellBase bu…
jtarraga e7c2385
lib: update clinical variant downloader by moving the split ClinVar f…
jtarraga f5b7c34
lib: update clinical variant builder by including the split ClinVar f…
jtarraga a4fca6b
lib: update code to the last changes, #TASK-5564
jtarraga 8598f08
Merge branch 'TASK-5564' into TASK-5387
jtarraga dec89f8
Merge branch 'TASK-5387' into TASK-5388
jtarraga 57c6f6f
lib: include SpliceAI/MMSplice in the configuration file, and create …
jtarraga c131459
lib: remove deprecated functions, #TASK-5575, #TASK-5564
jtarraga 3ef70b1
lib: improve PGS Catalog downloader, #TASK-5406, #TASK-5387
jtarraga 31bf3a2
lib: improve PGS Catalog builder, #TASK-5407, #TASK-5387
jtarraga c8d416e
lib: update CellBase loader for PGS Catalog data, #TASK-5410, #TASK-5387
jtarraga a8a047c
lib: improve gene downloader by taking into account the manually down…
jtarraga 100d6f3
lib: update gene builder (Ensembl/RefSeq) according to last changes, …
jtarraga 1852f3e
Merge branch 'TASK-5564' into TASK-5387
jtarraga 910ffd3
lib: refactor PGS builder to solve the RocksDB issue, #TASK-5407, #TA…
jtarraga 3939ac3
lib: improve PGS builder by speeding-up RocksDB, #TASK-5407, #TASK-5387
jtarraga 0cd4b80
lib: udate Ensembl/RefSeq gene builder to gunzip FASTA files before b…
jtarraga e42cd7e
Merge branch 'develop' into TASK-5564
jtarraga 6eac380
Merge branch 'TASK-5564' into TASK-5387
jtarraga 4d965d7
Merge branch 'develop' into TASK-5564
imedina fc65d14
lib: add hpo filter to GeneQuery
imedina 84ad97b
Many improvements and fixes:
imedina aaec065
* Add new ensembl_canonical.pl
imedina dad180d
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga 694b81d
lib: use DockerUtils to execute Perl script from docker image, #TASK-…
jtarraga c8e719a
test: update JUnit tests, #TASK-5564
jtarraga 19efdf4
cicd: update task.yml to deploy cellbase-builder docker, #TASK-5564
jtarraga fcbb680
build: create the MiRTarBase parser for .xlsx files, #TASK-5576, #TAS…
jtarraga 10a579a
Builder improvements and several data cleaning
imedina 87d95e8
Merge branch 'TASK-5564' of github.com:opencb/cellbase into TASK-5564
imedina c6bcbdd
Gene downloader fixes
imedina 0a6a84f
Add VariationDownloader
imedina 3dcad47
Add VariationDownloader
imedina 5eb33ae
app: update Dockerfile for cellbase-builder in order to allow the scr…
jtarraga f44dcc5
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga 2b226fe
lib: add variation to the EtlCommons dataVersionFilenamesMap, #TASK-5…
jtarraga 510819c
Merge branch 'develop' into TASK-5564
jtarraga 5514177
lib: remove unused variables, #TASK-5575, #TASK-5564
jtarraga ae9a817
core: add the field 'id' in DataSource model, #TASK-5575, #TASK-5564
jtarraga 20c554b
core: update DGIdb in the configuration file, #TASK-5575, #TASK-5564
jtarraga 9d2d4fe
lib: check if genome data is already downloaded before downloading to…
jtarraga 299003b
lib: add the parameter 'assembly' to command line when calling the sc…
jtarraga 1d171d5
lib: update GeneDownloadManager to call the script gene_extra_info.pl…
jtarraga d10931d
lib: improve genome and conservation downloaders by checking if data …
jtarraga b422f3a
lib: improve repeats downloaders by checking if data is already downl…
jtarraga d0c0ba3
lib: improve regulation downloader by checking if data is already dow…
jtarraga 1dc504f
lib: fix motif features folder for regulation downloader, #TASK-5575,…
jtarraga 4ba788d
lib: fix minor sonnar issue, #TASK-5575, #TASK-5564
jtarraga 6fc7129
lib: improve protein downloader by checking if data is already downlo…
jtarraga 8ed0e0d
lib: improve variation downloader by checking if data is already down…
jtarraga 1442766
lib: fix variation folder in downloader, #TASK-5575, #TASK-5564
jtarraga e48d27d
core: remove DISGENET, #TASK-5575, #TASK-5564
jtarraga 642935a
lib: improve gene downloader, removing DISGENET, fixing sonnar issues…
jtarraga 8030b02
lib: fix command line to execute Perl script, #TASK-5575, #TASK-5564
jtarraga e17e51d
lib: add files generated by scripts in the version JSON files, #TASK-…
jtarraga 733cade
lib: improve genome builder by checking files, and fixing sonnar issu…
jtarraga ddc1056
lib: take into account the parameter --keep when gunzip, #TASK-5576, …
jtarraga 8c6dc78
lib: improve conservation builder by adding checks, log messages and …
jtarraga 847f835
lib: add support for multi-species, checks and log messages in the re…
jtarraga b0d1c67
lib: add support for multi-species, checks and log messages in regula…
jtarraga 039aa81
lib: fix protein builder, #TASK-5576, #TASK-5564
jtarraga 7f77dec
lib: fix gene downloader for RefSeq files, #TASK-5575, #TASK-5564
jtarraga 0eb898e
lib: improve gene (Ensembl/RefSeq) builder by supporting multi-specie…
jtarraga 1d47fd9
lib: fix sonnar issues, #TASK-5576, #TASK-5564
jtarraga 7fbc054
lib: add variant and variant_structural_variations in the configurati…
jtarraga d483dcf
app: improve CellBase loader by creating a new function to be reused …
jtarraga 7f62ce7
lib: improve genome sequence and info loader, #TASK-6142, #TASK-5564
jtarraga 0602bba
app: update CellBase loader for conservation data, #TASK-6142, #TASK-…
jtarraga 2b4fbeb
app: update CellBase loader for genes and proteins according to the p…
jtarraga d693f57
lib: add VariantBuilder to generate the variation JSON files from VCF…
jtarraga 38400c1
app: update the CellBase loader for variation data according to the l…
jtarraga 3117337
app: add check before building variation data, #TASK-5776, #TASK-5564
jtarraga 9c810e7
lib: skip API-KEY param when parsing variant quey, #TASK-5564
jtarraga ec5f21a
server: update RESTful server to take into account multi-species, #TA…
jtarraga 36c3609
lib: extract the FutureSpliceScoreAnnotator in a file to reduce the V…
jtarraga efa4824
lib: update the VariantAnnotationCalculator to support multi-species,…
jtarraga 4326fa3
lib: add log messages in protein builder, #TASK-5776, #TASK-5564
jtarraga 2c7ddfb
lib: set variant ID in VariantBuilder, #TASK-5576, #TASK-5564
jtarraga 78211d0
lib: remove System.exit, #TASK-5576, #TASK-5564
jtarraga e0c6a13
lib: fix VariationBuilder by converting SV values from Ensembl to sta…
jtarraga 81e4cb1
lib: add new command 'data-list' to display the list of data supporte…
jtarraga 280fd67
app: update build options and fix sonnar issues, #TASK-5576, #TASK-5564
jtarraga 2235e5c
app: update CLI option descriptions for loading, exporting, indexing.…
jtarraga 08b0e1d
Prepare Port Patch Cellbase 5.8.3 -> 6.3.0 #TASK-6647
juanfeSanahuja dee7972
Merge branch 'develop' into TASK-6647-dev
juanfeSanahuja 6a4c16a
test: update JUnit tests according to the latest changes, #TASK-5564
jtarraga 68c9f43
lib: improve variation builder by setting xref and annotation, and re…
jtarraga 914b9c1
lib: remove break for testing, #TASK-5576, #TASK-5564
jtarraga 3538e14
core: add ontology data into configuration file for "mus musculus" an…
jtarraga d0d92a3
lib: update ontology downloader and take into account multi-species s…
jtarraga d51114b
lib: update ontology builder and take into account multi-species supp…
jtarraga 24450d3
app: update load command executor for ontology data according to the …
jtarraga 132382d
app: check data according to the species before loading data, #TASK-6…
jtarraga d556c4c
app: fix sonnar issues, #TASK-6142, #TASK-5564
jtarraga 60860ed
Merge pull request #706 from opencb/TASK-6647-dev
juanfeSanahuja 1f3572c
lib: fix the function to save status and message of the downloaded fi…
jtarraga 6056655
Merge branch 'develop' into TASK-5564
jtarraga a8d6368
core: add dbSNP in config file (removed after merging), #TASK-5564
jtarraga 2950c0e
Merge branch 'TASK-5564' into TASK-5387
jtarraga 162f34d
add: improve species and assembly parameter descriptions, #TASK-5575,…
jtarraga 344e92e
test: fix JUnit tests by updating configuration files, #TASK-5564
jtarraga 22770de
server: fix meta/health for multiple species, #TASK-6426, #TASK-5564
jtarraga 4cffc55
server: improve MetaWSServer for multiple species support, #TASK-6426…
jtarraga 2889192
lib: limit WriteBatch for number of items, #TASK-5407, #TASK-5387
jtarraga 8c8a4ea
lib: improve PGS builder, #TASK-5407, #TASK-5387
jtarraga 28a57ba
lib: fix sonnar issues, #TASK-5407, #TASK-5387
jtarraga a49783e
config: update version
imedina 76d849a
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga 2a0eb28
lib: update MiRTarBase indexer to take into account the new format (C…
jtarraga f5edb1f
lib: fix checkstyle and sonnar issues, #TASK-5564
jtarraga e65068d
lib: add _chunkIds in the collection conservation when loading data, …
jtarraga f45c2fc
config: update all data sources
imedina 32a1323
Merge branch 'release-6.x.x' into TASK-5564
jtarraga 1952fd4
core: fix UniProt version to 2025-02, #TASK-5564
jtarraga 3146041
lib: update UniProt builder to support release 2025-02, #TASK-5576, #…
jtarraga 13258bc
core: update the section PubMed of the configuration file, #TASK-5575…
jtarraga a7fd402
core: update configuration file for regulatory and motif features, #T…
jtarraga 247a135
lib: update gene builder for gnomAD 4.1 for Ensembl, and include it f…
jtarraga 7e53142
lib: update regulatory feature builder, #TASK-5576, #TASK-5564
jtarraga e3f80b5
lib: add more gnomAD 4.1 constraint scores, #TASK-5576, #TASK-5564
jtarraga cf425d1
lib: add imprented gene data from geneimprint.com, #TASK-7745, #TASK-…
jtarraga a824d7a
app: fix gene build path, #TASK-5576, #TASK-5564
jtarraga eec00af
lib: set category gene_annotation for geneimprint, and add to common …
jtarraga 694fd38
core: fix geneimprint URL in configuration file, #TASK-7745, #TASK-5564
jtarraga 504cb82
core: update configuration file, #TASK-5575, #TASK-5564
jtarraga 5cd942b
lib: fix HPO file parser (gene builder indexer), #TASK-5576, #TASK-5564
jtarraga 7e1ecb6
lib: update COSMIC builder to support v101 and above, #TASK-5576, #TA…
jtarraga b7f47a1
lib: fix gnomAD constraint indexer, #TASK-5576, #TASK-5564
jtarraga 8932c7f
lib: update COSMIC indexer, #TASK-5576, #TASK-5564
jtarraga 1c5a5da
lib: improve geneimprint indexer, #TASK-5576, #TASK-5564
jtarraga a5590e9
Merge branch 'release-6.x.x' into TASK-5564
jtarraga 85c1dc6
Merge branch 'TASK-5564' into TASK-5387
jtarraga f981f22
Merge branch 'TASK-5387' into TASK-5388
jtarraga 18dddab
pom: upgrade biodata and java-commons-lib dependencies, #TASK-5564
jtarraga e8a4b55
lib: add ChimerDB data (gene fusion) to gene annotation, #TASK-7830, …
jtarraga 7757e70
lib: update according to biodata changes, #TASK-7830, #TASK-5564
jtarraga 12b699a
lib: update according to biodata changes, #TASK-7745, #TASK-7830, #TA…
jtarraga 1165ee7
lib: update variant annotation calculator to take into account imprin…
jtarraga 129051a
pom: add the profile default-config-test-local, #TASK-5564
jtarraga 71e1352
lib: download ChimerKB, ChimerPub and ChimerSeq from ChimerDB, #TASK-…
jtarraga f9c6e7e
lib: download and build ChimerPub and ChimerSeq data from ChimerDB fo…
jtarraga b147775
lib: update according to biodata changes, #TASK-7745, #TASK-7830, #TA…
jtarraga 0d82f3b
lib: update according to biodata changes, and add mongodb-indexes, #T…
jtarraga 4858472
lib: download CIViC data, #TASK-7903, #TASK-5564
jtarraga 1b26b90
lib: implement CivicIndexer, #TASK-7903, #TASK-5564
jtarraga 5e6d98c
lib: improve CIViC indexer, #TASK-7903, #TASK-5564
jtarraga 3dd6361
lib: add CIViC additional properties in evidence entries, #TASK-7903,…
jtarraga ba0f293
server: improve the API key check before executing the CellBase endpo…
jtarraga 926b3d2
lib: update PharmGKB to ClinPGx, #TASK-7911, #TASK-5564
jtarraga b951b3c
lib: add rating and evidence_level in additional properties, #TASK-79…
jtarraga e78048d
config: fix some version dates
imedina 7536db4
core: update the configuration file, #TASK-5564
jtarraga a5961bb
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga 538212c
lib: fix clinical variant builder, #TASK-5564
jtarraga 7a17ed5
lib: fix GWAS download path, #TASK-5564
jtarraga a8b6de7
lib: improve clinical variants download/build paths, #TASK-5564
jtarraga 1e88be5
lib: improve CIViC indexer, #TASK-7903, #TASK-5564
jtarraga 0b33170
app: fix CADD loading, #TASK-6142, #TASK-5564
jtarraga 26e08f0
app: fix ClinPGx loader, #TASK-7911, #TASK-5564
jtarraga 0a6c2d8
app: fix CIViC loader (clinical variants), #TASK-7903, #TASK-5564
jtarraga 2ccfaa0
lib: add indexes for PGS collections, #TASK-5410, TASK-5564
jtarraga edf4515
lib: reduce batch size for PubMed data when loading, #TASK-6142, #TAS…
jtarraga 7d1f297
lib: fix sonnar issues, #TASK-6142, #TASK-5564
jtarraga 46de716
Merge branch 'release-6.x.x' into TASK-5564
jtarraga 604e42e
core: fix configuration file for JUnit test, #TASK-5564
jtarraga 28afbdd
lib: fix loader (PubMed), #TASK-6142, #TASK-5564
jtarraga abf94a0
server: improve default data releases for multiple species, #TASK-5564
jtarraga 5addb64
server: improve messages in endpoint /meta/about, #TASK-5564
jtarraga c135e05
cicd: added to test-analysis -DCELLBASE.WAR.NAME=cellbase #TASK-5564
juanfeSanahuja b6e7363
cicd: added to test-analysis -DCELLBASE.WAR.NAME=cellbase #TASK-5564
juanfeSanahuja 4c983d5
pom: updated policy #TASK-5564
juanfeSanahuja 6f907ff
lib: use estimatedCount to speedup count queries, #TASK-5564
jtarraga 1a60716
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga 9b97afc
core: use the env. variable CELLBASE_SECRET_KEY, #TASK-8046, #TASK-5564
jtarraga 0750f55
lib: add more indexes for the collection 'pubmed', #TASK-5564
jtarraga f9129c4
server: add some admin/endpoints to API key management, TASK-7912, #T…
jtarraga 167dcf1
lib: add DbSnpDownloader, and remove building and loading dbSNP data …
jtarraga 31286ce
lib: backwards compatibility, #TASK-5564
jtarraga 0a2d312
lib: fix typo, #TASK-5564
jtarraga 210a4d7
lib: catch exceptions in the different consequence type calculators, …
jtarraga 3873120
lib: fix PGS include, and remove some System.out, #TASK-5564
jtarraga 9fe6dec
app: implement a Python script to compare performances, #TASK-5564
jtarraga 45b2c44
app: add Python script to get metrics for a given variant annotation …
jtarraga 8f2da9c
lib: disable polygenic scores and mirna targets, to be enables in fut…
jtarraga 6df8576
lib: fix NPE for conservation scores in breakends, #TASK-5564
jtarraga 93e86fd
Merge branch 'release-6.x.x' into TASK-5564
jtarraga b7c0adc
client: update Meta endpoint to backward compatibility, #TASK-5564
jtarraga 1e0e24e
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
juanfeSanahuja 619a069
cicd:Added deploy in task yml #TASK-5564
juanfeSanahuja 26a2d78
cicd:Fix pull request approved #TASK-5564
juanfeSanahuja d159998
cicd:Fix pull request approved #TASK-5564
juanfeSanahuja 7c5a09e
cicd:Fix pull request approved #TASK-5564
juanfeSanahuja d0d8c39
cicd:Fix pull request approved #TASK-5564
juanfeSanahuja 99a36a9
cicd:Fix pull request approved #TASK-5564
juanfeSanahuja 87d1c5f
cicd:Fix pull request approved #TASK-5564
juanfeSanahuja df99b71
cicd:Fix pull request approved #TASK-5564
juanfeSanahuja File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
cellbase-app/app/scripts/ensembl-scripts/ensembl_canonical.pl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| #!/usr/bin/env perl | ||
|
|
||
| use strict; | ||
| use Getopt::Long; | ||
| use Data::Dumper; | ||
| use JSON; | ||
| use DB_CONFIG; | ||
|
|
||
| use BioMart::Initializer; | ||
| use BioMart::Query; | ||
| use BioMart::QueryRunner; | ||
|
|
||
| ## Default values | ||
| my $species = 'hsapiens'; | ||
| my $outdir = "./"; | ||
|
|
||
| ## Parsing command line | ||
| GetOptions ('species=s' => \$species, 'outdir=s' => \$outdir); | ||
|
|
||
|
|
||
| my $confFile = "/opt/cellbase/scripts/ensembl-scripts/martURLLocation.xml"; | ||
|
|
||
| # NB: change action to 'clean' if you wish to start a fresh configuration | ||
| # and to 'cached' if you want to skip configuration step on subsequent runs from the same registry | ||
| my $action='clean'; | ||
| my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action); | ||
| my $registry = $initializer->getRegistry; | ||
|
|
||
| my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default'); | ||
|
|
||
| $query->setDataset($species."_gene_ensembl"); | ||
|
|
||
| $query->addAttribute("ensembl_gene_id"); | ||
| $query->addAttribute("ensembl_transcript_id"); | ||
| $query->addAttribute("transcript_is_canonical"); | ||
|
|
||
| $query->formatter("TSV"); | ||
|
|
||
| # Open the file for writing | ||
| open(my $fh, '>', "$outdir/ensembl_canonical.txt") or die "Cannot open ensembl_canonical.txt file: $!"; | ||
|
|
||
| # Save the original stdout | ||
| my $original_stdout = *STDOUT; | ||
| open(STDOUT, '>&', $fh) or die "Can't redirect STDOUT: $!"; | ||
|
|
||
| my $query_runner = BioMart::QueryRunner->new(); | ||
|
|
||
| # to obtain unique rows only | ||
| $query_runner->uniqueRowsOnly(1); | ||
| $query_runner->execute($query); | ||
| #$query_runner->printHeader(); | ||
| #print ENSEMBL_CANONICAL $query_runner->printResults(); | ||
| # Call printResults which prints to STDOUT (now redirected to the file) | ||
| $query_runner->printResults(); | ||
| #$query_runner->printFooter(); | ||
|
|
||
| # Restore the original stdout | ||
| open(STDOUT, '>&', $original_stdout) or die "Can't restore STDOUT: $!"; | ||
|
|
||
| # Close the filehandle | ||
| close($fh) or die "Failed to close file: $!"; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
19 changes: 19 additions & 0 deletions
19
cellbase-app/app/scripts/ensembl-scripts/martURLLocation.xml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| <!-- | ||
| ~ Copyright 2015-2020 OpenCB | ||
| ~ | ||
| ~ Licensed under the Apache License, Version 2.0 (the "License"); | ||
| ~ you may not use this file except in compliance with the License. | ||
| ~ You may obtain a copy of the License at | ||
| ~ | ||
| ~ http://www.apache.org/licenses/LICENSE-2.0 | ||
| ~ | ||
| ~ Unless required by applicable law or agreed to in writing, software | ||
| ~ distributed under the License is distributed on an "AS IS" BASIS, | ||
| ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| ~ See the License for the specific language governing permissions and | ||
| ~ limitations under the License. | ||
| --> | ||
|
|
||
| <MartRegistry> | ||
| <MartURLLocation database="ensembl_mart_111" default="1" displayName="Ensembl Genes 111" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ENSEMBL" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" /> | ||
| </MartRegistry> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Check failure
Code scanning / SonarCloud
GitHub Actions should not be vulnerable to script injections High