Post-processing for OncoMX and Glygen
Processing done for integration of BioXpress data into OncoMX and Glygen.
Processing for OncoMX
The final output from BioXpress v-5.0 is available on the OncoMX-tst server at the path: /software/pipeline/integrator/downloads/bioxpress/v-5.0/`
For OncoMX, the de_per_tissue.csv is used to report gene expression per tissue, however data.oncomx.org hosts both per tissue and per study datasets. The files are processed with the recipe pipeline. The recipes filter for all genes that are successfully mapped to uniprotkb accession IDs.
Recipes
human_cancer_mRNA_expression_per_study.json
human_cancer_mRNA_expression_per_tissue.json
The output is available on the OncoMX-tst server at the path: /software/pipeline/integrator/unreviewed
Final output files
human_cancer_mRNA_expression_per_study.csv
human_cancer_mRNA_expression_per_tissue.csv
Processing for Glygen
The final output from BioXpress v-5.0 was modified to align with the previous input for cancer gene expression, and now includes the following columns:
pmid
sample_name
same as DOID and name
parent_doid
same as DOID
All DOIDs in v-5.0 are parent terms
parent_doname
same as DOID and name
All DOIDs in v-5.0 are parent terms
sample_id
Taken from previous version, unclear on the origin of these numbers
The following mapping for the column sample_id was recovered from the previous version and mapped to DOIDs present in v-5.0
sample_name |
sample_id |
---|---|
DOID:10283 / Prostate cancer [PCa] |
42 |
DOID:10534 / Stomach cancer [Stoca] |
19 |
DOID:11054 / Urinary bladder cancer [UBC] |
34 |
DOID:11934 / Head and neck cancer [H&NC] |
46 |
DOID:1612 / Breast cancer [BRCA] |
70 |
DOID:1781 / Thyroid cancer [Thyca] |
16 |
DOID:234 / Colon adenocarcinoma |
3 |
DOID:263 / Kidney cancer [Kidca] & Kidney renal cl … |
61 |
DOID:3571 / Liver cancer [Livca] |
60 |
DOID:3907 / Lung squamous cell carcinoma |
33 |
DOID:3910 / Lung adenocarcinoma |
53 |
DOID:4465 / Papillary renal cell carcinoma |
57 |
DOID:4471 / Chromophobe adenocarcinoma |
23 |
DOID:5041 / Esophageal cancer [EC] |
32 |
The processed file for Glygen is available on the glygen-vm-dev server at /software/pipeline/integrator/downloads/bioxpress/August_2021/human_cancer_mRNA_expression_per_tissue_glygen.csv