Post-processing for OncoMX and Glygen

Processing done for integration of BioXpress data into OncoMX and Glygen.

Processing for OncoMX

The final output from BioXpress v-5.0 is available on the OncoMX-tst server at the path: /software/pipeline/integrator/downloads/bioxpress/v-5.0/`

For OncoMX, the de_per_tissue.csv is used to report gene expression per tissue, however data.oncomx.org hosts both per tissue and per study datasets. The files are processed with the recipe pipeline. The recipes filter for all genes that are successfully mapped to uniprotkb accession IDs.

Recipes

human_cancer_mRNA_expression_per_study.json human_cancer_mRNA_expression_per_tissue.json

The output is available on the OncoMX-tst server at the path: /software/pipeline/integrator/unreviewed

Final output files

human_cancer_mRNA_expression_per_study.csv human_cancer_mRNA_expression_per_tissue.csv

Processing for Glygen

The final output from BioXpress v-5.0 was modified to align with the previous input for cancer gene expression, and now includes the following columns:

  • pmid

  • sample_name

  • same as DOID and name

  • parent_doid

  • same as DOID

  • All DOIDs in v-5.0 are parent terms

  • parent_doname

  • same as DOID and name

  • All DOIDs in v-5.0 are parent terms

  • sample_id

  • Taken from previous version, unclear on the origin of these numbers

The following mapping for the column sample_id was recovered from the previous version and mapped to DOIDs present in v-5.0

sample_name

sample_id

DOID:10283 / Prostate cancer [PCa]

42

DOID:10534 / Stomach cancer [Stoca]

19

DOID:11054 / Urinary bladder cancer [UBC]

34

DOID:11934 / Head and neck cancer [H&NC]

46

DOID:1612 / Breast cancer [BRCA]

70

DOID:1781 / Thyroid cancer [Thyca]

16

DOID:234 / Colon adenocarcinoma

3

DOID:263 / Kidney cancer [Kidca] & Kidney renal cl …

61

DOID:3571 / Liver cancer [Livca]

60

DOID:3907 / Lung squamous cell carcinoma

33

DOID:3910 / Lung adenocarcinoma

53

DOID:4465 / Papillary renal cell carcinoma

57

DOID:4471 / Chromophobe adenocarcinoma

23

DOID:5041 / Esophageal cancer [EC]

32

The processed file for Glygen is available on the glygen-vm-dev server at /software/pipeline/integrator/downloads/bioxpress/August_2021/human_cancer_mRNA_expression_per_tissue_glygen.csv