BioXpress Publisher Step

Step 4 of the BioXpress pipeline.

General Flow of Scripts

de-publish-per-study.py -> de-publish-per-tissue.py

Procedure

Publisher Step 1 : Run the script de-publish-per-study.py

Summary

The python script de-publish-per-study.py takes the output from running DESeq in the previous step for each TCGA study and combines into one master file.

Method

Edit the hard-coded paths in the script de-publish-per-study.py

  • Specify the in_file for the disease ontology mapping file (line ~26)

  • Specify the in_file for the uniprot accession id (protein id) mapping file (line ~40)

  • Specify the in_file for the refseq mapping file (line ~51)

  • Specify the in_file for the list of TCGA studies to include in the final output (line ~72)

  • Specify the deseq_dir for the folder containing all deseq output (line ~80)

  • Specify the path to write the output (line ~135)

Run the python script python de-publish-per-study.py

Output

A csv file with the DEseq output for all TCGA studies, mapped to DO IDs, uniprot accession ids, and refseq ids. The path is specified in the script as one of the hard-coded lines edited during the method.

Publisher Step 2 : Run the script de-publish-per-tissue.py

Summary

The python script de-publish-per-tissue.py takes the output from running DESeq in the previous step for each tissue and combines into one master file.

Method

Edit the hard-coded paths in the script de-publish-per-study.py

  • Specify the in_file for the disease ontology mapping file (line ~26)

  • Specify the in_file for the uniprot accession id (protein id) mapping file (line ~40)

  • Specify the in_file for the refseq mapping file (line ~51)

  • Specify the in_file for the list of tissues to include in the final output (line ~72)

  • Specify the deseq_dir for the folder containing all deseq output (line ~80)

  • Specify the path to write the output (line ~135)

Output

A csv file with the DEseq output for all tissues, mapped to DO IDs, uniprot accession ids, and refseq ids. The path is specified in the script as one of the hard-coded lines edited during the method.