Last updated: 2020-04-17
Checks: 6 1
Knit directory: bigquery-covid19/
This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has staged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish
to commit the R Markdown file and build the HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200415)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 2199019. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Staged changes:
Modified: analysis/query-phenomexcan.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/query-phenomexcan.Rmd
) and HTML (docs/query-phenomexcan.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 2199019 | Hae Kyung Im | 2020-04-17 | built |
html | 2199019 | Hae Kyung Im | 2020-04-17 | built |
Rmd | 19c2011 | Hae Kyung Im | 2020-04-17 | rmd querying bigquery and postgres |
Rmd | 7fb3256 | Hae Kyung Im | 2020-04-16 | bigqueries |
Rmd | 3bb3017 | Hae Kyung Im | 2020-04-16 | edits |
html | 3bb3017 | Hae Kyung Im | 2020-04-16 | edits |
Rmd | b96fa1c | Hae Kyung Im | 2020-04-16 | query tables |
## install.packages("bigrquery")
## suppressPackageStartupMessages(library(bigrquery))
library(tidyverse)
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.0 ✓ purrr 0.3.3
✓ tibble 3.0.0 ✓ dplyr 0.8.5
✓ tidyr 1.0.2 ✓ stringr 1.4.0
✓ readr 1.3.1 ✓ forcats 0.5.0
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
## connect tp phenomexcan postgres database, define querying functions
suppressPackageStartupMessages(source("code/helpers.R", chdir = TRUE))
phenomexcan_con <- get_db()
dbListTables(phenomexcan_con)
[1] "clinvar_phenotypes" "genes" "phenotypes"
[4] "smultixcan" "spredixcan" "tissues"
[7] "ukb_clinvar"
## connect to dapg dataset
dapg_dataset = "GTEx_V8_DAPG"
dapg_con <- DBI::dbConnect(
bigrquery::bigquery(),
project = "gtex-awg-im",
dataset = dapg_dataset,
billing = "gtex-awg-im")
##
dbListTables(dapg_con)
Using an auto-discovered, cached token.
To suppress this message, modify your code or options to clearly consent to the use of a cached token.
See gargle's "Non-interactive auth" vignette for more details:
https://gargle.r-lib.org/articles/non-interactive-auth.html
The bigrquery package is using a cached token for imlab2015@gmail.com.
Auto-refreshing stale OAuth token.
[1] "clusters_eqtl" "clusters_eqtl_eur"
[3] "clusters_sqtl" "model_summary_eqtl_eur"
[5] "model_variants_eqtl_eur" "models_eqtl"
[7] "models_eqtl_eur" "variants_pip_eqtl"
[9] "variants_pip_eqtl_eur" "variants_pip_sqtl"
## connect to gtex-awg-im:annotations.gencode_v26_all
gencode_con <- DBI::dbConnect(
bigrquery::bigquery(),
project = "gtex-awg-im",
dataset = "annotations",
billing = "gtex-awg-im")
gencode_tbl <- tbl(gencode_con, "gencode_v26_all")
gencode_df <- gencode_tbl %>% collect()
## connect to prediction weights gtex-awg-im:GTEx_V8_PF_MASHR_EUR_v1.weights_eqtl
mashrpred_con <- DBI::dbConnect(
bigrquery::bigquery(),
project = "gtex-awg-im",
dataset = "GTEx_V8_PF_MASHR_EUR_v1",
billing = "gtex-awg-im")
##
dbListTables(mashrpred_con)
[1] "extra_eqtl" "extra_sqtl"
[3] "smultixcan_eqtl" "smultixcan_eqtl_count"
[5] "smultixcan_sqtl" "smultixcan_sqtl_count"
[7] "spredixcan_eqtl" "spredixcan_eqtl_count"
[9] "spredixcan_eqtl_hn" "spredixcan_eqtl_hn_count"
[11] "spredixcan_eqtl_hq" "spredixcan_eqtl_hq_count"
[13] "spredixcan_sqtl" "spredixcan_sqtl_count"
[15] "weights_eqtl" "weights_sqtl"
weights_tbl = tbl(mashrpred_con,"weights_eqtl")
enslist = c("ENSG00000184012", "ENSG00000132842","ENSG00000130234")
## using the table from phenomexcan (sex chr missing)
genes_tbl <- tbl(phenomexcan_con,"genes")
colnames(genes_tbl)
[1] "id" "gene_id" "gene_id_version" "gene_name"
[5] "gene_type" "band"
genes_tbl %>% filter(gene_id %in% enslist) %>% collect() %>% .[["gene_name"]]
[1] "AP3B1" "TMPRSS2"
## using gencode table
colnames(gencode_tbl)
[1] "chromosome" "start_location" "end_location" "feature_type"
[5] "strand" "gene_id" "gene_name" "gene_type"
gencode_tbl %>% filter(substr(gene_id,1,15) %in% enslist) %>% collect() %>% .[["gene_name"]]
[1] "AP3B1" "TMPRSS2" "ACE2"
input = list()
input$gene_name = c("TMPRSS2", "AP3B1")
input$limit = 100
tempo <- get_results_from_data_db(input)
results:
SELECT
p.unique_description as phenotype, p.source as phenotype_source, g.gene_name, g.band as gene_band,
sm.pvalue, sm.rcp,
(case when sm.dir_effect_most_signif = 1 then '+'
when sm.dir_effect_most_signif = -1 then '-'
else '' end) as best_sign,
(case when sm.dir_effect_consensus = 1 then '+'
when sm.dir_effect_consensus = -1 then '-'
else '' end) as consensus_sign,
sm.n as n_tissues, sm.n_indep
FROM smultixcan sm inner join phenotypes p on (sm.pheno_id = p.id) inner join genes g on (sm.gene_num_id = g.id)
WHERE 1=1
AND g.gene_name IN (values ('TMPRSS2'), ('AP3B1'))
ORDER BY sm.pvalue
LIMIT 100
Query completed
head(tempo)
phenotype phenotype_source gene_name
1 Sitting height UK Biobank AP3B1
2 Leg fat-free mass (left) UK Biobank AP3B1
3 Standing height UK Biobank AP3B1
4 Leg predicted mass (left) UK Biobank AP3B1
5 Forced vital capacity (FVC), Best measure UK Biobank AP3B1
6 Forced vital capacity (FVC) UK Biobank AP3B1
gene_band pvalue rcp best_sign consensus_sign n_tissues n_indep
1 5q14.1 5.3877e-37 0.0891 - - 19 5
2 5q14.1 8.0496e-26 0.1424 - - 19 5
3 5q14.1 9.8699e-26 0.2496 - - 19 5
4 5q14.1 1.0095e-25 0.1397 - - 19 5
5 5q14.1 1.9335e-25 0.3197 - - 19 5
6 5q14.1 3.2199e-25 0.3423 - - 19 5
input = list()
input$sp_gene_name = "TMPRSS2"
input$sp_limit = 100
if(T) tempop <- get_sp_results_from_data_db(input)
results:
SELECT
p.unique_description as phenotype, p.source as phenotype_source, t.name as tissue_name, g.gene_name, g.band as gene_band,
sp.pvalue, sp.zscore, sp.effect_size
FROM spredixcan sp inner join phenotypes p on (sp.pheno_id = p.id)
inner join tissues t on (sp.tissue_id = t.id)
inner join genes g on (sp.gene_num_id = g.id)
WHERE 1=1
AND g.gene_name IN (values ('TMPRSS2'))
ORDER BY sp.pvalue
LIMIT 100
Query completed
head(tempop)
phenotype phenotype_source
1 C_MALE_GENITAL UK Biobank
2 malignant neoplasm of male genital organs UK Biobank
3 Malignant neoplasm of prostate UK Biobank
4 Diagnoses - main ICD10: J43 Emphysema UK Biobank
5 Diagnoses - main ICD10: C61 Malignant neoplasm of prostate UK Biobank
6 Diagnoses - main ICD10: C61 Malignant neoplasm of prostate UK Biobank
tissue_name gene_name gene_band pvalue zscore effect_size
1 Prostate TMPRSS2 21q22.3 8.6656e-08 5.3527 0.0138880
2 Prostate TMPRSS2 21q22.3 8.6656e-08 5.3527 0.0138880
3 Prostate TMPRSS2 21q22.3 1.3051e-07 5.2781 0.0132060
4 Lung TMPRSS2 21q22.3 5.6411e-07 5.0031 0.0022190
5 Prostate TMPRSS2 21q22.3 4.0875e-05 4.1025 0.0088149
6 Esophagus_Mucosa TMPRSS2 21q22.3 5.0152e-05 -4.0549 -0.3784200
tbl(phenomexcan_con,"genes") %>% tally()
# Source: lazy query [?? x 1]
# Database: postgres [metaxcan_ro@35.202.243.187:5432/phenomexcan]
n
<int64>
1 22515
## TODO write function genename2ensid and ensid2genename
genelist = c("TMPRSS2", "AP3B1","ACE2")
ensverlist = gencode_df %>% filter(gene_name %in% genelist) %>% collect() %>% .[["gene_id"]]
dbListTables(dapg_con)
[1] "clusters_eqtl" "clusters_eqtl_eur"
[3] "clusters_sqtl" "model_summary_eqtl_eur"
[5] "model_variants_eqtl_eur" "models_eqtl"
[7] "models_eqtl_eur" "variants_pip_eqtl"
[9] "variants_pip_eqtl_eur" "variants_pip_sqtl"
dapg_pip_tbl <- tbl(dapg_con,"variants_pip_eqtl")
tempof <- dapg_pip_tbl %>% filter(gene %in% ensverlist) %>% collect()
tempof <- tempof %>% left_join(gencode_df %>% select(gene_id,gene_name),by=c("gene"="gene_id"))
head(tempof)
# A tibble: 6 x 8
tissue gene rank variant_id pip log10_abvf cluster_id gene_name
<chr> <chr> <int> <chr> <dbl> <dbl> <int> <chr>
1 Lung ENSG00000… 95 chr5_78141447… 2.04e-3 3.23 1 AP3B1
2 Lung ENSG00000… 2 chr5_78089857… 3.83e-2 4.50 1 AP3B1
3 Lung ENSG00000… 79 chr5_78050844… 3.01e-3 3.40 1 AP3B1
4 Lung ENSG00000… 39 chr5_78242771… 6.25e-3 2.94 1 AP3B1
5 Lung ENSG00000… 107 chr5_78053912… 1.94e-3 3.22 1 AP3B1
6 Lung ENSG00000… 155 chr5_78192860… 7.68e-4 2.81 1 AP3B1
genelist = c("TMPRSS2", "AP3B1")
ensidverlist = gencode_df %>% filter(gene_name %in% genelist) %>% collect() %>% .[["gene_id"]]
tempow <- weights_tbl %>% filter(gene %in% ensverlist) %>% collect()
tempow <- tempow %>% left_join(gencode_df %>% select(gene_id,gene_name),by=c("gene"="gene_id"))
head(tempow)
# A tibble: 6 x 8
gene rsid varID ref_allele eff_allele weight tissue gene_name
<chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
1 ENSG0000… rs3507… chr21_414… AC A 0.115 Lung TMPRSS2
2 ENSG0000… rs7881… chr5_7828… A C -0.0278 Brain_C… AP3B1
3 ENSG0000… rs7881… chr5_7828… A C -0.0631 Muscle_… AP3B1
4 ENSG0000… rs9293… chr5_7820… A C 0.116 Whole_B… AP3B1
5 ENSG0000… rs7390… chr21_415… A G -0.00317 Esophag… TMPRSS2
6 ENSG0000… rs7390… chr21_415… A G 0.00957 Spleen TMPRSS2
# server <- function(input, output, session) {
# updateSelectizeInput(session = session, inputId = 'gene_name', choices = c(genes_), server = TRUE)
# updateSelectizeInput(session = session, inputId = 'sp_gene_name', choices = c(genes_), server = TRUE)
# updateSelectizeInput(session = session, inputId = 'pheno', choices = c(phenotypes_), server = TRUE)
# updateSelectizeInput(session = session, inputId = 'sp_pheno', choices = c(phenotypes_), server = TRUE)
# updateSelectizeInput(session = session, inputId = 'sp_tissue', choices = c(tissues_), server = TRUE)
# updateSelectizeInput(session = session, inputId = 'uc_ukb_trait', choices = c(phenotypes_), server = TRUE)
# updateSelectizeInput(session = session, inputId = 'uc_clinvar_trait', choices = c(clinvar_phenotypes_), server = TRUE)
#
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RPostgres_1.2.0 forcats_0.5.0 stringr_1.4.0 dplyr_0.8.5
[5] purrr_0.3.3 readr_1.3.1 tidyr_1.0.2 tibble_3.0.0
[9] ggplot2_3.3.0 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 lubridate_1.7.8 lattice_0.20-41 utf8_1.1.4
[5] assertthat_0.2.1 rprojroot_1.3-2 digest_0.6.25 R6_2.4.1
[9] cellranger_1.1.0 backports_1.1.6 reprex_0.3.0 evaluate_0.14
[13] httr_1.4.1 pillar_1.4.3 rlang_0.4.5 curl_4.3
[17] readxl_1.3.1 rstudioapi_0.11 whisker_0.4 blob_1.2.1
[21] rmarkdown_2.1 bit_1.1-15.2 munsell_0.5.0 broom_0.5.5
[25] compiler_3.6.3 httpuv_1.5.2 modelr_0.1.6 xfun_0.13
[29] askpass_1.1 pkgconfig_2.0.3 htmltools_0.4.0 openssl_1.4.1
[33] tidyselect_1.0.0 workflowr_1.6.1 fansi_0.4.1 crayon_1.3.4
[37] dbplyr_1.4.2 withr_2.1.2 later_1.0.0 grid_3.6.3
[41] bigrquery_1.2.0 nlme_3.1-147 jsonlite_1.6.1 gtable_0.3.0
[45] lifecycle_0.2.0 DBI_1.1.0 git2r_0.26.1 magrittr_1.5
[49] scales_1.1.0 cli_2.0.2 stringi_1.4.6 fs_1.4.1
[53] promises_1.1.0 xml2_1.3.1 ellipsis_0.3.0 generics_0.0.2
[57] vctrs_0.2.4 tools_3.6.3 bit64_0.9-7 glue_1.4.0
[61] hms_0.5.3 yaml_2.2.1 gargle_0.4.0 colorspace_1.4-1
[65] rvest_0.3.5 knitr_1.28 haven_2.2.0