Tech Log
2026年3月17日 研究日志
今日主题:解析文献Integration of eQTL and a Single-Cell Atlas in the Human Eye Identifies Causal Genes for Age-Related Macular Degeneration
今日研究重点:了解该研究的目的及大致实验方法,探究数据类型及数据结构
研究个人笔记:此研究结合了批量测序和单核测序,样本量较大,研究方法描述详细,值得细读。单核测序是一种特殊的单细胞测序,其特点为只提取细胞核进行测序,优点为可以使用冷冻样本进行研究,并且通过核内RNA检测,可捕获更多未成熟的转录(lncRNA)。
数据可用性:该研究进行了bulk RNA测序和单核测序,提供了分析结果和处理后的测序数据,两种测序方法分为两个GEO数据集储存
GSE135092(批量测序) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135092
GSE135133(单核测序)https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135133
分析用代码存档:
↓ GSE135092(将计数数据整合为计数矩阵)
# ----Library----
library(here)
library(ggplot2)
library(ggrepel)
library(readr)
library(dplyr)
library(limma)
library(purrr)
# ----Data----
# The list of all .tsv.gz files
files <- list.files(here("Dataset","GSE135092","GSE135092_RAW"), pattern = "\\.tsv\\.gz$", full.names = TRUE)
# read function
read_one <- function(path) {
df <- read_tsv(
path,
skip = 4, # skip annotation and original names
col_names = c("ID_REF", "VALUE", "count"), # force use same name
show_col_types = FALSE
)
df
}
df_list <- map(files, read_one)
#df_list <- map(files, ~ read_tsv(.x, skip = 3, show_col_types = FALSE))
# Table annotation:
# ID_REF=ID in platform
# VALUE=Size-factor-adjusted RPKM (nRPKM)
# count=Number of reads uniquely aligning to gene model
sample_names <- gsub(".tsv.gz", "", basename(files))
# Keep Gene & nRPKM
df_list <- map2(df_list, sample_names, ~ .x %>%
dplyr::select(ID_REF, VALUE) %>%
dplyr::rename(Gene = ID_REF, !!.y := VALUE))
merged <- reduce(df_list, full_join, by = "Gene")
merged <- as.data.frame(merged)
rownames(merged) <- merged$Gene
merged$Gene <- NULL
↓ GSE135133 (读取单核数据并使用Seurat包进行单细胞数据处理流程)
# ----Library----
library(here)
library(ggplot2)
library(ggrepel)
library(readr)
library(dplyr)
library(Seurat)
library(purrr)
# ---- Data ----
read_one_gsm <- function(file) {
mat <- read.delim(file, row.names = 1, check.names = FALSE)
CreateSeuratObject(counts = mat, project = basename(file))
}
files <- list.files(here("Dataset","GSE135133","GSE135133_RAW"), pattern = "txt.gz$", full.names = TRUE)
objs <- lapply(files, read_one_gsm)
obj <- merge(objs[[1]], objs[-1])
clusters <- read.table(here("Dataset","GSE135133","GSE135133_clusterAssignments.txt.gz"),
header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# ---- Single cell data processing workflow ----
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
obj <- RunUMAP(obj, dims = 1:30)
Reference
Orozco, L. D., Chen, H. H., Cox, C., Katschke, K. J., Arceo, R., Espiritu, C., Caplazi, P., Nghiem, S. S., Chen, Y. J., Modrusan, Z., Dressen, A., Goldstein, L. D., Clarke, C., Bhangale, T., Yaspan, B., Jeanne, M., Townsend, M. J., van Lookeren Campagne, M., & Hackney, J. A. (2020). Integration of eQTL and a Single-Cell Atlas in the Human Eye Identifies Causal Genes for Age-Related Macular Degeneration. Cell Reports, 30(4), 1246-1259.e6. https://doi.org/10.1016/j.celrep.2019.12.082
备注:不愧是发在Cell Report上的文章,内容充足,研究方法全面,描述也非常详尽。一天根本不足以理解和复现,之后还会接着对其进行研究的。