注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Bioinformatics home

 
 
 

日志

 
 

GEO2R Login  

2012-11-01 08:53:28|  分类: 默认分类 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

http://www.ncbi.nlm.nih.gov/geo/geo2r/

Background

GEO2R is an interactive web tool that allows users to compare two or more groups of Samples in a GEO Series in order to identify genes that are differentially expressed across experimental conditions. Results are presented as a table of genes ordered by significance.

GEO2R performs comparisons on original submitter-supplied processed data tables using the GEOquery and limma R packages from the Bioconductor project. Bioconductor is an open source software project based on the R programming language that provides tools for the analysis of high-throughput genomic data. The GEOquery R package parses GEO data into R data structures that can be used by other R packages. The limma (Linear Models for Microarray Analysis) R package has emerged as one of the most widely used statistical tests for identifying differentially expressed genes. It handles a wide range of experimental designs and data types and applies multiple-testing corrections on P-values to help correct for the occurrence of false positives. Thus, GEO2R provides a simple interface that allows users to perform R statistical analysis without command line expertise.

Unlike GEO's other DataSet analysis tools, GEO2R does not rely on curated DataSets and interrogates the original Series Matrix data file directly. This allows a greater proportion of GEO data to be analyzed in a timely manner. However, it is important to realize that this tool can access and analyze almost any GEO Series, regardless of data type and quality, so the user must be aware of GEO2R Limitations and caveats.


How to use Back to top

Enter a Series accession number

If you followed a link from a Series record, the GEO accession box will already be populated. Otherwise, enter a Series accession number in the box, e.g., GSE25724. If the Series is associated with multiple Platforms, you will be asked to select the Platform of interest.


Define Sample groups

In the Samples panel, click 'Define groups' and enter names for the groups of Samples you plan to compare, e.g., test and control. Up to 10 groups can be defined. At least two groups must be defined in order to perform the test. Groups can be removed using the [X] feature next to the group name.


Assign Samples to each group
Screenshot of GEO2R samples table

To assign Samples to a group, highlight relevant Sample rows. Multiple rows may be highlighted either by dragging the cursor over contiguous Samples, or using Ctrl or Shift keys. When relevant Samples are highlighted, click the group name to assign those Samples to the group. Repeat for each group. Not all Samples in a Series need to be selected for the test to work.

Use the Sample metadata columns to help determine which Samples belong to which group. The table is populated with Accession, Title, Source name and individual Characteristics fields from the Sample records. You can change which fields are displayed using the Columns box at the upper right corner of the table, and the columns can be sorted by clicking the table headers.


Perform the test

After Samples have been assigned to groups, click [Top 250] to run the test with default parameters.

Alternatively, you can use features in the other tabs to first assess the Sample value distributions, or edit default test parameters. For example, you can select an alternative P-value adjustment method in the Options tab then go back to the GEO2R tab and click [Top 250] to run the test with revised parameters. Details regarding each edit option are provided in the Edit options and features section below.


Interpret the results table
Screenshot of GEO2R results table

Results are presented in the browser as a table of the top 250 genes ranked by P-value. Genes with the smallest P-value are the most significant. Click on a row to reveal the gene expression profile graph for that gene. Each red bar in the graph represents the expression measurement extracted from the value column of the original submitter-supplied Sample record. The Sample accession numbers and group names are listed along the bottom of the chart.

Use the Select columns feature to modify which data and annotation columns are included in the table. Information about the meaning of the data columns is provided in the Summary statistics section.

If you want to edit the test parameters, you can do so in the Options tab, then go back to the GEO2R tab and click Recalculate to apply the edits.

To see more than the top 250 results, or if you want to save the results, the complete results table may be downloaded using the Save all results button. The downloaded files are tab-delimited and suitable for opening in a spreadsheet application such as Excel.


Tutorial Video


Edit options and features Back to top

Value distribution

This feature allows you to calculate and view the distribution of the values for the Samples you have selected. Values are the original submitter-supplied data upon which GEO2R calculations are performed. Viewing the distribution is important for determining if your selected Samples are suitable for comparison; see Limitations and caveats for more information. Generally, median-centered values are indicative that the data are normalized and cross-comparable.

Value distributions may be viewed graphically as a box plot. The graphic can be saved by right-clicking on the image. Alternatively, the distribution can be exported as a tab-delimited number summary table.


Options

Apply adjustment to the P-values: Limma provides several P-value adjustment options. These adjustments, also called multiple-testing corrections, attempt to correct for the occurrence of false positive results. The Benjamini & Hochberg false discovery rate method is selected by default because it is the most commonly used adjustment for microarray data and provides a good balance between discovery of statistically significant genes and limitation of false positives. If you want to change the adjustment method, go to the Options tab and select another method. References for each method are provided below. The adjusted P-values are listed in the Adj P-value column of the results table.

Apply log transformation to the data: The GEO database accepts a variety of data value types, including logged and unlogged data. Limma expects data values to be in log space. To address this, GEO2R has an auto-detect feature that checks the values of selected Samples and automatically performs a log2 transformation on values determined not to be in log space. Alternatively, the user can select Yes to force log2 transformation, or No to override the auto-detect feature. The auto-detect feature only considers Sample values that have been assigned to a group, and applies the transformation in an all-or-none fashion.

Category of Platform annotation to display on results: Select which category of annotation to display on results. Gene annotations are derived from the corresponding Platform record. Two types of annotation are possible:

NCBI generated annotation is available for many records. These annotations are derived by extracting stable sequence identification information from the Platform and periodically querying against the Entrez Gene and UniGene databases to generate consistent and up-to-date annotation. Gene symbol and Gene title annotations are selected by default. Other categories of NCBI generated annotation include GO terms and chromosomal location information.

Submitter supplied annotation is available for all records. These represent the original Platform annotations provided by the submitter. Note that there is a lot of diversity in the style and content of submitter supplied annotations and they may not have been updated since the time of submission.


Profile graph

This tab allows you to view a specific gene expression profile graph by entering the corresponding identifier from the ID column of the Platform record. This feature does not perform any calculations; it merely displays the expression values of the gene across Samples. Sample groups do not need to be defined for this feature to work.


R script

This tab prints the R script used to perform the calculation. This information can be saved and used as a reference for how results were calculated.

 

 

# Version info: R 2.14.1, Biobase 2.15.3, GEOquery 2.23.2, limma 3.10.1
# R scripts generated  Wed Oct 31 20:47:18 EDT 2012

################################################################
#   Differential expression analysis with limma
library(Biobase)
library(GEOquery)
library(limma)

# load series and platform data from GEO

gset <- getGEO("GSE18388", GSEMatrix =TRUE)
if (length(gset) > 1) idx <- grep("GPL6246", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# make proper column names to match toptable
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
sml <- c("G0","G0","G0","G0","G1","G1","G1","G1");

# log2 transform
ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
          (qx[6]-qx[1] > 50 && qx[2] > 0) ||
          (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
if (LogC) { ex[which(ex <= 0)] <- NaN
  exprs(gset) <- log2(ex) }

# set up the data and proceed with analysis
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=250)

# load NCBI platform annotation
gpl <- annotation(gset)
platf <- getGEO(gpl, AnnotGPL=TRUE)
ncbifd <- data.frame(attr(dataTable(platf), "table"))

# replace original platform annotation
tT <- tT[setdiff(colnames(tT), setdiff(fvarLabels(gset), "ID"))]
tT <- merge(tT, ncbifd, by="ID")
tT <- tT[order(tT$P.Value), ]  # restore correct order

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title","GO.Process"))
write.table(tT, file=stdout(), row.names=F, sep="\t")

################################################################
#   Boxplot for selected GEO samples
library(Biobase)
library(GEOquery)

# load series and platform data from GEO

gset <- getGEO("GSE18388", GSEMatrix =TRUE)
if (length(gset) > 1) idx <- grep("GPL6246", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# group names for all samples in a series
sml <- c("G0","G0","G0","G0","G1","G1","G1","G1")

# order samples by group
ex <- exprs(gset)[ , order(sml)]
sml <- sml[order(sml)]
fl <- as.factor(sml)
labels <- c("space-flown","control")

# set parameters and draw the plot
palette(c("#DFEAF4","#f4dfdf", "#AABBCC"))
dev.new(width=4+dim(gset)[[2]]/5, height=6)
par(mar=c(2+round(max(nchar(sampleNames(gset)))/2),4,2,1))
title <- paste ("GSE18388", '/', annotation(gset), " selected samples", sep ='')
boxplot(ex, boxwex=0.6, notch=T, main=title, outline=FALSE, las=2, col=fl)
legend("topleft", labels, fill=palette(), bty="n")

  评论这张
 
阅读(2093)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017