注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Bioinformatics home

 
 
 

日志

 
 

R recipes  

2012-12-01 00:45:08|  分类: R |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
Fold
Table of Contents
1. Converting the Gene Ontology graph into igraph
2. Centrality(-zation) function
3. Exporting graphs to LaTeX, using igraph and TikZ
3.1 The function
3.2 The code
3.3 An example
4. Bonacich's power centrality for big graphs, using sparse matrices
5. Assortativity coefficient in R
6. Bipartite Networks measures
7. Weighted degree distribution
8. Leverage centrality
9. Animating temporal networks
1. Converting the Gene Ontology graph into igraph

Rajarshi Guha wrote a nice Python script that converts the Gene Ontology graph into an igraph graph. Here is an R version that is much shorter, because we take advantage of the Bioconductor project, which has the data bundled into an R package.

Note that you only need the first two lines if you don't yet have the GO.db package installed. This package is updated twice a year, just like Bioconductor itself.

source("http://bioconductor.org/biocLite.R")
biocLite("GO.db")
library(GO.db)
library(igraph)
BP <- toTable(GOBPPARENTS)
CC <- toTable(GOCCPARENTS)
MF <- toTable(GOMFPARENTS)
g <- graph.data.frame( rbind(BP,CC,MF) )
Here is another version that keeps (some of) the metadata of the vertices (=terms) as well: the name of the term, the ontology and the textual definition. These will be added as vertex attributes.

source("http://bioconductor.org/biocLite.R")
biocLite("GO.db")
library(GO.db)
library(igraph)
BP <- toTable(GOBPPARENTS)
CC <- toTable(GOCCPARENTS)
MF <- toTable(GOMFPARENTS)
terms <- toTable(GOTERM)[,2:5]
terms <- terms[ !duplicated(terms[,1]), ]
g <- graph.data.frame( rbind(BP,CC,MF), vertices=terms )
For the terms, the table in the GO.db package is a one-to-many mapping, because it contains the (potentially many) synonyms and secondary IDS of the category. But we throw away these columns anyway, so we throw away the rows that appear many time in the table as well.

2. Centrality(-zation) function

This is quite basic, but may help cut corners. All the three 'classic' centrality measures are normalized.

centrality.norm<-function(graph,type=c("degree","closeness","betweenness"),centralization=FALSE)
{    
    result<-NA
    g<-graph
    cent<-centralization
    if (!is.igraph(g)) {stop("Not a graph object")}
    if (type[[1]] == "degree") {
        if (!cent) result <- degree(g)/(vcount(g)-1)
        else result <- (sum(max(degree(g))-degree(g)))/((vcount(g)-1)*(vcount(g)-2))}
    else if (type[[1]] == "betweenness") {
        temp <- 2*betweenness(g)/((vcount(g)-1)*(vcount(g)-2))
        if (!cent) result <- temp
        else  result <- sum(max(temp)-temp)/(vcount(g)-1)}
    else if (type[[1]] == "closeness") {
        if (!cent) result <- closeness(g)
        else result <- (2*vcount(g)-3)*(sum(max(closeness(g))-closeness(g)))/((vcount(g)-1)*(vcount(g)-2))}
    else {stop("this type is unavailable or mispelled")}
    return(result)
}
For example, if you want to calculate the closeness centrality of the vertices of a graph:

g<-graph.atlas(1000);
centrality.norm(g,type="closeness",centralization=FALSE)
To obtain the closeness centralization:

centrality.norm(g,type="closeness",centralization=TRUE)
3. Exporting graphs to LaTeX, using igraph and TikZ

Jérémie Knüsel wrote a very useful function to export graphs in LaTeX in order to use the powerful TikZ package, allowing then to use its options. You'll be able to create very large graphs, which was hard and time-consuming to do in LaTeX before.

3.1 The function

The two "igraph" parameters are 1. the graph (igraph format) 2. the layout, which can be either a matrix (dim: vcount(g) x vcount(g)) or a function (e.g. layout.random, …)
The "TikZ" parameters, that are situated inside the "igraph.to.tikz" function may by modified according to the pgf/TikZ manual.
node is related to the nodes' aspects
nodirectional is used in the case of undirected graph
unidirectional and bidirectional are used with directed graphs. tbidirectional is used in the case of bidectional (!) edges and draws them one next to the other (and not superimposed like with the dafault R output).

3.2 The code

tikz <- function (graph, layout) {
  ## Here we get the matrix layout
  if (class(layout) == "function")
    layout <- layout(graph)

  layout <- layout / max(abs(layout))
  clos <- closeness(graph)

  ##TikZ initialisation and default options (see pgf/TikZ manual)
  cat("\\tikzset{\n")
  cat("\tnode/.style={circle,inner sep=1mm,minimum size=0.8cm,draw,very thick,black,fill=red!20,text=black},\n")
  cat("\tnondirectional/.style={very thick,black},\n")
  cat("\tunidirectional/.style={nondirectional,shorten >=2pt,-stealth},\n")
  cat("\tbidirectional/.style={unidirectional,bend right=10}\n")
  cat("}\n")
  cat("\n")

  ##Size
  cat("\\begin{tikzpicture}[scale=5]\n")

  for (vertex in V(graph)) {
    label <- V(graph)[vertex]$label
    if (is.null(label))
      label <- ""

  ##drawing vertices
    cat (sprintf ("\t\\node [node] (v%d) at (%f, %f)\t{%s};\n", vertex, layout[vertex+1,1], layout[vertex+1,2], label))
  }
  cat("\n")

  adj = get.adjacency(graph)
  bidirectional = adj & t(adj)

  if (!is.directed(graph)) ##undirected case
      for (line in 1:nrow(adj)) {
      for (col in line:ncol(adj)) {
           if (adj[line,col]&col>line) {
            cat (sprintf ("\t\\path [nondirectional] (v%d) edge (v%d);\n", line-1, col-1)) ##edges drawing
          }
        }
      }
  else ##directed case
      for (line in 1:nrow(adj)) {
      for (col in 1:ncol(adj)) {
          if (bidirectional[line,col]&line > col)
            cat (sprintf ("\t\\path [bidirectional] (v%d) edge (v%d);\n", line-1, col-1),
            sprintf ("\t\\path [bidirectional] (v%d) edge (v%d);\n", col-1, line-1)) ##edges drawing
        else if (!bidirectional[line,col]&adj[line,col]) 
            cat (sprintf ("\t\\path [unidirectional] (v%d) edge (v%d);\n", line-1, col-1)) ##edges drawing
      }
    }

  cat("\\end{tikzpicture}\n")
}
3.3 An example

After the installation of the pgf/TikZ package, create a LaTeX document with this template:

\documentclass{article}

\usepackage{tikz} 

\usetikzlibrary{arrows,decorations.pathmorphing,backgrounds,positioning,fit} 

\begin{document} 

\end{document}
Then evaluate the function "igraph.to.tikz" in R. At that point, create a graph and translate it:

g<-as.directed(graph.atlas(400))
igraph.to.tikz(g,layout.fruchterman.reingold)
Copy the result in the LaTeX document and compose it!

4. Bonacich's power centrality for big graphs, using sparse matrices

bonpow.sparse <- function(graph, nodes=V(graph), loops=FALSE,
                          exponent=1, rescale=FALSE, tol=1e-07) {

  ## remove loops if requested
  if (!loops) {
    graph <- simplify(graph, remove.multiple=FALSE, remove.loops=TRUE)
  }

  vg <- vcount(graph)

  ## sparse adjacency matrix
  d <- get.adjacency(graph, sparse=TRUE)

  ## sparse identity matrix
  id <- Diagonal(vg)

  ## solve it
  ev <- solve(id - exponent * d, degree(graph, mode="out"), tol=tol)

  if (rescale) {
    ev <- ev/sum(ev)
  } else {
    ev <- ev * sqrt(vcount(graph)/sum((ev)^2))
  }

  ev[as.numeric(nodes) + 1]
}

## test graph
test.g <- simplify(ba.game(1000,m=2))

## test run
system.time(bp1 <- bonpow(test.g))
system.time(bp2 <- bonpow.sparse(test.g))

## check that they are the same
max(abs(bp1-bp2))
5. Assortativity coefficient in R

Code translated from the python recipes. References? Same coefficient as Newman's ?

assortativity <- function(graph){
    deg <- degree(graph)
    deg.sq <- deg^2
    m <- ecount(graph)
    num1 <- 0; num2 <- 0; den <- 0
    edges <- get.edgelist(graph, names=FALSE)+1

    num1 <- sum(deg[edges[,1]] * deg[edges[,2]]) / m
    num2 <- (sum(deg[edges[,1]] + deg[edges[,2]]) / (2 * m))^2
    den <- sum(deg.sq[edges[,1]] + deg.sq[edges[,2]]) / (2 * m)

    return((num1-num2)/(den-num2))
    }
6. Bipartite Networks measures

Reference: Matthieu Latapy, Clemence Magnien, Nathalie Del Vecchio, Basic notions for the analysis of large two-mode networks, Social Networks 30 (2008) 31–48

# Number of top and bottom nodes
top<-length(V(g)[type==FALSE])
bottom<-length(V(g)[type==TRUE])
# Number of edges
m<-ecount(g)
# Mean degree for top and bottom nodes
ktop<-m/top
kbottom<-m/bottom
# Density for bipartite network
bidens<-m/(top*bottom)
# Largest connected component for top and bottom nodes:
gclust<-clusters(g, mode='weak')
lcc<-induced.subgraph(g, V(g)[gclust$membership==1])
lcctop<-length(V(lcc)[type==FALSE])
lccbottom<-length(V(lcc)[type==TRUE])
# Mean distance for top and bottom nodes
distop<-mean(shortest.paths(lcc, v=V(lcc)[type==FALSE], to=V(lcc)[type==FALSE], mode = 'all'))
disbottom<-mean(shortest.paths(lcc, v=V(lcc)[type==TRUE], to=V(lcc)[type==TRUE], mode = 'all'))
#######################################################
# Network transitivity
trtarget<-graph.motifs(target, 4)
ccglobal<-(2*trtarget[20])/trtarget[14]
# Clustering coefficients (cc, cclowdot, cctopdot) for top and bottom nodes
ccPointTarg<-ccBip(g)
cctop<-mean(ccPointTarg$proj1, na.rm=TRUE)
ccbottom<-mean(ccPointTarg$proj2, na.rm=TRUE)
ccLowTarg<-ccLowDot(g)
cclowdottop<-mean(ccLowTarg$proj1, na.rm=TRUE)
cclowdotbottom<-mean(ccLowTarg$proj2, na.rm=TRUE)
ccTopTarg<-ccTopDot(g)
cctopdottop<-mean(ccTopTarg$proj1, na.rm=TRUE)
cctopdobottom<-mean(ccTopTarg$proj2, na.rm=TRUE)
# Those last clustering measures relies on the functions ccBip wrote by Gabor Csardi, while ccLowDot and ccTopDot are 
# essentially the same function with only a minor change
ccBip <- function(g) {
if (! "name" %in% list.vertex.attributes(g)) {
  V(g)$name <- seq_len(vcount(g))
}
neib <- get.adjlist(g)
names(neib) <- V(g)$name
proj <- bipartite.projection(g)
lapply(proj, function(x) {
  el <- get.edgelist(x)
  sapply(V(x)$name, function(v) {
    subs <- el[,1]==v | el[,2]==v
    f <- function(un, vn) length(union(un, vn))
    vals <- E(x)[subs]$weight / unlist(mapply(f, neib[el[subs,1]], neib[el[subs,2]]))
    mean(vals)
  })
})
}

ccLowDot <- function(g) {
if (! "name" %in% list.vertex.attributes(g)) {
  V(g)$name <- seq_len(vcount(g))
}
neib <- get.adjlist(g)
names(neib) <- V(g)$name
proj <- bipartite.projection(g)
lapply(proj, function(x) {
  el <- get.edgelist(x)
  sapply(V(x)$name, function(v) {
    subs <- el[,1]==v | el[,2]==v
    f <- function(un, vn) min(length(un), length(vn))
    vals <- E(x)[subs]$weight /
      unlist(mapply(f, neib[el[subs,1]], neib[el[subs,2]]))
    mean(vals)
  })
})
}

ccTopDot <- function(g) {
if (! "name" %in% list.vertex.attributes(g)) {
  V(g)$name <- seq_len(vcount(g))
}
neib <- get.adjlist(g)
names(neib) <- V(g)$name
proj <- bipartite.projection(g)
lapply(proj, function(x) {
  el <- get.edgelist(x)
  sapply(V(x)$name, function(v) {
    subs <- el[,1]==v | el[,2]==v
    f <- function(un, vn) max(length(un), length(vn))
    vals <- E(x)[subs]$weight /
      unlist(mapply(f, neib[el[subs,1]], neib[el[subs,2]]))
    mean(vals)
  })
})
}
7. Weighted degree distribution

The below code simply takes the code for degree.distribution and modifies it to use the weighted degree in the calculation (obtained via graph.strength)

graph.strength.distribution <- function (graph, cumulative = FALSE, ...)
{
  if (!is.igraph(graph)) {
    stop("Not a graph object")
  }
  # graph.strength() instead of degree()
  cs <- graph.strength(graph, ...)
  hi <- hist(cs, -1:max(cs), plot = FALSE)$intensities
  if (!cumulative) {
    res <- hi
  }
  else {
    res <- rev(cumsum(rev(hi)))
  }
  res
}
8. Leverage centrality

The following function calculates the leverage centrality of each vertex in a graph according to the definition found in Joyce et al: A New Measure of Centrality for Brain Networks (PLoS ONE 5(8):e12200, 2010). Thanks to Alex Upton for the implementation.

lvcent <- function(graph){
    k <- degree(graph)
    n <- vcount(graph)
    sapply(1:n, function(v) { mean((k[v]-k[neighbors(graph,v)]) / (k[v]+k[neighbors(graph,v)])) })
}
9. Animating temporal networks

Esteban Moro Egido has created some nice visualizations of temporal networks (see this one and this one) using igraph and R. He has also written a short tutorial about how to make such visualizations using igraph and only 20 lines of R code.
  评论这张
 
阅读(1429)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017