Monday, September 27, 2010
Lecture 8 - Communities
There are four community detection methods; Node-centric community, group-centric community, network-centric community, hierarchy-centric community.
Each node in a group satisfies certain properties. We make node-centric community detection based on these properties. This method is commonly used in traditional social network analysis. In this approach nodes satisfy different properties. One is complete mutuality. A clique is a maximal complete subgraph of three or more nodes all of which are adjacent to each other. Reachability of members is another issue considering nodes. Any node in a group should be reachable in k hops. A k-clique is a maximal subgraph in which the largest geodesic distance between any nodes is smaller than or equal to k. A k-club graph is a substructure of diameter smaller than or equal to k. In node-centric community each node should have a certain number of connections to nodes within the group. In a k-core graph each node connects to at least k members within the group.
To be able to make group-centric community detection we consider the connections within a group as whole.
To form a group we need to consider the connections of the nodes globally in network-centric community detection. We partition the network into disjoint sets. While partitioning we consider some features like node similarity,latent space model, block model approximation, cut minimization and modularity maximization. Node similarity is defined by how similar their interaction patterns are. Two nodes are structurally equivalent if they connect to the same set of actors. For huge networks we apply k-node clustering algorithms to seperate the network.
Finally the goal in hierarchy-centric community detection is to build a hierarchical structure of communities based on network topology.
Thursday, September 23, 2010
Lecture 9: Measures and Metrics
- Equal Similarity: Nodes have exactly same neighbors.
- Structural Similarity: Nodes have many common neighbors. Used for measuring nodes' structural closing. Cosine similarity is used to calculate.
- Regular Equivalence: Nodes have different nodes, however structure of neighbors are similar.
Thursday, September 16, 2010
Project ideas
Lecture 7- Centrality continued
The lecture started with an outline of previous lecture. Then EigenVector Centrality was talked about. In the above concept if a node is connected to more central node then the node also assumes some importance. So, in case of Eigen Vector centrality a vertex is given a score proportional to the sum of its neighbor’s scores. Eigen Vector centrality can be calculated in case of directed graphs as well.
Next, Katz centrality is talked about. In this concept each vertex is given a certain amount of centrality without considering its neighbors and their centrality. The equation for Katz centrality is
X= b(I - aA)-1.1
Where I is an identity matrix, 1 is a matrix of all 1’s, a is the scaling vector and b is the extent to which we weigh the centrality of people ego is tied to.
Next concept is that of page rank. The gist behind this concept is that if numerous pages on the web are referring to a particular web page then the later is said to have high importance, but, we cannot consider all the referring pages to be equal. Therefore, page rank uses the concept called as tracking a drunk. If a drunk person is walking through a network he would be spending sometime at each node which is proportional to the importance of that node. The drunk person must be allowed to teleport to some other node with some probability. This is similar to a websurfer visiting various web pages spending some time viewing each webpage and then clicking on a link in that web page and then after a while the person can move to some random webpage with some probability (teleporting). Using this concept the formulation for page rank algorithm is written.
The next topic discussed was Hubs and Authorities. Hubs are nodes that refer to good webpages and Authorities are webpages referred to by good Hubs. In the Hyperlink Induced Topic Search algorithm (HITS), we can start with a set of pages matching a particular query. We expand this set by following all the links on the initial set of web pages. We construct a transition matrix E where in Eij=1/ni, where there is a link between I and j and ni is the number of links from i. The authority and the hub scores can be calculated as a’=ETh and h’= Ea.
Friday, September 10, 2010
Lecture 6 - Network Centrality
Degree centrality is the measure of how many edges are connected to a node. It is good for determining how important a node is based on its connections to other nodes. For instance, how many people you can access for information, the influence of an article in terms of citations, or the probability a computer will catch a virus based on its network connections. Degree centrality can be determined by Freeman's general formula for centralization. It is also important to normalize the centrality so that meaningful comparisons can be made between networks of different sizes.
Betweenness centrality is the measure of how many shortest paths involve a specific vertex. It is good for determining how important a node is in controlling the "flow" of a graph. Betweenness centrality can be determined by summing the number of geodesic paths connecting two vertices (j and k) that pass through the vertex in question (i) divided by all paths between j and k. If there is more than one node on a geodesic path between j and k, they will have to "share credit."
Closeness centrality is the measure of how many shortest paths a vertex has to all other vertices in the graph. It is good for determining how "close to the center" of a graph a vertex is. Closeness centrality can be determined by taking the inverse of the average geodesic distance from the vertex in question to all other vertices. Values of closeness tend to span a small range and there is little chance that a change in the graph will affect closeness to a great degree.
Wednesday, September 8, 2010
Lecture 5 - Mathematics of Networks
Another metric of a network is components. In weakly connected components every node can be reached from every other node. In strongly connected components each node must be reached from every other node within the component.
Final thing that is covered is the eigenvalues and eigenvectors. We calculated the eigenvalues and eigenvectors of a matrix from the given formulas.
Thursday, September 2, 2010
Lecture 4 - Mathematics of Networks
Networks are equivalent to graphs and are graphed in many different ways depending on the type on network such as Planar graphs, tree graphs, and Hyper graphs.
Planar graphs are graphs that can be drawn without any of the edges crossing.
Tree graphs are undirected and contain no cycles.
Hyper graphs are graphs that join more than two nodes at a time (hyperEdge).