Monday, September 27, 2010

Lecture 8 - Communities

We talked about the communities this week. Community is a set of actors interacting with each other. If a there is a set of people without interaction we cannot say they form a community.
There are four community detection methods; Node-centric community, group-centric community, network-centric community, hierarchy-centric community.
Each node in a group satisfies certain properties. We make node-centric community detection based on these properties. This method is commonly used in traditional social network analysis. In this approach nodes satisfy different properties. One is complete mutuality. A clique is a maximal complete subgraph of three or more nodes all of which are adjacent to each other. Reachability of members is another issue considering nodes. Any node in a group should be reachable in k hops. A k-clique is a maximal subgraph in which the largest geodesic distance between any nodes is smaller than or equal to k. A k-club graph is a substructure of diameter smaller than or equal to k. In node-centric community each node should have a certain number of connections to nodes within the group. In a k-core graph each node connects to at least k members within the group.
To be able to make group-centric community detection we consider the connections within a group as whole.
To form a group we need to consider the connections of the nodes globally in network-centric community detection. We partition the network into disjoint sets. While partitioning we consider some features like node similarity,latent space model, block model approximation, cut minimization and modularity maximization. Node similarity is defined by how similar their interaction patterns are. Two nodes are structurally equivalent if they connect to the same set of actors. For huge networks we apply k-node clustering algorithms to seperate the network.
Finally the goal in hierarchy-centric community detection is to build a hierarchical structure of communities based on network topology.

Thursday, September 23, 2010

Lecture 9: Measures and Metrics

Transitivity is way of deriving path between two nodes using existent paths. Sure, there is a rule for this derivation. Rule can be visualized as follows: There exist 3 nodes A,B and C in our network. If there exists path from A to B and B to C , then we can conclude that path from A to C also exists.

Clustering coefficient can be defined as
c= (# of triangles x 3)/# of connected triples
For a clique(every node is connected to each other) , c equals 1. Clustering coefficient is used to understand network's structure such that if c is greater than 0.001 (this is an approximate number) that network can be defined as dense network.
For a node in network, complexity coefficient can also be calculated using following formula
C(i)= (#of neighbors of i that are connected)/(#of pairs of neighbors of i)

Reciprocity in networks is defined as mutual edges between two nodes. That is, if there exists directed edge from node A to Node B, then edge from node B to node A must exist.

Similarity is another issue between nodes in network that we mentioned. There are 3 types of similarity.
  1. Equal Similarity: Nodes have exactly same neighbors.
  2. Structural Similarity: Nodes have many common neighbors. Used for measuring nodes' structural closing. Cosine similarity is used to calculate.
  3. Regular Equivalence: Nodes have different nodes, however structure of neighbors are similar.
There are also other theorems that measure similarity using different parameters such as Euclidean Similarity and Pearson Similarity

Thursday, September 16, 2010

Project ideas

You may look at recent conferences such as NetSci to get idea of what the researchers are doing in the complex networks area.

Lecture 7- Centrality continued

The lecture started with an outline of previous lecture. Then EigenVector Centrality was talked about. In the above concept if a node is connected to more central node then the node also assumes some importance. So, in case of Eigen Vector centrality a vertex is given a score proportional to the sum of its neighbor’s scores. Eigen Vector centrality can be calculated in case of directed graphs as well.

Next, Katz centrality is talked about. In this concept each vertex is given a certain amount of centrality without considering its neighbors and their centrality. The equation for Katz centrality is

X= b(I - aA)-1.1

Where I is an identity matrix, 1 is a matrix of all 1’s, a is the scaling vector and b is the extent to which we weigh the centrality of people ego is tied to.

Next concept is that of page rank. The gist behind this concept is that if numerous pages on the web are referring to a particular web page then the later is said to have high importance, but, we cannot consider all the referring pages to be equal. Therefore, page rank uses the concept called as tracking a drunk. If a drunk person is walking through a network he would be spending sometime at each node which is proportional to the importance of that node. The drunk person must be allowed to teleport to some other node with some probability. This is similar to a websurfer visiting various web pages spending some time viewing each webpage and then clicking on a link in that web page and then after a while the person can move to some random webpage with some probability (teleporting). Using this concept the formulation for page rank algorithm is written.

The next topic discussed was Hubs and Authorities. Hubs are nodes that refer to good webpages and Authorities are webpages referred to by good Hubs. In the Hyperlink Induced Topic Search algorithm (HITS), we can start with a set of pages matching a particular query. We expand this set by following all the links on the initial set of web pages. We construct a transition matrix E where in Eij=1/ni, where there is a link between I and j and ni is the number of links from i. The authority and the hub scores can be calculated as a’=ETh and h’= Ea.

Friday, September 10, 2010

Lecture 6 - Network Centrality

In today's lecture we learned about network centrality. Centrality is a broad term that has a few different meanings based on the context in which it is used. If we know the structure of a network, then we can determine certain quantities or measures of a vertex's importance within the graph. We covered the first three of seven measures: degree centrality, betweenness centrality, and closeness centrality.

Degree centrality is the measure of how many edges are connected to a node. It is good for determining how important a node is based on its connections to other nodes. For instance, how many people you can access for information, the influence of an article in terms of citations, or the probability a computer will catch a virus based on its network connections. Degree centrality can be determined by Freeman's general formula for centralization. It is also important to normalize the centrality so that meaningful comparisons can be made between networks of different sizes.

Betweenness centrality is the measure of how many shortest paths involve a specific vertex. It is good for determining how important a node is in controlling the "flow" of a graph. Betweenness centrality can be determined by summing the number of geodesic paths connecting two vertices (j and k) that pass through the vertex in question (i) divided by all paths between j and k. If there is more than one node on a geodesic path between j and k, they will have to "share credit."

Closeness centrality is the measure of how many shortest paths a vertex has to all other vertices in the graph. It is good for determining how "close to the center" of a graph a vertex is. Closeness centrality can be determined by taking the inverse of the average geodesic distance from the vertex in question to all other vertices. Values of closeness tend to span a small range and there is little chance that a change in the graph will affect closeness to a great degree.

Wednesday, September 8, 2010

Lecture 5 - Mathematics of Networks

In today's lecture we continued learning about the Mathematics of Networks. We first learnt what a path is and continued with different types of paths. Self-avoiding paths do not intersect themselves. While tracing a path we must be careful if the edges are directed. Shortest path is the shortest sequence of links connecting two nodes. We also learnt how to calculate the path length (L), average path length (N) and diameter (D) of a network. According to formulas, we can get the relation between them as 1< L<= D< N. Two known types of paths Eulerian and Hamiltonian differ in one way. Hamiltonian path is always self-avoiding because it visits each vertex exactly once. However being self-avoiding of a Eulerian path depends on the degrees of vertices.
Another metric of a network is components. In weakly connected components every node can be reached from every other node. In strongly connected components each node must be reached from every other node within the component.
Final thing that is covered is the eigenvalues and eigenvectors. We calculated the eigenvalues and eigenvectors of a matrix from the given formulas.

Thursday, September 2, 2010

Lecture 4 - Mathematics of Networks

In today's lecture we learned about the Mathematics of Networks. Networks are composed of points (vertices, nodes, sites, etc), lines (edges, links, bonds, etc), and a Domain (math, coputer science, physics, etc). We learned about the relations of nodes to each other in directed and undirected networks and how the edges can have different meaning such as weight, ranking, type, etc. Edges can also be negative and it can be confusing deciding what they mean. Talked about Adjacency matrices and how to create them from a directed or undirected graph, and went over adjacency lists, and matrix notation. Talked about finding the degree of nodes and doing matrix multiplication on Bipartite networks and how to collapse a two-mode network to a one mode-network.

Networks are equivalent to graphs and are graphed in many different ways depending on the type on network such as Planar graphs, tree graphs, and Hyper graphs.

Planar graphs are graphs that can be drawn without any of the edges crossing.
Tree graphs are undirected and contain no cycles.
Hyper graphs are graphs that join more than two nodes at a time (hyperEdge).

Wednesday, September 1, 2010

Lecture 3-Project Descriptions and Internet measurements

08.31.2010-
First part of the lecture was about the final projects which are expected to be a professional paper at the end of the semester. We learned what it should be and not be in terms of network model, analysis and theory development. Project studies should include not only a literature review or just a measurement of network characteristics, but also a well interpretation of measurement results, visualization and comparison with other networks.

In the second part of the lecture we dealt with the internet measurements. Although complex networks cover many wide areas in addition to computer networks, internet is a very characteristic example of complex networks. For a better understanding of itself and due to many social, economic and technical issues, internet topology measurements is of crucial importance. Traceroute program is a basic way of discovering internet topology. By incrementing the TTL value at each node, it tries to figure out all the intermediate nodes between a source and a destination. By probing many destinations with traceroute, interconnection of routers can be detected. However, there are many problems with traceroute, like unresponsive routers and alias resolution problem. Such problems may cause incorrect mapping of the internet topology.
Traffic measurement is also an important part of internet measurements. Finally, spread characteristics of well-known computer viruses has been demonstrated in order to explain the importance of internet monitoring and traffic measurements. We saw that within several minutes they can cover most of the vulnerable hosts on the world.