Monday, December 13, 2010
Lecture #27: Network Dynamics
Tuesday, December 7, 2010
Lecture #26: Percolation and Network Resilience
Percolation threshold in an Erdos-Renyi graph is the point at which the giant component emerges.
Another fact about the resilience is scale-free networks are resilient with respect to random attacks. However they are not resilient against targeted attacks. Removal of successfully chosen a small number of nodes may cause many more nodes (their neighbors) to be disconnected to the network.
In power grid structural networks each node has a load and a capacity that says how much load it can tolerate. When a node is removed from the network its load is distributed to the remaining nodes. If the load of a node exceeds its capacity then the node fails. Hence we can conclude that network resilience depends on topology of the network.
Colloquium: Epidemic Spreading in Social Networks by Dr Kitsak/December 2nd 2010
Dr. Kitsak also mentioned three key models for infection. These are SI, SIS and SIR. SIR stands for Susceptible -> Infected -> Removed/Recovered. Each node in the network can be classified by one of these terms in the SIR model. If a node is susceptible, then it can be infected. An infected node is self-explanatory. A removed node means it was infected and removed from the network. Or, a node can recover and build a natural immunity. The SI model is simply Susceptible -> Infected. The definitions remain the same, but a node is unable to recover and therefore remains infected. The SIS model is Susceptible ->Infected -> Susceptible. One of the important network structure that disease can be spread is worldwide airport network. Dr. Kitsak also mentioned about network immunization. One vaccine strategy may be instead of vaccinating all people vaccinate a threshold part so the community can be divided into clusters and virus can't jump to clusters. Dr. Kitsak also told that our community is scale free network.
So who are the most influential spreader in the network?K-shell is the most robust spreading efficiency indicator. So if we want a disease spread quickly we should find different K-shells at different places in the world.As a conclusion Dr. Kitsak pointed out these factors:
1-Almost no epidemic threshold in Scale-Free Networks.
2- Efficient Immunization Strategy: Immunize at least critical fraction fc of nodes so that only isolated clusters of susceptible individuals remain.
3- Immunization strategy is not reciprocal to spreading strategy.
4-Influential spreaders (not necessarily hubs) occupy the innermost k-cores.
Monday, December 6, 2010
Metric computations tools
One of the tools is just for betweenness centrality and the other includes a set of metrics and also draws results of some.
Monday, November 29, 2010
Lecture #24: Search in Structured Networks
The lecture talks about searching through structured networks. It was found through some experimentation that if the network followed a power law it was more searchable. If the network followed a Poisson graph then the network was not as searchable because in this case all the nodes have almost the same degree and all the links are distributed randomly. The most effective way to search a Gnutella network is to search through the highest degree neighbor of the nodes.
The next topic in the lecture is about how people find shortest paths. The strategy to accomplish this is usually a simple greedy algorithm wherein each participants picks a correspondent based on how close they are to the target. In a research carried out by a few researchers to test the accuracy of small world it was found that participants are not very good in routing messages by using the shortest path method as they use local information only. [slide 14-17]
The next topic is about testing a few search strategies on social networks. In order to perform these tests the email correspondence over labs at HP was monitored over 3.5 months and a network was constructed with nodes such that edges were constructed between individuals who sent emails to each other. It was found that the degree distribution of email senders followed power law. If we considered the filtered network wherein only the participants who sent 6 messages each way were considered, degree distribution followed Poisson distribution and it took 40 steps on average to reach the desired target. In the next strategy, the geographical location of the offices of the participants was considered. It was found that 87% of the 4000 email links were between participants of the same floor. When the organizational hierarchy was considered it was found that hierarchy search was faster than geographical search.
Some research was conducted on virtual community called LiveJournal. When the degree distribution was observed it followed a log normal distribution rather than power law. The result of a simple greedy algorithm was observed and it was found that 80% of the chains were completed with a slight modification to the basic greedy algorithm search. When the geographical basis of friendships was considered the average user has approximately 2.5 non geographic friends and approximately 5.5 friends based on 1/distance relationship. It was also found that the probability of a person knowing another person doesn’t depend on the absolute distance between the two people but on the relative distance. It is considered that a social network is searchable if a certain fraction r of the messages reaches the target.
Hence, it can be concluded a search can be performed more effectively in a network if the weak ties were also considered in the process and if sophisticated strategies are used.
Wednesday, November 24, 2010
Lecture #25: Information Diffusion
Friday, October 29, 2010
How to give a talk
http://cel.archives-ouvertes.fr/cel-00529505/en/
http://www.cs.utexas.edu/users/dahlin/professional/badTalk.pdf
http://www.crhc.illinois.edu/wireless/talks/howto.ppt
http://www.cs.ucr.edu/~michalis/TECHWRITING/presentation-20.html
Lecture #17: Presentation by Anusha & Random Networks
Professor talked about Random Networks
G(n,m) is a random graph with n nodes and m edges. How do we generate one?
Randomly create nodes
Randomly select 2 of those nodes and connect with an edge. Repeat this step.
May end up with directed or undirected so purely random. Another way is to randomly create a node and for each node after the 1st you connect it to a random node. End up with n-1 edges and a directed graph, so not purely random.
P(G) = P^m (1-P)^((2^n)-m)
m=(n/2)P
P = m/(n/2) is the probability of having an edge. Includes self loops, but not double edges.
C =
Random graphs lack high clustering & degree.
Key Points: Generation Model
Wednesday, October 27, 2010
Lecture 18: Small World Networks
Lecture #14 Presentations by Chris Zachor and Daniel Ramos
The title of Daniel Ramos’e lecture is The Rise of Smart Devices on Mobility in Wireless Network. The lecture started with the definition of wired network and wireless network and the main characteristic of both of them. Structure is fixed for wired network and structure is dynamic and changes often for wireless network. The wireless network has lower bandwidth and quality of service than wired networks and the environment can be affected easily. Following is an instruction of Mobile Smart Devices. Mobility studies are important because of the quality of service, bandwidth utilization and roaming problems. Previous studies are focused on simulated mobility models and real data. Study about Modeling Roaming in Large-scale Wireless Networks using Real Measurement and Access and mobility of wireless Personal Digital Assistants (PDA) users have been done. Evolutionary topology model and campus waypoint model were used in the studies. As the conclusion of this lecture we know that studying mobility is important to network protocol evolution, highly mobile devices are already changing both computer and cellular network usage and more actual data and realistic models are needed as the conclusion of the lecture.
Tuesday, October 19, 2010
Lecture 15 - Presentations by Engin and Esra
Next, Esra presented her project about Gene Expression Networks. She explained the term of stress which is the factor that affects the plant grow in the natural environment.Every organism gives different responses to different stresses they are subjected to and they have to improve themselves by adapting changes in environment and to other stress conditions. In general we can say that the plants are affected adversely by these conditions and thus as a result human life will also be affected by the stresses that plants encounters. The main goal of this project is to analyze the stress tolerance of crops; rice, wheat and maize under drought, high salinity and low temperature and study the sets of genes that give any response to these stresses separately. For every stress condition there will be an undirected graph whose vertices correspond to genes, and the vertices of two genes are connected by an edge if their expressions are correlated. Once this network is constructed she will apply the metrics of complex network theory to analyze the network.
Monday, October 18, 2010
Lecture 16 - Presentations by Austin and Guoxun
Next, Guoxun presented his project on aerosol networks. Aerosol is an important study because many pollutants are aerosols and they can have an affect on climate change, health, and the economy. Aerosols are an important global problem because they can travel on trade winds. Guoxun will focus on California and how ozone is transported across the state. Previous work consists a study of Clark County Air Quality in March of 2006 by Robert A. Baxter of T&B Systems. The concentration of ozone increased during commuter hours and was spread widely across the state by winds.
Thursday, October 14, 2010
Lecture #13 : Presenatations by Akgun and Sevincer
Abdullah Sevincer's project is about Funding Network. Funding Networks nodes are government agencies as funding part to authors, their universities and funding as funded part. Funding network has been worked on however such works are not sufficiently detailed. Sevincer will use NSF as basic funding institution because it declares information about funds. He will use NSF data and try to find some relations between authors and fundings, universities and fundings and so on.
Akgun, on the other hand , will work on a Subnet Level Internet Topology Generator. He mentioned that Subnet level topology generator has not been issued yet. Firstly, he explained the need for internet topology generators. Network researchers need realistic internet topologies to run a performance study on their newly designed protocols. Mapping of real internet topologies has some difficulties and it is a time consuming job. Synthetic but realistic internet topology generators have a lot of use in this area. Secondly, he presented a detailed related work section including previous implementations of router-level and AS-level internet topology generators. Finally he talked about his design objectives. A realistic and highly configurable subnet level topology generator will be designed.
Wednesday, October 6, 2010
Lecture #12
The lecture started with Jeff giving his related works presentation on predicting eventual friendships. Through the use of cultural algorithms, katz centrality, and social theory, he wants to predict who you will become friends with. He finds that one of the key problems is current methods only account for a 1 to 2 hop distance in the network. By this account, you could only become friends with a friend of a friend. This, of course, is just a brief overview of some of what Jeff covered in class. For more information, check out his slides on the main CS 790g website.
After that, Dr. Gunes gave a lecture on Graph Data Mining. As mentioned in class, graph pattern mining can be used to do a number of things, including but not limited to: mining biochemical structures, program control flow analysis, and mining XML structures or web communities to name a few. A few of the graph mining algorithms mentioned in the lecture are Incomplete Beam Search, Inductive Logic Programming, and Graph theory-based approaches (this included the Apriori-based approach and the pattern-growth approach.) We covered the two graph theory-based approaches in class.
Two important definitions to remember from this lecture were frequent and support. The support of a graph g is pretty much how often the graph occurs within the larger structure. Meanwhile, a graph (or subgraph) is considered frequent if its support in a given dataset is no less than a minimum support threshold.
Tuesday, October 5, 2010
Lecture 11: Large-scale structure of networks
Another discussion in this lecture was the consideration of shortest paths and the small world effect. Analyzing shortest paths are a commonly computed measurement of networks. When considering shortest paths, we must also discuss the total number of nodes within a network and the small world effect. Essentially, as the number of nodes grow within random networks, the average path length, l, between nodes do not undergo a dramatic change. That is, most random networks have their average path lengths related to a logarithmic scale: l ~ log(n). It is because the fact that l does not change very much with respect to a dramatic increase in the nodes in a network that we define the small world effect. For example, in the current state of the internet, nodes can typically reach other nodes within 10 to 12 hops. Similarly, any human on Earth should be able to reach another human within 6 aquaintances. The small world effect essentially says we are all closely connected than we think. Further, if a network is a power law network, we will get l ~ log(log(n)).
Lastly, our lecture concluded with a discussion on hop distribution, i.e., how many nodes can be reached in n hops. When considering distribution, most centrality measures are similar to the power law except betweeness centrality (on the AS-level). If we examine the graph, we will notice a straight, veritcal line at the beginning of the plot. This indicates all the nodes (ASes) and their betweeness centrality. These nodes are stub networks such as universities that are connected to the network but do not necessarily have anyone passing through them and thus, not serving as a shortest path to anyone--they are on the edge of the network. Further, we observe the internet to lack redundancy as there are not many triangles in which the cluster coefficient is actually less than its expected value.
Lecture 10: Power laws
ln(p(x))=c-aln(x)
Where p(x) is the probability of observing an item of size x; c and are two constants.
Power laws distribution is everywhere. Work frequency, web hits, book sold, telephone calls received and earthquake magnitude all follow the power distribution. During the analysis the power laws distribution, the most common and not very accurate method is do the linear fit line of the whole range in the log-log scale to get the value of a and c. But sometimes, because of the barrier condition or the noise in the heavy tail, this method will cause a big error. There are many way to solve this problem.
First solution is logarithmic binning. Bin data into exponentially wider bins and normalized by the width of the bin. This method can get a good fit line with information lost. Second solution is cumulative binning. The cumulative probability of a power laws probability distribution is also power laws but with an exponent a-1. No loss of information of this method. Third way to do this is chose a good start point to do the fitness, like the distribution of citations to papers, power laws is evident only in the tail. The fourth way is maximum likehood fitting, it is the best way. But to use this method you have to make sure that you have a power laws distribution.
A power laws looks the same no matter what scale we look at it. We call this is scale free.
At the end of our class, to understand the power laws distribution further, we learn a lot of example with power laws distribution and other distribution, like cumulative distribution, power-law with an exponential cutoff, zipf’s law and pareto principle.
Sunday, October 3, 2010
Monday, September 27, 2010
Lecture 8 - Communities
There are four community detection methods; Node-centric community, group-centric community, network-centric community, hierarchy-centric community.
Each node in a group satisfies certain properties. We make node-centric community detection based on these properties. This method is commonly used in traditional social network analysis. In this approach nodes satisfy different properties. One is complete mutuality. A clique is a maximal complete subgraph of three or more nodes all of which are adjacent to each other. Reachability of members is another issue considering nodes. Any node in a group should be reachable in k hops. A k-clique is a maximal subgraph in which the largest geodesic distance between any nodes is smaller than or equal to k. A k-club graph is a substructure of diameter smaller than or equal to k. In node-centric community each node should have a certain number of connections to nodes within the group. In a k-core graph each node connects to at least k members within the group.
To be able to make group-centric community detection we consider the connections within a group as whole.
To form a group we need to consider the connections of the nodes globally in network-centric community detection. We partition the network into disjoint sets. While partitioning we consider some features like node similarity,latent space model, block model approximation, cut minimization and modularity maximization. Node similarity is defined by how similar their interaction patterns are. Two nodes are structurally equivalent if they connect to the same set of actors. For huge networks we apply k-node clustering algorithms to seperate the network.
Finally the goal in hierarchy-centric community detection is to build a hierarchical structure of communities based on network topology.
Thursday, September 23, 2010
Lecture 9: Measures and Metrics
- Equal Similarity: Nodes have exactly same neighbors.
- Structural Similarity: Nodes have many common neighbors. Used for measuring nodes' structural closing. Cosine similarity is used to calculate.
- Regular Equivalence: Nodes have different nodes, however structure of neighbors are similar.
Thursday, September 16, 2010
Project ideas
Lecture 7- Centrality continued
The lecture started with an outline of previous lecture. Then EigenVector Centrality was talked about. In the above concept if a node is connected to more central node then the node also assumes some importance. So, in case of Eigen Vector centrality a vertex is given a score proportional to the sum of its neighbor’s scores. Eigen Vector centrality can be calculated in case of directed graphs as well.
Next, Katz centrality is talked about. In this concept each vertex is given a certain amount of centrality without considering its neighbors and their centrality. The equation for Katz centrality is
X= b(I - aA)-1.1
Where I is an identity matrix, 1 is a matrix of all 1’s, a is the scaling vector and b is the extent to which we weigh the centrality of people ego is tied to.
Next concept is that of page rank. The gist behind this concept is that if numerous pages on the web are referring to a particular web page then the later is said to have high importance, but, we cannot consider all the referring pages to be equal. Therefore, page rank uses the concept called as tracking a drunk. If a drunk person is walking through a network he would be spending sometime at each node which is proportional to the importance of that node. The drunk person must be allowed to teleport to some other node with some probability. This is similar to a websurfer visiting various web pages spending some time viewing each webpage and then clicking on a link in that web page and then after a while the person can move to some random webpage with some probability (teleporting). Using this concept the formulation for page rank algorithm is written.
The next topic discussed was Hubs and Authorities. Hubs are nodes that refer to good webpages and Authorities are webpages referred to by good Hubs. In the Hyperlink Induced Topic Search algorithm (HITS), we can start with a set of pages matching a particular query. We expand this set by following all the links on the initial set of web pages. We construct a transition matrix E where in Eij=1/ni, where there is a link between I and j and ni is the number of links from i. The authority and the hub scores can be calculated as a’=ETh and h’= Ea.
Friday, September 10, 2010
Lecture 6 - Network Centrality
Degree centrality is the measure of how many edges are connected to a node. It is good for determining how important a node is based on its connections to other nodes. For instance, how many people you can access for information, the influence of an article in terms of citations, or the probability a computer will catch a virus based on its network connections. Degree centrality can be determined by Freeman's general formula for centralization. It is also important to normalize the centrality so that meaningful comparisons can be made between networks of different sizes.
Betweenness centrality is the measure of how many shortest paths involve a specific vertex. It is good for determining how important a node is in controlling the "flow" of a graph. Betweenness centrality can be determined by summing the number of geodesic paths connecting two vertices (j and k) that pass through the vertex in question (i) divided by all paths between j and k. If there is more than one node on a geodesic path between j and k, they will have to "share credit."
Closeness centrality is the measure of how many shortest paths a vertex has to all other vertices in the graph. It is good for determining how "close to the center" of a graph a vertex is. Closeness centrality can be determined by taking the inverse of the average geodesic distance from the vertex in question to all other vertices. Values of closeness tend to span a small range and there is little chance that a change in the graph will affect closeness to a great degree.
Wednesday, September 8, 2010
Lecture 5 - Mathematics of Networks
Another metric of a network is components. In weakly connected components every node can be reached from every other node. In strongly connected components each node must be reached from every other node within the component.
Final thing that is covered is the eigenvalues and eigenvectors. We calculated the eigenvalues and eigenvectors of a matrix from the given formulas.
Thursday, September 2, 2010
Lecture 4 - Mathematics of Networks
Networks are equivalent to graphs and are graphed in many different ways depending on the type on network such as Planar graphs, tree graphs, and Hyper graphs.
Planar graphs are graphs that can be drawn without any of the edges crossing.
Tree graphs are undirected and contain no cycles.
Hyper graphs are graphs that join more than two nodes at a time (hyperEdge).
Wednesday, September 1, 2010
Lecture 3-Project Descriptions and Internet measurements
Tuesday, August 31, 2010
Lecture 2-Introduction to Complex Networks
nodes:A collection of entities (not computers) which have properties that are somehow related to each other.
links:connection between nodes.
As a computer science student i was thinking that complex networks have only focused on computer networks! We first began discussing about basic structures of the networks (not only computer but also other networks that form connections inside themselves). So, a complex network does not consist of computers or machines but also it can consist of people, forks in rivers, proteins, web pages, organisms etc. It is not simple and straightforward. It is made of multiple parts.Complex networks are large, sparse and they are usually dynamic and evolving.
Looking at these information, it looks like it is graph theory but the emphasis is on data and mechanistic explanations. So, looking at graph theory side, a complex networks structure consists of graphs which can be undirected or directed (looking at the interaction between two neighboring nodes), weighted, connected or disconnected.
As a summary:a complex system consists of large number of components interacting with each other and all components and interactions are different from each other. In this system who interacts with whom can be visualized as a network.
A question may come into mind:why study such complex systems and what is the benefit that we will get?
Answer: They have lots of easily available data and large networks may contain information about basic design principles and the history of the complex system.So looking at one complex network we can understand some other networks which form from the same structure.