Monday, December 13, 2010

Lecture #27: Network Dynamics


Lecture is about network dynamics, briefly, analysis of the changes that occur
in the network in time. These changes include appearance and disappearance of nodes,
links and also some aggregate metrics of the network. In addition to the analysis
of the known changes, network dynamics deals with predictions about future changes
in the network, like formation and dropping of new edges.

First study mentioned in the class was "Empirical Analysis of an Evolving Social Network".
Paper analyzes a university e-mail network. Ties are weighted with the number of messages
within the last 60 days between the pairs. One major result is that, individuals sharing at
least one class are 3-times more likely to start emailing each other if they have an email contact
in common. Probability significantly increases as number of ties or shared classes increase.
Next part was about the Team Assembly Mechanisms and Evolution of Affiliation Network
Related to Social Network. Examples studies in this field are Broadway Musical Industry and
Collaboration Networks in different sciences. Interesting findings can be summarized as follows;
If your friends join, so will you.
Cliquish communities grow more slowly.

Second part of the lecture was about the results of the affects described in the first.
As the nodes and links change over time, how does the degree distribution,
clustering coefficient and average shortest path change. Graph Densification is analyzed
for physics citations, patent citations, autonomous systems and affiliation networks.
A mathematical model is given. The main result is that number of edges are growing faster
than nodes, so average degree is increasing. Similar studies are carried out for the
diameter change.

In the final part, possible explanation of densification, community structure and difficulty of cross linking, and forest fire model is discussed.

Tuesday, December 7, 2010

Lecture #26: Percolation and Network Resilience

This lecture’s topic was percolation and network resilience. We want to answer the questions of if a given fraction of nodes or edges are removed from a network , how large the connected components are or what the average distance between nodes in the components is. While removing nodes or edges we should consider some points. In bond percolation each edge is removed with probability (1-p). Causing the most damage to the network with the removal of the fewest edges is targeted attack. Usually edges with high betweenness are the targets to make the most damage to network.
Percolation threshold in an Erdos-Renyi graph is the point at which the giant component emerges.
Another fact about the resilience is scale-free networks are resilient with respect to random attacks. However they are not resilient against targeted attacks. Removal of successfully chosen a small number of nodes may cause many more nodes (their neighbors) to be disconnected to the network.
In power grid structural networks each node has a load and a capacity that says how much load it can tolerate. When a node is removed from the network its load is distributed to the remaining nodes. If the load of a node exceeds its capacity then the node fails. Hence we can conclude that network resilience depends on topology of the network.

Colloquium: Epidemic Spreading in Social Networks by Dr Kitsak/December 2nd 2010

The colloquium topic was information diffusion in networks. Dr. Kitsak mentioned about diseases that spread very fast and killed lots of people in the past and recent years also. The most important point here is to understand how information (or disease) spreads based on network structure and what strategies can be applied to overcome the diseases very fast.There are many things that can influence the spread of information within a network. The three main factors in diffusion are network structure (degree distribution, clustering, connected components, etc.), strength of ties (frequency of communication and strength of influence), and spreading agents. A strong tie can be defined by frequent contact, affinity, or having many mutual contacts.

Dr. Kitsak also mentioned three key models for infection. These are SI, SIS and SIR. SIR stands for Susceptible -> Infected -> Removed/Recovered. Each node in the network can be classified by one of these terms in the SIR model. If a node is susceptible, then it can be infected. An infected node is self-explanatory. A removed node means it was infected and removed from the network. Or, a node can recover and build a natural immunity. The SI model is simply Susceptible -> Infected. The definitions remain the same, but a node is unable to recover and therefore remains infected. The SIS model is Susceptible ->Infected -> Susceptible. One of the important network structure that disease can be spread is worldwide airport network. Dr. Kitsak also mentioned about network immunization. One vaccine strategy may be instead of vaccinating all people vaccinate a threshold part so the community can be divided into clusters and virus can't jump to clusters. Dr. Kitsak also told that our community is scale free network.

So who are the most influential spreader in the network?K-shell is the most robust spreading efficiency indicator. So if we want a disease spread quickly we should find different K-shells at different places in the world.As a conclusion Dr. Kitsak pointed out these factors:

1-Almost no epidemic threshold in Scale-Free Networks.
2- Efficient Immunization Strategy: Immunize at least critical fraction fc of nodes so that only isolated clusters of susceptible individuals remain.
3- Immunization strategy is not reciprocal to spreading strategy.
4-Influential spreaders (not necessarily hubs) occupy the innermost k-cores.

Monday, December 6, 2010

Metric computations tools

I have uploaded two tools to computer basic network metric under Course Content. These are useful if you have very large networks that Pajek can not process.

One of the tools is just for betweenness centrality and the other includes a set of metrics and also draws results of some.

Monday, November 29, 2010

Lecture #24: Search in Structured Networks

The lecture talks about searching through structured networks.  It was found through some experimentation that if the network followed a power law it was more searchable. If the network followed a Poisson graph then the network was not as searchable because in this case all the nodes have almost the same degree and all the links are distributed randomly. The most effective way to search a Gnutella network is to search through the highest degree neighbor of the nodes.

The next topic in the lecture is about how people find shortest paths. The strategy to accomplish this is usually a simple greedy algorithm wherein each participants picks a correspondent based on how close they are to the target. In a research carried out by a few researchers to test the accuracy of small world it was found that participants are not very good in routing messages by using the shortest path method as they use local information only. [slide 14-17]

The next topic is about testing a few search strategies on social networks.  In order to perform these tests the email correspondence over labs at HP was monitored over 3.5 months and a network was constructed with nodes such that edges were constructed between individuals who sent emails to each other. It was found that the degree distribution of email senders followed power law. If we considered the filtered network wherein only the participants who sent 6 messages each way were considered, degree distribution followed Poisson distribution and it took 40 steps on average to reach the desired target. In the next strategy, the geographical location of the offices of the participants was considered. It was found that 87% of the 4000 email links were between participants of the same floor. When the organizational hierarchy was considered it was found that hierarchy search was faster than geographical search.

Some research was conducted on virtual community called LiveJournal.  When the degree distribution was observed it followed a log normal distribution rather than power law. The result of a simple greedy algorithm was observed and it was found that 80% of the chains were completed with a slight modification to the basic greedy algorithm search. When the geographical basis of friendships was considered the average user has approximately 2.5 non geographic friends and approximately 5.5 friends based on 1/distance relationship. It was also found that the probability of a person knowing another person doesn’t depend on the absolute distance between the two people but on the relative distance. It is considered that a social network is searchable if a certain fraction r of the messages reaches the target.

Hence, it can be concluded a search can be performed more effectively in a network if the weak ties were also considered in the process and if sophisticated strategies are used.

Wednesday, November 24, 2010

Lecture #25: Information Diffusion

Today's topic was information diffusion in networks. There are many things that can influence the spread of information within a network. The three main factors in diffusion are network structure (degree distribution, clustering, connected components, etc.), strength of ties (frequency of communication and strength of influence), and spreading agents. A strong tie can be defined by frequent contact, affinity, or having many mutual contacts. One important study we looked at was about the strength of weak ties. Despite the name, weak ties can be very useful in spreading information. In the study, the author found that over 55 percent of people in the study found a job through a weak tie. They defined this weak tie as having contact with the person less than two times a week, but more than once a year.

In the lecture, we also discussed two key models for infection. These are SIR and SI. SIR stands for Susceptible -> Infected -> Removed/Recovered. Each node in the network can be classified by one of these terms in the SIR model. If a node is susceptible, then it can be infected. An infected node is self-explanatory. A removed node means it was infected and removed from the network. Or, a node can recover and build a natural immunity. The SI model is simply Susceptible -> Infected. The definitions remain the same, but a node is unable to recover and therefore remains infected.

Friday, October 29, 2010

How to give a talk

The following slides include very good points about how to give good research talks.

http://cel.archives-ouvertes.fr/cel-00529505/en/

http://www.cs.utexas.edu/users/dahlin/professional/badTalk.pdf

http://www.crhc.illinois.edu/wireless/talks/howto.ppt

http://www.cs.ucr.edu/~michalis/TECHWRITING/presentation-20.html

Lecture #17: Presentation by Anusha & Random Networks

Anusha gave her presentation on Food Webs. Food Webs are complex networks formed by on e species of animals feeding on the other species. She is exploring 16 species of which target species that when extinct can cause a large number of species of animals to go extinct or or near extinct. Food webs are more robust in case of random species removal than in case of removal of species with many links to other species. Food webs with high connectance displays high sensitivity from the beginning and loss of any species causes reduction in robustness irrespective of their connections.


Professor talked about Random Networks
G(n,m) is a random graph with n nodes and m edges. How do we generate one?
Randomly create nodes
Randomly select 2 of those nodes and connect with an edge. Repeat this step.
May end up with directed or undirected so purely random. Another way is to randomly create a node and for each node after the 1st you connect it to a random node. End up with n-1 edges and a directed graph, so not purely random.

P(G) = P^m (1-P)^((2^n)-m)
m=(n/2)P
P = m/(n/2) is the probability of having an edge. Includes self loops, but not double edges.
= (2/n)P
= 2m/n = (2(n/2)P)/n = (n-1)P
C = /n-1 goes to zero as the size of the network increases.

Random graphs lack high clustering & degree.
Key Points: Generation Model

Wednesday, October 27, 2010

Lecture 18: Small World Networks

This lecture presented an overview of issues pertaining to small world networks. In particular, small world phenomenon, small world network models, and the formation and occurrences of small world networks were discussed. As an introduction, we were presented with Milgram's experiment which was designed to describe the small world phenomenon. Essentially, although most individuals are not directly connected with one another, each individual can be reached from another through a relatively small number of hops over a network. That is, we are all closer than we think. It was concluded by Milgram that we are connected to any other person on Earth through roughly 6 hops. It is because of this that the term "6 degrees of separation" was coined.

Another criteria for defining small world networks is the natural occurrence of clustering. As example, a short list of social networking sites were presented. Such sites included Facebook, LinkedIn, and VisiblePath. The idea of clustering from a social networking perspective comes from people being drawn towards each other if they are relatively close to each other with respect to the network graph. That is, if people share more common interdependences with each other, they tend to cluster together. Small world networks can thus be simply characterized as having high clustering and low average shortest path.

Next, two social network models were presented: the Watts/Strogatz model and Kleinberg model. These models were made in an attempt to address the characterization of social networks. The Watts/Strogatz model bases their research off of the observation that your friends' friends tend to be your friend as well.

Lastly, a short discussion on the formation of social networks was made. The general idea stems from social theory in that people each have their own set of weighted criteria used in determining friendships or an interest to pursue a friendship. Social networks are very heterogenous in that people are always changing their minds on what it takes for someone to be their friend. Lastly, it is suggested that weak ties are actually very useful in connecting an individual with a larger set of individuals. For example, recent research suggests that those that you don't know very well typically play a largely important role in broadening a social circle,e.g., jobs, new friends, new interests, etc.

In summary, small world networks may evolve from different constraints such as navigation, constraint optimization, and group affiliation.

Paper review

Methodology papers are posted on WebCt.

Lecture #14 Presentations by Chris Zachor and Daniel Ramos

Chris Zachor’s lecture is about Network analysis of the SourceForge community. At the beginning, he did an introduction of open source software and the SourceForge Community with a website to help promote collaboration between developers of OS projects, a repository for projects, developers and Users. Multiple networks can be formed from the SourceForge community. Project-Developer network, Developer network and project network are given as examples in this lecture. Then is the instruction of related work including Co-authorship between department at UCD (University College Dublin) and network analysis of SourceForge which is done by Gao and Madey. While previous studies were focused on growth and why the process is a success, this study will focus on how key developers and groups play a part in creating popular software.

The title of Daniel Ramos’e lecture is The Rise of Smart Devices on Mobility in Wireless Network. The lecture started with the definition of wired network and wireless network and the main characteristic of both of them. Structure is fixed for wired network and structure is dynamic and changes often for wireless network. The wireless network has lower bandwidth and quality of service than wired networks and the environment can be affected easily. Following is an instruction of Mobile Smart Devices. Mobility studies are important because of the quality of service, bandwidth utilization and roaming problems. Previous studies are focused on simulated mobility models and real data. Study about Modeling Roaming in Large-scale Wireless Networks using Real Measurement and Access and mobility of wireless Personal Digital Assistants (PDA) users have been done. Evolutionary topology model and campus waypoint model were used in the studies. As the conclusion of this lecture we know that studying mobility is important to network protocol evolution, highly mobile devices are already changing both computer and cellular network usage and more actual data and realistic models are needed as the conclusion of the lecture.

Tuesday, October 19, 2010

Lecture 15 - Presentations by Engin and Esra

This lecture started by Engin's presentation about Network Interaction and animation. He first explained the geo-localization term which is mapping network objects with respect to their physical location. Geo-localization is important in terms of providing many new location aware services, and it helps to manage and diagnose network. After explaining the geo-localization term and its properties he focused on explaining the main goal of his project. The main goal of his project is building a network interaction and animation tool which will obtain location of routers and visualize them. This will be able to display properties of the network objects and generate random traffic flow. It will allow user to change network parameters and observe how these parameters effect the network performance. He talked about related works and missing parts in the related works (Dimes, Internet 2, NAm). In conclusion this work will be a game of network structure which will allow user to play with the parameters of the network to see the performance of the network. This project can also be used for training in different companies, ISPs and educational institutions.

Next, Esra presented her project about Gene Expression Networks. She explained the term of stress which is the factor that affects the plant grow in the natural environment.Every organism gives different responses to different stresses they are subjected to and they have to improve themselves by adapting changes in environment and to other stress conditions. In general we can say that the plants are affected adversely by these conditions and thus as a result human life will also be affected by the stresses that plants encounters. The main goal of this project is to analyze the stress tolerance of crops; rice, wheat and maize under drought, high salinity and low temperature and study the sets of genes that give any response to these stresses separately. For every stress condition there will be an undirected graph whose vertices correspond to genes, and the vertices of two genes are connected by an edge if their expressions are correlated. Once this network is constructed she will apply the metrics of complex network theory to analyze the network.

Monday, October 18, 2010

Lecture 16 - Presentations by Austin and Guoxun

In this lecture, Austin presented his project about Image Tagging Networks. Tags are keywords that link to a resource like an image, video, web page, or blog. They are usually entered by users without using a controlled vocabulary. They help to improve search, organization, spam detection, and repudiation systems. Austin will focus on Flickr and using its API to see if there is any correlation between image popularity and tags. Previous work consists of Yahoo Research and the UC Berkley School of Information. They found that most tags were unclassified, but those that were contained location or "other" keywords.

Next, Guoxun presented his project on aerosol networks. Aerosol is an important study because many pollutants are aerosols and they can have an affect on climate change, health, and the economy. Aerosols are an important global problem because they can travel on trade winds. Guoxun will focus on California and how ozone is transported across the state. Previous work consists a study of Clark County Air Quality in March of 2006 by Robert A. Baxter of T&B Systems. The concentration of ozone increased during commuter hours and was spread widely across the state by winds.

Thursday, October 14, 2010

Lecture #13 : Presenatations by Akgun and Sevincer

In this lecture 2 of our friends presented their project.
Abdullah Sevincer's project is about Funding Network. Funding Networks nodes are government agencies as funding part to authors, their universities and funding as funded part. Funding network has been worked on however such works are not sufficiently detailed. Sevincer will use NSF as basic funding institution because it declares information about funds. He will use NSF data and try to find some relations between authors and fundings, universities and fundings and so on.


Akgun, on the other hand , will work on a Subnet Level Internet Topology Generator. He mentioned that Subnet level topology generator has not been issued yet. Firstly, he explained the need for internet topology generators. Network researchers need realistic internet topologies to run a performance study on their newly designed protocols. Mapping of real internet topologies has some difficulties and it is a time consuming job. Synthetic but realistic internet topology generators have a lot of use in this area. Secondly, he presented a detailed related work section including previous implementations of router-level and AS-level internet topology generators. Finally he talked about his design objectives. A realistic and highly configurable subnet level topology generator will be designed.

Wednesday, October 6, 2010

Lecture #12

The lecture started with Jeff giving his related works presentation on predicting eventual friendships. Through the use of cultural algorithms, katz centrality, and social theory, he wants to predict who you will become friends with. He finds that one of the key problems is current methods only account for a 1 to 2 hop distance in the network. By this account, you could only become friends with a friend of a friend. This, of course, is just a brief overview of some of what Jeff covered in class. For more information, check out his slides on the main CS 790g website.


After that, Dr. Gunes gave a lecture on Graph Data Mining. As mentioned in class, graph pattern mining can be used to do a number of things, including but not limited to: mining biochemical structures, program control flow analysis, and mining XML structures or web communities to name a few. A few of the graph mining algorithms mentioned in the lecture are Incomplete Beam Search, Inductive Logic Programming, and Graph theory-based approaches (this included the Apriori-based approach and the pattern-growth approach.) We covered the two graph theory-based approaches in class.


Two important definitions to remember from this lecture were frequent and support. The support of a graph g is pretty much how often the graph occurs within the larger structure. Meanwhile, a graph (or subgraph) is considered frequent if its support in a given dataset is no less than a minimum support threshold.

Tuesday, October 5, 2010

Lecture 11: Large-scale structure of networks

In this lecture, various aspects and natural phenomenon of large scale networks were observed and discussed. In particular, we were interested in components within networks, their relationships among nodes, and shortest paths. One question arised in which we asked "Why would there be a giant component in a network?" We can answer this by considering the purpose of a network: to connect to others. Networks exist to provide communication between all inhabitants residing in these networks. Further, we examined the possibility of a network containing 2 largest components. We deduced this to be impossible as the probability of randomly added edges to the nwtwork is far too high in which a network can only contain either 1 or no largest component.

Another discussion in this lecture was the consideration of shortest paths and the small world effect. Analyzing shortest paths are a commonly computed measurement of networks. When considering shortest paths, we must also discuss the total number of nodes within a network and the small world effect. Essentially, as the number of nodes grow within random networks, the average path length, l, between nodes do not undergo a dramatic change. That is, most random networks have their average path lengths related to a logarithmic scale: l ~ log(n). It is because the fact that l does not change very much with respect to a dramatic increase in the nodes in a network that we define the small world effect. For example, in the current state of the internet, nodes can typically reach other nodes within 10 to 12 hops. Similarly, any human on Earth should be able to reach another human within 6 aquaintances. The small world effect essentially says we are all closely connected than we think. Further, if a network is a power law network, we will get l ~ log(log(n)).

Lastly, our lecture concluded with a discussion on hop distribution, i.e., how many nodes can be reached in n hops. When considering distribution, most centrality measures are similar to the power law except betweeness centrality (on the AS-level). If we examine the graph, we will notice a straight, veritcal line at the beginning of the plot. This indicates all the nodes (ASes) and their betweeness centrality. These nodes are stub networks such as universities that are connected to the network but do not necessarily have anyone passing through them and thus, not serving as a shortest path to anyone--they are on the edge of the network. Further, we observe the internet to lack redundancy as there are not many triangles in which the cluster coefficient is actually less than its expected value.

Lecture 10: Power laws

The last Tuesday, we learned Power laws distribution. To understand the character of Power laws distribution, we compare it with normal distribution. Normal distribution has a peak and do not has heavy tail. For example, heights of human males follow the normal distribution with a peak around 180 cm. The power laws distribution is a straight line with negative slop in the log-log scale and it has a heavy tail. For example, the city population size is a power laws distribution, and the number of the city which has a huge population is very small, but many small towns. And the equation of power laws distribution is:

ln⁡(p(x))=c-aln(x)

Where p(x) is the probability of observing an item of size x; c and are two constants.
Power laws distribution is everywhere. Work frequency, web hits, book sold, telephone calls received and earthquake magnitude all follow the power distribution. During the analysis the power laws distribution, the most common and not very accurate method is do the linear fit line of the whole range in the log-log scale to get the value of a and c. But sometimes, because of the barrier condition or the noise in the heavy tail, this method will cause a big error. There are many way to solve this problem.

First solution is logarithmic binning. Bin data into exponentially wider bins and normalized by the width of the bin. This method can get a good fit line with information lost. Second solution is cumulative binning. The cumulative probability of a power laws probability distribution is also power laws but with an exponent a-1. No loss of information of this method. Third way to do this is chose a good start point to do the fitness, like the distribution of citations to papers, power laws is evident only in the tail. The fourth way is maximum likehood fitting, it is the best way. But to use this method you have to make sure that you have a power laws distribution.

A power laws looks the same no matter what scale we look at it. We call this is scale free.

At the end of our class, to understand the power laws distribution further, we learn a lot of example with power laws distribution and other distribution, like cumulative distribution, power-law with an exponential cutoff, zipf’s law and pareto principle.

Monday, September 27, 2010

Lecture 8 - Communities

We talked about the communities this week. Community is a set of actors interacting with each other. If a there is a set of people without interaction we cannot say they form a community.
There are four community detection methods; Node-centric community, group-centric community, network-centric community, hierarchy-centric community.
Each node in a group satisfies certain properties. We make node-centric community detection based on these properties. This method is commonly used in traditional social network analysis. In this approach nodes satisfy different properties. One is complete mutuality. A clique is a maximal complete subgraph of three or more nodes all of which are adjacent to each other. Reachability of members is another issue considering nodes. Any node in a group should be reachable in k hops. A k-clique is a maximal subgraph in which the largest geodesic distance between any nodes is smaller than or equal to k. A k-club graph is a substructure of diameter smaller than or equal to k. In node-centric community each node should have a certain number of connections to nodes within the group. In a k-core graph each node connects to at least k members within the group.
To be able to make group-centric community detection we consider the connections within a group as whole.
To form a group we need to consider the connections of the nodes globally in network-centric community detection. We partition the network into disjoint sets. While partitioning we consider some features like node similarity,latent space model, block model approximation, cut minimization and modularity maximization. Node similarity is defined by how similar their interaction patterns are. Two nodes are structurally equivalent if they connect to the same set of actors. For huge networks we apply k-node clustering algorithms to seperate the network.
Finally the goal in hierarchy-centric community detection is to build a hierarchical structure of communities based on network topology.

Thursday, September 23, 2010

Lecture 9: Measures and Metrics

Transitivity is way of deriving path between two nodes using existent paths. Sure, there is a rule for this derivation. Rule can be visualized as follows: There exist 3 nodes A,B and C in our network. If there exists path from A to B and B to C , then we can conclude that path from A to C also exists.

Clustering coefficient can be defined as
c= (# of triangles x 3)/# of connected triples
For a clique(every node is connected to each other) , c equals 1. Clustering coefficient is used to understand network's structure such that if c is greater than 0.001 (this is an approximate number) that network can be defined as dense network.
For a node in network, complexity coefficient can also be calculated using following formula
C(i)= (#of neighbors of i that are connected)/(#of pairs of neighbors of i)

Reciprocity in networks is defined as mutual edges between two nodes. That is, if there exists directed edge from node A to Node B, then edge from node B to node A must exist.

Similarity is another issue between nodes in network that we mentioned. There are 3 types of similarity.
  1. Equal Similarity: Nodes have exactly same neighbors.
  2. Structural Similarity: Nodes have many common neighbors. Used for measuring nodes' structural closing. Cosine similarity is used to calculate.
  3. Regular Equivalence: Nodes have different nodes, however structure of neighbors are similar.
There are also other theorems that measure similarity using different parameters such as Euclidean Similarity and Pearson Similarity

Thursday, September 16, 2010

Project ideas

You may look at recent conferences such as NetSci to get idea of what the researchers are doing in the complex networks area.

Lecture 7- Centrality continued

The lecture started with an outline of previous lecture. Then EigenVector Centrality was talked about. In the above concept if a node is connected to more central node then the node also assumes some importance. So, in case of Eigen Vector centrality a vertex is given a score proportional to the sum of its neighbor’s scores. Eigen Vector centrality can be calculated in case of directed graphs as well.

Next, Katz centrality is talked about. In this concept each vertex is given a certain amount of centrality without considering its neighbors and their centrality. The equation for Katz centrality is

X= b(I - aA)-1.1

Where I is an identity matrix, 1 is a matrix of all 1’s, a is the scaling vector and b is the extent to which we weigh the centrality of people ego is tied to.

Next concept is that of page rank. The gist behind this concept is that if numerous pages on the web are referring to a particular web page then the later is said to have high importance, but, we cannot consider all the referring pages to be equal. Therefore, page rank uses the concept called as tracking a drunk. If a drunk person is walking through a network he would be spending sometime at each node which is proportional to the importance of that node. The drunk person must be allowed to teleport to some other node with some probability. This is similar to a websurfer visiting various web pages spending some time viewing each webpage and then clicking on a link in that web page and then after a while the person can move to some random webpage with some probability (teleporting). Using this concept the formulation for page rank algorithm is written.

The next topic discussed was Hubs and Authorities. Hubs are nodes that refer to good webpages and Authorities are webpages referred to by good Hubs. In the Hyperlink Induced Topic Search algorithm (HITS), we can start with a set of pages matching a particular query. We expand this set by following all the links on the initial set of web pages. We construct a transition matrix E where in Eij=1/ni, where there is a link between I and j and ni is the number of links from i. The authority and the hub scores can be calculated as a’=ETh and h’= Ea.

Friday, September 10, 2010

Lecture 6 - Network Centrality

In today's lecture we learned about network centrality. Centrality is a broad term that has a few different meanings based on the context in which it is used. If we know the structure of a network, then we can determine certain quantities or measures of a vertex's importance within the graph. We covered the first three of seven measures: degree centrality, betweenness centrality, and closeness centrality.

Degree centrality is the measure of how many edges are connected to a node. It is good for determining how important a node is based on its connections to other nodes. For instance, how many people you can access for information, the influence of an article in terms of citations, or the probability a computer will catch a virus based on its network connections. Degree centrality can be determined by Freeman's general formula for centralization. It is also important to normalize the centrality so that meaningful comparisons can be made between networks of different sizes.

Betweenness centrality is the measure of how many shortest paths involve a specific vertex. It is good for determining how important a node is in controlling the "flow" of a graph. Betweenness centrality can be determined by summing the number of geodesic paths connecting two vertices (j and k) that pass through the vertex in question (i) divided by all paths between j and k. If there is more than one node on a geodesic path between j and k, they will have to "share credit."

Closeness centrality is the measure of how many shortest paths a vertex has to all other vertices in the graph. It is good for determining how "close to the center" of a graph a vertex is. Closeness centrality can be determined by taking the inverse of the average geodesic distance from the vertex in question to all other vertices. Values of closeness tend to span a small range and there is little chance that a change in the graph will affect closeness to a great degree.

Wednesday, September 8, 2010

Lecture 5 - Mathematics of Networks

In today's lecture we continued learning about the Mathematics of Networks. We first learnt what a path is and continued with different types of paths. Self-avoiding paths do not intersect themselves. While tracing a path we must be careful if the edges are directed. Shortest path is the shortest sequence of links connecting two nodes. We also learnt how to calculate the path length (L), average path length (N) and diameter (D) of a network. According to formulas, we can get the relation between them as 1< L<= D< N. Two known types of paths Eulerian and Hamiltonian differ in one way. Hamiltonian path is always self-avoiding because it visits each vertex exactly once. However being self-avoiding of a Eulerian path depends on the degrees of vertices.
Another metric of a network is components. In weakly connected components every node can be reached from every other node. In strongly connected components each node must be reached from every other node within the component.
Final thing that is covered is the eigenvalues and eigenvectors. We calculated the eigenvalues and eigenvectors of a matrix from the given formulas.

Thursday, September 2, 2010

Lecture 4 - Mathematics of Networks

In today's lecture we learned about the Mathematics of Networks. Networks are composed of points (vertices, nodes, sites, etc), lines (edges, links, bonds, etc), and a Domain (math, coputer science, physics, etc). We learned about the relations of nodes to each other in directed and undirected networks and how the edges can have different meaning such as weight, ranking, type, etc. Edges can also be negative and it can be confusing deciding what they mean. Talked about Adjacency matrices and how to create them from a directed or undirected graph, and went over adjacency lists, and matrix notation. Talked about finding the degree of nodes and doing matrix multiplication on Bipartite networks and how to collapse a two-mode network to a one mode-network.

Networks are equivalent to graphs and are graphed in many different ways depending on the type on network such as Planar graphs, tree graphs, and Hyper graphs.

Planar graphs are graphs that can be drawn without any of the edges crossing.
Tree graphs are undirected and contain no cycles.
Hyper graphs are graphs that join more than two nodes at a time (hyperEdge).

Wednesday, September 1, 2010

Lecture 3-Project Descriptions and Internet measurements

08.31.2010-
First part of the lecture was about the final projects which are expected to be a professional paper at the end of the semester. We learned what it should be and not be in terms of network model, analysis and theory development. Project studies should include not only a literature review or just a measurement of network characteristics, but also a well interpretation of measurement results, visualization and comparison with other networks.

In the second part of the lecture we dealt with the internet measurements. Although complex networks cover many wide areas in addition to computer networks, internet is a very characteristic example of complex networks. For a better understanding of itself and due to many social, economic and technical issues, internet topology measurements is of crucial importance. Traceroute program is a basic way of discovering internet topology. By incrementing the TTL value at each node, it tries to figure out all the intermediate nodes between a source and a destination. By probing many destinations with traceroute, interconnection of routers can be detected. However, there are many problems with traceroute, like unresponsive routers and alias resolution problem. Such problems may cause incorrect mapping of the internet topology.
Traffic measurement is also an important part of internet measurements. Finally, spread characteristics of well-known computer viruses has been demonstrated in order to explain the importance of internet monitoring and traffic measurements. We saw that within several minutes they can cover most of the vulnerable hosts on the world.

Tuesday, August 31, 2010

Lecture 2-Introduction to Complex Networks

Today's lecture focused primarily on the fundamentals and basic structure(nodes and links) of complex networks.
nodes:A collection of entities (not computers) which have properties that are somehow related to each other.
links:connection between nodes.

As a computer science student i was thinking that complex networks have only focused on computer networks! We first began discussing about basic structures of the networks (not only computer but also other networks that form connections inside themselves). So, a complex network does not consist of computers or machines but also it can consist of people, forks in rivers, proteins, web pages, organisms etc. It is not simple and straightforward. It is made of multiple parts.Complex networks are large, sparse and they are usually dynamic and evolving.

Looking at these information, it looks like it is graph theory but the emphasis is on data and mechanistic explanations. So, looking at graph theory side, a complex networks structure consists of graphs which can be undirected or directed (looking at the interaction between two neighboring nodes), weighted, connected or disconnected.

As a summary:a complex system consists of large number of components interacting with each other and all components and interactions are different from each other. In this system who interacts with whom can be visualized as a network.

A question may come into mind:why study such complex systems and what is the benefit that we will get?

Answer: They have lots of easily available data and large networks may contain information about basic design principles and the history of the complex system.So looking at one complex network we can understand some other networks which form from the same structure.