Complex Networks: October 2010

Friday, October 29, 2010

How to give a talk

The following slides include very good points about how to give good research talks.

http://cel.archives-ouvertes.fr/cel-00529505/en/

http://www.cs.utexas.edu/users/dahlin/professional/badTalk.pdf

http://www.crhc.illinois.edu/wireless/talks/howto.ppt

http://www.cs.ucr.edu/~michalis/TECHWRITING/presentation-20.html

Lecture #17: Presentation by Anusha & Random Networks

Anusha gave her presentation on Food Webs. Food Webs are complex networks formed by on e species of animals feeding on the other species. She is exploring 16 species of which target species that when extinct can cause a large number of species of animals to go extinct or or near extinct. Food webs are more robust in case of random species removal than in case of removal of species with many links to other species. Food webs with high connectance displays high sensitivity from the beginning and loss of any species causes reduction in robustness irrespective of their connections.

Professor talked about Random Networks
G(n,m) is a random graph with n nodes and m edges. How do we generate one?
Randomly create nodes
Randomly select 2 of those nodes and connect with an edge. Repeat this step.
May end up with directed or undirected so purely random. Another way is to randomly create a node and for each node after the 1st you connect it to a random node. End up with n-1 edges and a directed graph, so not purely random.

P(G) = P^m (1-P)^((2^n)-m)
m=(n/2)P
P = m/(n/2) is the probability of having an edge. Includes self loops, but not double edges.
= (2/n)P
= 2m/n = (2(n/2)P)/n = (n-1)P
C = /n-1 goes to zero as the size of the network increases.

Random graphs lack high clustering & degree.
Key Points: Generation Model

Wednesday, October 27, 2010

Lecture 18: Small World Networks

This lecture presented an overview of issues pertaining to small world networks. In particular, small world phenomenon, small world network models, and the formation and occurrences of small world networks were discussed. As an introduction, we were presented with Milgram's experiment which was designed to describe the small world phenomenon. Essentially, although most individuals are not directly connected with one another, each individual can be reached from another through a relatively small number of hops over a network. That is, we are all closer than we think. It was concluded by Milgram that we are connected to any other person on Earth through roughly 6 hops. It is because of this that the term "6 degrees of separation" was coined.

Another criteria for defining small world networks is the natural occurrence of clustering. As example, a short list of social networking sites were presented. Such sites included Facebook, LinkedIn, and VisiblePath. The idea of clustering from a social networking perspective comes from people being drawn towards each other if they are relatively close to each other with respect to the network graph. That is, if people share more common interdependences with each other, they tend to cluster together. Small world networks can thus be simply characterized as having high clustering and low average shortest path.

Next, two social network models were presented: the Watts/Strogatz model and Kleinberg model. These models were made in an attempt to address the characterization of social networks. The Watts/Strogatz model bases their research off of the observation that your friends' friends tend to be your friend as well.

Lastly, a short discussion on the formation of social networks was made. The general idea stems from social theory in that people each have their own set of weighted criteria used in determining friendships or an interest to pursue a friendship. Social networks are very heterogenous in that people are always changing their minds on what it takes for someone to be their friend. Lastly, it is suggested that weak ties are actually very useful in connecting an individual with a larger set of individuals. For example, recent research suggests that those that you don't know very well typically play a largely important role in broadening a social circle,e.g., jobs, new friends, new interests, etc.

In summary, small world networks may evolve from different constraints such as navigation, constraint optimization, and group affiliation.

Paper review

Methodology papers are posted on WebCt.

Lecture #14 Presentations by Chris Zachor and Daniel Ramos

Chris Zachor’s lecture is about Network analysis of the SourceForge community. At the beginning, he did an introduction of open source software and the SourceForge Community with a website to help promote collaboration between developers of OS projects, a repository for projects, developers and Users. Multiple networks can be formed from the SourceForge community. Project-Developer network, Developer network and project network are given as examples in this lecture. Then is the instruction of related work including Co-authorship between department at UCD (University College Dublin) and network analysis of SourceForge which is done by Gao and Madey. While previous studies were focused on growth and why the process is a success, this study will focus on how key developers and groups play a part in creating popular software.

The title of Daniel Ramos’e lecture is The Rise of Smart Devices on Mobility in Wireless Network. The lecture started with the definition of wired network and wireless network and the main characteristic of both of them. Structure is fixed for wired network and structure is dynamic and changes often for wireless network. The wireless network has lower bandwidth and quality of service than wired networks and the environment can be affected easily. Following is an instruction of Mobile Smart Devices. Mobility studies are important because of the quality of service, bandwidth utilization and roaming problems. Previous studies are focused on simulated mobility models and real data. Study about Modeling Roaming in Large-scale Wireless Networks using Real Measurement and Access and mobility of wireless Personal Digital Assistants (PDA) users have been done. Evolutionary topology model and campus waypoint model were used in the studies. As the conclusion of this lecture we know that studying mobility is important to network protocol evolution, highly mobile devices are already changing both computer and cellular network usage and more actual data and realistic models are needed as the conclusion of the lecture.

Tuesday, October 19, 2010

Lecture 15 - Presentations by Engin and Esra

This lecture started by Engin's presentation about Network Interaction and animation. He first explained the geo-localization term which is mapping network objects with respect to their physical location. Geo-localization is important in terms of providing many new location aware services, and it helps to manage and diagnose network. After explaining the geo-localization term and its properties he focused on explaining the main goal of his project. The main goal of his project is building a network interaction and animation tool which will obtain location of routers and visualize them. This will be able to display properties of the network objects and generate random traffic flow. It will allow user to change network parameters and observe how these parameters effect the network performance. He talked about related works and missing parts in the related works (Dimes, Internet 2, NAm). In conclusion this work will be a game of network structure which will allow user to play with the parameters of the network to see the performance of the network. This project can also be used for training in different companies, ISPs and educational institutions.

Next, Esra presented her project about Gene Expression Networks. She explained the term of stress which is the factor that affects the plant grow in the natural environment.Every organism gives different responses to different stresses they are subjected to and they have to improve themselves by adapting changes in environment and to other stress conditions. In general we can say that the plants are affected adversely by these conditions and thus as a result human life will also be affected by the stresses that plants encounters. The main goal of this project is to analyze the stress tolerance of crops; rice, wheat and maize under drought, high salinity and low temperature and study the sets of genes that give any response to these stresses separately. For every stress condition there will be an undirected graph whose vertices correspond to genes, and the vertices of two genes are connected by an edge if their expressions are correlated. Once this network is constructed she will apply the metrics of complex network theory to analyze the network.

Monday, October 18, 2010

Lecture 16 - Presentations by Austin and Guoxun

In this lecture, Austin presented his project about Image Tagging Networks. Tags are keywords that link to a resource like an image, video, web page, or blog. They are usually entered by users without using a controlled vocabulary. They help to improve search, organization, spam detection, and repudiation systems. Austin will focus on Flickr and using its API to see if there is any correlation between image popularity and tags. Previous work consists of Yahoo Research and the UC Berkley School of Information. They found that most tags were unclassified, but those that were contained location or "other" keywords.

Next, Guoxun presented his project on aerosol networks. Aerosol is an important study because many pollutants are aerosols and they can have an affect on climate change, health, and the economy. Aerosols are an important global problem because they can travel on trade winds. Guoxun will focus on California and how ozone is transported across the state. Previous work consists a study of Clark County Air Quality in March of 2006 by Robert A. Baxter of T&B Systems. The concentration of ozone increased during commuter hours and was spread widely across the state by winds.

Thursday, October 14, 2010

Lecture #13 : Presenatations by Akgun and Sevincer

In this lecture 2 of our friends presented their project.
Abdullah Sevincer's project is about Funding Network. Funding Networks nodes are government agencies as funding part to authors, their universities and funding as funded part. Funding network has been worked on however such works are not sufficiently detailed. Sevincer will use NSF as basic funding institution because it declares information about funds. He will use NSF data and try to find some relations between authors and fundings, universities and fundings and so on.

Akgun, on the other hand , will work on a Subnet Level Internet Topology Generator. He mentioned that Subnet level topology generator has not been issued yet. Firstly, he explained the need for internet topology generators. Network researchers need realistic internet topologies to run a performance study on their newly designed protocols. Mapping of real internet topologies has some difficulties and it is a time consuming job. Synthetic but realistic internet topology generators have a lot of use in this area. Secondly, he presented a detailed related work section including previous implementations of router-level and AS-level internet topology generators. Finally he talked about his design objectives. A realistic and highly configurable subnet level topology generator will be designed.

Wednesday, October 6, 2010

Lecture #12

The lecture started with Jeff giving his related works presentation on predicting eventual friendships. Through the use of cultural algorithms, katz centrality, and social theory, he wants to predict who you will become friends with. He finds that one of the key problems is current methods only account for a 1 to 2 hop distance in the network. By this account, you could only become friends with a friend of a friend. This, of course, is just a brief overview of some of what Jeff covered in class. For more information, check out his slides on the main CS 790g website.

After that, Dr. Gunes gave a lecture on Graph Data Mining. As mentioned in class, graph pattern mining can be used to do a number of things, including but not limited to: mining biochemical structures, program control flow analysis, and mining XML structures or web communities to name a few. A few of the graph mining algorithms mentioned in the lecture are Incomplete Beam Search, Inductive Logic Programming, and Graph theory-based approaches (this included the Apriori-based approach and the pattern-growth approach.) We covered the two graph theory-based approaches in class.

Two important definitions to remember from this lecture were frequent and support. The support of a graph g is pretty much how often the graph occurs within the larger structure. Meanwhile, a graph (or subgraph) is considered frequent if its support in a given dataset is no less than a minimum support threshold.

Tuesday, October 5, 2010

Lecture 11: Large-scale structure of networks

In this lecture, various aspects and natural phenomenon of large scale networks were observed and discussed. In particular, we were interested in components within networks, their relationships among nodes, and shortest paths. One question arised in which we asked "Why would there be a giant component in a network?" We can answer this by considering the purpose of a network: to connect to others. Networks exist to provide communication between all inhabitants residing in these networks. Further, we examined the possibility of a network containing 2 largest components. We deduced this to be impossible as the probability of randomly added edges to the nwtwork is far too high in which a network can only contain either 1 or no largest component.

Another discussion in this lecture was the consideration of shortest paths and the small world effect. Analyzing shortest paths are a commonly computed measurement of networks. When considering shortest paths, we must also discuss the total number of nodes within a network and the small world effect. Essentially, as the number of nodes grow within random networks, the average path length, l, between nodes do not undergo a dramatic change. That is, most random networks have their average path lengths related to a logarithmic scale: l ~ log(n). It is because the fact that l does not change very much with respect to a dramatic increase in the nodes in a network that we define the small world effect. For example, in the current state of the internet, nodes can typically reach other nodes within 10 to 12 hops. Similarly, any human on Earth should be able to reach another human within 6 aquaintances. The small world effect essentially says we are all closely connected than we think. Further, if a network is a power law network, we will get l ~ log(log(n)).

Lastly, our lecture concluded with a discussion on hop distribution, i.e., how many nodes can be reached in n hops. When considering distribution, most centrality measures are similar to the power law except betweeness centrality (on the AS-level). If we examine the graph, we will notice a straight, veritcal line at the beginning of the plot. This indicates all the nodes (ASes) and their betweeness centrality. These nodes are stub networks such as universities that are connected to the network but do not necessarily have anyone passing through them and thus, not serving as a shortest path to anyone--they are on the edge of the network. Further, we observe the internet to lack redundancy as there are not many triangles in which the cluster coefficient is actually less than its expected value.

Lecture 10: Power laws

The last Tuesday, we learned Power laws distribution. To understand the character of Power laws distribution, we compare it with normal distribution. Normal distribution has a peak and do not has heavy tail. For example, heights of human males follow the normal distribution with a peak around 180 cm. The power laws distribution is a straight line with negative slop in the log-log scale and it has a heavy tail. For example, the city population size is a power laws distribution, and the number of the city which has a huge population is very small, but many small towns. And the equation of power laws distribution is:

ln⁡(p(x))=c-aln(x)

Where p(x) is the probability of observing an item of size x; c and are two constants.
Power laws distribution is everywhere. Work frequency, web hits, book sold, telephone calls received and earthquake magnitude all follow the power distribution. During the analysis the power laws distribution, the most common and not very accurate method is do the linear fit line of the whole range in the log-log scale to get the value of a and c. But sometimes, because of the barrier condition or the noise in the heavy tail, this method will cause a big error. There are many way to solve this problem.

First solution is logarithmic binning. Bin data into exponentially wider bins and normalized by the width of the bin. This method can get a good fit line with information lost. Second solution is cumulative binning. The cumulative probability of a power laws probability distribution is also power laws but with an exponent a-1. No loss of information of this method. Third way to do this is chose a good start point to do the fitness, like the distribution of citations to papers, power laws is evident only in the tail. The fourth way is maximum likehood fitting, it is the best way. But to use this method you have to make sure that you have a power laws distribution.

A power laws looks the same no matter what scale we look at it. We call this is scale free.

At the end of our class, to understand the power laws distribution further, we learn a lot of example with power laws distribution and other distribution, like cumulative distribution, power-law with an exponential cutoff, zipf’s law and pareto principle.

Sunday, October 3, 2010

An interesting talk on innovation

The amazing power of networks by Steven Berlin Johnson.