Monday, November 29, 2010

Lecture #24: Search in Structured Networks

The lecture talks about searching through structured networks.  It was found through some experimentation that if the network followed a power law it was more searchable. If the network followed a Poisson graph then the network was not as searchable because in this case all the nodes have almost the same degree and all the links are distributed randomly. The most effective way to search a Gnutella network is to search through the highest degree neighbor of the nodes.

The next topic in the lecture is about how people find shortest paths. The strategy to accomplish this is usually a simple greedy algorithm wherein each participants picks a correspondent based on how close they are to the target. In a research carried out by a few researchers to test the accuracy of small world it was found that participants are not very good in routing messages by using the shortest path method as they use local information only. [slide 14-17]

The next topic is about testing a few search strategies on social networks.  In order to perform these tests the email correspondence over labs at HP was monitored over 3.5 months and a network was constructed with nodes such that edges were constructed between individuals who sent emails to each other. It was found that the degree distribution of email senders followed power law. If we considered the filtered network wherein only the participants who sent 6 messages each way were considered, degree distribution followed Poisson distribution and it took 40 steps on average to reach the desired target. In the next strategy, the geographical location of the offices of the participants was considered. It was found that 87% of the 4000 email links were between participants of the same floor. When the organizational hierarchy was considered it was found that hierarchy search was faster than geographical search.

Some research was conducted on virtual community called LiveJournal.  When the degree distribution was observed it followed a log normal distribution rather than power law. The result of a simple greedy algorithm was observed and it was found that 80% of the chains were completed with a slight modification to the basic greedy algorithm search. When the geographical basis of friendships was considered the average user has approximately 2.5 non geographic friends and approximately 5.5 friends based on 1/distance relationship. It was also found that the probability of a person knowing another person doesn’t depend on the absolute distance between the two people but on the relative distance. It is considered that a social network is searchable if a certain fraction r of the messages reaches the target.

Hence, it can be concluded a search can be performed more effectively in a network if the weak ties were also considered in the process and if sophisticated strategies are used.

Wednesday, November 24, 2010

Lecture #25: Information Diffusion

Today's topic was information diffusion in networks. There are many things that can influence the spread of information within a network. The three main factors in diffusion are network structure (degree distribution, clustering, connected components, etc.), strength of ties (frequency of communication and strength of influence), and spreading agents. A strong tie can be defined by frequent contact, affinity, or having many mutual contacts. One important study we looked at was about the strength of weak ties. Despite the name, weak ties can be very useful in spreading information. In the study, the author found that over 55 percent of people in the study found a job through a weak tie. They defined this weak tie as having contact with the person less than two times a week, but more than once a year.

In the lecture, we also discussed two key models for infection. These are SIR and SI. SIR stands for Susceptible -> Infected -> Removed/Recovered. Each node in the network can be classified by one of these terms in the SIR model. If a node is susceptible, then it can be infected. An infected node is self-explanatory. A removed node means it was infected and removed from the network. Or, a node can recover and build a natural immunity. The SI model is simply Susceptible -> Infected. The definitions remain the same, but a node is unable to recover and therefore remains infected.