Search Engines and Social Networks

Outline

Introduction
Social networks, information networks, citation networks, actor-networks, transportation networks, biological networks.
Search Engines, Recommender Systems

Web content: words, images and hyperlinks
words, images, hyperlinks
Information Retrivel: Query and search,
Stop words, normalization, standardization
Efficient query processing, inverted index files
Bag of words model/ vector space model, Language model
Word occurrence distribution: multinomial distribution
Classification /categorization
kNN, Naive Bayes, Centroid method
*Bipartite graph/association graph
*Tripartite graph: users, items, tags

Graphs and Basic network properties
Nodes, edges, paths, graph representation, connected components
directed graphs, hyperlinks
centrality

Markov Chains and Random Walk on a graph
Random walks, transition probability, stationary distribution
Albert Einstein show the existance of molecules by random walks (Browning Motion) before microscope was invented to see them
Random walks on directed graphs, sinks and sources
PageRank: random walk on directed graph with random jumps.
Reading:
    "Random walks and electric networks (1984)" [PDF]

World Wide Web: hyperlink structure
Web is a directed graph. Diameter of Web is 19.
[In contrast, for a undirected graph, the diameter is usually 6].
Power law distributions of in-degrees.
Webpage ranking --- a key technology
co-citation, co-reference, hubs and authorites, HITS ranking
Web graph as an expected degree sequence random graph
Reading:
  • "Towards A Qualitive Search Engine: Hyperlink Vector Voting(1997)" [PDF]
  • "The diameter of the world wide web (1999)" [Link]
  • "PageRank" [Link]
  • "Graph structure in the web (1999)" [Link]
  • "The Web as a graph: measurements, models, and methods (1999)" PDF
  • "Link Analysis: Hubs and Authorities on the World Wide Web (2001)" [PDF]
  • "PageRank, HITS and a Unified Framework for Link Analysis (2002)" [PDF]

Social network strutures: Strength of Weak Ties
connection strength, depth of friendship, reciprocity, homophily
Transitivity, Transitive closure, Transitive reduction, measure of transitivity,
Strong links, weak links, local and global bridges.
Weak ties are the key links between tightly-knit communities
"Who talks to Who" experiments
Delete strong edges cause communities to shrink;
Delete weak edges cause communities to split;
Strengths between Facebook users: (1) Reciprocal, (2) one-way communication, (3) maintained relationship
Twitter: short messages, (1) All friends, (2) targeted friends
Structural holes: organization cohesion, social conformity
Fringe nodes: access to different information, innovation fusion, information gate keeper.

Positive and negative relations
Structual balance and compatibilty. Basic partition theorem.
Weaker structual balance and compatibilty. Relation to clutering.

Review of network centrality, clutering coefficents
Degree centrality, betweenness centrality, closeness centrality, eigenvector centrality.
How to compute edge betweeness

Models for social networks: Erdos-Renyi Random Graph
Degree distribution,
random variables, computing 2-chains and triangles
connectivity: chain formation, giant component, clique formation thresholding functions
Branching process: 1st neighbor, 2nd neighbor, etc.
condition for giant component
Algorithm to generate ER graphs

Models for social networks: Fixed/expected Degree Sequence Random Graph
A flexible model for networks
probabilty model: P(x_ij = 1) = d_i d_j / 2E.
compute many network quantities such as number of triangles, common neighbors, etc.
Chenoff bounds on degrees
neighborhood growth, condition for giant component
Algorithm to generate FDS random graphs
Reading:
  • "Connected components in random graphs with given expected degree sequences (2002)" [PDF]

Models for social networks: Small World graphs
Milgram's experiments.
Six degree of separation. From students to President.
Networks with small diameter.
Local connections and global connections.
Random rewiring of local connections to reduce network diameter.
Navigating the small world networks: Global optimal path vs. local information-based path.
Algorithm to generate Small World random graphs: 1D and 2D graphs

Information Cascade and Herding
Game theory: irrational behavior could be in fact rational
Private information and public decisions
Details of the experiment on red-majority vs blue-majority determination. Bayes rule.
Cascade can be wrong; Cascade can happen based on little true information; Cascade are fragile.
Sequencial decision making vs indepdent decision making. Discret decision vs continuous/probabilis decision.
Problems in discussion and decision making in meetings

Power law and Rich-get-richer Phenomena
Norma distribution vs Zipf distribution
Power law, Pareto distribution, A Website devoted to Zipf Distribution
Reading:
  • "Random texts exhibit Zipf's-law-like word frequency distribution", W. Li (1992), IEEE Transactions on Information Theory.
  • "Power laws, Pareto distributions and Zipf's law", M. E. J. Newman, Contemporary Physics 46, 323-351 (2005)).
  • "Universality of Rank-Ordering Distributions in the Arts and Sciences", PLoS ONE. 2009; 4(3): e4791 PDF

Network growth models: preferential attachment and copy model
Unpredictability of rich-get-richer effects. Effects of showing sales ranking.
Long tails: positive effects on internet economy

Diffusion and Propagation on Networks
Diffusion in networks. Game theory model.
Clusters block diffusion.
Percolation
Disease Propagation on the network: SIR and SIS Models

Finding communities on a network
Graph clustering

Recommender Systems
Content-based Filtering
Collaborative Filtering