Big Data and Social Analytics
(Presented at Short Course on Humanitarian Engineering in Asunción, Paraguay)
More information about the short course available at: http://www.ing.una.py/?p=20074#.V7R2vOTH5KA.twitter
Model-based Fraud Detection in Growing Networks
(Presented at the IEEE Conference on Decision and Control in Los Angeles, CA)
People share opinions, exchange information, and trade services on large, interconnected platforms. Because of the size of these platforms, they are common targets for fraudsters who try to deceive randomly selected users. To monitor such behavior, the proposed algorithm evaluates anomalies in the network structure that results from local interactions between users. In particular, the algorithm evaluates the degree of membership to well-defined communities of users and the formation of close-knit groups in their neighborhoods. We identify a set of suspects using a first order approximation of the evolution of the eigenpairs associated to the network; and within the set of suspects, we locate fraudsters based on deviations from the expected local clustering coefficients. Simulations illustrate how incorporating structural properties (their asymptotic behavior) into the design of the algorithm allows us to differentiate between the aggregate dynamics of fraudsters and regular users.
Crime hotspots in Chicago
The left plot below illustrates the total amount of crime instances that occurred in the city of Chicago during a January 1st, 2013. The right plot shows the location of police cameras.
illustrates the areas in Chicago that had a high crime rate during the first semester of 2013. Each time step captures the distribution of all types of crimes within the past 7 days. Visualizing the evolution of crime shows that hotspots are defined within specific areas (left plot). The right plot represents the hotspots generated by location of the police cameras. Hotspots are created using optimal bandwidth for standard normal data.
Arrest patterns are shown in the video below. The left plot illustrates the distribution of crimes that lead an arrest; the right plot shows again the location of police cameras.
A network of the types of crimes in San Francisco, CA
Relationships between different types of crimes can be visualized as a network, with a node representing each type of crime (e.g., bribery, stolen property, or drug-dealing) and a link quantifying the co-occurrence of two types of criminal activities in the same neighborhood (within a time window). This notebook illustrates the relationships between various types of crimes in San Francisco, CA (data available at https://data.sfgov.org). Two types of crimes are related if most instances associated to both types happened in the same neighborhood within the same day.
The video below highlights the types of crime with the two highest eigenvector centralities, which correspond to nodes that are connected to many other well-connected nodes.
The video below highlights the types of crime with the two highest eccentricity centralities, which corresponds to the nodes that are at short maximum distances to every other node.
The video below highlights the types of crime with the two highest degree centralities, which corresponds to the nodes that have high vertex degrees.
The video below highlights the types of crime with the two highest betweenness centralities, which corresponds to the nodes that are on many shortest paths of other node pairs.
Next, the plots below show the degree distribution (for different days) during a 90-day period. Simulations are based on uniform neighborhoods with radiii of 0.004 degrees (centered around each instance).
To download this notebook click here. Similar plots of the degree distribution for the City of Chicago (for a 10-day period) are shown below. The notebook can be downloaded here.
On the Formation of Community Structures
(Presented at the American Control Conference in Montréal, Canada)
Many real-world networks consist of numerous interconnected groups which, as communities, display distinctive collective behavior. The division of a network into communities – groups of nodes with a high density of ties within but a low density of ties between groups – underlies the structure of social and technological networks. In human communities, for instance, individuals may group together according to special interest, occupation, intent, or belief, with tendency to establish stronger ties with individuals who are similar to themselves. Here, we introduce a formal framework for the formation of community structures from homophilic relationships between individuals. Stochastic modeling of local relationships allows us to identify a wide class of agent interactions which lead to the formation of communities and quantify the extent to which group size affects the resulting structure.
What Data Needs
(Presentation at Universidad Javeriana, Santiago de Cali, Colombia)
This presentation examines the effects of democratizing information technology and the importance of data analysis (in particular graph theory) as a decision supporting system.
Large Networks: Theory and Applications
(Presentation at the Universidad Antonio Nariño, Santiago de Cali, Colombia)
This presentation overviews the field of complex systems. Part 1 focuses on the evolution of technology in the past decades. Parts 2 and 3 present various applications of network theory to social systems.
Complex Networks of Corruption
(Presentation at the Economics of Corruption 2010 Workshop, Passau, Germany)
This presentation introduces an argument for theorizing about corruption. In particular, we argue that mathematical models of processes taking place on social networks can provide good insight in the description of incentives that lead to corrupt behavior (when the phenomena of corruption emerges from systemic causes). We propose the implementation of an agent-based simulation model, which aims to emulate the dynamics of corruption. The proposed model sheds some light on potential factors that influence honest human behavior. It takes into account subjective individual factors and the possible effect of environmental variables, captured by a social contact network. We focus on the effects of network transitivity and average path length on corruption.
Bridging Across Technological, Biological and Social Systems
(Plenary Address, IEEE Colombian Workshop on Robotics and Automation, Cali, Colombia)
Control theory originated in an effort to understand and manipulate the behavior of technological systems. Contemporary trends in control theory are based on the recognition that, despite their apparent differences, systems found in formal and empirical sciences share common underlying principles. Beyond metaphorical connotations, I will argue here that we need to exploit structural similarities to transfer methods of modeling, analysis, and understanding from one academic field to another. This talk should help us elucidate the general structure and behavior of dynamical systems, and move us towards a deeper appreciation of the general nature of technological, biological, and social systems.
Ideal Free Distributions in Growing Networks
(Presented at the American Control Conference in Seattle, WA)
There is growing interest in understanding the emergence of a class of real-world networks called “scale-free networks” (e.g., computer networks such as the World Wide Web, some protein-protein interaction networks, and networks created by the formation of sexual partnerships). In this context, the number of edges (connections) that is most commonly found in a network (graph) indicates the scale of its connectivity distribution (e.g., the peak in a Poisson or bell-shaped distribution). Broadly speaking, the most notable feature of a scale-free network is its heavy-tailed (power-law), rather than a Poisson or bell-shaped, connectivity distribution. In particular, power-law distributions indicate that the probability P (k) that a node connects to k other nodes is proportional to k−β for some positive constant β , implying that the number of edges (the degree) of the nodes of the network comprises different orders of magnitude (i.e., with a few nodes having a high degree, many having only a low one, and without a peak in the distribution). We presents a class of network optimization processes that account for the emergence of scale-free network structures. We introduce a mathematical framework that captures the connectivity and growth dynamics of a network with an arbitrary initial topology. We show how selection via differential node fitness affects the proportion of connections a node makes to other nodes, and how a heavy-tailed connectivity behavior manifests itself from consecutive achievements of IFDs. Finally, we present simulation results that show how this class of networks may emerge even when consecutive IFDs are not perfectly reached.