My research aims to model, design, and optimize different types of networks - social, communication, and economic - through data science. With the convergence of Internet activities and new trends in distributed computing such as Fog Networks and the Internet of Things, networks are now generating substantial data on users as they interact with device applications. The plethora of data available presents unique opportunities for new approaches to network science through innovations in data science, optimization, and machine learning.

In developing methodologies for network data science, my research generally pursues a four-pronged approach:

  1. Data acquisition: Collecting fine-granular user behavioral data, which comes from large-scale system deployments.
  2. Feature engineering: Deconstructing high-dimensional network data into low-dimensional feature sets for modeling.
  3. Optimization modeling: Building short-timescale optimization models of network topologies and functionalities.
  4. System deployment: Implementing the models and features in large scale networking systems to validate/falsify assumptions.

The following summarizes a few key thrusts of my research, with selected publications in each case. More of my publications can be found here.

Network-Aware Distributed Machine Learning

There are two fundamental challenges to distributing the training/inference of machine learning models over contemporary networks: (i) edge devices have heterogeneous computation and communication capabilities, and (ii) when data is shared, privacy must be considered. To address these challenges, we are developing methodologies for network-aware machine learning, i.e., to enable intelligence at the edge.

More specifically, to address the first challenge, we have been developing distributed learning optimization methodologies that jointly consider (a) the objectives of machine learning algorithms with (b) the costs of computing at a device versus offloading between devices in deciding where and when data should be processed in a network. Our methodology has yielded theoretical bounds on the model accuracy that can be achieved at the edge, and our proof-of-concept experimental evaluation has shown improvements in cost of up to 50% in practice over existing methods for distributing ML. To address the second challenge, we have been developing a novel deep learning architecture to generate encodings of source data that are private from potential adversarial attacks (i.e., obfuscating of sensitive attributes) yet still informative to desired ML tasks (i.e., retaining information useful to prediction models). Our theoretical and experimental results have shown that this new architecture obtains significantly better tradeoffs between predictivity and privacy objectives than state-of-the-art ML techniques.

Recent Publications

Data-Driven Network Efficiency Optimization

A key theme in network science research is optimizing network functionalities (i.e., processes running on networks) to maximize performance subject to constraints on the underlying topologies (i.e., the connections between nodes). Contemporary networks pose challenges to employing such optimization techniques given how rapidly topologies evolve due to mobility, connection quality, and other factors.

To address this, we have been developing data-driven efficiency optimization methodologies for joint optimization of topologies and functionalities in social and communication networks. Our methods quantify efficiency as the ratio of node benefit in the existing network to the maximum utility achievable through optimization, where "benefit" is an assessment of how well the functionality matches the current topology. We infer these network attributes at each time step by analyzing behavioral data, e.g., through the sequence of messages passed between users in a social network, or through time series' of device data demands in a communication network. Our evaluation on several social network datasets has shown that our methodologies can obtain improvements in efficiencies of up to 30%, while preserving fairness in individual user utilities. For communication networks, we have derived conditions under which a virtual Internet Service Provider (ISP) such as Google Fi can maximize its profit through optimization of the ISPs it partners with.

Recent Publications

AI-Based Learning Personalization

Contemporary social and communication networks are innovating when and how human learning takes place. The fine-granular behavioral data (e.g., video-watching clickstream measurements) generated in networks as users interact with online content and with one another in their social learning networks (SLN) presents an opportunity to optimize human learning with machine learning intelligence embedded in user devices. In doing so, we must also preserve the privacy of sensitive educational records.

We have developed the first content delivery system that performs fully automated, fine-granular, behavior-based individualization. Traditional adaptive educational systems suffer from two limitations: (i) they require substantial upfront input from course authors to establish individualization paths, and (ii) they rely on sparse quiz performance data as adaptation signals. Our system overcomes the first limitation through topic-based modeling of course materials and SLN discussions to automate content tagging and remediation content generation. For the second limitation, we have developed behavior-based prediction algorithms that estimate a user's knowledge state from signals extracted as they interact with the course content. Trials of our system have shown statistically significant improvements in engagement and knowledge transfer compared with traditional adaptation methods. Our system has been delivered to more than one million employees at Fortune 500 companies worldwide.

Selected Publications