Debiasing Network Embeddings in Social Media

laranguyen811
Dec 4, 2022
4 min read

Social media has become one of the largest platforms for us to connect. World Economic Forum estimates that one-in-three people globally use them, and more than two-thirds of all internet users. However, fairness on social media is still a critical issue that needs an answer, especially around connection recommendations. This article explains how we can do so by debiasing network embeddings.

1. What is Network Embedding?

Nino Arsov and Georgina Mirceva assert that networks are one of the most robust structures for modelling problems in the real world. Downstream machine learning tasks defined on networks can solve many problems. According to Haochen Chen, Bryan Perozzi, Rami Al-Rfou and Steven Skiena, networks give us an omnipresent way to organize diverse information. We often expect it to predict missing information related to each node in a graph. Ian Robinson, Jim Webber & Emil Eifrem define a graph as a collection of vertices and edges, a set of nodes and the relationships that connect them. For example, we can represent Twitter, LinkedIn or Facebook’s data as a graph.

We can use a technique called network embeddings to perform complicated inference procedures on the entire network. Network embedding refers to learning latent low-dimensional feature engineering for the nodes or links in a network. The main idea is to find a mapping function that converts each node in the network to a low-dimensional latent representation. The basic principle is learning encodings for the nodes in the network. Thus, the similarity in the embedding space reflects the similarity in the network. The scope of embedding is changing and applicable to all distinctive graph types. The advantage of node embedding is that it does not require feature engineering by domain experts.

Network embedding encompasses various methods for unsupervised and sometimes supervised learning of feature representations (known as feature engineering) of nodes and links in a network. Embedding methods depend on the assumption that we should reflect the similarity between nodes in the learned feature representations.

2. Debiasing Network Embeddings

Maarten Buyl and Tijl De Bie explain that fairness has seen little attention in the network embedding field. They provide a method to increase fairness at the network embedding level and its downstream link one: link prediction. Link prediction can help you predict whether two persons will become friends on a social network. In network theory, link prediction is the problem of predicting the existence of a link between two entities, used to predict future possible links in a network.

An algorithm naturally allowing us to do so is Conditional Network Embedding (CNE): a Bayesian network embedding algorithm using a prior distribution to model the structural properties of the network, such as the node degrees or any block structure. SAS describes a prior distribution as a probability distribution parameter representing our uncertainty about the parameter before we examine our current data. CNE then searches for an embedding optimally describing the relationships between nodes considering these are not already captured by the prior.

DeBayes is an adaption of CNE where they model the sensitive information as strongly as possible in the prior distribution. It is then debiased. DeBayes have a few conceptual advantages. It is simple to choose whether we apply fairness or not at evaluation time; it suffices to opt for the required evaluation prior. The prior is flexible enough to model several sensitive attributes. Imposing fairness implies no additional computational cost.

DeBayes is part of the category of algorithmic fairness. These methods usually try to prevent bias from affecting the learned model. A common approach is modifying the loss function or requiring fairness constraints throughout the learning process. They refer to learning latent, non-discriminatory embeddings as fair representation learning. The DeBayes method attempts to represent sensitive and non-sensitive information separately during training. They avoid bias through the Bayesian formulation of the objective function rather than by reducing bias through a loss term.

3. Conclusion

DeBayes focuses on taking out biased information while still learning informative embeddings. It is a conceptually simple method for debiasing network embeddings based on CNE while helping us debias node embeddings substantially in almost all cases. The link predictions score well on three crucial high-level fairness measures: demographic parity, equalized opportunity and acceptance rate parity. The cost of prediction quality by using DeBayes is limited. The method is flexible enough to incorporate various types of sensitive attributes and is not dependent on minimizing a fairness-related loss term. Yet it relies on the ease of modelling the sensitive information through a prior distribution. The principles underlying DeBayes could be generalized and applied to other areas in fair machine learning, including content recommendations. Future social media will promote recommendations based less on inequitable biases when we debias networking embeddings to increase fairness and equity.

References

A Gentle Introduction to Maximum Likelihood Estimation for Machine Learning - MachineLearningMastery.com

https://arxiv.org/pdf/1808.02590.pdf

Graph Databases, published by O'Reilly Media

How has social media changed the world? | World Economic Forum (weforum.org)

https://arxiv.org/pdf/2002.11442.pdf

Introduction to Bayesian Analysis Procedures: Prior Distributions :: SAS/STAT(R) 9.2 User's Guide, Second Edition

1808.02590.pdf (arxiv.org)