How Location and Skills Affect LinkedIn User Uniqueness on LinkedIn

cover
30 May 2024

Authors:

(1) Ángel Merino, Department of Telematic Engineering Universidad Carlos III de Madrid {angel.merino@uc3m.es};

(2) José González-Cabañas, UC3M-Santander Big Data Institute {jose.gonzalez.cabanas@uc3m.es}

(3) Ángel Cuevas, Department of Telematic Engineering Universidad Carlos III de Madrid & UC3M-Santander Big Data Institute {acrumin@it.uc3m.es};

(4) Rubén Cuevas, Department of Telematic Engineering Universidad Carlos III de Madrid & UC3M-Santander Big Data Institute {rcuevas@it.uc3m.es}.

Abstract and Introduction

LinkedIn Advertising Platform Background

Dataset

Methodology

User’s Uniqueness on LinkedIn

Nanotargeting proof of concept

Discussion

Related work

Ethics and legal considerations

Conclusions, Acknowledgments, and References

Appendix

5 User’s Uniqueness on LinkedIn

The results in the table show that using the location significantly reduces the number of skills required to identify users on LinkedIn uniquely. This reduction is roughly 2× for the random skill selection (i.e., Sk_R Vs. Lo_R) and 3×-4× for the least popular selection (i.e., Sk_LP Vs. Lo_LP).

In the Lo_R scenario, 8, 14, and 23 skills are enough to make a user unique on LinkedIn with a probability of 50%, 75%, and 90%, respectively. For Lo_LP, the number of skills that make a user unique with those same probabilities is 3, 6, and 18, respectively.

For the two scenarios that do not consider the user location, at least 28 skills are required to make a user unique with a probability ≥ 75%. If we look again at Figure 1, we observe that only around 30% of the users report 25 or more skills in their profile. This means that exclusively using the skills roughly reduces the nano-targetable users on LinkedIn to 1/3 of its user base. It is worth noting that this still represents a privacy risk for ∼250M LinkedIn users.

Our data sample shows that 99% of the users on LinkedIn publicly share their location. In practice, the advertiser (attacker) willing to target an individual will be able to use the location in the vast majority of the cases, which reduces a lot the number of skills required to successfully nanotarget the user.

In summary, our results show that the combination of the location and 6 rare skills reported by a user in their profile is enough to uniquely identify 3 out of 4 LinkedIn users. Increasing the number of skills to roughly 20 (rare or random) would allow uniquely identifying 9 out of 10 users.

5.1 Success Probability of Nanotargeting Campaigns

Figure 5 shows the expected success probability (y-axis) of a nanotargeting campaign based on the combination of the location and N skills (x-axis). The figure depicts an upper bound for the success probability (red line) computed for the case where the least popular skills are selected and a lower bound (blue line) that refers to the random selection approach. The results suggest that an advertiser would need to use (roughly) between 5-8 skills more to achieve the same success probability when applying the random skills selection instead of the least popular skills selection. This is true except for very high success probabilities, such as 95%, where both strategies require a very similar number of skills.

From a practical point of view, an advertiser willing to nanotarget an individual would use all the skills the individual reports in her profile because the cost of retrieving 1 or 40 skills is the same. If an advertiser selects all the skills, the random and least popular skill selection becomes the same strategy. Therefore, the success probability of a nanotargeting campaign is actually bounded by the amount of skills users report in their profiles. The more skills users report the more vulnerable they are to nanotargeting attacks.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.