Using Machine Learning to Code Immigrant-Serving Organizations

Berkeley Computational Social Science Forum

CSS Training Program

September 21, 2021
4:00pm to 5:00pm
Virtual Participation


Computational Social Science Forum
Date: Tuesday, September 21, 2021
Time: 4:00-5:00 PM Pacific Time
Location: Virtual Participation – Register to receive the schedule and access links

Using Machine Learning to Code Immigrant-Serving Organizations

Speakers: Cheng Ren and Irene Bloemraad, University of California, Berkeley 

Abstract: Many migrants are vulnerable due to their noncitizenship, linguistic or cultural barriers, and inadequate safety-net infrastructures. Immigrant nonprofits can play an important role in improving immigrant well-being. However, progress on systematically evaluating the role of nonprofits has been hampered by the difficulty in accurately identifying immigrant-serving nonprofits in large administrative datasets. We tackle this challenge by employing natural language processing (NLP) and machine learning. Five NLP algorithms are applied and trained in supervised machine learning models. BERT results in the best performance, with 0.89 accuracy. This model further outperformed two non-machine methods often used by researchers, namely identification of organizations via NTEE codes or keyword searches of names. We thus demonstrate the viability of computer-based identification of immigrant-serving nonprofits using organizational name data, a technique that social scientists can apply to other research requiring categorization based on short labels. We also highlight limitations and areas for improvement.

The Computational Social Science Forum is an informal setting for the interdisciplinary exchange of ideas and scholarship at the intersection of social science and data science. Participants engage in a variety of activities such as presentations of work in progress, discussions and critiques of recent papers, introductions to new tools and methods, discussions around ethics, fairness, inequality, and responsible conduct of research, as well as professional development. This Forum is organized as part of the Computational Social Science Training Program, and weekly meetings are hosted by researchers from BIDS and D-Lab. The group welcomes social scientists and researchers with interests in data science methods and tools, and data scientists with applications or interests in public policy, social, behavioral, and health sciences. Participants include graduate students, postdocs, staff, and faculty, and members are encouraged to attend regularly in order to foster community around improving computational social science research, supporting the development and research of group members, and fostering new collaborations. Interested UC Berkeley community members are invited to use this registration form to receive the schedule and access links. Please contact for more information or if you are interested in presenting current research for an upcoming session.


Cheng Ren

Social Welfare, UC Berkeley

Irene Bloemraad

Professor of Sociology and Director of the Berkeley Interdisciplinary Migration Initiative