Semi-Automated Processing of Interconnected Dyads Using Entity Resolution

The Situation

social networkTracing the spread of infections is a critical task; every interaction may matter.

During the past two decades, there has been a surge in the number of studies applying social network analysis (SNA) to the study of infectious disease transmission. The mapping of "risk potential networks," in which individuals are connected by ties that spread infection, has yielded especially valuable insights. Principles of network analysis are essential to these efforts, including the following:

  • Contact tracing by disease intervention specialists
  • Network-based research identifying connections among people and/or animals as pathways for potential disease transmission
  • Research on social support networks
  • Quality improvement and research efforts to identify connections among and within health service organizations

Unfortunately, the process of identifying connections between people and organizations can be challenging, time-consuming, and manually intensive. For example, in research and outbreak investigations exploring direct and indirect relationships among participants or clients and their partners in infectious disease risk networks, participants’ names and demographic information are matched with those of named partners. Practitioners often must identify and resolve duplicate entries following a laborious, manual, cross-referencing procedure. This process is described in this article.

The Charles River Analytics Solution

SPIDER splash screenCharles River Analytics and our partner, Dr. April Young of the University of Kentucky, developed SPIDER to revolutionize the cross-referencing process and enable more rapid network construction. Essential for rapidly changing, fast-moving public health crises, SPIDER is a semi-automated, market-leading outbreak management tool for understanding disease networks. Our solution combines powerful text and variable cross-referencing tools, dramatically improving the speed and accuracy of risk network construction.

SPIDER helps health professionals and researchers on the front lines of communicable disease prevention and containment. It is a Java-based desktop application with a modular user interface (UI) that includes an assortment of text cross-referencing tools, such as Fuzzy Soundex, Phonex, and Q-gram. We also incorporated customizable entity resolution criteria for matching on continuous and categorical variables using arithmetic functions and equality operators. Data files can be uploaded, stored, and exported as comma-separated values (CSV) files. Data files needed for cross-referencing include edge array or relational data, such as a list of the network member names reported by each participant or client and their corresponding attributes; and participant or client data, such as names and attributes.

Through its promotion of contact tracing and proactive monitoring, SPIDER could greatly improve infectious disease research and practice.


With SPIDER, health care practitioners and researchers can better process their data and make better-informed decisions and conduct more rigorous examinations of disease-spreading networks.

SPIDER is easily configurable to meet changing needs; users can customize the criteria for cross-referencing via a point-and-click interface. It then displays match results in the UI as a customizable and sortable table. Match results can be exported as a de-identified CSV file in edge array format for use in network analysis packages. SPIDER has been used successfully in two NIH-funded network studies to date to build initial versions of risk networks.

Learn More

To learn more about SPIDER, contact us.

Research reported in this publication was supported by the National Institute of Mental Health of the National Institutes of Health under Award Number R43MH106361. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Contact Us