This post was authored by Jeenu Thomas, Data Systems Engineer at IDinsight and SurveyCTO super user. As a member of IDinsight’s data engineering team, Jeenu built a Python library for extracting data from SurveyCTO. She introduces the library here and shares how IDinsight uses the combination of SurveyCTO and Python to build automated data pipelines.
This library is based on a set of example Python code snippets created by Eric Dodge, Associate Director at IDinsight, and previously made available to the SurveyCTO community. While both resources achieve the same aim, the new library simplifies the process of extracting SurveyCTO data for Python users.
IDinsight is a global advisory, data analytics, and research organization that helps development leaders maximize their social impact using rigorous evidence. Rigorous evidence requires rigorous data collection, processing, and analysis. We achieve this rigor by using SurveyCTO, with its huge range of features that allow automation and decentralization of data management operations. You can read more about how we are using SurveyCTO’s integrations to create automated workflows here. Our data collection activities are expanding, namely with our Data on Demand service, which makes data collection radically faster and cheaper. As we expand our capabilities, we are focused on building better data management systems that can integrate with SurveyCTO and handle larger datasets with greater speed, minimum human intervention, and a high level of data integrity. As a part of this work, IDinsight published a new Python library—called pysurveycto—that streamlines the process of extracting SurveyCTO data for Python users.
The first and most important step in building automated systems is pulling data collected on SurveyCTO. In the past, we’ve taken a wide range of approaches to extract SurveyCTO data: downloading data from SurveyCTO web console, using SurveyCTO Desktop, publishing Server Datasets to Google Sheets, and publishing data using the Webhook API tool. We now rely on Python and SurveyCTO’s REST API feature for many of our projects, which involves writing custom Python code that can send requests to the SurveyCTO REST API and receive the survey data sent back in the API response (you need a SurveyCTO account to access the REST API resource).
Why Python and SurveyCTO REST API?
We chose Python as the programming language for extraction, and for most other tasks in our data management pipeline, because it is open source, has robust community support, and gives us access to Python-based pipeline frameworks such as Apache Airflow. Python is also easy to run on the cloud, which is a requirement for our automated pipelines. We chose the SurveyCTO REST API feature because it is faster than manual data downloads, works well with large datasets, and can be coupled easily with Python code. The data extracted via SurveyCTO REST API can then be processed, archived, cleaned, and used for generating different metrics for survey monitoring dashboards and other reports using Python.
Realizing there could be a similar shift to Python-based systems across projects and across organizations, and inspired by the open source ethos of the Python ecosystem, we are making our SurveyCTO data extraction code public by releasing it as a Python library that anyone can download and use.
Why is this new Python library helpful?
SurveyCTO’s REST API feature provides several options such as downloading data in json/csv format, downloading data collected after a specific submission date, downloading server datasets, and more. The pysurveycto
library packages these options as off-the-shelf Python functions that are easy to use and abstract away from implementation details that can be time-consuming to configure.
To illustrate the simplicity of working with pysurveycto
, basic form data extraction can be completed with only two lines of Python code:
scto = pysurveycto.SurveyCTOObject(server_name, username, password) my_form_data = scto.get_form_data(form_id)
Rather than spending your time figuring out implementation details like HTTP request headers and authentication protocols, you just need to gather a handful of inputs like your SurveyCTO server name, login credentials, and form ID. pysurveycto
takes care of the rest behind the scenes so you can start working with your survey data as quickly as possible. The library supports all current functionalities of the SurveyCTO REST API.
The source code for the library is available on Github here. You can refer to the examples provided here to learn more about installing and using the available functions. As we explore more ways of using Python to leverage SurveyCTO features, we hope to add more functionality to this library. Feel free to reach out to us with suggestions and feedback at it@idinsight.org or by opening an issue on the pysurveycto
GitHub repository.