Improving the quality of nutritional survey data worldwide: Putting Child Kwashiorkor on the Map Initiative

Published:

23 May 2016

By Lauren Browne

Lauren Browne is the Data Manager for the Kwashiorkor Mapping project and joined the team after interning with ACF-UK and Save the Children UK. She completed her Master of Science in Nutrition for Global Health at the London School of Hygiene and Tropical Medicine and previously served as a Peace Corps Volunteer for the US Government.

Location: Global

What we know: Kwashiorkor, or oedematous malnutrition, is overlooked in scientific and public health fora. The burden of kwashiorkor is unknown; metadata analysis has the potential to fill this information gap.

What this article adds: A CMAM Forum/ACF-UK/UNICEF/WHO collaboration undertook an updated mapping to estimate the numbers and location of kwashiorkor and identify high burden countries/areas. A total of 2,350 datasets from various UN/NGO/government sources were included. Significant limitations to the meta-analysis included barriers to data access, lack of standard formats (file names, datasheet types, varied coding and classifications between surveys), poor data quality, missing variables, no access to raw data, and duplicate surveys shared. Only one out of 36 MICS4 surveys included MUAC; all DHS surveys were missing both MUAC and oedema variables. There is a clear need for defined standards for all nutritional survey data, better overview of data quality and improved storage of raw original datasets. A minimal set of data, including MUAC and oedema, should be included in DHS, MICS and SMART surveys. Going forward, an expert inter-agency group should determine standard definitions, labels, codes and units for all indicators deemed to be of importance for inclusion in nutritional surveys. This group could facilitate management of an open access or licence-accessed central data repository for professionals and researchers.

Summary of the Putting Child Kwashiorkor on the Map initiative

Putting Child Kwashiorkor on the Map was a collaborative effort between the CMAM Forum, ACF-UK, UNICEF and WHO. The current phase of the project (Phase 2) was launched in late 2014 to help improve and strengthen the data used for the map produced in the first phase of mapping conducted in 2013 (Alvarez et al, 2013). The aims of Phase 2 were:

1. To refine and update the initial kwashiorkor map, provide a broad estimate of the numbers and location of kwashiorkor and identify high burden countries/areas; and

2. To strengthen the evidence base and support advocacy for inclusion of kwashiorkor in relevant methodology discussions at global level.

Non-governmental organisations (NGOs), United Nations (UN) agencies and governments involved with nutrition programmes were asked to share nutritional surveys. Requests were accompanied by a project information sheet and a data-sharing letter of agreement. A Technical Advisory Group (representatives from Centers for Disease Control and Prevention (CDC); CRED/University Uclouvain; Jimma University Ethiopia; Kenya Medical Research Institute (KEMRI); Mwanamugimu Nutrition Unit, Uganda; Médecins Sans Frontières (MSF); Washington University in St. Louis; University of Tampere and Valid International) guided the type of information to be collected, the database construction, the analyses and the final report.

Any nutritional survey adopting the SMART methodology (or similar methodology used before the development of SMART), with Population Proportional to Size (PPS) or exhaustive sampling, simple random sampling or systematic sampling, and including the variables age, sex, weight, height, mid-upper arm circumference (MUAC) and presence or absence of bilateral pitting oedema for children aged 6-59 months was deemed eligible for inclusion in a central database.

Project outcomes

The initial map from Phase 1 (557 surveys held by Brixton Health) was updated during Phase 2 with more robust estimates of the prevalence of kwashiorkor based on a total of 2,277 surveys collected from 11 NGOs (ACF, Concern Worldwide, GOAL, IMC, IRC, MSF, Plan International, Save the Children, Terre des Hommes, World Vision and Zerca y Lejos), 15 national governments/UNICEF, FEWS NET 1, FSNAU² and UNHCR for 55 countries. The eligible surveys were conducted from 1992 to 2015 and included the data of over 1.7 million children. Outcomes in terms of prevalence are included in an accompanying article in this edition of Field Exchange.

Findings and implications

One of the findings from the project was the “…need for systematic collection, storage, and standardisation of nutritional survey data, software and definitions… Inconsistencies were found across surveys, including lack of a standard format, varying codes for some indicators, loss of original files (often with past employees who left or through corrupted files), no clear contact person, etc. Variation was found in the type of software used, coding/labelling and units…” This article aims to expand on this finding and provide a more detailed description of the data issues encountered; specifically the barriers to data access, lack of standardisation, poor data quality, missing variables and receipt of raw and cleaned data.

Barriers to data access

Obtaining data permission was often a very lengthy process and some countries did not provide permission for use of nutritional surveys outside the country of origin. Furthermore, data agreements specified restrictions on use of the data and were time-bound. These problems are often encountered by researchers and have previously been discussed in Field Exchange (Guerrero, 2015).

Lack of standardisation

Surveys were received in five different formats (ENA³ for SMART, EpiInfo/EpiData (REC), STATA, SPSS and Excel), which required time-consuming file conversions to the common CSV format needed to aggregate all the data within the analytical software (R Analytic Flow was the statistical programme utilised for the project). Some files received were corrupted, most likely due to ineffective conversions, while others were received in unfamiliar formats that could not be converted.

Twenty-nine (18%) of the ineligible datasets were excluded because file labelling was poor and inadequate descriptive information was provided about the survey, such as location.

The metadata provided for surveys varied widely, was not standardised and was often either not present, coded opaquely, or classified differently. For instance, in those datasets that identified the population type, the definitions used by organisations to describe the surveyed population varied. Some surveys used general classifications (e.g. rural or urban) for the variable, while others disaggregated it into sub-groups (e.g. agrarian or pastoralist, instead of rural). Unknown codes utilised for variables were a problem for 11% (n=18) of the excluded datasets; some indicators were coded differently by different agencies and even in surveys conducted by the same agency, specifically oedema and sex.

Poor data quality

Data entry errors were extremely common in the received datasets, with values often typed into the wrong columns or typed incorrectly. The MUAC variable was most often recorded incorrectly and was sometimes recorded in both millimetres and centimetres within the same dataset. Very extreme values came up frequently for MUAC but also occurred for weight and height.

Missing variables

A total of 2,515 datasets were received, with nearly 7% (n=165) not eligible for inclusion in the database since they were missing one or more of the needed key variables (age, sex, weight, height, MUAC and/or oedema). No Demographic Health Survey (DHS) datasets had all the required variables (all were missing both the MUAC and oedema variables). Only Multiple Indicator Cluster Survey (MICS) 4 databases were sourced, since only MICS4 could potentially have all the variables needed. Of 36 MICS4 databases received, 35 were missing the MUAC variable and so were ineligible for inclusion. Overall, 63% (n=105) of the excluded datasets were missing MUAC; fewer were missing oedema or other variables.

A total of 114 children with oedema had incomplete case records, meaning they did not have one or more of the accompanying variables recorded (age, sex, weight, height or MUAC) and were therefore not included in the database. Of these, 83% (n=95) were missing MUAC, with the majority of the rest missing weight and/or height.

Receipt of raw and cleaned data

Raw data was specifically requested, but agencies found it difficult to locate all the original raw datasets, especially from older surveys. Many agencies had lost the data and could only provide narrative reports.

It was unclear whether datasets had already been cleaned prior to receipt, so an unknown number of included surveys were either cleaned based on the contributing organisation’s standards or the project’s standards, resulting in variability. Furthermore, agencies may have used WHO and/or SMART flagging criteria, either deleting flagged records or leaving them in, which was not evident from the datasets received.

Of the 2,350 eligible datasets, over 3% (n=73) were identified as duplicates, due to inter-agency collaboration during surveys and shared ownership of the data. Potential duplicate datasets were identified via the calculation of file-level checksums. However, the duplicate code could not account for cleaning differences among data entry persons, so this may have prevented some duplicate surveys from being detected. For example, if the same dataset had been cleaned by one collaborating organisation but not the other prior to sharing, then the code used for the analyses would not have picked up the duplicate dataset. It was not possible to systematically spot by eye all additional duplicates that could have been missed by the code due to the extensive nature of the database. The provision of raw original data by all organisations involved would have prevented these difficulties, thus minimising the number of duplicated dataset omissions.

Recommendations for the improvement of survey quality

It is recommended that in the future, a minimal set of data (including especially MUAC and oedema, since these are admission criteria for services managing acute malnutrition) be collected across all nutritional surveys, including standard national surveys like SMART, MICS and DHS.

Systematic storage of raw datasets, particularly in a common format (e.g. CSV) often used in large international research projects, should be prioritised, done at headquarter or country level and stored with the accompanying narrative reports.

It is important that nutritional survey datasets are properly standardised. It is recommended that an international, inter-agency technical advisory group determine standard definitions, labels, codes and units for all variables to be automatically included in nutrition surveys, including definitions for a minimal set of metadata. In addition, basic information must be integrated into each dataset, ideally in the file name.

Conclusions and the way forward

There is a clear need for defined standards for all nutritional survey data (especially surrounding file type and labels, codes, variables and metadata), better overview of data quality and improved storage of raw original datasets.

Going forward, an expert inter-agency group should determine standard definitions, labels, codes and units for all indicators deemed to be of importance for inclusion in nutritional surveys. In addition, if widely agreed, this group could facilitate management of an open access or licence-accessed central data repository for professionals and researchers.

For more information, email: Lauren Browne

References

¹ Famine Early Warning Systems Network.

² Food Security and Nutrition Analysis Unit, Somalia.

³ (Emergency Nutrition Assessment) software is an analytical programme recommended by SMART.

Alvarez JL, Dent N, Browne L, Myatt M, & Briend A. Putting Child Kwashiorkor on the Map. CMAM Forum Technical brief. London, March 2016.

Guerrero S. Strength in Numbers. Field Exchange 50. August 2015. p76. ENN.

Published

23 May 2016

About This Article

Issue:

Field Exchange 52 (en)

Article type:

Original articles

Download & Citation

Recommended Citation

Citation Tools

Page Tags

Field Exchange