Mission Statement
A global farming technology company had conducted hundreds of crop trials over the years. Each trial had been run by different scientists, capturing results in varying formats. Consolidating this data manually into a standard structure for reporting was slow, error-prone, and nearly impossible at scale.
We developed a proof-of-concept, repeatable process using Alteryx to ingest trial results, handle variations in format, and convert them into a consistent standard.
Tools Used
Alteryx
Detailed Solution
Identification of Repeated Trial Data Structures
Analysis determined that across the various trial results there were a few typical tabular formats used by the scientists to capture the results. Although, there were slight variations dependent on the scientist performing the trial and the factors relating to the trial.
Assessment of Trial data sets
Using Alteryx, we were able to develop a workflow to interrogate the files available for each trial and determine how much they deviated from the typical formats expected. This was used to score each trial data set: trials that matched the expected formats exactly were given a high score, and those that used non-standard formats to record results were given a low score.
This gave us a clearer view of which trials were ready for ingestion, and also helped identify further repeated data formats used by scientists.
Before:
Hundreds of crop trials were recorded in different formats by multiple scientists, making manual consolidation slow, error-prone, and unreliable.
After:
Using Alteryx, we created an automated workflow to standardise trial data, handle naming variations, and prioritise datasets for conversion.
Iterative Identification of further data structures / conventions
Based on the results of the initial assessment, we were then able to amend the workflow to recognise further repeated data structures. Trials using these formats then achieved a higher score for conversion.
We were also able to amend the workflow to deal with the fact that there were slight differences in naming conventions for fields used by the different scientists. This also lead to trials achieving a higher score for conversion.
Impact
Prioritize resources effectively by focusing on medium-to-high scoring trials that could be converted with minor workflow adjustments or targeted data cleanup.
Avoid wasted effort on very low-scoring trials, which were shown to be unsuitable for automated conversion due to their unique or one-off formats.
Support decision-making with quantifiable measures of data readiness, reducing uncertainty and enabling more strategic planning for data standardization.
Encourage process improvement by highlighting workflow or data entry practices that, if adjusted, would increase future conversion success rates.
Accelerate adoption of the standard format by creating a clear roadmap for which datasets to convert immediately, which to remediate, and which to exclude.

