Mission Statement

A global farming technology company had conducted hundreds of crop growing trials since their inception. The experiments had been performed by different scientists and looked at a variety of factors. The company wished to consolidate all past trial results into a standard format – as the results of each trial differed based on the team that performed the trial. This would have been a very difficult and time consuming task to perform manually.

The mission was to develop proof-of-concept repeatable process to ingest the trial results data, handle the variations in format and specific experimental factors, and convert to the agreed standard format. This would allow the business to report across the entire data set

Tools Used

  • Alteryx

 

Detailed Solution

Identification of Repeated Trial Data Structures

Analysis determined that across the various trial results there were a few typical tabular formats used by the scientists to capture the results. Although, there were slight variations dependent on the scientist performing the trial and the factors relating to the trial.

Assessment of Trial data sets

Using Alteryx, we were able to develop a workflow to interrogate the files available for each trial and determine how much they deviated from the typical formats expected. This was used to score each trial data set: trials that matched the expected formats exactly were given a high score, and those that used non-standard formats to record results were given a low score.

This gave us a clearer view of which trials were ready for ingestion, and also helped identify further repeated data formats used by scientists.

Iterative Identification of further data structures / conventions

Based on the results of the initial assessment, we were then able to amend the workflow to recognise further repeated data structures. Trials using these formats then achieved a higher score for conversion.

We were also able to amend the workflow to deal with the fact that there were slight differences in naming conventions for fields used by the different scientists. This also lead to trials achieving a higher score for conversion.

Impact

  1. Prioritize resources effectively by focusing on medium-to-high scoring trials that could be converted with minor workflow adjustments or targeted data cleanup.

  2. Avoid wasted effort on very low-scoring trials, which were shown to be unsuitable for automated conversion due to their unique or one-off formats.

  3. Support decision-making with quantifiable measures of data readiness, reducing uncertainty and enabling more strategic planning for data standardization.

  4. Encourage process improvement by highlighting workflow or data entry practices that, if adjusted, would increase future conversion success rates.

  5. Accelerate adoption of the standard format by creating a clear roadmap for which datasets to convert immediately, which to remediate, and which to exclude.

Conclusion

The solution gave the business a detailed view of the readiness of their data for conversion to the standard format. They could then make informed decisions about how to process the remaining trial data i.e., develop the workflow to encompass other (repeated) variations in data structure – or – manually fix the data to fit the handled structures.

Previous
Previous

Next
Next