Recently, the soilless cultivation team at the Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, integrated deep learning and machine learning technologies to analyze the shoot and root phenotypic traits of 263 cucumber accessions. They constructed a high-precision yield prediction model, designed an ideal high-yield cucumber plant architecture suited for greenhouse environments. The work also systematically elucidated the synergistic regulatory mechanisms of shoot and root phenotypes on yield. This work not only provides a theoretical basis for high-yield cucumber breeding but also offers new insights for phenotype prediction and structural design in smart agriculture. The related findings have been published in the Plant Biotechnology Journal.

Cucumber, as a major greenhouse vegetable crop in China, has its yield jointly influenced by the photosynthetic efficiency of its shoot and the nutrient absorption capacity of its root system. Traditional breeding approaches, often reliant on empirical selection, face challenges in systematically quantifying the interactive effects of multiple phenotypic traits. To address this, the research team collected shoot-related phenotypes (including leaf characteristics, stem features, and the node position of the first female flower), seedling root phenotypes (such as length, angle, and diameter), and final yield data from 263 cucumber accessions. By applying a U-Net model for automated segmentation and feature extraction from root images, utilizing multiple machine learning algorithms to construct a yield prediction model, and conducting large-scale phenotypic combination simulations (involving over 150,000 virtual combinations) to identify optimal structural configurations, this study has—for the first time in cucumber—enabled the prediction and optimization of high-yield plant architectures based on early-stage phenotypic data. The key findings are summarized below:
1. Deep Learning Enables Precise Root Phenotype Extraction, Overcoming High-Throughput Identification Challenges
The root system, serving as the primary organ for water and nutrient uptake, exhibits phenotypic traits that are closely linked to yield. The research team developed a semantic segmentation algorithm based on the U-Net deep learning model to process cucumber root images. This model achieved strong performance across diverse backgrounds and imaging conditions, with a mean Intersection over Union (mIoU) of 0.885, a precision of 0.9601, and an R² of 0.9605 between predicted and actual root surface area. Using this approach, the team efficiently extracted 29 core root traits—such as root length, diameter, angle, and depth—from seedlings of 263 cucumber accessions, and further computed 35 derived phenotypic indicators. These results established a robust data foundation for subsequent yield correlation analysis.

Figure 1 Root segmentation results of the U-Net model*
2. Machine Learning Constructs Yield Prediction Model, Identifying Key High-Yield Traits
The team integrated cucumber shoot phenotypes (e.g., first female flower node, leaf width, stem diameter) and root phenotype data from different growth stages, employing machine learning algorithms such as Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT) to construct yield prediction models. The study found that models relying solely on either root or shoot traits had limited prediction accuracy, while models integrating shoot and multi-stage root traits showed significant performance improvement. Among them, the GBDT+SVM combined model performed best, achieving a prediction R² of 0.6155 and a low Root Mean Square Error (RMSE) of 0.2601 using the "shoot + seedling root" trait combination. Feature importance analysis further pinpointed key high-yield traits such as the first female flower node, leaf width at 4 weeks, stem diameter, and frequency of shallow root angles, confirming these as core factors determining cucumber yield.

Figure 2 Analysis of combined machine learning models and frequency statistics of important features
3. Simulated Phenotype Combinations Reveal Interaction Patterns, Designing Ideal High-Yield Plant Architecture for Greenhouses
To identify optimal plant architecture configurations, the team selected 12 core high-yield traits and constructed 157,464 virtual phenotypic combinations. Using the best-performing prediction model for yield simulation, the study ultimately proposed a reference range for high-yield cucumber plant architecture in greenhouse environments. The results indicated that a high-yield cucumber architecture is characterized by a "compact and robust shoot combined with a narrow, thick, shallow root system." Yield gains in high-yielding phenotypic combinations were found to arise mainly from additive trait effects, rather than synergistic interactions. While shoot architecture determines the theoretical upper limit of yield, root architecture governs the extent to which this shoot potential is realized. Pairing a robust shoot with a slender root system created an antagonistic effect that reduced yield. In contrast, combining a weak shoot with a broad, thick root system partially compensated for yield loss through synergistic effects.

Figure 3 Analysis of interactions among simulated cucumber phenotype combinations
This study pioneers the integrated application of deep learning and machine learning technologies in analyzing shoot‑root phenotypes and predicting yield in cucumber, thereby overcoming the “empirical selection” limitations inherent in traditional breeding. It provides a quantifiable and predictable new approach for optimizing crop plant architecture. The proposed reference range for high-yield greenhouse cucumber architecture offers breeders a directional selection tool, accelerating the development of superior cultivars. Moreover, it establishes a theoretical basis for precise cultivation practices in protected cucumber production, such as water and fertilizer regulation, plant architecture management. Thereby, it contributes to the green agricultural objectives of “resource efficiency and high-quality yield.”

Figure 4 Schematic diagram of machine learning facilitating high-yield cucumber plant architecture construction*
Dr. Zhu Cuifang (graduated, now at Shanghai Academy of Agricultural Sciences) and Prof. Yu Hongjun from the Institute of Vegetables and Flowers, CAAS, are the co-first authors of the paper. Prof. Jiang Weijie and Associate Prof. Li Qiang are the co-corresponding authors. This research was supported by the National Key R&D Program of China, the China Agricultural Research System for Major Vegetables, the Shanghai Academy of Agricultural Sciences Smart Agriculture Research Center.
Article Link: https://onlinelibrary.wiley.com/doi/10.1111/pbi.70539