Data Modeling for Inter- and Intra-individual Stability of Young Swimmers' Performance: A Longitudinal Cluster Analysis.

Purpose: The aims of this study were to classify, identify and follow-up young swimmers' performance and its biomechanical determinants during two competitive seasons (in seven different moments of assessment-M), and analyze the individual variations of each swimmer. Method: Thirty young swimmers (14 boys: 12.70 ± 0.63 years-old; 16 girls: 11.72 ± 0.71 years-old) were recruited. A set of anthropometric, kinematic, efficiency, hydrodynamic and mechanical power variables were assessed. Results: The cluster solution (i.e., number of ideal clusters for this sample) resulted in three clusters, which were named as: cluster 1 ("talented"), cluster 2 ("proficient"), and cluster 3 ("non-proficient"). The performance improved between moments of assessment in all clusters (cluster 1-M1: 68.07 ± 6.62s vs M7: 61.46 ± 3.43s; cluster 2-M1: 73.14 ± 4.87s vs M7: 65.33 ± 2.97s; cluster 3-M1: 82.60 ± 4.18s vs M7: 70.09 ± 3.48s). Anthropometric features also increased between moments of assessment, and remaining biomechanical variables (kinematic, efficiency, hydrodynamic and mechanical power) also increased between M1 and M7, in all clusters. Cluster 1 increased their swimmer's membership between M1 and M7 (4 to 11), cluster 2 decreased (12 to 5), and cluster 3 maintained (14). Conclusion: It can be concluded that the cluster formation depends on different determinant factors during two competitive seasons, and young swimmers are prone to change from one cluster to another over this period of time.

The identification and development of sports talents is one of the main topics to consider in sports performance. Swimming is a multifactorial sport, in which interactions between various scientific domains occur. If intrinsic (genetic) factors are self-determined, extrinsic factors (related to environmental conditions, such as training and development program) can be enhanced by monitoring their implementation over time (Morais, Silva, Marinho, Lopes, & Barbosa, 2017;Zacca et al., 2018). A well-delineated training plan can focus on improving physiological and/or biomechanical variables with a positive effect on performance (Morais, Marques, Marinho, Silva, & Barbosa, 2014;Zacca et al., 2018).
In the search for the identification and prediction of the main determinants of young swimmers' performance, the following relationships have been reported; young swimmers' performance and anthropometric factors (Latt et al., 2009;Sammoud et al., 2019). Likewise, kinematics/efficiency (Figueiredo, Silva, Sampaio, Vilas-Boas, & Fernandes, 2016;Morais et al., 2016), strength and power (Amaro, Marinho, Marques, Batalha, & Morouço, 2017;Garrido et al., 2012) and hydrodynamic aspects (Barbosa, Morais, Marques, Given the existence of this variability, other studies have verified that it is still possible to group young swimmers according to their anthropometric and/or biomechanical similarities at a given moment Figueiredo et al., 2016), or at various moments throughout a competitive season (Morais, Silva, Marinho, Seifert, & Barbosa, 2015). Nevertheless, the information on this topic of study is still scarce since the morphological and technical changes that young swimmers depict over growth and maturation processes may interfere in their performance enhancement. A study analyzed the group membership (variations between sub-groups) of an agegroup of young swimmers during one single competitive season (Morais et al., 2015). However, young swimmers keep growing up and the transition between competitive seasons may trigger other determinants that were previously not responsible for performance (Latt et al., 2009;Morais et al., 2015). Moreover, the addition of other main performance determinants, such as variables related to mechanical power (Barbosa et al., 2015a), are not yet commonly assessed in a longitudinal research design. Nonetheless, a recent study indicated that fastest swimmers are more likely to produce higher values of mechanical power in comparison to swimmers in lower tiers (Barbosa, Bartolomeu, Morais, & Costa, 2019).
In this sense, this study aimed to: (i) classify, identify and follow-up young swimmers into sub-groups (clusters), according to the performance and its biomechanical determinants, during two competitive seasons; (ii) analyze the individual variations (hypothetical changes between sub-groups) of each swimmer, at each moment of assessment. It was hypothesized that the determinant factors responsible for the sub-group formation vary from one moment of assessment to another, and that swimmers may change between clusters over time.

Method
Participants Thirty young swimmers (14 boys: 12.70 ± 0.63 yearsold, Fina points: 234.86 ± 69.76 points; 16 girls: 11.72 ± 0.71 years-old, Fina points: 288.75 ± 67.01 points, at the beginning of the study) were recruited for analysis. Swimmers were in a pre-pubertal stage (Tanner stages 1-2). So, the test for a sex effect (oneway ANOVA, significance set at p ≤ 0.05) showed no significance. In this sense boys and girls were pooled together, as reported elsewhere (Barbosa et al., 2015b). The sample included national record holders, agegroup national champions and other swimmers under a talent identification program. Parents or guardians and the swimmers themselves signed an informed consent form. All procedures were in accordance to the Declaration of Helsinki regarding human research, and the University Ethics Board approved the research design.

Study design
This framework was based on a longitudinal follow-up design. Participants were evaluated at seven different moments (M i ) over two consecutive seasons. The swimming seasons were planned based on a traditional three peak performance, and each moment of assessment corresponded to one of these peaks (end of each macrocycle). Figure 1 depicts the moments of assessment scheme during the two seasons. Such moments were different in each season according to coaches' advice (i.e., defined according to the training program and the competitive calendar in each season). The level of evidence based on the United States Preventive Services Task Force (USPSTF) and Level II-3: Evidence obtained from multiple data groups over time with or without intervention, were used.

Performance
The official 100m freestyle event (short course meter swimming pool, i.e., 25m length) was selected as the performance outcome. This was chosen because it is the most popular event among these age-group of swimmers. The time gap between the data collection and performance was no longer than 15 days.

Anthropometrics
The body mass (BM) was measured on a digital scale (TANITA,Amsterdam,Netherlands). Height (H) was measured as the distance between the vertex to the floor (with the swimmers in the orthostatic position) with a digital stadiometer.
Trunk transverse surface area (TTSA) was measured by digital photogrammetry. The swimmers were photographed by a digital camera (Alpha 6000, Sony, Tokyo, Japan) in the transverse plane (downwards view, i.e., from above) on land simulating the streamlined position (Morais et al., 2011). This position is characterized by the upper limbs being fully extended above the head, one hand over the other, fingers also extended close together and head in neutral position. They wore their regular textile swimsuit, cap and goggles. In the camera shooting field, there was also a calibration pole with 0.945m length at the height of the xiphoid process (Morais et al., 2011). Afterward, the TTSA was measured from each swimmer's digital photo on dedicated software (Udruler, AVPSoft, USA).
For the hand surface area (HSA) and feet surface area (FSA), swimmers placed their dominant hand and foot on the scan surface of a copy machine, and the file was exported to a PC. The scan surface was also fitted with a 2D calibration frame (Morais et al., 2012). The HSA and FSA were measured using the same procedure described for TTSA. The arm span (AS) was also measured by digital photogrammetry. The AS with the swimmers placed in an orthostatic position, with both arms in lateral abduction at a 90°angle with the trunk. Both arms and fingers were fully extended. The distance between the tip of each third finger was measured.

Hydrodynamics
The active drag (D a ), and the active drag coefficient (C Da ) were computed based on the velocity perturbation method (Kolmogorov & Duplishcheva, 1992). Swimmers performed two all-out trials of 25m at front crawl with a push-off start. One trial was carried out towing a hydrodynamic body (i.e., a perturbation device) and one other without towing it (Kolmogorov & Duplishcheva, 1992). A camera (Sony x3000, Tokyo, Japan) was used to record the swimmer's displacement time between the 11 th and 24 th meter. The swimming velocity was calculated as: v = d/t. The D a was computed as: where D a is the swimmers' active drag at maximal velocity (N), D b is the resistance of the hydrodynamic body computed from the manufacturer's calibration of the buoy-drag characteristics and its velocity (N), v b and v are the swimming velocities with and without the perturbation device (m·s −1 ). The C Da was computed as: where C Da is the active drag coefficient (dimensionless), D a is the active drag (N), ρ is the density of the water (1000 kg·m −3 ), TTSA is the trunk transverse surface area (m 2 ), and v the swimming velocity (m·s −1 ).

Mechanical power
The power to overcome drag (P d ) was computed as (Kolmogorov & Duplishcheva, 1992): where P d is the power to overcome drag (W), D a is the swimmers' active drag at maximal velocity, and v is the swimming velocity (m·s −1 ). The external mechanical power (P ext ) and the mechanical power to transfer kinetic energy to water (P k ) were computed respectively as (Zamparo, Pendergast, Mollendorf, Termin, & Minetti, 2005): where P ext is the external mechanical power (W), P d is the power to overcome drag (W), and η F is the Froude efficiency (dimensionless, described in the kinematics/ efficiency section).
where P k is the mechanical power to transfer kinetic energy to water (W), P ext is the external mechanical power (W), and P d is the power to overcome drag (W).

Kinematics/efficiency
The kinematics and efficiency variables were collected during a 25m maximal front-crawl trial. Swimmers were attached to a speedometer cable (Swim Speedometer, Swimsportec, Hildesheim, Germany) (Barbosa et al., 2015b). A customized software interface in LabVIEW (National Instruments, v. 2009, Austin, TX, USA) was selected to acquire (f = 50 Hz), display, and process speed-time data online during the trial.
A 12-bit resolution acquisition card (USB-6008; National Instruments, Austin, TX, USA) was used to transfer the data from the mechanical apparatus to the software application. Thereafter, data were exported to a signal processing software (AcqKnowledge v. 3.9.0, Biopac Systems, Santa Barbara, CA, USA) and filtered with a 5 Hz cutoff low-pass fourth-order Butterworth. The intra-cyclic variation of the swimming velocity (dv) was computed as : where dv is the intra-cyclic variation of the swimming velocity (dimensionless), v is the mean swimming velocity (m·s −1 ), v i is the instant swimming velocity (m·s −1 ), F i is the acquisition frequency, and n is the number of observations. The mean value of three consecutive full strokes between during the middle 15m were considered for analysis. The stroke frequency (SF) was measured by calculating the number of cycles per unit of time from the time it takes to complete one full cycle (f = 1/t, the mean of three consecutive full stroke cycles was used for analysis), and afterward converted to Hz. The stroke length (SL) was computed as (Craig & Pendergast, 1979): SL = v/SF. Where SL is the stroke length (m), v is the swimming velocity (m·s −1 ), and SF is the stroke frequency (Hz). The stroke index (SI) was computed as (Costill et al., 1985): SI = v ·SL. Where SI is the stroke index (m 2 · −1 ), v is the swimming velocity (m·s −1 ), and SL the stroke length (m). The Froude efficiency (η F ) was computed as (Zamparo et al., 2005): where η F is the Froude efficiency (dimensionless), v is the swimming velocity (m·s −1 ), SF is the stroke frequency (Hz), and l is the shoulder to hand average distance (in meter).

Data analysis
The normality and homoscedasticity assumptions were analyzed with Kolmogorov-Smirnov and Levene tests, respectively. The mean plus one standard deviation, and the 95% confidence interval (95CI) were calculated as descriptive statistics. The cluster modeling was performed based in the k-means approach (nonhierarchical), which permitted the prior definition of several clusters to be used. The k-means defines a centroid (i.e., the mean of a group of points/subjects), based on their similarities (Rein, Button, Davids, & Summers, 2010). To ensure a coherent comparison of data sets with different magnitudes and/or units, standardized z-scores of all variables were computed. The one-way ANOVA was used to identify the main determinants responsible for the cluster formation in each moment of assessment (p ≤ 0.05). A discriminant analysis was performed to validate the cluster formation. The total eta square (η 2 ) was selected as effect size index, and deemed as: (i) without effect if 0 < η 2 ≤ 0.04; (ii) minimum if 0.04 < η 2 ≤ 0.25; (iii) moderate if 0.25 < η 2 ≤ 0.64 and; (iv) strong if η 2 > 0.64 (Ferguson, 2009). The swimmers' changes between clusters were analyzed by visual inspection.
All anthropometric variables increased between the first (M1) and last (M7) moment of assessment (Table  1). Overall, the kinematic variables (SF, SL, v, and dv), mechanical power (P d , P ext and P d ) and efficiency (η F and SI), improved between the first (M1) and last (M7) moment of assessment. The hydrodynamic variables (D a and C Da ) increased (i.e., negative effect to the performance), but this fact may be related to the increase in swimming velocity, which is directly related to the drag (Table 1).
The determinant factors that better discriminated (based on higher F-ratio and standardized η 2 effect) the cluster solutions differed between moments of assessment (M1: CP and BM; M2: P d and P ext ; M3: H and AS; M4: BM and CP; M5: H and P k ; M6: CP and BM; M7: FSA and BM) ( Table 1).
The determinant factors that characterized each cluster (responsible for the swimmers' similarities) in each moment of assessment, are presented in Table 1, and summarized in Table 2. For example, in the first moment of assessment (M1), cluster 1 was characterized by high P d , P ext and P k (power), cluster 2 by large BM, H and CP (anthropometrics), and cluster 3 by Table 1. Descriptive statistics (mean ± one standard deviation), and 95CI for all parameters analyzed in each cluster, at each moment of assessment. The F-ratios are also presented, showing the determinant factors responsible for the cluster formation (and correspondent effect size: eta square-η 2 ).

M1
Cluster 1 (N = 4) Cluster 2 (N = 12)   and last (M7) moment of assessment, cluster 1 increased their membership (from four to 11 swimmers), cluster 2 decreased (from 12 to five swimmers), and cluster 3 maintained (14 swimmers). A deeper insight of the swimmers' shifts is portrayed in Figure 3 (panel b). The initial four swimmers included in cluster 1 in M1, maintained their membership in M7. From the 12 swimmers included in cluster 2, only four kept their membership at M7. Thus, remaining Kinematics (  eight shifted to other clusters. On the other hand, out of the 14 swimmers assigned to cluster 3 at M1, 11 kept the same membership at M7 (only three changed to another cluster). Altogether, out of 30 swimmers, only eight did not shift the cluster membership at any moment. The remaining 22 changed their cluster membership at least once over two consecutive seasons.

Discussion
The main aims of this study were to classify, identify and follow-up young swimmers' performance and determinant factors, gathered into sub-groups during two competitive seasons, and analyzing the variations in cluster membership over time. It was verified that the selected variables increased and/or improved between the first and the last moment of assessment. In each moment of assessment, the determinant factors responsible for the characterization of the cluster were different. It was found that most swimmers shift clusters over time.
The performance improved over the two seasons, and the determinant factors presented an overall trend to increase/enhance. Previous studies monitoring young swimmers' have shown an increase of the performance and its determinant factors between moments of assessment (Latt et al., 2009;Morais et al., 2014;Zacca et al., 2019a). In the present study, performance showed the same trend (except between the end of the first season and the beginning of the second).
However, if more moments of assessment are included (as in this study), some of the determinant factors included; kinematic, efficiency, hydrodynamic and mechanical power variables, may show some slight and circumstantial decreases/increases (i.e., sinusoidal profile) between moments of assessment (Table 1). These increases and/or decreases do not occur concurrently in all determinant factors (e.g., between M1 and M2: the v and SL decreased, and the D a increased). Moreover, the non-linear trend may not occur simultaneously in all clusters as well. For example, between M5 and M6 the SF increased in the swimmers included in cluster 1 and 2, but decreased in cluster 3.
Most studies with young swimmers seek to determine/predict which are the performance determinants, but designing cross-sectional studies Saavedra, Escalante, & Rodriguez, 2010). These cross-sectional studies do not give insight into the variations that occur in the performance determinants over time. The data of the present study shows that young swimmers are prone to change (increase/ decrease) their stroke mechanics (kinematic and efficiency), hydrodynamics and mechanical power at least three times in each competitive season. As sub-groups were formed, a deeper insight shows that such variations may not occur at the time same in all clusters.
The best solution to classify and identify the swimmers during two competitive seasons consisted in three clusters. Thus, it was decided to use the labels already found in the literature Morais et al., 2015). This cluster formation is due to the similarity/difference between swimmers. Throughout the seven moments of assessment, different determinant factors explained the clusters' formation, indicating that young swimmers' performance is a holistic phenomenon (Morais et al., 2017) (Table 1). For swimmers of a similar age-group, and included in the same competition level, it was noted that the main determinant factors that contributed to grouping of swimmers (the main factor of differentiation) varied at different moments of assessment (e.g., in M1 it was the CP and BM; and at M2 the P d and P ext ) (Table 1). This information is relevant for coaches and swimmers, since it is possible to state that within the same age-group and/or competitive level, it is likely to group swimmers according to their anthropometric and/or technical characteristics . This suggests that coaches should not put all swimmers of the same age-group and/or competitive level under the same training regime as if one size would fit all. Coaches should design the training and developing programs in tandem to anthropometric and biomechanical characteristics of each of the sub-group of swimmers (Table 2).
Overall, it was suggested that the evolution of an age-group of swimmers is under a process of natural selection. For example, during a competitive season, the number of "talented" swimmers will decrease, and likewise the number of swimmers assigned to the "nonproficient" cluster will increase (Morais et al., 2015). However, this is just an overall analysis of the swimmers' stability in each cluster. Indeed, it is not possible to present a deeper approach understanding the constituent factors of the number of swimmers who change cluster.
For example, a cluster can keep the same number of swimmers from one moment of assessment to another even if new swimmers are assigned to this cluster and others leave. The stability remains the same, despite not giving insight into individual variation (Morais et al., 2015). This issue of individuality seems to be extremely important, since some studies suggested that each athlete should be seen as "unique" and that there are several ways to improve performance (Durand-Bush & Salmela, 2002;Morais et al., 2014). Thus, a visual inspection was used to understand the individual variation of each swimmer (Figure 3b). It is possible to note that the natural selection argument may not be clear. Data from this study shows that in the first season the number of "talented" swimmers was the same in M1 and M3 (4 swimmers), "proficient" increased (from 12 to 16), and "non-proficient" swimmers decreased (from 14 to 10). However, immediately at the beginning of the second competitive season (M4), the "talented" cluster increased their membership (six swimmers).
Interestingly, in M4 (baseline of the second season) the determinant factors responsible for the cluster formation were the BM, CP, and H (anthropometric features). Swimmers are growing over the break when they are not undergoing training, which will may promote significant changes in their technical profile, enhancing their performance (Moreira et al., 2014;Zacca et al., 2019b). Thus, assessing young swimmers during a longer timeframe (including the non-training periods) will permit a deeper understanding of the swimmers' interindividual changes. Moreover, it is also possible to note that from the four swimmers initially assigned to "talented" cluster, three changed (all dropped one level, from cluster 1 to cluster 2), and hence only one kept his/her membership during the entire two seasons (S7; Figure 3b). On the other hand, five swimmers that were included in cluster 2 ("proficient") were promoted to the "talented" cluster (S1, S10, S11, S14, and S24), and two swimmers included in the "non-proficient" cluster at M1 were able to change their membership to the "talented" one at M7 (S21 and S30). In this sense, it seems that training does have a major effect on performance enhancement at young ages, simultaneously with the anthropometric features.
As main limitations it can be pointed out: (i) the determinant factors responsible for the cluster formation in each moment of assessment may be only reliable for short-distance events (i.e., 100m freestyle); (ii) in further studies the relationship between the external workload and the main determinants of each cluster could be checked.

What does this article add?
Since the determination/formation of each cluster depended on different factors in each one of the moments of assessment, perhaps swimming organizations could become aware of the existence of sub-groups for swimmers in the same age bracket or competitive level. Moreover, those sub-groups might not be determined by the same determinant factors. So, coaches should monitor and evaluate their swimmers in order to understand in which subgroup a given swimmer is clustered and which are the main determinants responsible for the performance. This will help coaches to apply a specific training plan according to such swimmer characteristics, and understand which determinants the swimmer can improve. Additionally, coaches should be aware that each swimmer can shift his/her membership over time. Therefore, as much as possible, coaches should design training and development programs catering the needs of each sub-group of swimmers and re-assign them to the most effective program depending on their state of development (depending on the cluster membership) at any given time. Since the determinant factors that are responsible for each cluster performance are different and can change in a short timeframe, training should be specific for each cluster (group of swimmers with similarities). Coaches should monitor their swimmers regularly in order to understand when they are ready to shift to a better cluster (i.e., faster), or when they shift to a worst cluster (i.e., slower), and they are able to understand such reasoning and change/adapt the training plan.
In conclusion, it is possible to group a set of swimmers of the same age-group and/or competition level into three different sub-groups (clusters). The factors responsible for the cluster formation vary over time and the rate of cluster membership is also significant. Coaches should pay attention to the swimmer's intravariability, i.e., swimmer's changes between clusters. Swimmers included in the fastest cluster may not maintain his/her membership during a timeframe, and shift to a slower cluster. The inverse may also occur, where slower swimmers may shift to fastest clusters. In summary, this research supports the understanding that performance at such early ages is a non-linear, dynamic, complex and holistic phenomenon.