Abstract
Background. In Poland, there are limited validated outcome measures to evaluate upper extremity function in stroke patients for clinical and research use. The Action Research Arm Test (ARAT) aims to assess functional performance of the upper extremities.
Objectives. To translate and culturally adapt the original version of ARAT into Polish, and to determine its reliability and validity.
Materials and methods. A Polish version of ARAT (ARAT-PL) was developed using a forward-backward translation. The study then examined 60 patients with subacute stroke. Internal consistency (α), test–retest and inter-rater reliability (intra-class correlation (ICC), κ), standard error of measurement (SEM), minimal detectable change (MDC), and floor and ceiling effects were determined. The construct validity was evaluated using the method of hypothesis testing based on the results of correlations (rho) between subscale and total scores of the ARAT-PL and the upper and lower extremity section of the Fugl–Meyer Assessment (FMA-UE and FMA-LE).
Results. The internal consistency of the total scores and subscale was excellent (α = 0.97–0.99). Test–retest and inter-rater reliability scores were almost perfect (κ = 0.85–1.0) and excellent for the total and subscale scores (ICC = 0.99–1). The SEM and MDC for the test–retest and inter-rater reliability were 0.479, 1.327 points and 0.335, 0.930 points, respectively. The ceiling effect amounted to 48%. The validity levels with respect to FMA-UE and FMA-LE were found to be high (rho ranging from 0.70 to 0.83) and moderate (rho ranging from 0.53 to 0.68), respectively.
Conclusions. A Polish version of ARAT is a reliable and valid tool for assessing upper extremity function in subacute stroke patients in Poland. However, it appears to have a ceiling effect that limits differentiation of patients with mild upper limb impairment.
Key words: stroke, outcome measure, ARAT, FMA, upper extremity function
Background
Stroke is a leading cause of disability in modern societies.1, 2 In a large number of patients, motor and functional deficits are observed after a stroke.3 Upper extremity dysfunction is present in approx. 30–66% of stroke survivors4; it manifests in limitations in reaching and grasping movements, resulting in serious deterioration in the ability to perform daily living activities.5, 6
There are various outcome measures to assess the level of upper limb functional capacity after stroke. Examples frequently seen in the literature and practice include the Fugl–Meyer Assessment for Upper Extremity (FMA-UE), the Jebsen Hand Function Test and the Chedoke Arm and Hand Activity Inventory.7 One of the commonly used upper extremity assessment measures for post-stroke patients is the Action Research Arm Test (ARAT).8 This test was described by Lyle in 1981,9 and was based on Carroll’s Upper Extremity Function Test.10, 11, 12 It was designed for observation of the arm and hand during grasping, gripping, pinching and gross movements in people with cortical damage.13 Previous studies have shown good psychometric properties of this instrument in stroke patients.13, 14, 15 The ARAT has shown excellent internal consistency in stroke patients with mild-to-moderate hemiparesis (α = 0.98).16 The test–retest and inter-rater reliability, as calculated using intraclass correlation (intra-class correlation (ICC): 0.92–0.99) for the total and all subscales’ scores, was similarly excellent when tested in patients with subacute stroke.7, 16, 17, 18 Studies examining the convergent validity of ARAT have reported moderate, good or excellent correlations between the absolute (rho = 0.77–0.94) and subscale scores (rho = 0.67–0.74) of ARAT and FMA.17, 18, 19 To date, the original version of the ARAT has been translated into Swedish,20 Chinese21 and Spanish (in Chile).22 No studies have reported cultural adaptation of ARAT or assessed its reliability and validity on Polish stroke survivors; thus, there is a significant need to develop a Polish version of ARAT.
Objectives
The principal aim of the present study was to estimate the reliability and construct validity of a translated and culturally adapted Polish version of ARAT in the population of subacute patients with stroke. Construct validity was estimated by the correlation of total and subscale ARAT scores with scores for the translated upper and lower extremity sections of the FMA.23
Materials and methods
Translation and cultural adaptation
Forward, backward and final translation
The ARAT was translated into Polish by following the international guidelines.24, 25, 26, 27 Permission for the outcome measure translation was obtained from the author, Ronald Lyle (Wolters Kluwer Health rights). In the 1st and 2nd stages, the English version of ARAT was independently translated using a process that encompassed semantic, idiomatic, experimental, and conceptual meaning. The translation was performed by 2 bilingual Polish translators fluent in English (1 specialized in the physiotherapy field). The 2 Polish versions were compared; the differences between the translations were discussed and corrected, and the draft of the common version was jointly established.
In the 3rd stage, this Polish draft was back translated into English independently by 2 certified English translators. A common retranslated English version was then created. This was compared to the original English version by 2 native-speaking translators specialized in health sciences and rehabilitation. Where necessary, corrections were made to the retranslated version.
In the 4th stage, a panel of judges consisting of a neurologist, a psychologist, 2 neurological physiotherapists, 1 clinical neurophysiology and 1 orthopedic physiotherapist, and all the translators compared and discussed the differences between the translated and original versions of the ARAT. Based on the detected differences, the emerging Polish version of ARAT was corrected to obtain a satisfactory harmony between cultural language requirements and the original English instrument. Lastly, the linguistic consistency between the final Polish and original English versions was verified very carefully to ascertain the equivalence of concepts. The final version of ARAT-PL was therefore established (stages 5 and 6).
Study design, participants and initial evaluation
This was a cross-sectional study that lasted for 7 months. We recruited 60 stroke patients from the Bonifraterskie Medical Center hospital in Piaski, Poland. The inclusion criteria were: 1) a diagnosis of stroke, as indicated by computed tomography (CT) scans or magnetic resonance imaging (MRI), 2) hemiparesis, and 3) no additional orthopedic or neurological disabling deficits. The exclusion criteria were: 1) total hemiparesis in the upper extremity (i.e., score = 4 on the Modified Ashworth Scale), 2) serious visual and hearing disorders, 3) cognitive decline that limited administration of the tests, 4) disorders of speech and language, and 5) a native language other than Polish.
During the initial evaluation, we collected demographic data such as age, gender, weight, height, and upper limb lateralization. We also collected clinical data such as duration of illness, type of lesion, location of lesion, involved side, presence of comorbidities, and duration of rehabilitation in the hospital.
The study was approved by the Bioethical Committee of Poznan University of Medical Sciences (approval No. 187/19) and was carried out in accordance with the Declaration of Helsinki. Informed consent was obtained from all participants at the time of their enrolment in the study.
Procedure of assessment
The ARAT-PL and FMA were carried out by 2 experienced neurological physiotherapists trained in administration of each measure. Reproducibility, i.e., the degree to which the score is free from random error, was assessed with test–retest and inter-rater procedures.28 To determine inter-rater reliability, 2 raters independently examined patients at the same time in a quiet hospital room.29 Test–retest reliability was obtained by 1 observer examining the patients twice on the same day with a 2-h gap between assessments.29 The results were collected for the total and subscales of ARAT and FMA.
Outcome measures
Action Research Arm Test
This clinical scale is an evaluative measure to assess dexterity and object-handling ability. It was initially designed for individuals who sustained stroke resulting in hemiplegia. The original ARAT consists of 4 subtests: Grasp, Grip, Pinch, and Gross Movement. Every item within the subtest is assessed on a 4-point ordinal scale and arranged with the most difficult task 1st and the easiest 2nd.17
Fugl–Meyer assessment
The FMA is a recommended clinical assessment of sensorimotor function of the upper and lower extremities; it has mostly been used after stroke.30 The FMA has been translated into Polish but has not yet been cross-culturally adapted.23 The present study administered only the motor domain of the (as yet unpublished) Polish version of FMA for the upper extremity (FMA-UE) and lower extremity (FMA-LE).23 The maximum score for the total motor scale is 100 points (66 for FM-UE and 34 for FM-LE).18, 31, 32
Statistical analyses
The statistical analysis was made using a software package in Statistica v. 13 (Tibco Software Inc Polska, Cracow, Poland) and R studio program (the psych package, v. 2.4.3).33
Internal consistency
Internal consistency evaluates the homogeneity of the scale items.28 This study used Cronbach’s α to assess internal consistency for the subscales and total scale.34, 35
Reliability
The test–retest and inter-rater reliability of ARAT were determined using kappa coefficients, the ICC (ICC 2,k, absolute agreement, the command in RStudio: 1st line – choosing the psych package, 2nd line –library(psych) ICC(dane[,c(1,2)])$results[5,]), and percentage of agreement (PA).8, 36, 37 Item reliability was established when more than 80% agreement was observed.8 The minimal detectable change and standard error of measurement were calculated for all scale items according to the following Equation 1:
(1)
(2)
where ICC is the reliability of the test and SD is the standard deviation of all scores.
Validity
Construct validity was evaluated using hypotheses testing according to the guidelines of the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN).27A total of 10 independent hypotheses were formed. For each of them, we defined the anticipated Spearman’s rank correlation direction, correlation strength, and rationale; upon these, we based the hypothesis (Table 1, Table 2).38 We assessed the relationships of ARAT-PL scores with scores in FMA-UE (5 hypotheses) and FMA-LE (5 hypotheses) to determine the degree to which they were consistent with the formulated hypotheses. The construct validity rating for ARAT-PL was assessed according to the total number of confirmed hypotheses: 8–10 (≥75%) indicated high construct validity, while 5–7 (≥50%) indicated a moderate level.27The threshold values for the correlations determined in the present study (Table 1, Table 2) were based on those indicated by Prinsen et al.27
Spearman’s rank correlation is computationally identical to Pearson’s product-moment coefficient. Therefore, we computed the required sample size for Spearman’s correlation using the software G*Power (v. 3.1.9.2; Kiel University, Germany)39 for estimating sample size for Pearson’s correlation (bivariate normal model). We assumed correlation p H1: 0.70 (we expected moderate correlation), alpha of 0.05, a power of 0.95, and correlation p H0: 0.0 (we expected low correlation) for a 1-tailed test (we expected correlation between both measures to be positive). The calculation estimated that at least 44 participants were necessary; our study had a sample of 60.
Floor and ceiling effects
Floor and ceiling effects were determined as the proportion of answers scoring beyond the lower (floor) and upper (ceiling) boundaries of the total ARAT score (0–57 points). The cutoff points for these boundaries were established at 5%, so that scores under 3 were considered as the floor and those above 54 as the ceiling. Floor and ceiling effects were established if more than 20% of patients fell outside either the set lower or upper boundaries.21 The level of significance selected throughout was p < 0.05.
Results
Clinical characteristics of the patients
A total of 60 subjects in the subacute stage of stroke participated in the examination, of whom 63.3% had left hemiplegia and 36.7% right hemiplegia. Of the patients, 31.6% were women and 68.4% were men. The mean age was 64 years (range: 31–85 years). The median length of time since stroke was 47 days (range: 22–138 days). Most of the patients were right-handed (93.3%); only 6.7% were ambidextrous.
Translation and cultural adaptation
Multiple linguistic changes were required in the forward, backward, and final versions of the translation to obtain an ARAT-PL that was as consistent as possible with the original English version (Table 3).
Reliability
The ARAT-PL Grip, Grasp, Pinch, and Gross Movement Test subscale items exhibited almost perfect agreement: The calculated test–retest kappa values ranged from 0.95 to 1.00 (Table 4). The ICC coefficients for the subscales and total instrument score were in a range of 0.99–1.00, indicating excellent reliability (Table 4). The standard measurement error and minimal detectable change for the subscales ranged from 0 to 0.479 and 0 to 1.327 points, respectively; for the total ARAT-PL score calculated for the test–retest measurements (Table 5).
Inter-rater kappa values for the ARAT-PL subscale items ranged from 0.85 to 1.00 (Table 4); they exhibited almost perfect agreement. The ICC (2,k) coefficient values calculated for each subscale and the total score were above 0.99, showing excellent reliability (Table 5). The intra-observer standard measurement error and minimal detectable change calculated for the subscales and total ARAT-PL score ranged from 0.112 to 0.335 and 0.310 to 0.930, respectively (Table 5).
Internal consistency
The total ARAT-PL score exhibited excellent internal consistency with a Cronbach’s α value of 0.99. Similarly, the Cronbach’s α values for the grasp, grip, pinch, and gross movement items amounted to 0.99, 0.98, 0.97, and 0.99, respectively. The Cronbach’s α value for the FMA-UE was 0.93, also indicating excellent internal consistency.
Validity
Scatter plots of the correlations between the ARAT-PL and FMA scores are shown in Figure 1. There were high correlations between ARAT-PL and FMA-UE absolute scores, and between all ARAT-PL subscale scores and FMA-UE absolute scores (Table 1). There were moderate correlations between the ARAT-PL and FMA-LE absolute scores, and between all ARAT-PL subscales’ scores and FMA-LE absolute scores (Table 2). Results for the hypotheses testing correlations are shown in Table 1 for the associations with FMA-UE and Table 2 for the associations with FMA-LE. Based on the absolute scoring method, ARAT-PL has 5 out of 10 hypotheses confirmed (50%), indicating moderate construct validity (Table 1, Table 2).
Floor and ceiling effects
The Polish version of ARAT had a significant ceiling effect, spanning 48% of tested patients, but no floor effect (12% of patients). It has been demonstrated that both FMA-UE and FMA-LE have significant ceiling effects (50% and 30% of patients, respectively) but no floor effect (0% of patients).
Discussion
This is probably the first reported cross-cultural translation and adaptation based on rigorous methodology and strict regulation of this process. This study assessed the reliability and construct validity of a Polish version of ARAT. The hypotheses tested to evaluate construct validity showed that ARAT-PL had excellent reliability and moderate construct validity.
Therefore, this result provides an official, transculturally validated ARAT for wide and consistent clinical use across Poland, and for research across the world.
Reliability
The total scores and sub-scores of ARAT-PL showed excellent inter-rater and test–retest reliability. This agrees with the results of previous studies, which have reported ICC coefficients of 0.98 and 0.99 for inter-rater reliability13, 40 and test–retest reliability21 in poststroke hemiparetic patients. Moreover, the agreement for individual ARAT-PL items assessed with Cohen’s kappa coefficients was almost perfect, and the interobserver agreement measured via the percentage agreement was ≥90. The latter result is even higher than reported in another study, which found percentage agreement ≥70.20 Therefore, our study has shown that ARAT-PL has excellent reliability, comparable to the original scale.
Minimum detectable change and measurement error
The values for standard error of measurement and minimal detectible change were 0.34 and 0.93 for inter-rater, and 0.48 and 1.33 for test–retest measurements. Similar comparisons in past studies have shown higher values. One example produced standard error of measurement and minimal detectible change values for the test–retest assessment of post-stroke patients with ARAT of 1.3 and 3.5, respectively.40Another study reported minimal detectible change values of 13.1 and 3.5 for inter-rater and test–retest measurements performed with ARAT.13 The minimum detectable change captures the amount of change that must be observed in order to exceed measurement error, for assessments administered by the same or by different observers. The results suggest that ARAT-PL can produce very reliable data in subacute stroke patients, both across multiple sessions by the same experienced rater and for measurements performed by 2 different experienced raters.
Internal consistency
The Polish version of ARAT showed excellent internal consistency for both the total and subscale scores (α = 0.97–0.99). These results are consistent with previous studies, which have reported excellent internal consistency for the original ARAT (α ≥ 0.98,)39, 16 and for the Chinese version (α = 0.98)21 in subacute and chronic stroke patients. Our results show that the particular items of ARAT-PL have been well translated into Polish; this version is highly consistent with the original and other foreign adaptations.
Validity
This study found high correlation (r = 0.71–0.83) between the total and subtest scores of ARAT-PL and the total score of FMA-UE-PL in subacute post-stroke patients. These results agree with other studies, one of which indicated coefficients of 0.77 within 72 h of patient admission to the rehabilitation unit, and 0.87 in the 24 h before discharge.19 Another reported coefficients in the range of 0.71–0.74 for correlations between ARAT and FMA-UE in chronic patients with stroke.40, 41 However, a further study found slightly higher correlation coefficients of 0.91 after 2 weeks and 0.94 8 weeks after stroke onset18, 42 for the original ARAT and FMA-UE. Higher coefficient values of 0.90, 0.90, 0.82, and 0.92 have also been demonstrated for correlations between ARAT and FMA-UE performed 14, 30, 90, and 180 days after stroke, respectively.40 However, the latter study had a smaller sample. Lastly, Wei et al.43 found somewhat higher coefficient values of 0.93. However, they evaluated chronic stroke subjects before and after upper-extremity rehabilitation robotic training. It seems that the strength of interdependence between ARAT and FMA-UE may be affected by many different factors, including 1) the size of the study sample, 2) the time of the administration of outcome measures after stroke, 3) the type of rehabilitation therapy to which studied subjects are subjected, and 4) translation-related differences between versions of the same instrument. Both ARAT and FMA-UE evaluate the degree of impairment of the upper limbs in patients with stroke. However, ARAT assesses the functioning of upper extremities using observational methods, while the FMA measures motor impairment. Therefore, collectively, these studies show that the ARAT score may effectively assess not only function, but also indirectly some motor impairment of the upper extremity.
Compared to the FMA, ARAT has a smart scoring system. Subjects with both severe and minor upper limb dysfunction may get minimum or maximum scores, and then no more tests need to be administered for them to receive a score for that subtest. This shortens the total time of evaluation. The advantage of ARAT is that it can very precisely evaluate hand movements and indicate the specific functional problem of the extremity, even if the patient seems to be in generally good functional shape. Our results show that ARAT is an appropriate tool for assessing people with moderate-to-severe stroke.
Floor and ceiling effects
We did observe a significant ceiling effect of ARAT-PL. The studied patients were in a range of 22–138 days after recovery from stroke. It was perhaps possible for many patients who had had minor strokes and longer histories of recovery, and had reached high functional status, to gain the highest scores in the ARAT-PL. Therefore, it seems that ARAT is a less useful outcome measure for people who substantially recover from stroke. For example, in cases of mild stroke we did not observe difficulties with completing the specific tasks; the only exception was the ability to pinch a marble with the 3rd finger and thumb. Therefore, a relatively large number of patients with mild stroke achieved maximum points. This may suggest that the scoring system of ARAT is not well designed for people with mild upper limb dysfunctions. In parallel, we observed a significant ceiling effect for FMA-UE; 50% of patients had total scores ≥64. However, no floor effect was demonstrated. Hence, as with ARAT-PL, half the patients had near the maximal FMA-UE score. The FMA assesses some additional skills, such as movement coordination or reflex activity, and requires greater mobility skills than ARAT. Again, this shows that many of the studied patients had recovered well from stroke; for such patients, FMA-UE is not a challenging evaluation. The consistency between the results with ARAT and FMA-UE also suggests that recovery in movement coordination and muscle reflex activity is paralleled by upper extremity functional independence in subacute patients with stroke.17, 44
Limitations
The main limitations of the study were differences in rehabilitation protocol and in time of recovery after stroke; these might have affected the sample homogeneity. However, at the time of the study, we had limited access to a more homogenous group of stroke survivors. Future research with ARAT-PL and FMA-UE should separately analyze patients in the acute or chronic stage of stroke to improve the conditions of observational studies aimed at determining the interdependence of particular outcome measures. To show the construct validity of ARAT, we examined correlations with FMA-LE, finding a significant but lower correlation coefficient (0.59) as compared to the FMA-UE (0.83) for the total score relationship. This may falsely indicate that the level of upper extremity function was moderately related to the level of lower extremity function.
Conclusions
It can be concluded that ARAT-PL is a reliable and valid tool for assessing upper extremity function in subacute stroke survivors. Its only drawback is that it appears to have a ceiling effect, limiting the differentiation of patients with mild upper limb impairment after stroke. Despite this, our results support the clinical and research use of ARAT-PL in the Polish population of patients with stroke.
Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
Consent for publication
Not applicable.




