Introduction

Graves orbitopathy (GO), also known as thyroid eye disease, is a condition closely linked to hyperthyroidism, though it can also occur in euthyroid and hypothyroid patients [1, 2]. It is the most common extrathyroidal manifestation of autoimmune hyperthyroidism and affects ~25–50% of patients with Graves’ disease, with severe cases comprising 5–6% [3]. The incidence varies across populations, with women experiencing around 16 cases per 100,000 and men about 2.9 cases per 100,000 [4]. In Europe, the prevalence is noted at 10 cases per 10,000 individuals, with unilateral cases occurring at a rate of 0.50–1.50 per 10,000 [5]. GO progresses through distinct phases, as designated by the European Group on Graves’ Orbitopathy (EUGOGO) guidelines: the active inflammatory phase, marked by inflammation, pain, redness, and increased proptosis; the plateau phase, where disease activity stabilizes but some symptoms may persist; and the fibrotic or inactive phase, characterized by fibrosis and scarring of orbital tissues, leading to symptoms like diplopia and eyelid retraction [6].

Thyrotropin Receptor Antibodies (TRAb) play a significant role in the development of GO. These antibodies can either act on the thyrotropin receptor (TSHR) as a stimulator (TSAb) or a blocker (TBAb) [7, 8]. Activating these receptors leads to the release of thyroid hormones, contributing to hyperthyroidism and the proliferation of thyrocytes [9]. TSAb stimulates orbital and periorbital tissues, leading to tissue remodeling and inflammation associated with GO [8, 10]. In addition, TRAb and insulin-like growth factor-1 (IGF-1) influence GO development by targeting orbital fibroblasts, causing tissue expansion, optic nerve compression, and exophthalmos [1, 11]. The interaction between TRAb and IGF1 receptors in fibroblasts further suggests a complex cross-talk that maintains the autoimmune process in GO [12, 13]. IGF-1 also upregulates proinflammatory cytokines and hyaluronan production, maintaining the inflammatory environment in GO [14]. IL-6, another proinflammatory cytokine, plays a critical role in the inflammatory process by promoting tissue remodeling and accumulation of adipose tissue, which are hallmarks of the disease [15, 16].

Conventional treatment options for GO primarily involve high-dose intravenous glucocorticoids, recommended as the first-line treatment for moderate-to-severe and active GO according to the EUGOGO guidelines 2016 [5]. Similarly, in cases involving optic neuropathy or corneal ulceration, intravenous glucocorticoids remain the recommended first-line treatment [17]. Despite their effectiveness, with success rates ranging from 60% to 85%, up to 20% of patients may be unresponsive, another 10–20% may experience relapses following steroid withdrawal, and about 5% progress to dysthyroid optic neuropathy despite treatment [18, 19]. Surgical intervention becomes crucial when conservative treatments such as glucocorticoids or orbital radiotherapy prove ineffective. Surgeries like orbital decompression aim to alleviate optic nerve pressure, reduce proptosis, and improve visual function in severe cases [20]. Furthermore, procedures targeting extraocular muscles help correct strabismus and restore proper eye alignment and are generally recommended during the inactive phase to minimize complications and optimize outcomes [21].

Multiple new therapies have emerged to address cases that do not respond well to glucocorticoids and potentially minimize residual deformities of GO. Rituximab (RTX), a monoclonal antibody targeting the CD20 antigen on B cells, has been suggested as a therapeutic option for GO in cases resistant to corticosteroids or where steroid therapy is contraindicated [22, 23]. Recent guidelines now recommend RTX as a second-line treatment for GO [24]. Another promising treatment, tocilizumab (TCZ), an anti-IL-6 receptor therapy, has been shown to reduce clinical activity, TRAb titers, proptosis, eyelid retraction, and diplopia and improve bulbar motility in patients with GO [1]. Recent studies have affirmed the efficacy and safety of TCZ, especially in refractory cases [25, 26]. Conversely, teprotumumab (TPM), a human monoclonal antibody that inhibits the IGF-1 receptor, has demonstrated efficacy in reducing proptosis, diplopia, and clinical activity while improving the quality of life for patients with GO. Multiple studies have confirmed TPM as a valuable second-line treatment for active moderate-to-severe GO [27].

Novel monoclonal antibodies are promising in treating recalcitrant cases of GO; however, their use comes with some considerable adverse effects. For instance, TPM treatment has been associated with inflammatory bowel disease, hyperglycemia, and hearing loss [28]. Similarly, TCZ use was associated with reports of bronchospasm and anaphylactic shock [29]. The current literature does not include reviews comparing the efficacy and safety of these drugs, as mentioned earlier in GO, except for a small review comprising six studies [30]. In our current review, we seek to examine the efficacy of these agents in GO and safety concerns associated with their use. Moreover, we aim to comprehensively review potential use cases and multiple challenges that arise when using these agents.

Methods

The current review was conducted adhering to the tenants of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and the Cochrane Handbook for Systematic Reviews of Interventions [31, 32].

Eligibility criteria

Inclusion criteria

  • Articles with cases diagnosed with Graves’ Orbitopathy (GO).

  • Observational or interventional studies where the treatment is rituximab, tocilizumab, or teprotumumab.

  • Articles written in English.

  • Articles could be retrieved in full text.

Exclusion criteria

  • Abstracts, letters to editors, comments, opinions, case reports, and reviews.

  • Articles are written in languages other than English.

Information sources and search strategy

Record retrieval involved several steps. First, we conducted a broad search using generic terms across databases such as PubMed, Scopus, and Embase to identify relevant articles. Following this initial search, we developed a comprehensive strategy that included all relevant terms and their MeSH equivalents. In the subsequent stage, we refined our search by using these terms in PubMed, Scopus, Cochrane Library, and Web of Science. The specific search terms related to GO were as follows: (“Graves’ ophthalmopathy” OR “Graves ophthalmopathy” OR “Graves’ orbitopathy” OR “Graves orbitopathy” OR “thyroid eye” OR “thyroid eyes” OR “thyroid-associated ophthalmopathy” OR “thyroid ophthalmopathy” OR “Graves’ eye”) AND (rituximab OR tocilizumab OR teprotumumab OR Tepezza OR Actemra OR RoActemra OR Tofidence OR MabThera OR Riabni OR Rituxan OR Ruxience OR Truxima). Our final search strategy was conducted on July 2024, using the detailed query. Finally, we manually searched through the references and citations of the retrieved records to find additional relevant studies.

Selection process

Two independent reviewers screened the titles and abstracts of all available records, having removed identifying data such as author names and affiliations to minimize bias. They followed a checklist of eligibility criteria to guide their screening process. A third reviewer was on hand to resolve any conflicts. Following the initial screening, the same two reviewers assessed the full texts of the selected articles for eligibility, comparing their findings to resolve any remaining disagreements.

Data collection

Two independent reviewers extracted data from full-text articles into a shared Google spreadsheet, following a two-part process using a predetermined set of variables. In the first part, they gathered baseline information, including the author’s last name, study design, study arms, country of origin, follow-up duration, and outcomes. They also collected details such as the number of patients, their age, gender distribution, body mass index (BMI), severity of GO, treatment regimen, and response to previous steroid therapy. In the second part, for analysis, they extracted data on clinical activity score (CAS), exophthalmometric measurements, antibody levels, Gorman diplopia score (GDS), and intraocular pressure (IOP).

Outcome measures

All variables were expressed as Mean ± Standard deviation (SD). Change scores were calculated as the difference between the last follow-up and initial values at baseline. In studies where data were expressed in forms other than Mean ± SD, we used the available formula from the Cochrane Handbook to convert them back into means and standard deviations.

Clinical activity score (CAS)

CAS is an important tool in the management of GO; it is used to classify the disease into active (CAS ≥ 3) or inactive (CAS < 3) according to EUGOGO guidelines [33]. Some studies have used the original scoring system of 10 items, where baseline evaluation is conducted using the first seven items and later assessments use the last three items for follow-up [34]. Other studies adhered to a seven-item score throughout follow-up assessments. Thus, we analyzed each method separately for consistency, labeling the outcomes as either CAS-7 or CAS-10.

Proptosis

Proptosis measurements were obtained using Hertel’s exophthalmometer. Whenever studies provided separate measurements for both eyes, we averaged the readings, given the minor differences in both eyes, and to allow for combination with other studies that provided a single composite value.

Thyrotropin receptor antibodies (TRAb)

Since the TRAb assay does not differentiate between stimulating and blocking antibodies, we decided not to separate it from the thyrotropin receptor-stimulating antibody (TSAb) assay. Although TSAb is the modern and more accurate alternative, most studies adhered to the more conventional TRAb assay.

Gorman diplopia score (GDS)

GDS is a subjective grading system used to assess the severity of diplopia in patients with GO. This scoring system typically ranges from 0 to 3, with different levels indicating varying degrees of diplopia. For instance, a score of 0 represents no diplopia, 1 for intermittent diplopia, 2 for inconstant diplopia (e.g., gaze-evoked), and 3 for constant diplopia in the primary gaze or while reading [35]. Similar to proptosis, we combined the values for both eyes to make the outcome compatible with studies providing a single average value.

Intraocular pressure (IOP)

IOP monitoring can help early detection of compressive optic neuropathy and correlates well with the severity of the underlying GO, aiding in choosing appropriate lines of treatment [36].

Risk of bias

Two different reviewers assessed the quality of all full-text articles, and conflicts were resolved by a third reviewer. The reviewers evaluated the methodical quality of all cohort studies using the National Institutes of Health (NIH) Quality Assessment Tool for Observational Cohort, Case-Control, and Case Series Studies [37, 38]. Similarly, they used the Risk of Bias 2 (RoB2) provided by Cochrane to evaluate all Randomized control trials (RCTs) included in the review [39]. As for Non-randomized trials (NRCTs), the Risk of Bias in Non-randomized Studies- of Interventions (ROBINS-I) tool was used to assess the quality [40].

Statistical analysis

All analyses were carried out using RevMan v5.4.1 software to determine the overall change scores of CAS-7 and CAS-10, proptosis, antibody levels, IOP, and GDS. All outcomes were compared among RTX, TCZ, and TPM, where eligible. The analysis was conducted by pooling the estimates for each intervention and representing them as subgroups in RevMan software. This allows comparing results from all interventions in a single forest plot for each parameter. When more than two subgroups were present in an outcome, we checked for a significant subgroup difference, and if so, a pairwise analysis was undertaken to examine the differences. Combined means and corresponding 95% confidence intervals (CIs) were reported. Statistical significance was determined with a P-value of less than 0.05. Heterogeneity among included studies was assessed using the I2 statistic, and the random effects model was used to account for differences between populations and inclusion criteria. The results were displayed on a forest plot, providing information about individual studies and the heterogeneity of the effect measure. Funnel plots were generated and manually inspected to check for asymmetry representing publication bias.

Results

Study selection

After searching four databases, we collected 1152 records. Removing 635 duplicates left us with 917 unique records. We then screened the titles and abstracts, excluding 815 entries. We retrieved the full text of the remaining 102 records and evaluated them against our eligibility criteria. During this evaluation, we excluded three review articles, five non-English articles, 11 articles that were reviews, and nine articles that did not meet the specified population criteria. Ultimately, 77 articles were included in the review, with 58 providing sufficient data for analysis. The flow diagram for study selection is shown in Fig. 1.

Fig. 1: PRISMA Flow chart.
figure 1

PRISMA flow chart illustrating the selection process of studies, including the number of studies (n) identified, screened, and included in the study.

Characteristics of the included studies

This review comprised 77 studies: 39 cohort studies, 23 case series, 10 randomized clinical trials, three case-control studies, and 2 non-randomized trials [26, 41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116]. The total number of subjects was 6356, distributed as 434 subjects treated with RTX, 212 treated with TCZ, and 5710 cases treated with TMP. The cases treated with TPM are further divided into 1381 cases from primary studies and 4329 from postmarketing pharmacovigilance studies.

Thirty-eight articles investigated TPM [41,42,43, 48, 49, 52, 54,55,56, 63,64,65,66,67, 69,70,71, 73, 80,81,82, 90, 92, 94,95,96, 100,101,102,103,104,105,106,107, 110, 112, 114, 115]; the mean age was 53.94 years, and 72.15% of cases were female patients. The mean follow-up was 9.9 months (range: 2–20.4), and the prevalent treatment regimen prescribed was a single dose of 10 mg/kg of TPM followed by seven doses of 20 mg/kg, with an interim gap of three weeks between individual doses. The patients had active disease in 19, inactive in five, and mixed in 11 studies. Most studies did not explicitly state glucocorticoid resistance in their inclusion criteria.

As for RTX, 27 articles examined its use in GO [44, 46, 51, 57,58,59,60, 62, 68, 72, 74,75,76,77,78, 85,86,87,88,89, 91, 97, 98, 108, 109, 111, 113, 116]; the mean age was 51.12 years and females constituted 68.8% of cases. The follow-up period averaged 17.42 months (range: 3–67). Patients had active disease in 25 articles, inactive disease in one, and mixed activity in a single article. TCZ was studied in 14 studies [26, 44, 45, 47, 50, 53, 61, 75, 79, 83, 84, 93, 99, 103]; the patients were 68.8% females with a mean age of 51.89 years. The follow-up period was 14.48 months (range: 4–24), and the disease was reported as active in 13 articles and to have mixed activity in another. All articles exclusively included patients resistant to glucocorticoid treatment. A complete depiction of baseline characteristics is presented in Table 1.

Table 1 Summary and baseline characteristics of the included studies.

Risk of bias

The reviewers categorized cohort studies as “good,” “fair,” or “poor” based on NIH criteria. A “good” study has the lowest risk of bias and produces valid results, a “fair” study may have some bias but not enough to invalidate findings, and a “poor” rating indicates significant bias, leading to exclusion from the evidence body. Four of the 39 cohort studies were deemed to have fair quality, and 35 had good quality. There were 23 case series in the review: 19 with good and another four with fair quality. Using the RoB2 tool for RCTs, assessors labeled studies as having “low risk”, “some concern”, or “high risk” of bias. Five trials had a low risk, one had some concern, and four had a high risk of bias. As for NRCTs, there were two studies; one had a moderate, and another had a low risk of bias. Supplementary Tables 13 and Supplementary Figs. 1 and 2.

Results of syntheses

CAS-7

The seven-item clinical activity score was reported in 26 studies: 10 studies investigating TPM, 10 studies on RTX, and six studies evaluating TCZ. All three drugs were able to significantly reduce CAS-7 after completion of respective treatment regimens. TCZ reduced CAS-7 by 3.51 points (95%CI: −4.25, −2.78); TPM decreased it by a mean 3.1 points (95%CI: −3.71, -2.49); and RTX was able to diminish it by a mean 2.81 points (95%CI: −3.57, −2.06). Despite TCZ and TPM achieving greater reductions in CAS-7 than RTX, the differences between all three drugs were not statistically significant Fig. 2.

Fig. 2: Forest plot of CAS-7.
figure 2

Forest plot illustrating the mean reduction in CAS-7 achieved by the three drugs and the statistical difference between them.

CAS-10

The ten-item clinical activity score was reported in 19 studies, and one study compared both TCZ and RTX: 10 studies investigated RTX, six studies on TCZ, and three studies evaluated TCZ. All three monoclonals were able to significantly reduce CAS-10 after completion of the respective treatment course. TCZ reduced CAS-10 by 5.12 points (95% CI: −6.26, −3.98); TPM decreased it by a mean 4.08 points (95% CI: −5.77, −2.39); and RTX was able to diminish it by a mean 3.02 points (95% CI: −3.58, −2.46). TCZ achieved more reduction in CAS-10 than TPM; however, the difference was not statistically significant, and conversely, TPM and RTX were not significantly different. On the other hand, TCZ achieved significantly more reduction than RTX (P = 0.0006). Reductions achieved by TPM and RTX using CAS-7 did not differ significantly from those measured via CAS-10. Nevertheless, the reductions achieved by TCZ differed significantly between CAS-7 and CAS-10 (P = 0.0119). Assimilating all data from both CAS-7 and CAS-10, the findings match those using CAS-10 alone, with TCZ performing significantly better than RTX (P = 0.0025) Fig. 3.

Fig. 3: Forest plot of CAS-10.
figure 3

Forest plot illustrating the mean reduction in CAS-10 achieved by the three drugs and the statistical difference between them.

Proptosis

Proptosis was reported in a form suitable for analysis in 40 studies and one article comparing TCZ and RTX: 21 studies using TPM reported proptosis, as well as 11 RTX studies and eight TCZ studies included proptosis measurements. The three monoclonal treatments each resulted in a significant reduction of proptosis following their respective courses. TCZ led to a reduction of 1.99 mm (95% CI: −2.52, −1.45); TPM showed a decrease of 2.95 mm (95% CI: −3.21, −2.68); and RTX was associated with a reduction of 0.79 mm (95% CI: −1.20, −0.38). The differences between all three drugs were statistically significant: TCZ vs. RTX (P = 0.0010), TPM vs. TCZ (P = 0.0019), and TPM vs. RTX (P < 0.0001) Fig. 4.

Fig. 4: Forest plot of Proptosis.
figure 4

Forest plot illustrating the mean reduction in proptosis achieved by the three drugs and the statistical difference between them.

TRAb

The level of total antibodies was explored in 18 studies, and one study compared TCZ to RTX: 15 articles on RTX and 4 on TCZ. Both drugs reduced TRAb levels significantly; TCZ reduced them by 8.29 U/L (95% CI: −10.48, −6.09) and RTX by 5.22 U/L (95% CI: −6.83, −3.61), and TCZ produced a significantly greater reduction in TRAb levels than RTX (P = 0.03) Fig. 5.

Fig. 5: Forest plot of TRAb.
figure 5

Forest plot illustrating the mean reduction in TRAb achieved by the three drugs and the statistical difference between them.

GDS and IOP

TPM reduced Diplopia scores significantly (MD: −0.96 points; 95% CI: −1.50, −0.43). However, the RTX effect was not significant (P = 0.30). Moreover, all three drugs had comparable reductions in IOP Figs. 6 and 7.

Fig. 6: Forest plot of Gorman diplopia score.
figure 6

Forest plot illustrating the mean reduction in Gorman diplopia score achieved by the three drugs and the statistical difference between them.

Fig. 7: Forest plot of Intraocular pressure.
figure 7

Forest plot illustrating the mean reduction in Intraocular pressure achieved by the three drugs and the statistical difference between them.

Complications

An exhaustive analysis of complication severity according to the common terminology criteria of adverse events (CTCAE) is beyond the scope of this review; however, we provide a comprehensive summary of the adverse events that arose during the use and follow-up of each drug.

Rituximab (RTX)

Adverse effects associated with RTX use occurred mainly during infusions and were self-limited. Mild (Grade 1) events included throat itching, nose stuffiness, mild hypotension, and rhinorrhea [88,46], minor infusion-related rashes (4, [68]), headache (2, [78]), mild temperature elevation [89], mild urticaria [109], fatigue (1, [113]), and minor infections during follow-up (2, [59]). Moderate (Grade 2) adverse events included hypotension (4, [59]), nausea (4, [59, 89, 58]), chills (2, 59, 78), tachycardia (1, [59]), heart palpitations (1, [44]), hypertension exacerbation [74], and serum sickness-like reaction (2, [59]). Severe (Grade 3) reactions included Cytokine release syndrome with vision decline and chemosis controlled with hydrocortisone (7, [51, 85, 88, 113]) and arthritis exacerbation (1, [58]).

Tocilizumab (TCZ)

Adverse effects associated with TCZ were mainly hematologic, metabolic, and infection-related. Mild (Grade 1) events included minor neutropenia (1, [44]), mild neutropenia (3, [61]), mild liver enzyme increases (1, [44]), deranged lipid profiles (2, [61]), cutaneous rash (1, [44]), hypercholesterolemia (16, [45, 84]), pruritus/urticaria (3, [84]), weight increase (4, [45]), and severe but self-limiting back pain (1, [61]). Moderate (Grade 2) adverse events included neutropenia (7, [84, 79]), hypertransaminasemia (2, [84]), upper respiratory tract infection (1, [83]), respiratory tract infection (1, [79]), otitis (2, [26]), and asthenia (6, [84]). Severe (Grade 3) reactions included thrombocytopenia (2, [84]), elevated transaminases (2, [83]), cellulitis (3, [84]), autoimmune hepatitis (1, [83]), and delayed injection-site reactions (1, [79]). Life-threatening (Grade 4) events were observed in cases of acute pyelonephritis (1, [83]) and anaphylactic shock with bronchospasm (1, [84]). There was one Grade 5 (death) event reported due to metastatic cancer [79].

Teprotumumab (TPM)

The most prominent adverse events arising during TPM therapy were hyperglycemia and ototoxicity. The complications were graded according to CTCAE as follows: Mild (Grade 1) events included autophony and patulous eustachian tube [55], dry skin (5, [66]), nail changes (3, [66]), paresthesia (3, [96]), alopecia (3, [96]), hearing impairment (2, [56]), upper abdominal pain (1, [73]), and mild nausea (9, [66]). Moderate (Grade 2) adverse events included hypoacusis, hearing impairment (5, [55]; 5, [64]; 5, [73]; 5, [66]), ototoxicity (5, [73]), hyperglycemia (2, [55]; 5, [96]; 10, [64]), muscle spasms (8, [96]; 1, [73]), fatigue (10, [66]), diarrhea (6, [96]; 10, [66]), hypertension (2, [66]), menstrual abnormalities (9, [101]). Severe (Grade 3) reactions included muscle cramps (20, [64]; 31, [66]), new hearing symptoms (8, [73]), sensorineural hearing changes (5, [66]), and dysgeusia (7, [66]). Life-threatening (Grade 4) events were present, including intracerebral hemorrhage (1, [55]), serious pneumothorax and infusion reactions (2, [56]), serious diarrhea (1, [96]), inflammatory bowel disease (1, [96]), Escherichia sepsis (1, [96]), Hashimoto’s encephalopathy (1, [96]), and urinary retention (1, [96]).

Publication bias

Only CAS-7, CAS-10, and proptosis had enough studies to justify inspection for publication bias. Manual inspection of the funnel plots did not reveal asymmetry in any of the subgroups in the three eligible outcomes (Supplementary Figs. 35).

Discussion

Graves’ orbitopathy (GO) is the most common extrathyroidal manifestation of Graves’ disease. GO leads to a multitude of presentations that affect vision, appearance, and quality of life [3]. Around a fifth of active patients fail to respond to conventional glucocorticoid therapy [18, 19]. This review aims to analyse the efficacy of rituximab (RTX), tocilizumab (TCZ), and teprotumumab in improving the various clinical parameters related to GO. We compared the performance of all three drugs in reducing the clinical activity score (CAS), proptosis measurements, thyrotropin receptor antibodies (TRAb), intraocular pressure (IOP), and Gorman diplopia scale (GDS).

In our analysis of the clinical efficacy of TPM, TCZ, and RTX, all three drugs significantly reduced both the seven-item (CAS-7) and ten-item (CAS-10) in patients after treatment. TCZ achieved the most substantial reduction in CAS-7 and CAS-10, although the differences compared to TPM were not statistically significant. However, TCZ did perform significantly better than RTX in reducing CAS-10 (P = 0.0006), and this superiority was confirmed when data from both CAS-7 and CAS-10 were combined (P = 0.0025). Proptosis also decreased significantly with all three treatments, with TPM showing the greatest reduction, followed by TCZ, and RTX showing the least; the differences between all three were statistically significant. Moreover, TCZ also outperformed RTX in reducing TRAb levels (P = 0.03), and in terms of diplopia, TPM significantly reduced scores, while RTX did not have a significant effect. All three drugs demonstrated comparable efficacy in lowering intraocular pressure (IOP).

Contrasting our results with the previous meta-analysis conducted on monoclonal therapy for GO showed agreement in the main parameters of disease. Fatani et al. reported that monoclonal antibodies, including TPM and RTX, showed significant improvements in CAS, proptosis, and diplopia compared to glucocorticoids or placebo [30]. Similarly, our analysis found that all three drugs significantly reduced CAS and proptosis, with TCZ significantly outperforming TPM and RTX in regards to CAS, and both TCZ and TPM performing better than RTX in regards to proptosis. In contrast to Shen et al., who found a non-significant improvement in proptosis for RTX, our results showed a significant reduction in proptosis for all three drugs, however, TPM had the greatest impact in comparison with the other two agents [23]. Conversely, Li et al. reported a modest effect of TPM but not RTX on CAS and proptosis, our analysis showed that TCZ and TPM outperformed RTX significantly in these measures, nonetheless, RTX still had a smaller but significant effect, Thus, although RTX underperforms in its effect on CAS and proptosis in comparison to other agents, it is still an effective and considerable agent [117].

Categorizing the complications reported in the results of the current review showed that TPM exhibited the highest number of adverse events across several categories. TPM had 48 mild (Grade 1) events, 62 moderate (Grade 2) events, and 50 severe (Grade 3) events, along with 11 life-threatening (Grade 4) events. In contrast, TCZ reported the highest number of mild (Grade 1) events at 54 but had fewer moderate (31) and severe (12) events, with only 2 life-threatening events. RTX had the fewest total adverse events overall, with 34 mild, 22 moderate, and 14 severe events, and notably, no life-threatening (Grade 4) or fatal (Grade 5) events. TPM, therefore, stands out for its higher incidence of more severe and life-threatening complications, while RTX caused the least number of complications across all grades.

Notably, our results match the findings of the postmarketing pharmacovigilance study by Huang et al. [65]. The authors found that the most commonly reported adverse effects of TPM were hyperglycemia, muscle spasms, fatigue, and hearing disorders. The effect of TPM on blood glucose levels was investigated by Amarikwa et al. [43]. The authors noted that 40.9% were already on antihyperglycemic medications at baseline. Management strategies included adding new medications (36.4% of cases) or adjusting existing ones (31.8%), with insulin modifications in 18.2% of cases. At the last follow-up, ~50 weeks after the first infusion, 36.4% of patients with hyperglycemia had returned to their baseline glycaemic status, while 63.6% still had persistent hyperglycemia above baseline. These results show the high rate of complications during the use of TPM, some of which can sometimes arise due to pre-existing conditions; thorough investigations are required to identify baseline conditions that may increase the risk of developing complications.

The safety profile of TCZ was less severe than TPM; however, the reports by Moi et al. that three cases of cancer developed during the follow-up were quite concerning [79]. Multiple studies on rheumatoid arthritis patients treated with TCZ were available to counsel. Rubbert-Roth et al. found that the overall malignancy risk in tocilizumab-treated patients was comparable to that of rheumatoid arthritis (RA) patients treated with anti-TNF agents [118]. This finding was corroborated by Wadström et al., who reported no increased overall cancer risk with tocilizumab compared to other biologic disease-modifying antirheumatic drugs (bDMARDs) like abatacept and rituximab [119]. Furthermore, Cho et al. noted that postmarketing surveillance and large studies have not shown a significant increase in malignancy risk for tocilizumab-treated RA patients [120]. Collectively, these studies suggest that tocilizumab does not significantly elevate cancer risk, and the cases reported by Moi may have occurred sporadically. However, this does not preclude the need for more trials with longer follow-up durations to settle such concerns further.

RTX had the lowest incidence of complications in the current review, and the incidence could be further reduced by lowering the dose of RTX. Studies examining different dosing regimens of RTX reveal a complex balance between efficacy and adverse effects. Lower doses of RTX (100 mg) generally demonstrated fewer adverse effects, with Bennedjaï et al., Insull et al., and Karasek et al. reporting only minor issues such as heart palpitations and infusion-related rashes [44, 68, 72]. However, these lower doses were associated with higher treatment failure rates, as seen in [44] (5 out of 14 patients) and Karasek (3 out of 10 patients). Many of the failures were due to the development of optic neuropathy, which required urgent decompression surgery. Conversely, higher doses (2 × 1000 mg/2 week) showed improved efficacy but more frequent adverse effects. Salvi et al. reported 13/15 infusion reactions, including two cases of cytokine release syndrome, while Stan et al. noted 11/13 adverse events in the RTX group, with 5 considered moderate or severe [88, 97]. Careful selection of patients who are unlikely to proceed to optic neuropathy for low RTX regimens might improve the safety profile while maintaining efficacy.

This study was the first to incorporate all available evidence regarding the efficacy of monoclonal therapy for GO using RTX, TPM, and TCZ. In addition, we compared the safety profiles of each drug after categorizing the adverse events according to the Common Terminology of Adverse Events. Conversely, we highlighted some of the key problems that arise during treatment with different monoclonals and provided suggestions on how to address them. Nonetheless, the underlying evidence was heterogeneous in regard to thyroid status, disease activity, and resistance to glucocorticoid therapy. Only two double-armed studies directly compared more than one monoclonal. Furthermore, we could not assess whether complications arose due to the effect of monoclonal therapy or were the mere result of underlying conditions or faulty procedures. Future controlled trials might investigate the efficacy and safety of RTX, TCZ, and TPM to confirm the findings from our research.

Conclusion

In conclusion, this review provides a comprehensive analysis of the efficacy and safety of rituximab (RTX), tocilizumab (TCZ), and teprotumumab (TPM) in the treatment of Graves’ orbitopathy (GO). Our findings demonstrate that all three drugs effectively reduce clinical activity scores, proptosis, and thyrotropin receptor antibodies; however, TCZ and TPM generally outperform RTX in these parameters. RTX was associated with the fewest adverse events overall, especially when administered at lower doses. This came with the price of higher treatment failure rates, particularly in cases of optic neuropathy. TPM, while highly effective, exhibited the highest incidence of severe and life-threatening complications, necessitating careful patient selection and monitoring. The safety profile of TCZ was more favorable than that of TPM, but concerns about malignancy risk, although questionable and not definitively established, warrant further investigation. Future controlled trials are essential to validate these findings, further analyse adverse effects, and refine the therapeutic approaches for GO.