Introduction

The microbial rhodopsin Archaerhodopsin-3 (Arch-3) from Halorubrum Sodomense1,2, has gained popularity as optogenetic fluorescent reporter for the visualization of neuronal action potentials3. However, Arch-3 has a 556 nm maximum absorption wavelength (\({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\)) that is far from the red light necessary for low phototoxicity and high tissue penetration4,5. For this reason, scientists have been looking for mutants with redder \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\)6,7,8,9, but their success has been limited. This and other bioengineering issues are the consequence of an incomplete understanding of how the protein environment tunes the \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) of an embedded chromophore and, in turn, the lack of a route to variants with red-shifted \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\).

\({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\)-tuning (also called color-tuning) of rhodopsins has been studied for a long time, and different factors have been found to play a role10. Here we focus on the electrostatic interaction between protein (opsin) amino acidic residues and all-trans retinal protonated Schiff base (rPSBAT) chromophore of microbial rhodopsins (see Fig. 1A). In the past, such an interaction has been investigated experimentally by using chromophore analogs11,12,13, site-directed mutagenesis14,15,16,17,18,19,20 and, more generally, via structural analysis21,22. Nakanishi and coworkers23 proposed an external point charge mechanism to explain the \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) changes in different sets of chemically modified rPSBATs. According to such a mechanism, frequently discussed in rhodopsin applications24, the “migration” of a localized negative counterion charge from a position close to the -C15=N- Schiff base moiety to a position close to the β-ionone ring induces a \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) increase23,25. More recently, to investigate electrostatic effects, Borhan, Geiger, and coworkers used the retinol-binding protein II (hCRBPII)25,26 to prepare a set of microbial rhodopsin mimics. The x-ray crystallographic analysis of mimics displaying \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) from 425 to 644 nm pointed, in contrast to the Nakanishi mechanism, to an external delocalized charge mechanism where the reddest \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) are produced by the “delocalization” of the counterion charge along the opsin cavity resulting in a more homogeneous electrostatic potential (ESP) acting on the rPSBAT chromophore. The same authors reported that such an ESP would induce a maximal delocalization of the chromophore positive charge along its conjugated backbone (i.e., cavity and chromophore charges would be negative replicas of each other)25. Indeed, a positive charge spread along a conjugated backbone with equal double-bond and single-bond lengths leads to minimal energy gap between the ground (S0) and spectroscopic (S1) states and, consequently, largest \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\); a behavior related to reaching the so-called, cyanine limit27.

Fig. 1: Structure and scope of Arch-3zero.
figure 1

A Schematic structure of a microbial rhodopsin with an all-trans chromophore and a counterion located near its Schiff base. B Left. Subsystems of Arch-3zero before counterion configuration and geometry optimizations. Center and right. Illustration of the negative charge optimization process. The blue (i.e., negative) color representing the D222 charge is transferred to other residues during the optimization and then affects geometrical relaxation. C ESP generated by shielding a central negative charge (charges and bond length parameters are from acetone, described at the HF/6-31G* level of theory). The initial ESP given in red is diffuse (top blue line) or localized (bottom orange line) after shortening and elongating the dipole bond length, respectively. D The effect seen in C is qualitatively replicated (green lines) by delocalizing of a single central negative charge spanning the same blue and orange line limits by changing the charge fraction (X).

Computational studies should, in principle, provide a complete understanding of color-tuning. This is urgent after the recent discovery of Neorhodopsin (NeoR), a microbial rhodopsin with a near-infrared absorption of 690 nm28 and a ~ 0.2 fluorescence quantum yield (FQY), the highest fluorescence documented so far in rhodopsins. However, most of the reported computations have focused on steric and electrostatic effects of specific residues in specific rhodopsins20,29,30,31,32,33,34,35,36, while the effect of the position and distribution of the rPSBAT counterion (from now on collectively called “counterion configuration”) has not been systematically investigated.

In this work, we show that it is possible to capture the precise relationship between counterion configuration and \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) values by taking Arch-3 as a “lab model”. To do so, we design and construct a quantum mechanics / molecular mechanics (QM/MM) model of Arch-3 called Arch-3zero capable to determine how “diffuse” a charge residing in the opsin cavity must be to achieve a desired \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\). Arch-3zero features a counterion charge free to migrate from residue to residue and delocalize on different residues, ultimately producing different charge configurations and, thus, \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) values. By using Arch-3zero, we demonstrate that the contrasting Nakanishi and Borhan-Geiger color-tuning mechanisms can equivalently cause a \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) red shift along a compact (MiDeCO) or extended (MiDeEX) charge migration-delocalization path, respectively. The validity of this conclusion is demonstrated by reproducing the observed proportionality between \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) and fluorescence quantum yields (FQYs) not only in a set of Arch-3 variants6 but, remarkably, in several bacterial6,37,38 and eukaryotic37,38,39 rhodopsins.

Results

Counterion configuration optimization using Arch-3zero

Arch-3zero is constructed starting from an Arch-3 conventional QM/MM model generated from the available dark-adapted crystallographic structure (PDB ID: 6GUX)1 using our automatic rhodopsin modeling protocol40,41,42 (see Supplementary Note 1 and Figs. S1S2 in the Supporting Information for details). This has a QM part corresponding to the rPSBAT chromophore treated at the 2-root state average CASSCF(12,12)/6-31 G* level of theory43. The rest of the protein, including the cavity residues, is included in the MM part described by the AMBER94 force field44. The QM and MM parts interact via electrostatic embedding45. The standard model is validated by computing, after CASPT2 correction, the \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) in terms of vertical excitation energy \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}0-{{{\rm{S}}}}1}^{{{{\rm{a}}}}}\) that is compared with the corresponding experimental value. \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}0-{{{\rm{S}}}}1}^{{{{\rm{a}}}}}\) is defined as the difference between the S1 and S0 energies (\({{{{\rm{E}}}}}_{{{{\rm{S}}}}1}\,{{\rm{and}}}\,{{{{\rm{E}}}}}_{{{{\rm{S}}}}0}\), respectively) at the Franck-Condon (FC) point (see Supplementary Note 1 in the Supporting Information). Note that, in order to best reproduce the Arch-3 absorption band, and thus the opsin electrostatics, our model has a D222 ionized (the counterion) and D95 neutral residue. Such a model has been previously shown to also reproduce trends of \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) values as well as fluorescence intensity for a set of Arch-3 variants46.

Arch-3zero models are “electrostatically perturbed QM/MM models where the charge of each cavity residue is allowed to become more negative while the selected D222 counterion becomes less negative to ensure a global zero charge. As illustrated in Fig. 1B, Arch-3zero is partitioned in 3 subsystems: (i) a geometrically flexible set formed by the rPSBAT QM atoms plus the MM atoms of the covalently linked lysine sidechain, (ii) a second geometrically flexible set including all MM atoms of the cavity residues bearing variable MM atomic point charges and (iii) a geometrically fixed set of the remaining MM atoms with fixed MM point charges. The key property of Arch-3zero is that the atomic charges of subsystem II can be optimized (see Supplementary Note 2, Tables S1S2 and Figs. S3S10 in the Supporting Information) to yield a delocalized counterion configuration producing a target \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\) value (* indicates the value of a target \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\)). The counterion configuration is numerically defined by vector q =  q1, q2, …, qN where N is the total number of residues in the opsin cavity and \({{{{\rm{q}}}}}_{{{{\rm{i}}}}}\) represents a fractional charge transferred from D222 to the ith cavity residue. Thus, in the unperturbed Arch-3zero model (see left part of Fig. 1B), \({{{\bf{q}}}}\) is a vector which has N-1 elements equal to 0 and the element corresponding to D222, equal to −1. The optimization47 (see right part of Fig. 1B) then yields a \({{{\bf{q}}}}\) vector with non-zero elements and a D222 charge between 0 and −1. The problem of finding the \({{{\bf{q}}}}\) yielding the desired \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},\,*}\) value (indicated as q*), is formulated in terms of a constrained optimization problem as follows:

$$\min \left({\left(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}({{{\bf{q}}}})-\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\right)}^{2}\right){{{\rm{with}}}}\; {{{\rm{constraints}}}}\,{{{\bf{q}}}}={\sum }_{{{{\rm{i}}}}}^{{{{\rm{N}}}}}{{{{\rm{q}}}}}_{{{{\rm{i}}}}}=\,-1\,\le {{{{\rm{q}}}}}_{1},{{{{\rm{q}}}}}_{2},\,\ldots,\,{{{{\rm{q}}}}}_{{{{\rm{N}}}}}\,\le \,0$$
(1)

Where, again, the charge of the ith residue is perturbed by a fractional charge \({{{{\rm{q}}}}}_{{{{\rm{i}}}}}\) equally spread on all its atoms (i.e., excluding the counterion, the original MM charges of the ith residue are updated by equally distributing the negative fractional charge \({{{{\rm{q}}}}}_{i}\) on all its atoms). Since no analytical CASSCF(12,12)/6-31G*/AMBER gradient with respect to point charges is available, the optimization of \({{{\bf{q}}}}\) is guided by numerical gradient computed using the following two-point formula:

$$\frac{{{\left(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}({{{\bf{q}}}})-\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\right)}^{2}-\left(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\left({{{\bf{q}}}}+\Delta {{{\bf{q}}}}\right)-\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\right)}^{2}}{{{{\boldsymbol{||}}}}\Delta {{{\bf{q}}}}{{{\boldsymbol{||}}}}}$$
(2)

where ||∆q|| is set to 0.001. In practice, the approach relies on a Python interface and exploiting the SciPy48 built-in trust-region algorithm (based on a conjugated gradient method) to minimize the quantity \({\left(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}({{{\bf{q}}}})-\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\right)}^{2}\) calculated at the CASSCF(12,12)/6-31G*/AMBER9445 level using the OpenMolcas/Tinker49,50,51,52 interface (a schematization is provided in Supplementary Note 2 of the Supporting Information). Note that the conjugated gradient method is a locally convergent method sensible to the initial (guess) \({{{\bf{q}}}}\). For this reason, for each \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\), different guess vectors have been used with both localized and delocalized counterion configurations. We found that, for each target, the initial \({{{\bf{q}}}}\) vector converged to either a compact (\({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\boldsymbol{*}}}}}\)) or an extended (\({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\boldsymbol{*}}}}}\)) delocalization of the guess, respectively. In other words, we find that two different “solutions” may exist for the charge optimization problem. It cannot be excluded that more solutions exist, but only one or two solutions have been found for the set of target values considered below. Note that in the present study, we assume that the migrating and delocalizing counterion charge does not change the cavity water molecule charges.

The Arch-3zero models allow to study the relationship \({{{\bf{q}}}}{{{\boldsymbol{*}}}}\to {{{\rm{ESP}}}}\to {{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\), in a realistic way since the perturbing charge resides on the residues defining the protein cavity/structure. This is very different from other models reported in the literature, where, for instance, regular grids of point charges are placed on smooth surfaces surrounding the chromophore53,54,55. Basic mathematical modeling (see Fig. 1C) shows that such relationship provides information useful for the design of effective color-tuning mutations based on changes in magnitude and/or direction in electric dipole of the replaced residue. In fact, in a model of the opsin cavity represented by a negative counterion charge shielded by two surrounding dipoles and a chromophore backbone placed below it, the ESP acting on the chromophore, and computed using the classic Coulomb expression, is effectively modulated by symmetrically elongating or shortening the dipole lengths. In Fig. 1D, a parallel model featuring a diffusive counterion charge (i.e., representing a change in its counterion configuration q) replicates, qualitatively, the ESP change induced by dipole variations, supporting the use of Arch3zero models to study the effect of residue replacements.

The ESP change generated by \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\) must also impact the Arch3zero geometry. To account for this factor, we optimize the geometrical parameters R of subsystems i and ii (ii includes the cavity residues) by minimizing \({{{{\rm{E}}}}}_{{{{\rm{S}}}}0}\left({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}},{{{\boldsymbol{R}}}}\right)\) where \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\) is now a constant parameter. Such minimization is carried out using the rational functional optimization (RFO) method available in OpenMolcas56. Presently, to save time, we limit the geometry optimization to a single step (in principle, this could be converged iteratively, alternating sequential \({{{\bf{q}}}}\) and R optimizations). We stress that, due to such geometrical relaxation, each optimal counterion configuration has distinct cavity and chromophore geometries. Below, the optimized counterion configuration \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\) and relaxed geometry \({{{{\boldsymbol{R}}}}}^{*}\) define an Arch3zero(\({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\)) model and its \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{OPT}}}},*}\) value. Note that \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{OPT}}}},*}\) is numerically close, but not identical, to the corresponding \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},\,*}\) (again see Supplementary Note 2 of the Supporting Information for details). While Arch3zero(\({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\)) and \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\) will be used interchangeably, it has to be stressed that in the case of two solutions (i.e., \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\boldsymbol{*}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\boldsymbol{*}}}}}\)) of the optimization problem, one gets two Arch3zero models for the same \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\).

Compact and extended color-tuning paths

We investigate the color-tuning mechanism by selecting eight \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},\,*}\) values from 73.9 kcal mol−1 to 56.2 kcal mol−1 (e.g., a range of ca. 18 kcal mol−1 with spacing of ~2.5 kcal mol−1) and, for each target, construct the Arch3zero models. Inspection of Arch-3zero(\({{{{\bf{q}}}}}^{{{{\boldsymbol{8}}}}}\)) shows that the more red-shifting target generates a single model featuring a pyramidalized (sp3) Schiff base nitrogen and not the expected planar (sp2) nitrogen. Such pyramidalization is explained in terms of an intense charge transfer pushing the π-electron density of rPSBAT away from its β-ionone ring. Since the pyramidalization is responsible for a non-linear trend (see Supplementary Note 2 in the Supporting Information), we limit our discussion to the models corresponding to \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},1}\) to \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},7}\) and yielding \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{OPT}}}},*}\) values from 72.7 to 52.7 kcal mol−1 (from ca. 400–550 nm).

In Fig. 2A we compare the optimized counterion configuration \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\), \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{2}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) solutions corresponding to the most blue-shifted and the most red-shifted cases. It is apparent that \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\), that is the only solution found (i.e., the same \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}{{{\boldsymbol{*}}}}}\) is obtained starting the optimization from localized or diffuse guesses) for the 73.9 kcal mol−1 target, displays a localized charge at the D222 position and small charge fractions in nearby residues, consistently with a limited charge delocalization. In contrast, the redder targets \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{2}\) to \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{7}\), display a gradually increased delocalization and migration towards the β-ionone ring. Note that for \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},2\,}\) and \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},7}\), two solutions are found; \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{2}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{2}\) for the first value and \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{7}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{7}\) for the second value. The \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{2}}}}}\) solution is only slightly more delocalized than \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\), while \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\) appears extensively delocalized with charge fractions residing in a set of residues surrounding the Schiff base moiety. In parallel, \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) is extensively delocalized with respect to the more compact \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) delocalization with charge fractions transferred to the β-ionone region.

Fig. 2: Optimized Arch3zero(q*) models and their effect on rPSBAT charge distribution.
figure 2

All source data (including geometrical and charge values for parts AC and electrostatic potential values for parts DF) are provided as a Source Data file. A Counterion configuration (blue circles, radii are proportional to negative charges) and rPSBAT S0 charge distribution (red circles, radii are proportional to positive charges) for \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\) values from 73.9 to 56.2 kcal/mol. The corresponding RVC positions are marked by arrows and red crosses. B Norm of \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\) along the MiDeCO and MiDeEX paths. Notice that for q1 and q5 we have found the solution corresponding to an extensively delocalized counterion configuration. C Position of RVC relative to the rPSBAT backbone. D Change in the ESP for Arch-3zero(q1), Arch-3zero(q2), and Arch-3zero(q7) projected on a 2D cross section defined by the pseudo-plane (see Supplementary Note 3 in the Supporting Information) hosting the rPSBAT backbone. The q7EX panel must be compared with the q7CO panel. E ESP maps for the variants Arch-3 and Arch-7 obtained with the conventional QM/MM models from ref. 46 F ESP map of NeoR generated using, again, a conventional QM/MM model (see refs. 58,59). Notice that the retinal chromophore of our NeoR model fully reproduces the observed RR spectrum (see Broser, M. et al. Experimental Assessment of the Electronic and Geometrical Structure of a Near-Infrared Absorbing and Highly Fluorescent Microbial Rhodopsin58.

Compact and extended delocalization can be quantitatively assigned by taking, as an index, the norm of \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\). In Fig. 2B we report such an index for the entire set of \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{OPT}}}},*}\) to show that, starting from \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\) two different “paths” to redder, \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) can be followed. The figure shows that, starting from \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\), the \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\boldsymbol{*}}}}}\) delocalization increases fast up to \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\boldsymbol{4}}}}}\) and more slowly up to \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) defining the MiDeCO path and somehow pointing to a mechanism dominated by a Nakanishi color-tuning. A trend is also seen from \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\) to \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{6}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) that is consistent with an extended delocalization defining the MiDeEX path, where progressively more delocalized counterion charges lead to redder and redder \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) values (i.e., to decreasing \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{OPT}}}},*}\) values) supporting the color-tuning mechanism hypothesized by Geiger, Borhan, and coworkers25.

A quantity that accompanies the \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\) delocalization progression is the position of the charge centroid considered as the “virtual counterion” (RVC). The RVC coordinates are determined by the following expression:

$${{{{\rm{R}}}}}_{{{{\rm{VC}}}}} \ast=\frac{{\sum}_{{{{\rm{i}}}}=1},\ldots,{{{{\rm{Nqr}}}}}_{{{{\rm{i}}}}}^{*}}{{\sum}_{{{{\rm{i}}}}=1},\ldots,{{{{\rm{Nq}}}}}_{{{{\rm{i}}}}}^{*}};{{{{\rm{qr}}}}}_{{{{\rm{i}}}}}^{*}={\sum}_{{{{\rm{j}}}}=1},\ldots,{{{{\rm{M}}}}{{{\rm{\chi }}}}}_{{{{\rm{i}}}}}^{*}{{{{\rm{r}}}}}_{{{{\rm{i}}}},{{{\rm{j}}}}}$$
(3)

where \({{{{\rm{\chi }}}}}_{{{{\rm{i}}}}}^{*}\) is the value of the component of the fractional charge \({{{{\rm{q}}}}}_{{{{\rm{i}}}}}\) on each of the j = 1…, M atoms of residue i, while \({{{{\bf{r}}}}}_{{{{\bf{i}}}},{{{\bf{j}}}}}\) is the position vector of atom j of residue i. Figure 2A shows that, despite their difference, \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{2}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\) produce RVC close to D222 while \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) produce RVC shifted towards the β-ionone. This behavior is detailed in the left panel of Fig. 2C, showing that along both MiDCO and MiDEX paths, RVC moves towards the β-ionone as expected in the Nakanishi color-tuning57. In other words, when moving towards increased \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) values, RVC always moves along the chromophore cavity towards the β-ionone ring.

The counterion configuration \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\) is paralleled by changes in the rPSBAT positive charge distribution calculated in terms of Mulliken charges. In fact, as shown in Fig. 2A, this is more localized near the Schiff base moiety in \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\) and more displaced along the backbone in \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\). Interestingly, no qualitative changes are observed when comparing the compact and extended solutions for the same \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{TARGET}}}},*}\) values. In Nakanishi's view, this is interpreted in terms of the similar RVC positions polarizing the π-electron density of rPSBAT.

The MiDCO and MiDEX paths are associated to changes in the ESP topography acting on rPSBAT and responsible for the change in its positive charge distribution. As expected, a comparison of the ESP projected by the protein on the rPSBAT pseudo-plane shows that \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) projects, relative to \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{2}}}}}\), a significantly more negative and more positive ESP on the β-ionone ring and Schiff base regions, respectively. This is documented by the change in the green and blue areas of Fig. 2D following the \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{2}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) path to lower \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{OPT}}}},*}\) values. Therefore, while \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) does not show a fully localized negative charge near the β-ionone ring as implied by the Nakanishi mechanism, it must push the π-electron density of rPSBAT towards its -C=N moiety. It is also shown that, while compact and extended counterion configurations may be found for a given target value, these produce close ESP topographies (\({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) ESPs are compared in Fig. 2D. For a comparison of the \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{2}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\) ESPs, see Supplementary Note 2 in the Supporting Information for details.

In the left diagram of Fig. 3A we show that the change in π-electron density (Δρabs) of rPSBAT upon a S0 → S1 transition is also a function of ESP and, therefore, of \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\). In fact, Arch-3zero(\({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\)) shows a Δρabs change consistent with a decreased positive charge (right diagram) of the -C=N moiety. This reflects a covalent (COV) character of S0 characterized by a positive charge mainly located on the -C=N moiety, and a charge-transfer (CT) character of S1 characterized by a positive charge spread along the chromophore backbone. In contrast (see Fig. 3B), the Δρabs of red-shifted \({{{{\rm{Arch}}}}-3}^{{{{\rm{Zero}}}}}({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}})\) and \({{{{\rm{Arch}}}}-3}^{{{{\rm{Zero}}}}}({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}})\) reflects a weaker change, consistently with S0 and S1 states featuring similar delocalized charge distributions. These electronic effects are accompanied by geometrical changes in bond-length alternation (BLA) of the S0 and S1 equilibrium structures (the parameter BLA is the difference between the rPSBAT average single-bond length and the average double-bond length). The BLA of \({{{{\rm{Arch}}}}-3}^{{{{\rm{Zero}}}}}({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}})\) and \({{{{\rm{Arch}}}}-3}^{{{{\rm{Zero}}}}}({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}})\) are the smallest and, thus, closer to the cyanine limit 27 (as also mentioned above, a conjugated linear system where, at equilibrium, the single and double bond lengths are equal and a positive charge is delocalized along the conjugated framework).

Fig. 3: rPSBAT change in electronic structure as a function of the counterion configuration.
figure 3

All source data (including geometrical and charge values) is provided as a Source Data file. A Top-left. Δρabs upon vertical S0 → S1 transition for q1 (blue and red indicate increase and decrease in electron (negative charge) density, respectively. Isovalues set at 0.002 a.u.). Bottom-left. Bond lengths and BLA (in Å) of the FC structure. Right. Positive (red circles) and negative (blue circles) rPSBAT charges in the S0 and S1 state characterizing the same vertical S0 → S1 transition (only values > 0.05 are displayed. The radius is proportional to the charge magnitude). B Same data for q7EX ≈ q7CO. Note that, on the bases of the positive charge distribution and bond lengths, the S0 electronic structure is interpreted as a mixture of CT and COV diabatic states indicated by the “CT+aCOV” symbol and determined by the coefficient a \(\lesssim 1.0\). Consistently, the S1 is also interpreted as a mixture indicated by the “aCT-COV” symbol reminiscent of an opposite linear combination.

3D representation

Figure 2A provides a 2D view of the cavity residues perturbed by the counterion configuration. However, since it displays the qi charges projected on the chromophore pseudo-plane, it cannot answer questions such as: (i) Are the mutations of residues located near the pseudo-plane less red-shifting than those of residues located above or below it? (ii) Is the charge diffusion described by the MiDeCO and MiDeEX paths running exclusively above or below the pseudo-plane or along both sides of it? (iii) What is the relationship between the cavity residues sequentially perturbed along the MiDeCO and MiDeEX paths? The answers to these questions require a 3D mapping of the Arch-3 cavity.

In Fig. 4A we define our 3D map. This corresponds, roughly, to an ellipsoidal shell running along the cavity residues. The rPSBAT pseudo-plane seen in the left part of the figure defines the ellipsoid “equator” and the North and South poles. The West and East axis and the M and P axis are orthogonal and parallel to the chromophore backbone, respectively. In the right part of the figure, we display the same moiety seen along the M to P axis from the P side. Note that the pseudo-plane and West and East directions are consistent with those of Fig. 2A. In Fig. 4B we display the 3D progression towards lower \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{OPT}}}},*}\) values along the MiDeCO and MiDeEX paths. In contrast with the 2D map, the map shows that during the \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\) process, the counterion charge flows along the North “hemisphere” without getting too far from the -C=N moiety. In contrast, the South hemisphere only displays the residual D222 charge (−0.2). However, during the following \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{4}}}}}\) process, the delocalized negative charge diffuses further and forms a ring orthogonal to the equator plane and parallel to the plane defined by the South-to-North and West-to-East directions. During the successive \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{4}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) diffusion, such a ring moves toward the β-ionone moiety and partially contracts (becomes more localized) while the original D222 counterion loses its charge completely. The MiDeCO path visits very different counterion configurations. Indeed, while the first qualitative changes are only seen during the \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{2}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{4}}}}}\) process corresponding to a charge migration along the North hemisphere, in the \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{4}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) process, the -C=N moiety retains a −0.3 counterion charge counterbalanced by four −0.1 fractional charges located in the β-ionone and both hemispheres.

Fig. 4: Chromophore cavity (ca. ellipsoidal) representation of the counterion configuration of Arch-3zero(q*) models.
figure 4

All source data (including geometrical and charge values for part B) are provided as a Source Data file. A Left. Structure of the 3D map. The atom positions of all cavity residues are shown in light gray, with the only exception of the atoms of D222 that are shown in blue, and the atoms of the chromophore that are shown in black. The chromophore pseudo-plane corresponds to the regular x (M to P) and y (East to West) plane and defines the ellipsoid equator. The North pole is defined by the z-axis that is orthogonal to the pseudo-plane. Right. Front view of the ellipsoid along the x-axis. B Change of the counterion configuration as a function of the \(\triangle {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},{{{\rm{OPT}}}},\,*}\) decrease along the MiDeCO and MiDeEX paths defined in Fig. 2B. The color scale on the right refers to the charge of single cavity residues (not single atoms) in each specific q*. The numerical values refer to the counterion residual charge (underline value), and the few residues with the largest charge variation (≤ −0.1) are also provided. The dominating counterion charge diffusion processes are marked with full red arrows. The dashed-line arrows mark the charge diffusion seen in the previous step for comparison.

Figure 4B also provides information on the Arch-3 residues that need to be mutated to effectively decrease \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}},\,*}\). Taking a Nakanishi view, such residues would be the one with a −0.1 fractional charge located along the M direction in the \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) North and South hemispheres and producing, together with the −0.3 residual counterion charge in the South hemisphere, a RVC (see crosses (left panel) and circles (right panel) in Fig. 2C) located close to the =C8-C9= moiety. Such RVC position is reminiscent of the unusual counterion position found in NeoR models58,59. As far as \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) is concerned, it is apparent that the original full counterion charge is diffuse consistently with the Borhan-Geiger mechanism.

Validation of the two red-shifting charge diffusion mechanisms and relationship with Arch-3 fluorescence

In order to document the significance of the MiDeCO and MiDeEX paths for protein engineering, we show that the observed relationships between \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) and other S1 properties are replicated by the Arch-3zero(q*) models of six Arch-3 variants (Arch-2, QuasAr-1, QuasAr-2, Archon-2, Arch-5, and Arch-7)56 and NeoR28,39. By “observed relationships” we refer to experimentally established proportionalities between \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) and Stokes shift (\({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}\)), emission wavelength (\(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{f}}}}}\)), and FQY6,8,52,58,59 that are replicated by conventional QM/MM of the same Arch-3 variants46,59. Such proportionalities are explained on the basis of the structure of the S1 potential energy surface (PES) defining \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\), \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}\), \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{f}}}}}\) and the quantity \({\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}\) associated to the FQY (see Fig. 5A). Such PES includes the S1 fluorescent state (FS), S1 twisted diradical intermediate (TIDIR) and the S1/S0 conical intersection (CoIn) driving the photoisomerization of the C13=C14 double bond of rPSBAT (see the curly red arrow). The figure shows that FC is connected to FS via a BLA change46. Similarly, TIDIR and CoIn differ in their BLA values (see Supplementary Note 4 and Figs. S11S12 in the Supporting Information). In contrast, FS is connected to TIDIR along the isomerization coordinate α defined by the C12-C13-C14-C15 dihedral angle. A change in \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) induces a change in the stability of FS, TIDIR, and CoIn and, consequently, in the related \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}\), \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{f}}}}}\) and \({\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}\) values46.

Fig. 5: Relationship between \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) and other excited state properties.
figure 5

All source data for parts B and C is provided as a Source Data file. A Schematic structure of the S1 potential energy surface driving the geometrical relaxation of Arch-3 variants and Arch-3zero,* models (e.g., Arch-3zero,* and Arch-3zero,*,S1 corresponds to FC and FS, respectively). As documented in ref. 46 and illustrated in the three-dimensional plots on the right, TIDIR resides (approximately) on the two-dimensional branching plane of CoIn dominated by BLA and α (see also Section S4). The dashed curves labeled COV and CT describe the changes in diabatic potential energies. The label “barrier” refers to the S1 energy barrier separating FS and TIDIR. B Relationship between \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) and the quantities \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}\) (left panel), \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{f}}}}}\) (center panel), \({\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}\,{{{\rm{and}}}}\,{\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}}\) (right panel). The green to purple color scale goes from \({{{{\bf{q}}}}}^{1}\) to \({{{{\bf{q}}}}}^{8}\). Full circles refer to quantity computed with the Arch-3zero,* models, open circles with the conventional QM/MM models of ref. 46. Reported correlation coefficients are related to Theil-Shen regression estimates. Note that the change in Stokes shift is dominated by the destabilization of FS. Such destabilization is found to be directly proportional to the decrease in vertical excitation energy at FC (i.e., to the red shift in \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\)). This behavior justifies the computed change in excitation energy of the emission process that is found to be inversely proportional to the excitation energy of the absorption process along the series, even if its change is limited (ca. 8 kcal/mole). C Left panel. Relationship between \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) and \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}\) for experimentally investigated rhodopsins including Arch-3 variants (open circles), NeoR variants (open triangles), and various microbial rhodopsins (open squares) compared with the corresponding relationship calculated using our Arch-3zero models. Right panel. Comparison between the relationship between measured \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) and log(FQY) for the same experimentally tested sets and the calculated relationship between \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\,{{{\rm{and}}}}\,{\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}\).

We now show that Arch-3zero(q*) calculations reproduce the behavior of conventional models, implying that the changes in counterion configuration capture the ESP changes induced by variations in the amino acid sequence. The geometrical relaxation of the obtained Arch-3zero(q*) models on S1 with fixed \({{{{\bf{q}}}}}^{{{{\bf{1}}}}}\,{{{\rm{to}}}}\,{{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\,{{{\rm{to}}}}\,{{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) charge configurations, yields the corresponding FS models (from now on Arch-3zero,FS(\({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\boldsymbol{*}}}}}\)) and Arch-3zero,FS(\({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\boldsymbol{*}}}}}\))) and the corresponding \({{{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{FS}}}}}\) and \({{{{\rm{E}}}}}_{{{{\rm{S}}}}0}^{{{{\rm{FS}}}}}\) energies. This allows to calculate the Stokes shift as \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}={{{{\rm{E}}}}}_{{{{\rm{S}}}}1}-{{{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{FS}}}}}\) (\({{{{\rm{E}}}}}_{{{{\rm{S}}}}1}\) is the S1 energy at the S0 energy minima, namely, at the FC point). In the left panel of Fig. 5B we show that the obtained \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}\) are proportional to the \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) values and that such proportionality is consistent with the one generated using the conventional QM/MM models of the produced lab mutants. In other words, the counterion configurations yielding the most red-shifted absorptions (e.g., \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{6},{{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{7}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{6},{{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{7}\)) display the smallest \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}\) value, similar to what is seen when comparing quantities computed with the Arch-5, QuasAr2, and NeoR models with those of Arch-3. The decrease in \({\Delta {{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{Stokes}}}}}\) is accompanied by a decrease in the geometrical difference between Arch-3zero,* and Arch-3zero,FS,*. Indeed, \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) yield the smallest BLA difference between FC and FS, with BLA being close to a zero value (see Supplementary Note 2 in the Supporting Information) again in line with the hypothesis that these counterion configurations generate models approaching the cyanine limit and, consistently with the plots in Fig. 3, similar S1 and S0 rPSBAT charge distributions. In the central panel of Fig. 5B we see that an inverse proportionality between \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{f}}}}}\) and \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) previously reported for, for instance, Arch-3, QuasAr2, and NeoR models, is also found when using the Arch-3zero,FS (\({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\boldsymbol{*}}}}}\)) and Arch-3zero,FS(\({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\boldsymbol{*}}}}}\)) models. However, this corresponds to a rather limited change in \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{f}}}}}\) as also observed experimentally46.

The change in \({{{{\bf{q}}}}}^{{{{\boldsymbol{*}}}}}\) also reproduces the proportionality between the \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) and \({\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}\) and\(\,{\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}}\) values documented using conventional QM/MM models and reflecting changes in experimentally observed FQYs46. In fact, in rhodopsins, the S1 lifetime and, in turn, FQY depends primarily on the chromophore S1 double-bond isomerization rate24,60. Thus, the faster is the propagation of the rPSBAT along α, the lower is FQY. When adopting the Hammond-Leffler postulate61,62, the S1 isomerization rate must be proportional to the stability of TIDIR or CoIn, calculated as \({\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}\,=\,{{{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{TIDIR}}}}}-{{{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{FS}}}}}\) and \({\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}}\,={{{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{CoIn}}}}}-{{{{\rm{E}}}}}_{{{{\rm{S}}}}1}^{{{{\rm{FS}}}}}\) respectively. In the right panel of Fig. 5B we show that the Arch-3zero,TIDIR(q*) and Arch-3zero,CoIn(q*) models, computed via S1 geometrical relaxation and CoIn optimization, respectively, yield \({\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}\) and \({\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}}\) values inversely proportional to \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}0-{{{\rm{S}}}}1}^{{{{\rm{a}}}}}\). In the same panel, we also show that such proportionalities parallel, again qualitatively, the ones obtained by employing conventional QM/MM modeling of the Arch-3 variants46 and NeoR28,39.

The proportionalities generated by the Arch-3zero,FS(q*), Arch-3zero,TIDIR(q*), and Arch-3zero,CoIn(q*) models indicate that the changes in \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) (i.e., \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\)) \({\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}{{{\rm{or}}}}{\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}}\) are functions of the counterion configuration and, therefore, must be interdependent. Recent studies confirm that the extremely red-shifted and highly fluorescent NeoR features a large \({\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}}\) value along its C13=C14 isomerization path56. In contrast to Arch-3, NeoR does not incorporate a conventional carboxylate counterion near the Schiff base moiety (e.g., like in \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\), but a “dipolar counterion” located in the middle of the rPSBAT backbone58,59).

The correlations displayed in Fig. 5B are limited to a set of models of Arch-3 variants spanning a limited \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) range. Thus, the “noise” in the detected correlation, due to the unavoidable “disorder” in the ESP generated by actual residue and in the approximations used to build the model, is not negligible (e.g., see the R2 = 0.32 in the right panel). We therefore looked at significantly larger rhodopsin sets. In Fig. 5C we show that the correlation is also valid when contrasting the behavior of the \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\)-\({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\)-\({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) models with that of three experimentally determined sets of data comprising 36 different rhodopsins (including variants) from 16 different organisms. More specifically, one set is an expanded set of Arch-3 variants (open circles) including the one investigated computationally (see above)46, a second more red-shifted set (open Delta) comprise different NeoR related variants39, and, finally, a third blue-shifted set includes various microbial rhodopsins (open squares)6,37,38. Taken together, the three sets include Archaea, Eucharia, and Eubacteria opsins. From inspection of the figure, it is apparent that the Arch-3zero models are capable to reproduce the observed trends. In the case of the Stock-shift (see the left panel), the experimentally observed and \({{{{\bf{q}}}}}^{{{{\bf{1}}}}}\) to \({{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) and \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\) to \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) theoretical slopes are very close. Most remarkably, the theoretical slope of \({\Delta {{{\rm{E}}}}}^{{{{\rm{TIDIR}}}}-{{{\rm{FS}}}}}\) with respect to \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) appears to have the same sign of log(FQY); a quantity that must be proportional to an Arrhenius-type excited state barrier.

The interdependences discussed above are explained by the distinct positive charge distributions of the CT and COV electronic configurations (see Fig. 5A) of rPSBAT that would cross “diabatically” along α46. When looking at the left scheme of Fig. 6A the localized counterion configuration stabilizing COV must also stabilize S1 near TIDIR/CoIn and S1 relative to S0 in the TIDIR/CoIn region, consistently with barrierless isomerization and absence of fluorescence emission. The opposite applies to a counterion configuration stabilizing CT (see right scheme). In this case, the partially or largely delocalized counterion stabilizes S1 in the FC and FS regions relative to the TIDIR/CoIn region, thus leading to a large S1 energy barrier and an increase in emission. A diffuse counterion charge (see central scheme) would instead stabilize both regions, thus generating an intermediate S1 energy profile.

Fig. 6: Mechanism explaining the relationship between stabilization of the FS region and destabilization of the TIDIR/CoIn region as a function of the counterion configuration.
figure 6

All source data for parts B and C are provided as a Source Data file. A Schematic illustration of the effect of the counterion migration-delocalization along the \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{4}}}}}\to {{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) path. B S1 isomerization energy profiles (S1 energies relative to FC) for the \({{{{\bf{q}}}}}^{{{{\boldsymbol{1}}}}}\to {{{\boldsymbol{\to }}}}\to {{{{\bf{q}}}}}_{{{{\bf{CO}}}}}^{{{{\bf{7}}}}}\) transition. C Same data computed for the \({{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{2}}}}}\to {{{\boldsymbol{\to }}}}\to {{{{\bf{q}}}}}_{{{{\bf{EX}}}}}^{{{{\bf{7}}}}}\) transition.

The validity of a mechanism associating a negligible isomerization energy barrier (\({\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}} < 0\)) with a counterion localized near the Schiff base region, an average barrier (\({\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}}\gtrsim 0\)) with a diffuse counterion, and a large barrier (\({\Delta {{{\rm{E}}}}}^{{{{\rm{CoIn}}}}-{{{\rm{FS}}}}}\gg 0\)) with a counterion semi-localized near the β-ionone ring is supported by isomerization path computations. The resulting isomerization energy profiles are shown in Fig. 6B, C for the MiDeCO and MiDeEX paths, respectively (see Supplementary Note 5 in the Supporting Information for details). As expected, for both mechanisms, q7CO and q7EX display a very large barrier and a TIDIR and CoIn regions at high energy. In contrast, q1 and q2CO and q2EX display negligible or small isomerization barriers. An intermediate behavior is seen for q4CO and q4EX. The high similarity of the computed energy profiles supports the conclusion that MiDeCO and MiDeEX produce close ESP changes, supporting the existence of contrasting color-tuning mechanisms that can be associated to the Nakanishi and Borhan-Geiger views.

Color-switching hotspots

In retinal protein studies, “color switches” have been defined as pairs of residues causing a spectral blue shift or red shift when they are swapped at a specific position along the protein sequence. One residue is usually the wild-type (WT) residue, while the other residue is a more polar or less polar residue with respect to the WT36. The discovery of color switches in Arch-3 variants would allow to build more efficient optogenetic tools with the desired \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) value. While presently, there is no general rule for locating potential color-switching residues, the Arch-3zero,* models could be used to track which amino acid would maximally affect the \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) after a change in its polarity. To test such concept, we focus on the q7CO counterion configuration featuring at least five residues with −0.1 perturbing charge and located in the northern and southern hemisphere area close to the β-ionone (see Fig. 4B). The elementary models of Fig. 2C, D suggest that the replacement of one of these residues with a residue of different polarity, would lead to a significant \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) change. Accordingly, we have generated conventional QM/MM models of few Arch-3 variants (see Supplementary Note 6, Tables S3S4 and Figs. S13S16 in the Supporting Information) where the M128 and P196 residues are renplaced with isosteric groups of increased polarity (Q128 and S196, respectively). In the hope to get a cooperative effect, we have also replaced the D222 counterion with E222 featuring a carboxylate group located at a larger distance from the positively charged Schiff-base moiety of the chromophore.

The results are reported in Fig. 7A in terms of \(\Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\) differences (\(\Delta \Delta {{{{\rm{E}}}}}_{{{{\rm{S}}}}1-{{{\rm{S}}}}0}^{{{{\rm{a}}}}}\)) relative to the WT Arch-3 model. The most redshifted variant is the triple mutant M128Q-P196S-D222E and double mutant M128Q-P196S (33 nm and 31 nm red-shifted with respect to the WT, respectively). Similar effects are seen when the different isosteric dipolar chain of protonated E replaces M128. Interestingly, a simple replacement of the amide Q with the shorter chain N in both the triple and double mutant leads to an inversion of the effect (29 nm blue-shifted with respect to WT). In Fig. 7B we show that, as expected, the red shift is due to an increase in negative ESP in the β-ionone region in the red-shifted mutant, while in the blue-shifted mutant, one observes an ESP decrease.

Fig. 7: Mutant models suggested by model q7CO.
figure 7

All source data, including the energies of part A, the geometries of parts C and D, and the electrostatic potentials of part B, are provided as a Source Data file. A Change in excitation energies with respect to WT Arch-3 for six different mutants. B Side-by-side comparison of the ESP of a red-shifted and a blue-shifted mutant (the electrostatic potential in the β-ionone region of WT Arch-3 is given in the inset. C 3D representation of the M128Q-P196S-D-222E red-shifted mutant. The dipole arrows and numerical values represent the dipole moment of the entire residue (not only that of the side chain). D Same for the M128Q-P196S-D-222E blue-shifted mutant.

In Fig. 7C, D, we provide an atomistic representation of the computed mutant cavities (see Supplementary Note 6 in the Supporting Information for the structures of the mutants in Fig. 7A and not discussed here). The red-shifting and blue-shifting effects are easily interpreted in terms of oriented dipoles (i.e., roughly, using Nakanishi’s interpretation). In fact, the Q128 and N128 side chains have opposite orientations and dipoles that project a negative and positive electrostatic potential in the β-ionone region, respectively. Interestingly, the same change in orientation is seen in the S196 dipole that, similar to Q128, is stabilizing S1 in the mutant of Fig. 7C and, similar to N128, is destabilizing it in the mutant of Fig. 7D. In conclusion, it appears that q7CO can be used for locating color-switching positions.

Discussion

In the past, the investigation of the key factors determining how to push the \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) of rhodopsins closer to the infrared has focused on interactions between specific amino acid residues and the chromophore. However, given the complexity of protein environments, the principles underlying such a modulation have been difficult to determine. Above we have presented the results of an approach based on Arch-3zero: a computational tool capable of connecting \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) changes with the migration and delocalization of a single negative charge. We have shown that these models lead to qualitatively different red-shifting charge diffusion paths that can be directly associated with Nakanishi’s and Geiger-Borhan’s color-tuning mechanisms. Both MiDeCO and MiDeEX paths reproduce and explain the experimentally observed proportionality between \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) fluorescence and in a set of Arch-3 variants.

The physics explaining the MiDeCO and MiDeEX paths becomes apparent when, for instance, one compares the EPS equatorial cross section of Arch-3zero,2 and Arch-3zero,7. As shown in Fig. 2D for the compact case (q1/q2CO and q7CO), the highly negative region “a” and slightly negative region “b” in Arch-3zero,2 are almost swapped in Arch-3zero,7, where the a’ region is now less negative than the b’ region. The same effect is seen when comparing the ESP cross-section of the conventional QM/MM model of Arch-3 (close to Arch-3zero,1) with, for instance, that of its red-shifted variant Arch-7 (compare the regions a, a’ and b, b’ in Fig. 2E). However, consistently with the data reported in Fig. 5B, the magnitude of the Arch-3 to Arch-7 change is significantly less since Arch-3zero,7 features an energy gap larger than the Arch-7 gap and closer to the gap of NeoR. For this reason, we report in Fig. 2F the ESP equatorial cross-section for NeoR. While such an ESP appears to be, in general, less negative with respect to Arch-3, it can be seen that, like Arch-7, the “a” region is less negative than the “b” region, pointing to a large destabilization of S0 with respect to S1.

From an engineering perspective, Arch-3zero can be used to construct a detailed 3D map of the charge configuration (and ESP) producing a specific \({{{{\rm{\lambda }}}}}_{\max }^{{{{\rm{a}}}}}\) value. The map indicates in which sub-area of the 3D ellipsoidal shell of Arch-3 one should increase the fractional negative charges (or decrease the fractional positive charges) to achieve a large red-shifting or blue-shifting effect. This aids the discovery of novel color-switches where the replacement with isosteric polar residues would “re-create” the perturbed ESP. However, while this represents a valid working direction, it must be realized that the electrostatics generated by even single residues depends critically on their side-chain orientation. It is therefore apparent that the results of the present research must be complemented by future studies expanding the scope of the Arch-3zero tool. A first possibility is to allow one positive and two negative fractional charges (equivalent to a single negative charge) to migrate and delocalize during the optimization process, leading to maps with both positive and negative perturbations indicating, more explicitly, the type of side-chain replacement necessary to re-create the predicted EPS. Another improvement would be that incorporate into the model “diffusing dipoles” that could also orient rather than simply migrate and delocalize, to better mimic the properties of amino acid side chains. These and other developments are currently pursued in our lab.

Methods

All calculations were performed at the CASSCF(12,12)/6-31G*/AMBER9445 level using the OpenMolcas/Tinker49,50,51,52 interface. The optimization of the charge distributions was carried out by a Python-based algorithm performing a constrained trust region minimization. The mutated QM/MM models were generated leveraging the established a-ARM protocol, and the excitation energies were computed at the CASPT2/SA3-CASSCF(12,12)/AMBER level. See the Supporting Information for details.