- Split View
-
Views
-
Cite
Cite
Gil Wiseglass, Rotem Rubinstein, Following the Evolutionary Paths of Dscam1 Proteins toward Highly Specific Homophilic Interactions, Molecular Biology and Evolution, Volume 41, Issue 7, July 2024, msae141, https://doi.org/10.1093/molbev/msae141
- Share Icon Share
Abstract
Many adhesion proteins, evolutionarily related through gene duplication, exhibit distinct and precise interaction preferences and affinities crucial for cell patterning. Yet, the evolutionary paths by which these proteins acquire new specificities and prevent cross-interactions within their family members remain unknown. To bridge this gap, this study focuses on Drosophila Down syndrome cell adhesion molecule-1 (Dscam1) proteins, which are cell adhesion proteins that have undergone extensive gene duplication. Dscam1 evolved under strong selective pressure to achieve strict homophilic recognition, essential for neuronal self-avoidance and patterning. Through a combination of phylogenetic analyses, ancestral sequence reconstruction, and cell aggregation assays, we studied the evolutionary trajectory of Dscam1 exon 4 across various insect lineages. We demonstrated that recent Dscam1 duplications in the mosquito lineage bind with strict homophilic specificities without any cross-interactions. We found that ancestral and intermediate Dscam1 isoforms maintained their homophilic binding capabilities, with some intermediate isoforms also engaging in promiscuous interactions with other paralogs. Our results highlight the robust selective pressure for homophilic specificity integral to the Dscam1 function within the process of neuronal self-avoidance. Importantly, our study suggests that the path to achieving such selective specificity does not introduce disruptive mutations that prevent self-binding but includes evolutionary intermediates that demonstrate promiscuous heterophilic interactions. Overall, these results offer insights into evolutionary strategies that underlie adhesion protein interaction specificities.
Introduction
Adhesion proteins play a central role in cellular organization and communication within multicellular organisms (Dalva et al. 2007; Makrilia et al. 2009; Lele and Hindges 2023). These are often members of large protein families that have expanded through gene duplication and subsequent evolutionary divergence (i.e. paralog proteins). Due to their ancestral link, paralog proteins generally have similar sequences and nearly identical structures with a tendency for intrafamilial interactions (Ispolatov et al. 2005; Lukatsky et al. 2007; Pereira-Leal et al. 2007). Yet, adhesion proteins within the same family often display distinct binding specificities and affinities. These differences are central to their roles in cell patterning and organization. Some such roles include neural tube formation mediated by N- and E-cadherins (Taneyhill and Schiffmacher 2017), as well as the function of nectins within the inner ear cell patterning (Togashi et al. 2011), and clustered protocadherin and Dscam1 proteins in dendritic arborization (Zipursky and Sanes 2010; Honig and Shapiro 2020).
New protein–protein interaction specificities can evolve through mutations that impose negative constraints and prevent cross-interactions among family members (Zarrinpar et al. 2003; Reinke et al. 2013; Peleg et al. 2014; Cheng et al. 2019; Honig and Shapiro 2020; Sergeeva et al. 2020). There are two contrasting models that outline the evolutionary pathways proteins undergo from being identical duplicated copies to becoming paralogs with distinct binding specificities. The first model states that over extensive periods of time, proteins lose their functionality due to mutations that cause incompatible interacting interfaces until additional mutations lead to new interacting interfaces and binding specificities (Ohno 1970; Siddiq et al. 2017; McClune and Laub 2020). The second model suggests a continuous evolutionary shift, with intermediate proteins exhibiting promiscuous, nonspecific interactions (Sayou et al. 2014; Aakre et al. 2015). While the evolution of specific protein–protein interactions has been extensively studied within the context of enzyme–substrate specificities (Aharoni et al. 2005; Weinreich et al. 2006; Khersonsky and Tawfik 2010) and receptor–ligand systems (Chockalingam et al. 2005; Ortlund et al. 2007; Eick et al. 2012; Koehbach et al. 2013), it remains underexplored from the perspective of cell adhesion proteins.
The Drosophila Dscam1 serves as an extraordinary example of a large paralogous protein family with highly precise cell surface adhesion interactions. The Dscam1 gene consists of 24 exons, 3 of which, exons 4, 6, and 9, have undergone extensive duplications (Fig. 1a). In alternative splicing, a single exon from each of the three clusters is stochastically selected, with the potential to encode an astounding array of 19,008 unique extracellular regions (Fig. 1b). Each extracellular region is characterized by distinct combinations of three alternative immunoglobulin (Ig) domains—Ig2, Ig3, and Ig7 encoded by exons 4, 6, and 9, respectively (Schmucker et al. 2000). The first four Ig domains (Ig1 to Ig4) form a horseshoe-like structure that positions the alternate Ig2 and Ig3 domains of two membrane-apposing Dscam1 proteins for homophilic interactions (Meijers et al. 2007; Sawaya et al. 2008). Dscam1 isoforms are also unique because of their strict homophilic binding specificity. Homodimerization occurs in a symmetric antiparallel fashion only when all three alternative Ig domains match up with each other (i.e. Ig2 binds to Ig2, Ig3 to Ig3, and Ig7 to Ig7; Fig. 1c; Wojtowicz et al. 2004, 2007; Sawaya et al. 2008). This is in contrast to many cell adhesion protein families that typically exhibit both homophilic and heterophilic binding between members of the same family (Katsamba et al. 2009; Vendome et al. 2014; Mosca 2015; Zinn and Özkan 2017; Brasch et al. 2018; Honig and Shapiro 2020; Sergeeva et al. 2020). The strict homophilic binding exhibited by Dscam1 enormous isoform repertoire is key for its ability to differentiate self from nonself cell–cell interactions. This ability is required for neural patterning within the developing Drosophila nervous system (Wang et al. 2002; Zhan et al. 2004; Zhu et al. 2006; Hughes et al. 2007; Matthews et al. 2007; Soba et al. 2007; Wojtowicz et al. 2007; Wu et al. 2012; Miura et al. 2013; Wilhelm et al. 2022).
Earlier investigations of Dscam1 evolution traced the extensive exon expansion to the last common ancestor (LCA) of the Pancrustacea (Lee et al. 2010; Armitage et al. 2012), which diverged into insects and crustaceans approximately 500 million years ago (Misof et al. 2014). These studies used a sample of representative genomes and identified a significant variability in the count of exon duplications among different species, indicating duplication events also occurred in subsequent evolutionary lineages (Lee et al. 2010; Armitage et al. 2012). However, these studies did not experimentally test whether newly duplicated Dscam1 isoforms, aside from Drosophila, bind strictly homophilically. The functionality of Dscam1 ancestral proteins also remains unknown, resulting in a knowledge gap in the evolution of adhesion specificity for this unique protein family.
Here, we extensively investigate the evolutionary expansion of Dscam1 exon 4 in insects. We predict the mutational pathways that connect ancestral to recent Dscam1 duplications. Our analysis reveals a clear pattern: as Dscam1 paralogous exons diverge from identical duplicates, some initially exhibit heterophilic interactions, which diminish with additional mutations, ultimately resulting in exclusive homophilic binding. We did not observe a “nonfunctional” ancestral isoform lacking homophilic binding. By demonstrating this evolutionary progression, we shed light on the key process of increasing the number of highly specific adhesion molecules, deepening our understanding of this critical aspect of molecular evolution.
Results
Phylogenetic Analysis of Dscam1 Exon 4 in Insects
In this study, we investigated the evolutionary trajectory of the Dscam1 gene, focusing specifically on exon 4, which encodes a segment of the Ig2 domain that is involved in homophilic dimerization. A tblastn search was performed against the RefSeq genome database (Johnson et al. 2008; NCBI Resource Coordinators 2016; O’Leary et al. 2016) for sequences with homology to Drosophila melanogaster exon 4.7 (Fig. 2a). This comprehensive search covered 83 insect species, resulting in the identification of 962 homologous sequences. A nonredundant set of these sequences, including a Dscam2 sequence from Chelicerata Ixodes scapularis as an outgroup, was aligned and used to construct a phylogenetic tree (for additional details, see Materials and Methods). The resulting tree revealed nine distinct clusters of exons, each representing orthologous sequences from various species (Fig. 2b). These orthologous sequences maintain high levels of sequence conservation, particularly in the Ig2 dimer interface, with an average similarity of over 90% within each ortholog cluster. Notably, we found that all clusters contain representative sequences from most species, indicating paralogous relationships between clusters (Fig. 2b). For example, the beetle exon 4 sequences can be found in all nine clusters. While previous work suggests that the LCA of insects possessed nine exon 4 paralogs (Lee et al. 2010; Armitage et al. 2012), our current findings provide strong support for this notion, mainly due to the increased availability of genomic data.
Next, using the 12 exon 4 paralogs of D. melanogaster (4.1 to 4.12) as a reference, we focused on more recent exon 4 duplication events occurring in specific insect lineages. These analyses uncovered the absence of Drosophila exons 4.1, 4.2, and 4.6 in most insect species. These exons are unique to the Diptera lineage and encompass flies and mosquitoes, which diverged from other insects approximately 260 million years ago (Wiegmann et al. 2011). Sequence homology and branch support values strongly indicate recent duplications and divergence of exons 4.1 and 4.2 from exon 4.3 (Fig. 2c), as well as the divergence of exon 4.6 from exon 4.7 (Fig. 2b). We also discovered more recent lineage-specific duplications, including duplications of exons 4.8 and 4.10 found in mosquitoes (Figs. 2c and 3) and duplications of exons 4.10 and 4.12 in the Lepidoptera (i.e. moths and butterflies; Fig. 2b).
Recent Duplications in Mosquito Dscam1 Exhibit Strict Homophilic Binding
To date, the highly specific homophilic dimerization of the Dscam1 protein has been observed exclusively between D. melanogaster isoforms. This study aimed to investigate whether the binding specificities of recent exon 4 duplications, not present in Drosophila, would also maintain homophilic specificity. We focused on mosquito exons 4.8 and 4.10, along with their respective duplications, referred to here as 4.81, 4.82, 4.83, 4.101, 4.102, and 4.103. To assess whether the newly duplicated isoforms evolved new homophilic binding specificities, we used site-directed mutagenesis to swap the interface residues of Drosophila Dscam1 to match those found in the mosquito isoforms (Fig. 3). We implemented this strategy based on a previous study showing that isoform specificity could be altered by the substitution of residues in positions 107 to 114 on the Ig2 domain dimer interface (Wojtowicz et al. 2007).
We assessed the binding preferences of the mosquito interfaces via cell aggregation assays using HEK293F-suspended cells. Cell aggregation is a well-established method used to determine the binding specificity of adhesion proteins (Matthews et al. 2007; Schreiner and Weiner 2010; Boucard et al. 2014; Thu et al. 2014; Rubinstein et al. 2015; Bisogni et al. 2018; Zhou et al. 2020; Hou et al. 2022; Cheng et al. 2023; Wiseglass et al. 2024). Each protein is tagged with either red or green fluorescent markers and transfected into separate cell populations. The two cell populations are then mixed and allowed to aggregate based on the binding specificities of the adhesion proteins they express. If the two proteins are strictly homophilic, the cells will form separate red or green aggregates. In contrast, if the two proteins are heterophilic, the cells will form mixed red and green aggregates. To quantify the extent of aggregate mixing or separation, we employed a customized Python script (Wiseglass et al. 2024), calculating the ratio of proximate red and green cells (see Materials and Methods). The resulting ratio is displayed in the corner of each image, with a value exceeding 0.1 indicating visibly mixed aggregates (Fig. 4).
Cell aggregation assay was performed in pairwise combinations of the yellow fever mosquito Aedes aegypti isoforms 4.81 to 4.83, 4.101 to 4.103, and the D. melanogaster isoforms 4.1 to 3. We observed that only cells expressing identical isoforms formed mixed aggregates, while all combinations of nonidentical isoform pairs resulted in separate aggregates (Fig. 4). These results demonstrate a strict homophilic binding preference for each isoform. Importantly, our findings indicate that the recent mosquito exons evolved to encode adhesion receptors with highly specific homophilic cell recognition. This suggests mosquito Dscam1 has a similar function to Drosophila Dscam1 in mediating the distinction between self and nonself in neurons.
Tracing Evolutionary Paths of Binary Specificities in Dscam1
The evolution of Dscam1 provides a unique opportunity to explore the challenges associated with diversifying homophilic interfaces. Following exon duplication, alterations in the dimer interface can ultimately lead to the establishment of a new homophilic specificity. However, during this evolutionary process, intermediate changes may result in a nonspecific binding or potential loss of binding altogether. To gain deeper insights into the evolutionary trajectory of Dscam1 and to predict intermediate isoforms, we performed an ancestral sequence reconstruction (ASR). Utilizing two ASR programs, PaML (Yang 2007; Xu and Yang 2013) and GRASP (Ross et al. 2022), we predicted the LCA proteins prior to the recent mosquito duplication of exons 4.8 and 4.10 and the duplication in Drosophila exon 4.3, referred to here as 4.8LCA, 4.10LCA, and 4.1-3LCA. The reconstruction of the three ancestor proteins achieved high confidence of 0.85, 0.91, and 0.84 (for 4.1-3LCA, 4.8LCA, and 4.10LCA, respectively) with predicted ancestral interface residues similar, but not identical, to current sequences (Fig. 5a; supplementary file S4, Supplementary Material online).
Resurrected Ancestral Proteins Bind Homophilically
We resurrected these ancestral interfaces by mutagenesis of extant interface residues and examined their ability to self-bind using the cell aggregation assay. We found that all three ancestors effectively mediated cell aggregation, demonstrating their ability to function as homophilic adhesion receptors (Fig. 5b). Next, we compared the binding preferences of these ancestor sequences with their extant descendants. We observed that cells expressing the ancestral 4.8LCA formed mixed aggregates exclusively with cells expressing the extant 4.81 isoform, but not with 4.82 and 4.83. Similarly, 4.10LCA was observed to interact solely with one of its current descendants, 4.101, and not with the remaining two isoforms (Fig. 5c). These results implicate that postduplication, the ancestral 4.8 and 4.10 exons evolved into their respective current isoforms (4.81 and 4.101, respectively), while the other duplicates diverged and developed new binding specificities. Interestingly, cells expressing the 4.1-3LCA ancestor recognized cells expressing either 4.3 or 4.1 extant isoforms, but not 4.2 expressing cells (Fig. 5c, left). These results indicate that these extant proteins diverged via a subfunctionalization mechanism (McClune and Laub 2020), by which the ancestor protein binds to a wider range of partners (in this case, two distinct isoforms) compared to its descendants (which here bind strictly homophilically).
Next, we examined whether divergence of the Ig2 domain interface occurred through intermediate proteins that maintain homophilic cell adhesion or whether mutations in intermediate proteins disrupt self-binding. We observed that in all three studied examples, the ancestor sequence differs from two extant exons by a single residue and by two residues from the third extant exon. For example, the 4.1-3LCA interface is composed of the following five residues: “EDNKY.” A single substitution from tyrosine at position 114 to histidine is sufficient to reach the interface of exon 4.1 (EDNKH). Similarly, a single substitution of the 4.1-3LCA interface at position 107 from glutamate to aspartate would generate an exon 4.3 interface (DDNKY; Fig. 5a). Editing the 4.1-3LCA interface to the interface of exon 4.2 (EDHKF) requires at least two mutations, N111H and Y114F, with two possible intermediate interfaces, EDHKY and EDNKF, depending on the mutation order. Using similar logic, we identified interface intermediates from the 4.10LCA and 4.8LCA to 4.103 and 4.83, respectively (Fig. 5a).
We then examined the self-aggregation abilities of cells expressing each intermediate isoform with the aim of testing the self-binding capabilities of intermediate states. Surprisingly, we observed that all intermediate proteins engaged in homophilic interactions despite having interfaces that appeared incompatible for such interactions (Fig. 5b). For example, within the homodimer interface of one 4.8 intermediate isoform, two negatively charged aspartate residues are positioned in proximity upon dimerization, where they could potentially lead to electrostatic repulsion (Fig. 6). These types of incompatibilities are thought to be central in preventing unwanted cross-talk between different Dscam1 isoforms (Fig. 4; Sawaya et al. 2008). To assess changes in the binding affinity (ΔΔGbind) of this intermediate, we used two computational methods, FoldX and SSIPe (Schymkowitz et al. 2005; Huang et al. 2020). Both methods predicted that the N112D mutation would significantly destabilize homophilic interactions, with ΔΔGbind exceeding 2 kcal/mol. These predictions indicate that negative constraints weaken the self-interaction of the intermediate isoforms. Yet, our cell aggregation results show these constraints do not entirely disrupt adhesive functionality.
Resurrected Ancestral Proteins Bind Heterophilically with Extant Isoforms
We then tested whether nonspecific binding could occur between ancestral, intermediate, and extant isoforms. We found that both intermediate proteins that may lead to exon 4.2 formed mixed aggregates with extant isoforms 4.1, 4.3, and their ancestor, demonstrating heterophilic binding specificities (Fig. 5c, left). Thus, both mutations leading to exon 4.2 are necessary to prevent cross-interactions with closely related paralogs, generating strict homophilic binding. One of the intermediate proteins leading to isoform 4.83 recognized both the ancestor and one paralog extant sequence (4.81). Finally, both intermediate proteins lead to 4.103 promiscuously bound to the ancestor, with one intermediate binding weakly to one of the extant paralogs (4.102; Fig. 5c, right). Overall, of the six intermediate proteins we tested, five exhibit promiscuous cross-interactions, demonstrating a gradual transition in specificity. Our experimental observations also explain the continued evolution of intermediate Dscam1 isoforms reconstructed here, as they generally exhibit nonspecific cross-interactions with other isoforms.
To address uncertainties in the ancestral protein reconstruction process, we focused on three cases where the reconstruction's posterior probability for a particular interface residue was below 0.85. In these instances, we generated an alternative ancestor by incorporating the second most likely residue. These alternative ancestors were then tested for their binding preferences. All alternative ancestors displayed homophilic binding, as evidenced by their ability to mediate cell aggregation (supplementary fig. S2, Supplementary Material online). One of the alternative ancestors (the LCA of 4.82 and 4.83) has the same dimer interface as extant isoform 4.82 and binds only to 4.82, thus differing from the binding preferences of the primary ancestor, which has a 4.81-like interface (supplementary fig. S2, Supplementary Material online, left). Nevertheless, the transition of this alternative ancestor to extant isoform 4.83 exhibits the same promiscuous binding preferences as the primary ancestor (supplementary fig. S2, Supplementary Material online, middle). The two additional alternative ancestors exhibited the same promiscuous binding as the primary ancestors (supplementary fig. S2, Supplementary Material online). These findings show that the alternative ancestors produced similar binding preferences to the primary ancestors, thereby further supporting our conclusions.
Discussion
This study examined the evolution of Dscam1 isoform binding specificities, with a focus on the exon 4 cluster. This exon encodes the second Ig domain (Ig2), one of three domains involved in Dscam1 homophilic binding. We utilized a phylogenetic analysis, ASR, and cell aggregation experiments and revealed several key insights: we identified relatively recent Dscam1 duplications that evolve strict homophilic binding in mosquitos. We also observed that Dscam1 proteins have maintained their fundamental functionality throughout their evolutionary trajectory, as both ancestral and evolutionary intermediate proteins mediate homophilic cell recognition. Finally, we discovered that in contrast to extant Dscam1 proteins that interact only homophilically, ancestral and intermediate proteins exhibit promiscuous interactions and are able to engage in both homophilic and heterophilic binding.
Conservation of Dscam1 Exon 4 Cluster
With the goal of tracking evolutionary trajectories of the Ig2 dimer interface, our initial step involved a comprehensive search for current exon 4 sequences. Similar to past findings, we observed that exon 4 duplications are relatively conserved (Graveley et al. 2004; Lee et al. 2010; Armitage et al. 2012), comprising nine invariant exons (Lee et al. 2010). In addition to the nine conserved exons, we identified other exon 4 duplications across various insect lineages (Fig. 2). These duplications expand the Dscam1 isoform repertoire by diversifying the Ig2 dimer interface sequences, albeit to a lesser extent than the significant expansions observed in exons 6 and 9. In all these instances, a single ortholog can consistently be identified through the conserved Ig2 dimer interface, while the other duplicated exons undergo nonsynonymous substitutions, consistent with Ohno’s (1970) evolutionary model.
The conservation of most exon 4 variants could possibly be attributed to an unknown functional role these exon variants might have. Alternatively, it is possible that this conservation could be attributed to inherent constraints imposed by the small Ig2 dimer interface, comprising of only five residues.
Strict Homophilic Binding in Dscam1 Isoforms
To our knowledge, studies into the binding preferences of insect Dscam1 had previously been confined to D. melanogaster isoforms. However, both the current and previous studies identified duplication events outside of the Drosophila lineage for which there yet to have been experimental investigations of their binding preferences (Graveley et al. 2004; Lee et al. 2010; Armitage et al. 2012). Our findings reveal that even relatively recent duplications, occurring subsequent to the divergence of mosquitoes from flies, have evolved into isoforms exhibiting strict homophilic binding preferences (Fig. 4b and c). Overall, these findings underscore the remarkable ability of Dscam1 proteins to generate self-binding domains via exon duplication and sequence divergence. These results also highlight the evolutionary pressure to generate isoforms that can accurately differentiate self from nonself-interactions, which is central in neuronal patterning.
A previous study using enzyme-linked immunosorbent assay (ELISA) showed that a small subset of exons, including exons 4.1 and 4.3, engage in both homophilic and significantly lower-affinity promiscuous heterophilic interactions (Wojtowicz et al. 2007), in contrast to our findings. Interestingly, the authors themselves have observed such discrepancies in results between ELISA and cell aggregation assays. This difference in findings likely stems from the fundamental methodological differences between the assays. ELISA assays are more quantitative and sensitive to slight variations in protein binding affinities. On the other hand, cell aggregation assays offer an advantage by facilitating binding interactions within the context of native cellular membranes, potentially providing a more physiologically relevant measure of adhesion specificity.
Ancient Dscam1 Proteins Bind Promiscuously
Understanding the emergence of new binding specificities is a central question in biochemistry and evolution. Our study investigates the evolutionary trajectories of insect Dscam1 exon 4 by focusing on recent duplications in flies and mosquitoes to reconstruct ancestral isoforms. When reconstructing the evolutionary paths from ancestral proteins to current isoforms, we found that as little as one or two residue changes were sufficient to alter the ancestral binding specificity. We reconstructed intermediate isoforms by examining the shortest mutational paths and specifically testing the impact of the two mutations that altered binding specificity. While the reconstruction of intermediate isoforms did not rely on statistical data, our conclusion that these intermediate isoforms retain self-binding capability is likely to be robust to this uncertainty. This is because both identified mutations are likely to be responsible for insolating the isoform from cross heterophilic interactions with its closely related paralogs. By introducing one mutation at a time, we test the maximal impact of these alterations on interface stability without allowing for additional bridging mutations. Therefore, although the actual evolutionary trajectory may have been more complex, our analysis of the most extreme mutations that could impact binding specificity on this path strongly supports the validity of our conclusion.
We discovered that across various evolutionary trajectories, Dscam1 isoforms consistently maintained their self-binding capabilities. In addition to self-binding, several evolutionary intermediate isoforms demonstrated promiscuous heterophilic interactions. These results were particularly surprising considering the necessity for Dscam1 paralogs to achieve highly precise, strict homophilic interaction specificities for their role in neuronal self-avoidance. Previous studies have shown that Dscam1 isoforms evolve interfaces compatible with only self-interactions, while interfaces generated by heterophilic interactions contain noncomplementary electrostatic charges and shapes, thereby preventing heterophilic interactions (Sawaya et al. 2008).
Given these insights, one might expect intermediate isoforms to possess a noncomplementary interface that would hinder homophilic binding rather than the interface that allows for cross-interactions (Fig. 6, middle). While an isoform with abolished homophilic binding would not mediate neuronal self-recognition and patterning, the potential fitness penalty for a dysfunctional isoform may be mitigated by the presence of up to 50 randomly expressed isoforms in each neuron (Neves et al. 2004; Zhan et al. 2004). Our study challenges these expectations by demonstrating that, in fact, intermediate Dscam1 isoforms retain self-binding through their evolutionary development.
A possible explanation for this unexpected observation lies in the high degree of conservation of the exon 4 cluster, suggesting a specialized functional role, potentially imposing an additional evolutionary constraint. It would therefore be interesting to study the evolution of the exon 6 or exon 9 clusters, which have been shown to be nonconserved and therefore could potentially exhibit functional disruptive mutations.
In summary, our study into the evolutionary history of Dscam1 exon 4 across various insect lineages provides insights into the mechanisms by which cell adhesion proteins evolve distinct binding specificities crucial for the regulation of complex cellular processes. Mutations leading to promiscuous binding have previously been documented in other protein systems evolved under selection against cross-talk, including the toxin–antitoxin systems, hormone receptors, and enzymes (Voordeckers et al. 2012; Aakre et al. 2015; Devamani et al. 2016; Siddiq et al. 2017; Lite et al. 2020; Ghose et al. 2023). Our findings suggest parallel phenomenon in adhesion proteins, where promiscuous intermediates constitute a likely evolutionary step toward achieving highly precise binding specificity.
Materials and Methods
Dscam1 Exon 4 Tree Construction
The translated sequence encoded by exon 4 of D. melanogaster Dscam1 transcript variant BE (isoform 7.9.30, NCBI annotation NM001043041) was used as the query for identifying exon 4 duplications in other insect genomes. The tblastn algorithm was used with the default parameters against the RefSeq Genome Database in the NCBI portal (Johnson et al. 2008; NCBI Resource Coordinators 2016; O’Leary et al. 2016). The search set was limited to Coleoptera (taxid:7041), Diptera (taxid:7147), Hymenoptera (taxid:7399), and Lepidoptera (taxid:7088). Each order was searched separately due to the data set's favorable bias toward Drosophila species sequences, which led to other species sequences to be omitted from the search results. Notably, the query sequence did not limit the results to any specific Dscam1 isoform, as the blast results showed a consistently high sequence identity for exon 4 across all isoforms. Additionally, a second tblastn search was conducted using a different D. melanogaster Dscam1 exon 4 isoform (isoform 1.10.3, NCBI annotation NP_001036494.1) revealing identical results.
The search resulted in 962 sequences from 83 species (see supplementary file S1, Supplementary Material online). From the 83 species, 37 are Drosophilidae species with highly similar (>95% identity) exon 4 duplications to D. melanogaster. To avoid overrepresentation biases, we included only the 12 D. melanogaster exon 4 duplications in further analyses (see supplementary file S1, Supplementary Material online, for sequence list). Five hundred eighteen high-confidence exon 4 sequences retrieved in this search were clustered by cd-hit-v4.8.1-2019-0228 (Li and Godzik 2006; Fu et al. 2012) using 95% threshold, to remove identical sequences. The 266 representative sequences were aligned with an outgroup sequence, a translated exon 4 of I. scapularis (deer tick) Dscam2 (NCBI accession XP_042144217). Ixodes scapularis was previously used as an outgroup organism for Dscam1 insect alignments (Armitage et al. 2012). Alignment was performed by MAFFT v7.490 plugin in Geneious Prime (Katoh et al. 2002; Katoh and Standley 2013), using the progressive method FFT-NS-2 algorithm, legacy gap penalty, and default settings (see supplementary file S2, Supplementary Material online). Phylogenetic tree was constructed using FastTree 2.1.11 plugin in Geneious Prime (Price et al. 2009) with default parameters and the Whelan and Goldman (WAG) 2001 amino acid substitution model with 20 rate categories. The FastTree local support values were computed using the Shimodaira–Hasegawa test. The FastTree tree was rooted using Geneious Prime branch rooting feature (see supplementary file S3, Supplementary Material online). Sequence similarity of the five Ig2 dimer interface positions (107, 109, 111, 112, and 114) was calculated using the BLOSUM62 matrix.
Ancestry Sequence Reconstruction
Based on the rooted FastTree, ancestral sequences were predicted using two independent ASR programs: PaMLX 1.3.1 CodeML (Yang 2007; Xu and Yang 2013) and GRASP 2020.05.05 (Ross et al. 2022). With PaMLX, default parameters were used and two alternative models were applied: (i) the Poisson model, which assumes equal rates for any amino acid substitutions with ncatG = 5, and (ii) the WAG model with ncatG = 20. In the GRASP analyses, the Jones–Taylor–Thornton (JTT) and WAG evolutionary models were used. Both models corroborated the predictions made by PaMLX, with all predicting the same ancestors at the focus of this study. For detailed accuracy estimations using both methods, refer to supplementary file S4, Supplementary Material online.
Cloning
The plasmid pCMVi-Dscam1_7.27.25-AP (Addgene 72062) encoding Ig1-9 and the first FNIII domain of Dscam1 was purchased. To construct pCMVi-Dscam1_2.27.25-AP, bases 375 to 681 from clone IP15321 (DGRC Stock 1602902; https://dgrc.bio.indiana.edu//stock/1602902 from the Drosophila Genomics Resource Center, NIH Grant 2P40OD010949) were incorporated into pCMVi-Dscam1_7.27.25-AP using Gibson assembly (NEBuilder HiFi DNA Assembly Cloning Kit E5520S). Lentivirus (pLV) transfer plasmid was modified using Gibson to include full-length Dscam1 isoform 7.27.25 using the first ten domains from pCMVi-Dscam1_7.27.25-AP and the remaining ectodomain, transmembrane, and cytoplasmic domains from clone RE54695 (DGRC Stock 9407; https://dgrc.bio.indiana.edu//stock/9407 from the Drosophila Genomics Resource Center, NIH Grant 2P40OD010949).
Mutations were introduced into the Ig2 dimer interface sequence of Dscam1_7.27.25, creating extant and ancestor isoforms 4.8 and 4.10 using the pCMVi construct. Similarly, mutations on Dscam1_2.27.25 generated extant and ancestor isoforms 4.1 to 3. This was performed using QuikChange Lightning Site-Directed Mutagenesis Kit 210518. Mutagenesis primers were designed using the QuikChange primer design program.
To express the mutants as membrane-attached proteins, positions 1 to 3011 of the mutant pCMVi constructs were amplified and cloned into plv-Dscam1_7.27.25(Δ1-3011)-GFP and plv-Dscam1_7.27.25(Δ1-3011)-mCherry amplicons using Gibson assembly.
Cell Aggregation Assay
FreeStyle 293-F cells (Thermo Fisher R79007) were separately transfected using PEI MAX—linear polyethylenimine hydrochloride (MW 40,000, 49553-93-7) as follows: 1 million cells per milliliter were grown in FreeStyle 293 Expression Medium at six-well nontreated plates (SPL #32006), 2 mL per well at 37 °C, 8% CO2, 135 rpm. A total of 1.25 µg plasmid DNA was mixed in Opti-MEM I-reduced serum media (Thermo Fisher 11058021) to a final volume of 62.5 µL. A total of 3.12 µL PEI was added with Opti-MEM I media to a final volume of 62.5 µL. Both mixtures were incubated at room temperature for 15 min. The PEI mix was then added to the DNA mix, immediately vortexed and incubated at room temperature for an additional 15 min, then added to the cells. Three hours later, 500 µL FreeStyle 293 Expression Medium was added to the cells. One milliliter of each reaction was mixed with 1 mL of the complementary reaction for testing binding preferences of desired isoforms. Forty-eight hours post transfection, eight images were acquired per well using an Eclipse Ts2 inverted microscope with a 10× objective. Three replications for each isoform combination were performed.
Aggregate Mixing Quantification
Figures 4 and 5 illustrate the aggregation or separation of cells, quantified using a custom Python script available at https://github.com/Rubinstein-Lab/Mixing-Score. This script evaluates the ratio of red and green cells in close proximity. Each image is captured in a red and a green channel, and these images are segmented into black and white using Otsu thresholding. Initially, the script divides these segmented images into squares of a specified size (24.32 µm2, equivalent to 1.5-2 HEK293F cells). Subsequently, the script counts the number of squares with white pixels in both the red and green channel images. It scans each square for white pixels, updating counters accordingly. The mixing index score is then calculated as the ratio of squares with white pixels in both images to the total number of squares parsed. This score offers insight into the degree of overlap or proximity between red and green cells. Each image receives a score between 0 (no red cells in the proximity of green cells) and 1 (all red cells are proximate to green cells). Images with a score higher than 0.1 typically display mixed aggregation, and images with a lower than 0.1 score show visibly separate red or green aggregates.
The images shown in Figs. 4 and 5 are cropped to depict representative aggregates. However, the quantification of mixing was carried out across the entire image area for all acquired images. The proportion displayed represents the average score obtained from three replications, with eight images for each replication. For further statistical details, refer to supplementary fig. S1, Supplementary Material online.
Computational Evaluation of Binding Affinities
Structural model of Dscam1 8LCA Ig1 to Ig4 dimers was prepared using AlphaFold2 (Jumper et al. 2021; Mirdita et al. 2022). The structure was refined using at least five rounds of the “RepairPDB” utility in FoldX (Schymkowitz et al. 2005). Mutations were generated using the “BuildModel” utility and analyzed using “AnalyseComplex” utility. Additionally, we used the SSIPe server with default parameters for calculating the difference in binding energy upon mutation (Huang et al. 2020).
Supplementary Material
Supplementary material is available at Molecular Biology and Evolution online.
Acknowledgments
We thank Prof. Dinorah Friedmann-Morvinski from the Department of Biochemistry and Molecular Biology at Tel Aviv University for providing the lentivirus (pLV) transfer plasmid. We thank Dr. Karin Smorodinsky-Atias for her assistance during the experimental stages of this work.
Author Contributions
G.W. and R.R. designed the research, analyzed the data, and wrote the paper. G.W. performed the research.
Funding
This work was supported by the Israel Science Foundation (1463/19 to R.R.).
Data Availability
The data underlying this article is available in the article and also in its online supplementary material.