De novo establishment of wild-type song culture in the zebra finch
Associated Data
Abstract
What sort of culture would evolve in an island colony of naive founders? This question cannot be studied experimentally in humans. We performed the analogous experiment using socially learned birdsong. Culture is typically viewed as consisting of traits inherited epigenetically, via social learning. However, cultural diversity has species-typical constraints1, presumably of genetic origin. A celebrated, if contentious, example is whether a universal grammar constrains syntactic diversity in human languages2. Oscine songbirds exhibit song learning and provide biologically tractable models of culture: members of a species show individual variation in song3 and geographically separated groups have local song dialects 4,5. Different species exhibit distinct song cultures6,7, suggestive of genetic constraints8,9. Absent such constraints, innovations and copying errors should cause unbounded variation over multiple generations or geographical distance, contrary to observations9. We asked if wild-type song culture might emerge over multiple generations in an isolated colony founded by isolates, and if so, how this might happen and what type of social environment is required10. Zebra finch isolates, unexposed to singing males during development, produce song with characteristics that differ from the wild-type song found in laboratory11 or natural colonies. In tutoring lineages starting from isolate founders, we quantified alterations in song across tutoring generations in two social environments: tutor-pupil pairs in sound-isolated chambers and an isolated semi-natural colony. In both settings, juveniles imitated the isolate tutors, but changed certain characteristics of the songs. These alterations accumulated over learning generations. Consequently, songs evolved toward the wild-type in 3–4 generations. Thus, species-typical song culture can appear de novo. Our study has parallels with language change and evolution12,13. In analogy to models in quantitative genetics14,15, we model song culture as a multi-generational phenotype, partly encoded genetically in an isolate founding population, influenced by environmental variables, and taking multiple generations to emerge.
Young male zebra finches develop individually distinct song by imitating adult males16. The adult wild-type (WT) song includes stereotyped syllables repeated in fixed order (song motifs, Fig. 1a) in both wild and domesticated zebra finch colonies. Birds deprived of song during vocal development, develop a less structured isolate (ISO) song with more noisy, broadband notes and high pitch upsweeps11 (Fig. 1b). ISO syllables are often prolonged, monotonic or stuttered, and the songs appear to have an irregular rhythm. Despite these anomalies, young zebra finches readily imitate songs of adult isolates17 even in the presence of WT adults11.
We quantified the differences between WT and ISO songs over three time-scales. At the 10 ms time-scale, we used spectral frame features (e.g., frequency modulation; Supplementary 4a). Over the 10–100 ms time-scale, we used the correlation time of the spectral shape, termed Duration of Acoustic State (DAS, Supplementary 4b). At even longer (200–1000 ms) time-scales, we used measures of song rhythm (Supplementary 4d)18. Feature probability distributions across birds differed between ISO and WT (Fig. 1c–e). ISO songs had lower frequency modulation, longer durations of acoustic state, and less structured rhythms.
These distributions provide a high-dimensional song phenotype for each bird. We reduced the dimensionality by applying Principal Component Analysis (PCA) to the collection of feature distributions of all birds (WT & ISO), and retained the first two principal components (PCs) to obtain two-dimensional song phenotype values (Supplementary 4e). PCs at all three time-scales show separable clusters for ISO and WT songs along a continuum (Fig 2a–c). The mean values of the first PC were significantly different between ISO and WT at all time-scales of song structure (p<0.001, t-tests, nWT=52 birds, niso=17 birds, FDR adjusted, Supplementary 5). We found that these differences are largely an outcome of tutoring deprivation and not of social isolation (Supplementary 3f).
To examine the imitation of isolate songs, we trained 13 juvenile birds (pupils) by isolate tutors one-to-one in a sound-isolated chamber. This allowed us to control genetic relatedness, and to minimize social effects, e.g., to eliminate feedback from female listeners. Four isolate tutors, with songs stable over the course of tutoring, were used 2–4 times to train unrelated pupils. We projected the feature distributions of the pupils on the PCs derived earlier from the WT/ISO data (Fig. 2a–c), and displayed vectors connecting each ISO tutor to his pupils (Fig. 2d–f). As shown, most of these vectors point in the direction of the WT cluster, indicating a shift toward WT features in pupils of ISO tutors. The mean values of the first PC for the first generation pupils differed significantly from both ISO and WT means for the spectral-frame features and for DAS (p=0.018-0.001, n=13), but not for rhythm. Feature distributions of most individual pupil songs were closer to WT songs than were their tutor’s songs (12/13 at at least one time-scale, 10/13 at all time-scales, FDR significance=0.01, binomial test, n=52, supplementary 5d).
Although pupils typically imitated all of the tutor syllables20 and did not invent new syllables (Supplementary 2), pupil songs deviated consistently from tutor songs. Fig. 2g presents an example where a long ISO syllable (red bar, mean duration=367ms, s.d.=29ms) was copied by a pupil, but was shortened by about 30% (mean=243ms, s.d.=7.6ms). Across all the syllables and all pupils, the durations of pupil syllables accurately matched those of the corresponding ISO tutor syllables for syllables shorter than 230ms (Fig. 2h, r, 2=0.98, slope=0.97, n=20 syllables). Copies of longer ISO syllables, however, were shorter than the originals (r2=0.84, slope=0.56, n=11 syllables). Across birds, the ratio between the longest and shortest syllable within a bout was significantly smaller in pupils compared to their ISO tutors (p<0.01 n=13, Wilcoxon sign test, Supplementary 4c). Overall, the range where durations of ISO syllables were accurately copied is similar to the range of WT syllable durations (25–75 percentile range = 67–180ms, n=52 WT birds). In addition, pupils only copied the abundance (relative frequency) of syllables when it was within the WT range (up to about 30%). In cases where one syllable dominated the ISO song (Fig 2i), pupils decreased its abundance to 20–30% (Supplementary Fig. 5), thereby creating more structured song motifs.
Imitation of spectral features, as judged by the first PC of the feature distribution, was also biased: linear regression analysis of pupil versus tutor yielded a nonzero intercept and a slope slightly less than one (Fig 2j). The equality line, corresponding to faithful copying (pupil=tutor, dashed blue line), was rejected in favor of the alternative hypothesis represented by the linear fit shown in red (P<0.001, likelihood ratio test, n=13). Note that imitation that was inaccurate but unbiased would have only increased the spread around the equality line.
Because the songs of ISO-tutored birds differed significantly from both their respective ISO tutors and WT, we examined whether recursive tutoring would cause further progression toward WT over multiple generations. We used four of the first-generation pupils as tutors of a second generation of unrelated pupils, and continued recursively over 2–5 generations (Fig. 3a). Similarity to WT songs increased over 3–4 generations, as can be appreciated from the audio in Supplementary 1 and the three examples of multiple generations of recursive tutoring in Fig. 3b. In the first example, both ISO syllables become shorter in the songs of the first and second generation pupils (blue and red rectangles), but the second syllable is also differentiated into three distinct notes. The middle panel shows spectral and temporal differentiation of syllables, and omission by the 3rd generation pupil. In the right lineage, the duration of the final syllable (red rectangle) decreased over two generations and then stabilized. The spectral structure, however, continued to change in the 3rd and 4th generations.
To judge if the imitation of ISO songs progressed toward WT song over multiple generations, we displayed vectors in the PC space (as in Fig. 2d–f) with each tutoring lineage labeled by a different color (Fig. 3c–e). As shown, the multi-generational trajectories penetrate more deeply into the WT cluster (purple shading). Direct comparisons across first and later generation pupils reach significance only for DAS (p=0.02), but multi-generational comparisons suggest further progression toward WT for all song traits. For spectral frame features, we found that the first principal component of song features changes monotonically toward WT over generations. Its mean values for ISO, first generation, later generations, and WT songs were 1.3, 0.3, 0.03, −0.4 respectively. First PC values for later generation songs were significantly different from ISO song (p<0.005, t-test, n=8 for later generations) but not from WT songs (p=0.17). For DAS, first PC values also decreased monotonically with generations: 1.1, 0.3, 0.02, −0.3. Higher generation songs were significantly different (p<0.01) from both WT and ISO, suggesting that WT approximation was not complete. For rhythm, first PC values also decreased monotonically with generations: 4.1, 2.2, 1.4, −2, and differences from WT and ISO were marginally significant (p=0.02, 0.056 respectively).
Although the one-to-one training provided a well defined learning environment, the multi-generational changes that would occur in a complex social setting may be more representative of natural evolutionary processes. Therefore, we established a semi-natural island colony (Supplementary 3d) starting with one of our isolate tutors and three unrelated females in a large sound chamber (Supplementary Fig. 1).
In this social situation, too, the isolate colony approached the WT cluster over a few generations (Fig. 4). To judge the transition toward WT clusters, we examined PC projections with the isolate tutor song marked as a red dot. Comparing the trajectory shown in Fig. 4e to that of Fig. 3b, right panel (originating from the same tutor), we see that the outcome in the colony is similar to that observed in one-to-one tutoring. Even though the outcome of the colony experiment can only be judged qualitatively, we find it remarkable that despite intense social interactions, female presence and mating competition, there were only mild differences between birds in the two conditions. In the colony, juveniles also imitated sibling syllables and female long calls, leading to more complex songs (Supplementary 1c). In contrast to one-to-one tutoring, the best progress toward WT song occurred in rhythm, perhaps because birds incorporated additional syllable types into their song motifs.
Our findings resemble the well-known case of deaf children in Managua, Nicaragua, spontaneously developing sign language21, as well as linguistic phenomena such as creolization. Models of language change and evolution12–14, which contain a developmental account of the language acquisition process, are germane to our study (Supplementary Model 3).
We further discuss our findings using a simple recursive model which motivated this study. PCs of feature distributions (Fig. 2) give us phenotypic measures of song. Consider the distribution of a quantitative phenotype P in the ISO population. Since some of the variation in ISO songs is heritable, we partition P into a genotypic and an environmental value P = G + E, assuming an additive model for genetic variance22 VP=VG+VE.
We consider an Isolated Lineages Model, in which the environmental component of the pupil phenotype P(n+1) in the n+1’th generation is further divided into a portion E0(n+1) independent of the tutor, and a portion proportional to the tutor song phenotype c0P(n). We therefore have the recursion P(n+1) = G(n+1) + c0 P(n) + E0(n+1) [Eq. 1]. The partitioning of the phenotypic variance is analogous to the parental effects model in quantitative genetics1,23. In the one-to-one study, tutor and pupil genotypic values are approximately uncorrelated, and c0 may be estimated by regressing the pupil against the tutor (cf. Fig. 2j, c0 = 0.86, s.d = 0.15). The literature on cultural transmission24,25 also contains models analogous to Eq. 1 and has similar implications. Half-sib or cross-fostering experimental designs26 should be useful for separating the genetic27 and learning-related components of song transmission in future studies28.
Our one-to-one experimental design may be modeled using Eq. 1 by initializing P(1)=G(1)+E(1) for the ISO generation. The recursion then causes the distribution of phenotypic values to exponentially relax to an asymptotic “WT” distribution, the relaxation being rapid if c0 is close to 0. The largest changes occur in the first generation (consistent with our results). The case c0 =1 corresponds to a simple random walk V[P(n)]~√n, where the song phenotype would drift indefinitely (unbiased song copying with errors). The “copying bias” (1− c0) plays the role of a spring constant, confining the walker to a parabolic potential well. Notably, the WT variance in the model is a combination of the ISO variance and the learning parameter, emphasizing how ISO song and learning ability combine to produce WT song. Extensions of the model predict that both genetic relatedness between tutor and pupil and horizontal transmission alter the asymptotic “WT” distributions (Supplementary Model). Therefore we would expect our two designs to yield slightly different song cultures.
In a sense, the results of our study show that song culture is the result of an extended developmental process, a ‘multi-generational’ phenotype partly genetically encoded in a founding population and partly in environmental variables, but taking multiple generations to emerge. The functional significance of our findings remains open, i.e. whether WT females prefer the songs of multi-generation pupils to those of ISO tutors. Since our findings suggest that song culture is the result of an extended developmental process, it would be interesting to examine if changes in gene expression, neuronal reorganization or neurogenesis associated with song development show orderly multi-generational progression during the evolution of song culture.
METHODS SUMMARY
Animal care
All experiments were performed in accordance with guidelines of the National Institutes of Health and have been reviewed and approved by the IACUC of CCNY.
Experimental design
We used zebra finches (Taenyopygia guttata) from the CCNY breeding colony. Colony management and isolation procedures have been described previously29. Except for the colony experiment, all birds were kept either singly (isolates) or pair-wise (one-to-one tutored) in sound attenuation chambers (Supplementary 3e) from day 30 to 120 post-hatch. Wild-type songs (n=52) were obtained from birds raised in two well-established colonies. Isolates (n=17) were raised by their mothers from day 7–29 post-hatch and were kept in complete isolation from day 30 until day 120 or later. One-to-one tutored birds (n=13 and 8, for first and later generations, respectively), were randomly selected from 40 breeding pairs, and paired with one of 6 isolate tutors on day 30. For the colony setting, we made a sound isolation chamber from an old 20 cubic ft refrigerator (Supplementary Fig. 1). All birds in the colony (except for the 3 female founders) were the descendants of the founder male.
Data analysis
All the analysis was performed using Matlab 7, except for spectral feature calculations, which were done using Sound Analysis Pro 2. Isolate song syllables are often prolonged and monotonic. To quantify this notion, we estimated the time interval where acoustic features remain highly correlated and named this feature duration of acoustic state (Supplementary 4b). Rhythm spectrum18 was used to detect periodicity in song features at the syllabic and the song-motif levels (Supplementary 4d). We constructed song feature PCs by first computing cumulative frequency distributions (CDF) for each feature time-series (Supplementary Fig. 8). These CDFs were the input vectors for the Principal Component Analysis (Fig. 2a–c). Statistical tests are described Supplementary 5.
Acknowledgments
We thank J. Wallman & H. Williams for critical reading of the manuscript & consultation. C. Harding and N. Leader for recordings of WT songs. The study was supported by NIH grants to OT and PPM, an RCMI grant to CCNY and by the Crick-Clay Professorship to PPM.
Footnotes
Supplementary Information is available online at www.nature.com/nature. All methods and statistical analyses are included in the supplementary material, as well as details on the theoretical model and an audio file illustrating multi-generational song evolution.
Author Contributions: The idea for the study originated with PPM, with important modifications by OT and OF. The experiments were carried out by OF and OT. The model was developed by PPM with help from HW. All authors participated in the data analysis, with major efforts by HW and OF.