| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Department of Cancer Biology (S.R.M., J.L.H., C.M.D., S.E.M., E.A.K., S.I.H., J.D.C., G.K.B., L.A.C.); Department of Cell and Developmental Biology (L.A.C.); Department of Medicine (L.A.C.), Division of Endocrinology, Diabetes and Metabolism, and Abramson Family Cancer Research Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6160
Address all correspondence and requests for reprints to: Dr. Lewis Chodosh, Department of Cancer Biology, University of Pennsylvania School of Medicine, 612 Biomedical Research Building II/III, 421 Curie Boulevard, Philadelphia, Pennsylvania 19104-6160. E-mail: chodosh{at}mail.med.upenn.edu.
| ABSTRACT |
|---|
|
|
|---|
, Ucp1, and genes involved in fatty acid oxidation are spatially and temporally coregulated during development; that the mammary gland plays a functional role in adaptive thermogenesis; and that the transcriptional control of this adaptive response to cold is itself developmentally regulated. | INTRODUCTION |
|---|
|
|
|---|
The use of microarray hybridization to study Drosophila metamorphosis (8) suggested the potential applicability of this technique to the analysis of vertebrate development. However, analysis of microarray expression data from an entire organism or organ over its developmental course brings an additional level of complexity because gene expression profiles reflect both the alteration of genetic programs within discrete subsets of cells in the organ as well as dynamic shifts in the relative populations of various cell types. Furthermore, in contrast to studies in which changes in gene expression patterns are monitored over intervals that approximate expected time scales for transcriptional regulation (minutes to hours), studying developmental processes such as mammalian organogenesis requires the ability to reproducibly detect changes in gene expression patterns that occur over weeks to months. As such, whether microarray approaches can be used successfully to develop a coherent picture of the biological programs used during vertebrate development remains an open question.
To address this question, we undertook a gene expression analysis of murine mammary gland development using DNA oligonucleotide microarrays. The mammary gland possesses several unique characteristics that make it an ideal experimental system for this analysis. First, mammary gland organogenesis occurs via many programs of general developmental interest including branching morphogenesis, inductive stromal-epithelial interactions, programmed cell death, extracellular matrix remodeling, and hormone action (9). Second, because much of mammary ontogeny occurs after birth, these developmental events can be conveniently manipulated and analyzed. Finally, the striking extent to which the timing of normal developmental events influences breast cancer risk in humans suggests that a thorough understanding of mammary gland development will be of clinical utility.
Our findings confirm the ability of this nondirected analytical approach to identify cellular processes and pathways of known importance in mammary development as well as shifts in the relative abundance of different cell types within the gland. In addition, this approach permitted the identification of novel aspects of mammary development, particularly with regard to our findings that the murine mammary gland participates in the adaptive response to cold and that this response occurs in a developmentally regulated manner.
| RESULTS |
|---|
|
|
|---|
|
Global Analysis
To first determine whether global patterns of gene expression during mammary gland development are consistent with previous biological observations, pairwise "distances" between developmental points were calculated using the Pearson correlation coefficient and were visualized in three-dimensional space using multidimensional scaling (Fig. 1B
). Similar methods have previously been used to represent global relationships between individual tumors at the level of gene expression (11). Our analysis of mammary gland gene expression revealed that the four points of nulliparous development occupy a discrete region of gene expression space that is separable from all other surveyed points (Fig. 1B
). In addition, global expression profiles display a monotonic progression through gene expression space as the mammary gland progresses from the nulliparous state through pregnancy toward lactation, with the greatest differences in gene expression existing between the nulliparous and lactating mammary gland. Conversely, as the mammary gland undergoes postlactational involution the pattern of gene expression returns toward a state most similar to that of the nulliparous gland. However, the trajectory in gene expression space that the mammary gland follows from lactation through involution is distinct from that which describes its path from the nulliparous state to that of lactation. Presumably, this reflects the fact that the cellular processes by which the mammary gland progresses from nulliparity to lactation (e.g. proliferation and differentiation) are not simply the inverse of those by which the gland returns to the involuted state (e.g. apoptosis and remodeling). Notably, the global gene expression pattern found after 28 d of involution is similar, but not identical, to the age-matched adult nulliparous gland, consistent with the existence of morphological differences between the nulliparous and parous involuted gland, as well as functional differences with respect to cancer susceptibility. In aggregate, these data are consistent with previous morphological descriptions of postnatal mammary development, suggesting that these gene expression data represent a reasonable measure of underlying developmental processes.
While pairwise correlation coefficients provide a means to estimate similarities between global gene expression profiles, other analytical approaches are required to identify coordinate regulatory patterns among subsets of genes. Accordingly, profiles of genes whose expression exceeded a minimal threshold of regulation during development were normalized and clustered using a two-dimensional self-organizing map (SOM). As expected, these clusters include a variety of physiologically suggestive patterns of developmental expression indicative of regulation during puberty, pregnancy, lactation, and postlactational involution (Fig. 2
). Complete lists of the genes contained within each cluster and their expression values are published as supporting information at www.abramsoninstitute.org/chodoshdata.html.
|
Thirty independent clustering runs were tested for significant associations between expression clusters and gene categories (see Materials and Methods). Overlapping sets of genes within individual categories that exceeded the significance threshold in any individual clustering run were pooled. To confirm the statistical significance of associations identified by this approach, we used data sets in which gene identifications had been randomly permuted with expression data to generate 100 data sets for statistical characterization. This analysis demonstrated that each individual category appearing in the original compiled list has a probability of chance appearance less than 0.03. This approach resulted in the identification of 49 gene categories that are associated in a statistically significant manner with one or more temporal expression profiles during mammary gland development (Table 1
). Conversely, 20 clustered developmental gene expression profiles were statistically associated with one or more functional gene categories.
|
, and four components of the MCM complex, which is required to license one round of DNA replication (13). Further examination of cluster 17 reveals the presence of additional genes associated with proliferation and cell cycle progression (Fig. 3A
|
, ß,
,
, and
-casein along with whey acidic protein,
-lactalbumin, and milk fat globule-EGF factor 8. "Protein biosynthesis" genes include those encoding eIF-4B, eIF-5, EF-Tu, EF-1
, and EF-2 and a number of aminoacyl-tRNA synthases, consistent with the requirement for large-scale production of milk proteins. "Protein transport" genes include sequences encoding each of the components (
, ß,
) of the heterotrimeric SEC61 complex responsible for protein transport into the endoplasmic reticulum (16) as well as homologs of other genes involved in protein secretion (SEC13, SEC23). Further inspection of clusters with peak expression during lactation reveals genes up-regulated during de novo fatty acid synthesis (ATP citrate-lyase, cluster 36), mammary epithelial differentiation (H-FABP, cluster 36), and the stimulation of adipocyte lipid degradation (zinc alpha-2 glycoprotein, cluster 37) (17, 18, 19).
|
, that have been previously associated with apoptosis during early involution (21, 22). Consistent with the widespread degradation and remodeling of the extracellular matrix that occurs during the second phase of mammary gland involution, transcripts encoding "Thiol protease" and "Zymogen" proteins with previously described roles in involution, such as stromelysin 1 (Mmp3), stromelysin 3 (Mmp11), and MT-MMP (Mmp14), are statistically associated (P < 0.03) with this stage of development (cluster 14; see Table 1
Identification of Shifts in Cellular Compartments during Organogenesis
In addition to reflecting changes in transcriptional regulatory patterns within individual cells, changes in gene expression observed during organogenesis may also reflect shifts in the relative abundance of specific cell types. For example, a number of adipocyte markers, including C/EBP
, aP2, Fsp27, ß3-adrenergic receptor, Ob, and PPAR
, were noted to share a similar pattern of expression during puberty and pregnancy (Fig. 3D
) (25, 26). This finding could reflect changes in the level of expression of these genes within each adipocyte or changes in the relative abundance of adipocytes within the gland. To distinguish between these possibilities, the expression of Ob in mammary adipocytes was determined using in situ hybridization. This analysis demonstrated that, in contrast with the marked decrease in Ob expression detected by microarray analysis during pregnancy, Ob expression does not change substantially on a per-cell basis during puberty and pregnancy (Fig. 3E
). This suggests that the progressive reduction in adipocyte-specific mRNAs as a fraction of total mRNA during pregnancy is more likely to be due to dilution by the large increase in epithelial cell abundance and/or levels of epithelial RNA synthesis that occurs during this developmental transition. Indeed, the marked increase in epithelial cells during puberty and pregnancy can be directly visualized using cytokeratin 14 (a marker of myoepithelial cells) as well as other cytokeratins (Fig. 3F
and data not shown). Furthermore, the statistical association (P < 0.01) between members of the "Keratin" gene category and cluster 22 suggests a proportional increase of other epithelial cell subtypes, including subsets of luminal epithelial cells (cytokeratin 19) (27) and putative stem cells (cytokeratin 6) (28). These data suggest that the use of selected cell type-specific genes whose expression across development is relatively constant on a per-cell basis will facilitate the rapid identification of genes expressed in the same cell type and the establishment of canonical cell type-specific expression profiles against which regulatory perturbations may be recognized and measured.
Beyond identifying shifts among abundant cell types, this approach can also readily detect changes in cell types that constitute only a small fraction of the mammary gland. For example, the presence of macrophages in the mammary gland during involution is indicated by the dramatic up-regulation of macrophage-specific genes such as Mac-2, Nramp1, F4/80, macrophage metalloelastase, and MPS-1 at d 7 of postlactational involution (clusters 13 and 14) (29, 30, 31, 32, 33, 34), as well as by a number of genes whose products have been postulated to mediate the detection and engulfment of apoptotic cells and debris, including C1q (chains A, B, and C), CR3ß, Gas-6, Axl, Cd68, and Abc1, consistent with a role for macrophages in clearing apoptotic epithelial cells after lactation (35). Together, our data indicate that microarray analysis can detect altered cell numbers as well as suggest likely functional roles within the context of a whole organ.
Related to this, endothelial-specific genes, including Pecam, VE-cadherin, and endoglin, were found to exhibit a sharp peak of expression in glands harvested from 2-wk-old animals (Fig. 2
, cluster 10) (36), as were a number of genes involved in the coordination of angiogenesis, such as Flk-1, Tie1, Tie2/Tek, and Vegf (Fig. 2
, clusters 10 and 15) (37). These data suggest active formation of the microvasculature during early postnatal development. Further examination reveals that genes encoding basement membrane proteins, including the
-1 and
-2 chains of type IV collagen, heparan sulfate proteoglycan 2, and the laminin B2 chain, also demonstrate peak expression in the mammary glands of 2-wk-old animals (Fig. 2
, clusters 10 and 11). This observation is intriguing in light of in vitro work demonstrating that endothelial-specific markers can be detected in cultured mammary stromal cells capable of differentiating into adipocytes or forming capillary-like structures in a Vegf-dependent manner when plated on basement membrane components (38). Our results suggest that basement membrane protein expression may provide a similar signal for endothelial differentiation of mammary stromal cells during early postnatal development. Alternately, the coordinate expression of endothelial markers and basement membrane proteins may reflect the simultaneous formation of an extensive capillary network characteristic of white adipose tissue along with deposition of basement membrane associated with each adipocyte. While further work is required to test these hypotheses, the ability to conceptually link disparate pathways on the basis of gene expression patterns during organogenesis demonstrates the utility of this approach for suggesting novel hypotheses regarding vertebrate ontogeny.
The Mammary Gland Contains Brown Adipose Tissue
Having successfully used a statistical approach to identify known developmental processes in the mammary gland, we next examined our data set for groups of genes with previously unsuspected patterns of coordinate developmental regulation. For example, we found that genes encoding "Fatty acid metabolism" enzymes were statistically associated (P < 0.02) with expression clusters manifesting peak expression in the mammary glands of 2-wk-old animals (clusters 5 and 10). This pattern was confirmed by Northern hybridization analysis (Fig. 4A
, Dci and Acadvl). The eight enzymes in the "Fatty acid metabolism" category that map to clusters 5 and 10 catalyze sequential steps in the ß-oxidation of both saturated and unsaturated fatty acids, including the transport of fatty acyl-coenzyme A (CoA) across the inner mitochondrial membrane via the carnitine shuttle, the breakdown of very-long-/long-/medium-/short-chain fatty acyl-CoA, and the isomerization of cis-
3-enoyl CoA (Fig. 4B
) (39, 40). Eight of nine clustered genes in this category exhibit peak expression in the mammary gland at 2 wk of age, emphasizing the striking coordinate regulation of the majority of enzymes in this metabolic pathway.
|
Because brown adipose tissue plays a central role in adaptive thermogenesis, we further considered the possibility that the mammary gland may play a role in this process. Moreover, because adaptive thermogenesis requires the ß-oxidation of fatty acids within brown adipose tissue (47), a role for the mammary gland in nonshivering thermogenesis might account for the developmental coregulation of "Fatty acid metabolism" genes and Ucp1. To begin to test this hypothesis, in situ hybridization for Ucp1 expression was performed on serial sections from 2-wk-old mammary gland. This analysis demonstrated that Ucp1 is expressed at high levels in defined, lobular regions of eosinophilic, multilocular adipose tissue located within the mammary gland that are histologically characteristic of brown fat (Fig. 4
, CE). Notably, these Ucp1-expressing regions encompass only a subset of adipocytes within the mammary gland because the adipose-specific marker, aP2, is expressed in additional regions of the gland that do not express Ucp1 (Fig. 4E
). Similarly, aP2 expression can be seen in both interscapular brown fat as well as the surrounding white adipose tissue, whereas Ucp1 expression in the interscapular fat pad is restricted to the regions of brown fat.
Further examination of serial sections from this analysis revealed that genes (Acadvl, Dci) encoding enzymes that catalyze the ß-oxidation of fatty acids are expressed in a spatially restricted pattern that is identical to that of Ucp1 (Fig. 5
). Moreover, expression of the nuclear hormone receptor, PPAR
, which has been shown to regulate the expression of both Ucp1 (48) and fatty acid metabolic genes (49 ; for review, see Ref. 50), was found to exhibit developmental regulation (cluster 15) and spatial expression strikingly similar to that of Ucp1, Acadvl, and Dci (Figs. 4
and 5
). As such, the spatial and temporal coregulation of Ucp1, PPAR
, and genes whose products are involved in the ß-oxidation of fatty acids further supports our hypothesis that the statistical association between "Fatty Acid Metabolism" genes and developmental expression profiles with peak expression in the neonatal mammary gland may be related to the PPAR
-mediated coordinate activation of gene regulatory networks involved in adaptive thermogenesis.
|
coactivator, PGC-1, is induced in response to cold and activates adaptive thermogenesis by up-regulating Ucp1 (51). Furthermore, PPAR
and PGC-1 can interact to up-regulate the mitochondrial transcription of fatty acid metabolic genes (52). Therefore, to determine if a cold-inducible signaling pathway is present within the mammary gland, 2- or 10-wk-old female FVB mice were exposed to cold (4 C) for 3 h before tissue harvest. Northern hybridization analysis demonstrated a 6-fold up-regulation of PGC-1 in the mammary glands of 2-wk-old mice exposed to cold, compared with animals housed at room temperature (Fig. 6
|
, and Ucp1 expression by in situ hybridization in mammary glands of 2-wk-old and 10-wk-old mice after cold exposure. Analysis of Ucp1-expressing cells in the mammary glands of 2-wk-old mice confirmed that brown adipose tissue comprises a significant fraction of the neonatal gland (Fig. 6
, and Ucp1 expression colocalize and that are each up-regulated in response to cold exposure in restricted regions of the adult (10-wk-old) gland to levels that are comparable to those seen in 2-wk-old mice (Fig. 6C
were not observed to increase beyond their already elevated levels of expression (Fig. 6C
Finally, having demonstrated that the adult gland remains capable of inducing pathways required for adaptive thermogenesis, we extended our study to 10-wk-old animals housed at 4 C for increasing periods of time before harvest. This analysis demonstrated that, whereas Ucp1 is expressed at high levels throughout the mammary glands of 2-wk-old mice, only a small fraction of adipocytes within the mammary glands of 10-wk-old animals are capable of expressing Ucp1 in response to acute cold exposure. Furthermore, although increasing periods of cold exposure progressively induce Ucp1 expression in the 10-wk gland, the absolute expression of Ucp1 on a per-cell basis is comparable to that seen within the 2-wk-old gland either at room temperature or following cold exposure (Fig. 6D
). These data demonstrate that both the amount of brown adipose tissue and the transcriptional regulation of the mammary glands adaptive response to cold are developmentally regulated within the murine mammary gland.
| DISCUSSION |
|---|
|
|
|---|
, Ucp1, and genes involved in the ß-oxidation of fatty acids colocalize within this compartment and are coordinately regulated during development; that the mammary gland plays a functional role in adaptive thermogenesis; and that transcriptional control of the adaptive response to cold within mammary brown adipose tissue is developmentally regulated. These findings suggest a previously unrecognized, developmentally regulated role for the murine mammary gland in adaptive thermogenesis. Notably, despite the fact that current wisdom regarding mammary development considers the neonatal period to be relatively quiescent, studies clearly demonstrate that novel and interesting events occur in the mammary gland during this stage of development. A central feature of the analytical approach described in this paper is the identification of statistically significant relationships between prospectively annotated functional gene categories and clustered gene expression profiles. This approach significantly decreases the analytical burden imposed by examination of large numbers of genes and avoids several potential sources of selection bias in pathway identification. In the present study, we have demonstrated the validity of this approach for studying vertebrate development by confirming its ability to identify known biological processes during mammary gland development. The automated identification of statistical relationships between functional gene categories (e.g. cell cycle, cell division, milk protein, protein biosynthesis, protein transport, thiol protease, and zymogen genes) and specific stages of mammary gland development proves that this approach is able to reliably identify biologically relevant pathways within a complex mixture of cell types during organogenesis in the absence of prior information about the developmental process analyzed. These data confirm the utility of this technique for studying mammary development and suggest its broader applicability to other developmental systems.
We have further demonstrated the ability of microarray approaches to detect shifts in the relative abundance of adipocytes and epithelial cells within the mammary gland, as well as cell types that constitute only a small percentage of the gland such as macrophages, endothelial cells, and lymphocytes. While our data show that it is clearly possible to detect coordinate pathway regulation in complex mixtures of cell types during organogenesis, interpretation of expression profiles from intact organs will undoubtedly be aided by the identification of genes that mark the contribution of specific cell types. As such, establishing baseline expression profiles for diverse cell types should permit a more detailed description of changes in tissue composition as well as facilitate the identification of developmentally regulated genes by identifying gene expression patterns within a compartment that depart from that compartments canonical baseline expression profile.
Adaptive Thermogenesis in the Mammary Gland
Perhaps the greatest value in pursuing a nondirected analytical approach lies in the ability to detect unexpected patterns of developmental regulation. In this regard, it is noteworthy that the statistical approach employed in this study was able to identify the presence of an abundant cellular compartment within the mammary gland that direct examination over several decades failed to reveal. Our identification of the coordinate up-regulation of genes involved in fatty acid oxidation, coupled with the recognition that this pattern reflects the presence of brown adipose tissue within the mammary gland, demonstrates that this methodology can identify novel and unpredicted features of vertebrate development.
One potential explanation for the observed elevation in fatty acid metabolism gene expression in the mammary glands of 2-wk-old mice is that it reflects the increased use of ß-oxidative pathways to provide the energy requirements of the neonatal gland. This explanation seems unlikely, however, because cells within the neonatal gland are relatively quiescent when compared with the rapid epithelial proliferation that occurs during puberty and pregnancy, or to the large-scale milk production that occurs during lactation. A second hypothesis to explain this pattern of gene expression is that it represents a metabolic response to changes in dietary composition. Inasmuch as 2-wk-old mouse pups are nursing on diet of high-fat rodent milk, it is likely that the mammary gland adapts to utilize the predominant source of available energy. In fact, regulation of ß-oxidative pathways during the neonatal period has been described in other tissues (54). Thus, this developmental expression pattern may represent an adaptation of the organism to its environment rather than a local phenomenon in the mammary gland.
A third hypothesis is suggested by our observation that Ucp1 is expressed in the mammary gland and is developmentally regulated in a manner similar to that observed for genes involved in fatty acid oxidation. Although the mammary fat pad is composed of white adipose tissue, the expression of a gene whose product dissipates the proton gradient across the mitochondrial inner membrane suggested the presence of cells that perform thermogenic functions characteristic of brown fat. Because thermogenesis also requires the ß-oxidation of fatty acids (47), it is possible that this pathway is up-regulated in the mammary gland in response to a need to maintain body temperature during the neonatal period. This interpretation is supported by the spatial coregulation of Ucp1 and genes encoding enzymes involved in fatty acid metabolism. The increased rate of heat loss in neonatal mice due to their comparatively high surface:volume ratio and relative lack of fur suggests the potential advantage of utilizing the mammary fat pad for thermogenesis during early postnatal development. As would be predicted by this model, the mammary glands of male neonatal mice show a similar pattern of Ucp1 expression (data not shown). This model is also consistent with the described developmental regulation of the amount of brown fat in the interscapular fat pad in rodents, with the induction of brown-fat-like regions within white adipose depots, and with the emerging recognition that brown fat and white fat may be interconvertable under some experimental conditions (53, 55, 56). This phenomenon has not been described in the mammary gland, however, validating the ability of this approach to identify unexpected aspects of mammary development.
In addition to demonstrating that the murine mammary gland contains functional depots of brown adipose tissue that respond appropriately to cold, our studies further demonstrate that this response is developmentally regulated. Specifically, the mammary glands of 10-wk-old mice do not express significant levels of Ucp1, PGC-1, or PPAR
when housed at room temperature, but coordinately induce each of these genes in response to cold exposure. In contrast, Ucp1 and PPAR
are each maximally up-regulated in neonatal mice housed at room temperature. These findings indicate that the stimuli that trigger this coordinate transcriptional response differ between neonatal and adult animals. Consistent with this, cold exposure caused an up-regulation of the transcriptional coactivator PGC-1 in the mammary glands of 2-wk-old mice, yet this increase was not accompanied by an increase in the expression of its transcriptional target, Ucp1. This suggests that the up-regulation of PGC-1 is not rate-limiting for Ucp1 expression at this developmental stage. These findings both highlight the developmental regulation of this system and raise the question of what signal is responsible for the up-regulation of Ucp1 expression in neonatal mice housed at room temperature. Potentially, the difficulty that neonatal mice have in maintaining body temperature may itself provide this signal. That is, even when housed at room temperature neonatal mice may exist in what is essentially a cold-induced state. Previous work in rat models supports this hypothesis (57). The elevated expression of Vegf observed in 2-wk-old mice (data not shown) is also consistent with this model because brown adipose tissue has been shown to produce Vegf in response to adrenergic stimulation (58). To our knowledge, this aspect of the developmental regulation of the transcriptional response to cold exposure has not been previously described.
Further consideration of the potential mechanisms underlying this effect suggests alternative developmental signals for mammary brown fat activity. Given that PGC-1 has been shown to act as a transcriptional coactivator for PPAR
(52, 48), and that high levels of PPAR
mRNA are found in the 2-wk-old mammary gland, it is possible that basal levels of PGC-1 and PPAR
expression are sufficient for high levels of expression of Ucp1 in the mammary glands of neonatal mice housed at room temperature. Notably, a high-fat diet has been shown to induce PPAR
expression in the liver (59), raising the possibility that maternal milk provides a developmental signal that induces a protective, thermogenic response in suckling pups. It is also possible, however, that other factors may contribute to the up-regulation of Ucp1 at this developmental stage. Further experiments will be required to dissect the developmental and environmental stimuli responsible for these observations.
In aggregate, our findings demonstrate that statistical methods for detecting associations between functional gene categories and patterns of developmental gene expression allow the rapid, automated identification of key pathways and processes used during mammary development. Additionally, our observation that genes involved in the ß-oxidation of fatty acids are developmentally coregulated with Ucp1 and PPAR
underscores the ability of this technique to suggest novel hypotheses regarding developmental processes. We anticipate that further characterization of clustered gene expression profiles during normal mammary development will provide a framework for identifying and classifying a broad array of pathways that are altered in transgenic and knockout animals exhibiting mammary developmental phenotypes. When combined with studies to determine the spatial localization of gene expression, the parallel analysis of developmental gene expression patterns should facilitate the elucidation of regulatory networks guiding vertebrate development.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Oligonucleotide Microarray Hybridization
Approximately 30 µg total RNA was used at each developmental time point. Biotinylated cRNA was generated essentially as described (61). Hybridization to a set of four Affymetrix Mu6500 GeneChips was performed overnight per manufacturers instructions. After staining and washing, chips were scanned using a Hewlett-Packard Co. GeneArray scanner. Grid alignment and raw data generation was performed using Affymetrix GeneChip 3.1 Software. Raw expression values, representing the average difference in hybridization intensity between oligonucleotides that perfectly match the transcript sequence and oligonucleotides containing single base pair mismatches, were measured. To scale between chips, these expression values were rank ordered, and the median 96% were averaged. Chips were scaled relative to each other to equalize this average value. A noise value (Q) based on the variance of low-intensity probe cells was used to calculate a minimum expression threshold (2.1*Q) for each chip, and the highest value across development for each of the four Mu6500 subarrays was used to set the minimum value for that subarray at all time points. To estimate the total number of genes being assayed, all probe set accession numbers present in GenBank were matched against Unigene. A total of 5165 probe sets matched 4097 different Unigene clusters, and the remaining 1182 probe sets, which represent 1175 distinct accession numbers, did not match any Unigene entry.
Multidimensional Scaling
Scaled distances between developmental time points were calculated using a weighted modification of the Pearson correlation coefficient.
![]() |
and
are the scaled mean values, e.g.
![]() |
Multidimensional scaling coordinates were calculated using the "R" software package, and the resulting projection into three-dimensional space was visualized using Blender (Not a Number B.V., Amsterdam, The Netherlands).
Cluster Analysis
Expression profiles with a relative difference greater than 3-fold and an absolute difference greater than 100 between lowest and highest expression during development were selected for clustering. For comparison, the average expression value of all probe sets at a given time point ranged from 93.6134.4. A total of 1,312 genes met both the relative and absolute difference criteria and were selected for cluster analysis. Individual developmental profiles were normalized such that mean expression = 0 and variance = 1. The resulting data were clustered by a two-dimensional SOM using GeneCluster software (2). An 8-by-5 grid was chosen empirically to balance the goals of minimizing the variance of individual clusters while minimizing the risk of splitting closely associated profiles into multiple clusters. A list of genes assigned to each of the 40 clusters is published in the supporting data.
Identification and Statistical Analysis of Gene Categories
SWISS-PROT version 38 and TrEMBL (including updates before 20-Aug-1999) were used for the purposes of this analysis (62). For 2,665 expressed sequence tags, Affymetrix provided the nearest SWISS-PROT accession number, and this was used to extract keyword annotations from SWISS-PROT 38. For remaining sequences, the accession number was searched against SWISS-PROT to find an explicit match. If no such match existed, the nearest SWISS-PROT/TrEMBL sequences were determined by alignment using BLAST 2.0 (63). If the closest match had greater than 95% degree of identity, its keywords were extracted. If the closest match was an unannotated TrEMBL entry and an annotated SWISS-PROT entry also showed greater than 95% homology, the SWISS-PROT keywords were used. If the highest degree of identity was 8595%, the closest match was used regardless of number of keyword annotations. The resulting gene categories were then tested for statistically significant association with SOM clusters (64). The cumulative hypergeometric distribution was used to determine the probability of chance association between a given cluster and a given gene category. To account for the independent testing of 394 categories that appear more than one time in the set of clustered genes, this value was required to be
![]() |
Visual inspection of the 8 by 5 SOM revealed that several adjacent clusters have similar expression profiles (Fig. 2
). As such, we reasoned that some gene categories might not reach the threshold of statistical significance due to the partitioning of genes with similar expression profiles into adjacent, related clusters. Furthermore, because the precise location and organization of SOM clusters depends on the random seeding of points used in the analysis, subtle shifts in cluster boundaries among independent runs could theoretically impair the ability of this analytical approach to detect biologically relevant associations. Therefore, more sensitive detection of associations was achieved by pooling overlapping sets of significant genes from 30 independent clustering runs. Briefly, for each individual run with a statistically significant (as defined above) association between some cluster and a specific gene category, a list of genes within that cluster and category was generated. Overlapping contigs of the resulting gene lists were generated, and these final groups of genes represent connected regions of expression space associated with a given gene category. A single SOM was chosen for reference purposes, and a cluster within this map that contains 40% or more of the grouped genes was designated the primary cluster for this association. Similarly, any cluster containing 25%40% of the genes was designated as a secondary cluster. To confirm the statistical significance of associations identified by this approach, gene identifications were randomly permuted with expression data to generate 100 data sets for statistical characterization. Each permutation was analyzed as described for the original data set, and categories that exceeded the statistical significance threshold were tabulated. The resulting data were fit to binomial distributions and indicated that a conservative estimate for the probability of an individual keyword associating by chance with one or more clusters of gene expression was less than 0.03. For gene categories containing less than 10 members, the probability of association was individually estimated.
Statistically significant associations were individually examined for redundant gene representation, and significance was recalculated where appropriate. Categories that no longer exceeded the defined statistical threshold after removal of duplicate entries were removed from further consideration. Additionally, significant associations between clusters and muscle-specific gene categories were considered to represent a variable admixture of mammary gland and skeletal muscle across development and were therefore eliminated from further consideration.
Morphological Analysis
For mammary gland whole mounts and in situ hybridization, no. 4 mammary glands were harvested from FVB mice. Whole mounts were generated by mounting the gland and staining with carmine alum as previously described (12). In situ hybridization of sections from paraffin-embedded glands was performed as previously described (60).
Experimental Animals
All animal experimentation described was conducted in accord with accepted standards of humane animal care. Protocols for animal work were approved by the University of Pennsylvania institutional committee on animal care.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Abbreviations: CoA, Coenzyme A; Ob, leptin; PGC-1, peroxisome proliferator activated receptor
coactivator 1; PPAR
, peroxisome proliferator activated receptor
; SOM, self-organizing map; Ucp1, uncoupling protein 1; Vegf, vascular endothelial growth factor.
Received for publication February 12, 2002. Accepted for publication March 18, 2002.
| REFERENCES |
|---|
|
|
|---|
-inducible protein in murine macrophages. Biochem J 325:779786