Diffusion MRI (dMRI) is the only non-invasive method that can map the living human brain’s connections and is critical for understanding mental disorders. Several large studies such as the Human Connectome Project (HCP) and the Adolescent Brain Cognitive Development (ABCD) have collected or are poised to collect diffusion MRI data from over 30,000 subjects. However, an important challenge is that these datasets collected from different scanners cannot be pooled for joint analysis due to large inter-scanner (inter-site) differences, caused by differences in vendor specific software for data reconstruction, the sensitivity of head coils etc. These scanner differences are often larger than the effect sizes observed between groups in psychiatric disorders. A second challenge for large-scale data analysis is the lack of a single consistent ontology-based definition and automated extraction of white matter connections across the lifespan (including neonates and children). A third challenge is the sheer size of the combined dMRI datasets (several terabytes), limiting the ability of researchers to test hypotheses as this requires expertise and complex computational resources for processing, storing, and visualizing such large volumes of data. In this grant, we propose to address these challenges to enable large- scale data-intensive analysis of dMRI data. Specifically, in Aim 1, we propose to develop novel mathematical algorithms to remove scanner-specific differences from data acquired at multiple sites. We will harmonize 10,000 subjects from the ABCD study acquired at 21 different sites, another 10,000 subjects from the HCP initiative spanning the entire lifespan and numerous disease indications and 10,000 subjects from the Healthy Brain Network. All the harmonized datasets (30,000 subjects), will be shared with the community using the NIMH data archive (NDA). In Aim 2, we will develop a formal ontology-based system for defining 189 white matter fascicles using neuroanatomical landmarks known from human and monkey literature on brain connectivity. Our main focus will be to develop novel algorithms for automated and consistent clustering and extraction of these fiber bundles spanning the entire human lifespan including neonates. To enable widespread use without the need for demanding computational resources and technical knowledge, in Aim 3, we will develop a web-based system for real-time 3D viewing and querying of the harmonized data and fascicles (integrating with NIMH data archive infrastructure) for a user-defined selection of subjects from the entire cohort of subjects across different diagnostic categories. Overall, the potential impact of this framework is significant, as it will, for the first time, allow a large-scale data-intensive analysis of dMRI data to study neurodevelopment as well as mental disorders cutting across diagnostic boundaries.