Scripps Research Logo

The Design and Management of Human Variation and Protein Folding through Precision Medicine

Variation and Proteostasis Dynamics Laboratory of William E. Balch

Overview of the human variation and protein folding- the matrices problem

The central goal of the Balch laboratory is to provide an integrated view of human variation and protein folding dynamics using modern evolutionary, biochemical (mass spectrometry), molecular, biophysical, structural and bioinformatics approaches based on the fundamental laws of physics and emergent approaches in Fig 1 - The matrix problem systems biology that will take us to new level of understanding of human health and disease- how we got here, how we survive, and how complexity (Figure 1. The Matrix problem- biology is complex) is fundamental to the genotype to phenotype transformation that drives our individuality and health in the context of the population.

The Balch laboratory has 4 integrated goals: (1) Define computationally the basis for variation in inherited and somatic disease through its impact on biology; (2) Define from molecular, biochemical, cell biological and structural perspectives how the protein fold and its stability impacts human health through the action of the protein homeostasis or ‘proteostasis’ (Balch et al (2008) Science); (3) Understand how endomembrane trafficking pathways fail to integrate with variation and the proteostasis program in ‘mis’folding disorders, and (4) learn how therapeutic management of proteostasis biology can be used to modify variation from a precision medicine perspective to restore the activity of misfolded protein to normal function to reduce frailty and increase resiliency of the individual in response to disease and aging. Through this multidisciplinary approach we hope to gain critical insight into the fundamental principles that integrate the cellular programs that direct and manage human variation in protein folding and endomembrane trafficking pathways to help us to arrive at a new language base that explains biology in the context of principles driving natural selection- that is, who we are.

The challenge of human variation (variomics) in health and disease

We now appreciate that we are all unique in our genomic composition based on ongoing human genotypic information overload emerging from sequencing efforts.  Such big data remains a daunting challenge to interpret in the context of biology (Figure 2. Variation in Precision Medicine). We now appreciate that variation contributes to diversity of traits defining the human population where variation is strongly influenced by both ancestry and the environment- natural selection. Importantly, variation is beginning to change the way we think about protein structure- from that of snapshots captured by structural techniques to one of a biologically flexible folding dynamics- that operate in as yet to be determined ways. This is particularly evident in inherited (both rare and complex) disease that impact anabolic/catabolic features of biology and in somatic (acquired)/epigenetic disease where the aberrant rapid evolution of variation in the genome over a lifetime is a driver of proliferation leading to cancer and responsible for aging. To understand the ‘variomics’ found in rare diseases such as Cystic Fibrosis (ion channel pathobiology) (Pankow et al. (2015) Nature), Niemann-Pick C (cholesterol metabolism disruption) (Pipalia et al. (2017) J Lipid Res), alpha-1-antitrypsin deficiency (AATD- liver and lung disease) (Chao and Balch (2015) Respir Med, Chao and Balch (2017) In Press) and prominent oncogenes (EGFR, p53) we are developing screening approaches to assess the impact of variation recorded in the ExAC database ( to reinterpret the ‘body’ language of protein folding and dynamics in complex cell and tissue environments. Given that each one of us differ by 10,000 to 20,000 SNPS that reflects our ancestral roots, we now need to understand the fold in the context of this surprising level of variation.

Protein folding management of human disease by the adaptive proteostasis network (APN).

A major challenge is to understand how proper folding keeps us healthy and what goes wrong in disease. Protein folding is managed by the 'protein homeostasis' or 'proteostasis' program (Balch et al. (2008) Science; Evans and Balch (2013) Nature Rev Mol Cell Biol; Pankow et al. (2015) Nature; Roth et al (2014) PLoS Biol; Fig 2 - Variation in precision medicineAmaral and Balch (2015) J Cys Fibros; Thannickal et al. (2015) Am J Respir Crit Care Med; Veit et al. (2016) Mol Cell Biol; Wang and Balch (2017) submitted). This ‘adaptive’ proteostasis network (APN) that response to environmental cues is essential to maintain the germline, manage the stem cell niche, drive development, and to protect us from the environment and pathological challenges that occur daily and during aging. It is now well-established that the proteostasis program serves as a buffer for human evolution and plays a prominent role in unveiling variation responsible for the genotype to phenotype traits of each individual in health and disease (see Siegal (2017) Nature). (Figure 3.  Adaptive Proteostasis Network (APN).

Inherited disease poses unique challenges to the APN. All protein folding diseases arise as a consequence of an imbalance between the need for protein function, the energetics and kinetics of the protein fold and the properties of the local proteostasis managed folding environment that has both genetic and epigenetic foundations. Because all proteins encoded by the genome are likely at some point in their life cycle to reside outside the cell, protein folding management is intrinsically linked to membrane trafficking pathways and external challenges to the cell (Wiseman et al. (2007) Cell; Hutt and Balch (2010) Science; Roth et al. (2014) Plos Biol; Wang and Balch (2017) submitted).  These trafficking pathways involve the extensive endomembrane system found in all eukaryotes- including the exocytic and endocytic compartments that link to autophagic and exosome expulsion pathways. The APN harbors numerous chaperone systems that are specialized for the cytosol and for compartments of the endomembrane system mediating trafficking, forming a 'cloud' around each protein to manage its daily function (Figure 3. The APN Cloud).  Moreover, they are evolutionarily specialized for each cell type, and tissue and organismal environment (Powers et al (2009) Ann Review Biochem, Balch (2013) Nature Rev Mol Cell Biol, Pankow et al. (2015) Nature, Amaral et al. (2015) J Cys Fibros, Veit et al (2016) Mol Cell Biol, Pipalia et al. (2017) J Lipid Res).

Fig 3 - Adaptive Proteostasis Network (APN)The dynamic APN matrix (Figure 1) includes the ubiquitous  and highly abundant (5-10% of cell protein) Hsp70 and Hsp90 chaperone/co-chaperone systems that direct folding as we have shown for CF (Coppinger et al. (2012) PLoS One, Roth et al. (2014) PLoS Biol, Pankow et al. (2015) Nature, Wang and Balch (2017) submitted) and protect the fold from genetic and/or physiological/pathological stresses that assault human physiology. Loss of proteostasis can lead to not only inherited disease, but environmentally triggered complex diseases including, among others of current interest to the lab, type 1/2 diabetes (T1/2D) (Pottekat et al. (2013) Cell Rep) and AATD/COPD (Roth et al. (2014) PLoS Biol). Combined with human genome sequencing efforts that have revealed >10,000 diseases triggered >200,000 variants world-wide (ExAC database ( reflecting the ongoing evolution of the genome responsible for natural selection and fitness as well as newly emerging epigenetic forces that tune folding capacity through histone acetylation and methylation pathways.

Integration variation and folding with endomembrane trafficking

Eukaryotic cells are highly compartmentalized. Each compartment of exocytic and endocytic pathways harbors a unique chemical and biological environment in which protein folding and function can be modulated to maintain cellular, tissue and organismal homeostasis. During export from the first compartment of the exocytic pathway, the endoplasmic reticulum (ER), where folding is initiated, nearly one-third of the protein cargo encoded by the human genome is mobilized to the rest of the cell by the activity of vesicle budding machines that utilize tethering/fusion and coat components to direct endomembrane traffic. We and others have found that the selection of cargo clients are subject to variation sensitive recruitment by the COPII vesicle formation pathway (Wang et al. (2006) Cell; Stagg et al. (2008) Cell; Wang and Balch (2017) submitted).  The components involved in the assembly and disassembly of coat and tethering systems is likely biologically regulated by the activity of the Hsp90 family of chaperone/co-chaperone components (Roth et al. (2014)), an area of current focus by the laboratory. Thus, the extensive machineries modulating the dynamics of the endomembrane system to accommodate and adjust the protein fold transiting to downstream compartments are likely integral components of an underappreciated variation sensitive, proteostasis-based trafficking matrix (Hutt and Balch (2013) Cold Spring Harb Perspect Biol) (Figure 1) that manages the fold for function in diverse environments and in response to many challenges to human biology. By using machine learning tools to understand folding and trafficking design (Wang and Balch (2017) submitted, Subramanian et al. (2017) submitted), and by implementation of quantitative approaches for pathway dissection using mass spectrometry (Pankow et al. (2015); Rauniyar et al. (2015)) in conjunction with human genome sequence information, we are hope to define how proteostasis integrates variation and folding with trafficking to achieve function in a cell and tissue specific manner.

Variation, proteostasis and epigenetics in misfolding disease

We have discovered that manipulation of the chromatin environment (the epigenome) through modulation of histone deacetylases (HDACs) using HDAC inhibitors (HDACi) or siRNA silencing of HDAC enzymes modulates a global network of interacting factors that contribute to restoration of the function of CFTR and NPC1 function in disease (Pankow et al. (2015) Nature, Pipalia et al. (2017) J Lipid Res, Wang and Balch (2017) submitted, Subramanian et al. (2017) submitted). These results suggest that transcriptional circuits controlled by open- and closed states of chromatin through histone-regulated acetylation/methylation pathways and/or post-translational modifications of non-histone proteostasis regulators (including Hsp90) may allow us to reprogram disease to a more permissive but healthy state utilizing evolutionary conserved epigenetic pathways that normally buffer change for fitness and survival in response to stress. In essence, epigenetics is a problem in proteostasis as modifications impact the body language of the fold and the language of the fold manages the genome. Understanding the entanglement of genetics, epigenetics and proteostasis in response to variation remains an unmet challenge.

Pharmacological management of variation in misfolding disease

Taking advantage of our understanding of the informatics, molecular and biochemical relationships between variation and proteostasis, we are generating high-throughput-screening (HTS) approaches to monitor APN Fig 4 - Landscape in population biologycomponent contribution to identify small molecule regulators that precisely regulate key steps of critical APN pathways involved in human disease. One such area in the laboratory that has been showing significant progress is our recent discovery of small molecules that interfere with the function of the co-regulators of the Hsp90 chaperone system (co-chaperones). These small molecules appear to have an important impact on the onset of misfolding disease (Singh et al. (2017) submitted). Thus, HTS approaches that target the APN are beginning to provide us with novel chemical and biological tools to understand and manage proteostasis biology.

Building variation landscapes to describe folding in complex biological states   

To put our integrated approach in a single package (i.e., in order to understand the many interactions governing the function of APN in folding and trafficking management of variation that drives biology) we are applying informatics, mathematical and statistical tools (Wiseman et al. (2007) Cell; Wang and Balch (2017) submitted) to databases acquired through in-house efforts and human genome sequencing efforts to generate population-based landscapes that define healthy and unhealthy protein relationships in response to the variation and the environment, and how small molecules can adjust those pathways to achieve correction of function. The bottom line is that our understanding of human variation is in its infancy, is central to central dogma and much work is necessary to create a language base of how a single protein operates in the context of variation to maximize each individual’s function, particularly in response to disease.