DBP8: Scalable approaches to modeling using large sets of rules and images

Back to table

A. Name of Collaborating Investigator(s): Peter K. Sorger,1 James Faeder,2 Robert F. Murphy3

B. Institutions: 1Harvard Medical School, 2Pitt, 3Carnegie Mellon University

C. Funding Status of Projects: NIH U54HL127365-02 (Sorger) 09/10/2014-05/31/2020; DARPA

W911NF-14-1-0397 (Sorger) 7/15/2014-1/14/2018

D. Driving relationship between DBP8 and TR&Ds. The Sorger lab is playing a leading role in the DARPA “Big Mechanism” program119 by compiling and curating their text mining results on the signaling pathways involving the protein Ras (whose mutations drive the development of many forms of cancer) in a comprehensive 'library of rules' called the Ras Executable Model (REM), similar in spirit to previous efforts to develop comprehensive models120-125 but at larger scale. The rapidly growing model, extensively documented and freely available at http://rasmodel.org, is an ideal driver for the development of rule-based modeling capabilities in TR&D3, and, in turn, TR&D3 will facilitate the development of the REM. As the model is being developed using the pySB framework, this DPB will also provide a testbed for the software library and API being developed in TR&D3 Aim 3 and will further synergize with the integration efforts in collaboration with the Lopez lab at Vanderbilt (C&SP33).

At the same time the Sorger lab is also a participating site in the NIH LINCS project, which uses imaging, multiplex biochemical assays and measurement of cell state to develop multi-factorial signatures of cellular responses to drugs and growth factors. The datasets generated by this project could enable calibration of large scale models such as REM by providing extensive characterization of cell processes, but several challenges exist: 1) the data derived from images must be formulated in a way that enables comparison with model outputs and between different sets of experiments; 2) the underlying signaling processes must be modeled with spatial resolution to describe many of the observed features; and 3) the effects of cellular heterogeneity must be accounted. The first challenge will drive the development of point process models in TR&D4 Aim 1, which will enable determination of spatial relationships of proteins to each other and to common landmarks such as the nuclear and cell boundaries. The resulting models can then be combined with coarse-grained (CG) spatial simulations being developed in Aim 1 of TR&D3, which can be used in combination with technologies developed in Aims 2 and 3 of TR&D3 to calibrate specific models of signaling processes. The generative models will also enable characterization of subpopulations. Finally, the computationally intensive calibration process will benefit from the use of WE-based sampling methods being developed in Aim 3 of TR&D1.

E. Innovation: This DBP will result in (i) development of integrated tools for model development visualization, calibration, and analysis that are much needed for any modeling project, especially on the scope of the REM. Previous efforts to develop large-scale models have generally employed generic software components such as wiki’s and proprietary drawing programs,125,126 and calibrated models based on qualitative information,124 or avoided the issue of model calibration altogether;120,127 and (ii) improvement of generative modeling tools that capture biophysical relationships independent of the details of image acquisition and can be used in conjunction with model calibration to study mechanism.

 

F. Approach: Aim 1. Use RuleBender to assist in developing, visualizing and managing the REM and to establish a section of the RuleHub repository dedicated to the REM and its submodels. This aim will drive TR&D3 Aim 2. The REM is comprised of molecular components encoded as structured molecules and interactions, encoded as rules in the pySB framework.128 These model elements are extensively documented. Specific submodels can be selected and simulated using different scenarios that mimic experimental protocols. pySB generates these submodels in BioNetGen language (BNGL) format and uses BioNetGen to generate the complete reaction network or to perform network-free simulations. Thus, the visualization tools to be developed in TR&D3 will also be used on pySB models to

1. Provide zoomable visualizations of the components and interactions included in the REM and in any derived submodel using a combination of contact maps, regulatory graphs, compact rule visualization, and state transition diagrams (see TR&D3 for more details on these methods).

2. Enable interactive browsing of model elements and documentation using network visualizations.

3. Enable comparison of submodels based on composition, network structure, and parameterization.

4. Track and compare the development of specific submodels.

5. Compare submodels' dynamics under different conditions, e.g. mutants in the presence of drugs.

Items 1-3 will be supported by the enhancements proposed for RuleBender and RuleHub repository (TR&D3 Aims 2.1 and 2.2, respectively) The latter will help compare the REM and other models in the literature. Item 4 will be enabled through a RuleHub section dedicated to the REM. Item 5 will provide a major testbed for the model analyzer capabilities to be added to RuleBender in Aim 2.3. We anticipate a close and fruitful collaboration with members of the Sorger lab working on the REM, many of whom have experience with BioNetGen and rule-based modeling. For example, Dr. Lily Chylek and Faeder have co-authored three papers on rule-based modeling prior to her joining the Sorger Lab and Robert Sheehan, PhD student of Dr. Faeder will join the Sorger lab following his expected graduation in August, 2016.

Aim 2. Create and distribute generative models from multiplexed images collected by the HMS LINCS project. This aim will drive Aim 1 of TR&D4. The Sorger lab has developed an inexpensive approach to sequential imaging of antibodies against many proteins,129 and begun to apply this approach for small molecule screens in the LINCS project. Current analysis of the images consists of calculating morphometric features from each channel and clustering to identify subpopulations of cells. We will take advantage of the information in these highly multiplexed images by constructing point process models130 (Aim 1 of TR&D4), which will provide added value to the LINCS data in two ways. First, they will reveal cell subpopulations that might be seen only when relationships between all proteins are considered. Second, they will serve as a universal exchange medium with other LINCS sites, since they capture the biochemical/ structural relationships underlying the images rather than simply using visual features which suffer from dependency on the details of image acquisition (such as microscope camera and objective magnification). We note that the OHSU LINCS team has also expressed interest in adopting such models.

Aim 3. Calibrate submodels using a combination of generative models and parameter estimation of CG spatial models of signaling processes. This aim will drive the development of advanced methods for simulation of rule-based models in Aim 1 of TR&D3, as well as the development of efficient parallel parameter estimation capabilities in Aim 2.3. In addition, it will drive Aim 2 of TR&D4 to infer effective protein-protein interaction parameters by combining generative models of spatial distributions with spatially-resolved rule-based models. We will calibrate REM submodels to specific LINCS data sets that contain measurements, both spatial and non-spatial, of Ras-related signaling events. Model calibration tools in TR&D3 Aim 2.3 will be used as well as the web portal used in Aim 3, and these efforts will take advantage of accelerated simulation capabilities developed in Aim 1.1, CG spatial simulation capabilities developed in Aims 1.2 and 1.3, and ENM- and WE-based methods developed in TR&D1 Aims 1 and 3.

 

Copyright © 2020 National Center for Multiscale Modeling of Biological Systems. All Rights Reserved.