IPM Papers

The Inductive Process Modeling (IPM) research effort has been in progress for more than a decade, generating dozens of papers. This page provides abstracts, citation information, and links to readable copies for many of these papers. Suggestions (eg, additions, corrections) are welcome...

2002

Inducing Process Models from Continuous Data

Abstract

In this paper, we pose a novel research problem for machine learning that involves constructing a process model from continuous data. We claim that casting learned knowledge in terms of processes with associated equations is desirable for scientific and engineering domains, where such notations are commonly used. We also argue that existing induction methods are not well suited to this task, although some techniques hold partial solutions.

In response, we describe an approach to learning process models from time-series data and illustrate its behavior in a population dynamics domain. In closing, we describe open issues in process model induction and encourage other researchers to tackle this important problem.

Inducing Process Models from Continuous Data
Langley, P., Sanchez, J., Todorovski, L., & Dzeroski, S. (2002)
Proceedings of the Nineteenth International Conference
on Machine Learning
(pp. 347-354).
Sydney, Australia: Morgan Kaufmann. (PDF, PS)

2003

Robust Induction of Process Models from Time-Series Data

Abstract

In this paper, we revisit the problem of inducing a process model from time-series data. We illustrate this task with a realistic ecosystem model, review an initial method for its induction, then identify three challenges that require extension of this method.

These include dealing with unobservable variables, finding numeric conditions on processes, and preventing the creation of models that overfit the training data. We describe responses to these challenges and present experimental evidence that they have the desired effects.

After this, we show that this extended approach to inductive process modeling can explain and predict time-series data from batteries on the International Space Station. In closing, we discuss related work and consider directions for future research.

Robust Induction of Process Models from Time-Series Data
Langley, P., George, D., Bay, S., & Saito, K. (2003)
Proceedings of the Twentieth International Conference
on Machine Learning
(pp. 432-439).
(PDF, PS)

An Interactive Environment for Scientific Model Construction

Abstract

Most AI research on scientific model construction aims to automate this process using discovery techniques. In contrast, we describe an interactive environment for model construction that lets the user construct, edit, and visualize scientific models, use them to make predictions, and call on discovery methods to revise them in ways that better fit the available data.

The environment relies on a new formalism that embeds mathematical equations, which are familiar to many scientists, within distinct processes, which can encode background knowledge used to constrain model revision.

We report initial studies on ecosystem modeling that suggest this environment is more effective than earlier approaches and more transparent to users. In closing, we discuss related work on modeling environments and model revision, then suggest directions for future research.

An Interactive Environment for Scientific Model Construction
Sanchez, J. N., & Langley, P. (2003)
Proceedings of the Second International Conference on Knowledge Capture (pp. 138-145).
Sanibel Island, FL, USA: ACM Press. (PDF, PS)

Discovering Ecosystem Models from Time-Series Data

Abstract

Ecosystem models are used to interpret and predict the interactions of species and their environment. In this paper, we address the task of inducing ecosystem models from background knowledge and time-series data, and we review IPM, an algorithm that addresses this problem.

We demonstrate the system's ability to construct ecosystem models on two different Earth science data sets. We also compare its behavior with that produced by a more conventional autoregression method. In closing, we discuss related work on model induction and suggest directions for further research on this topic.

Discovering Ecosystem Models from Time-Series Data
George, D., Saito, K., Langley, P., Bay, S., & Arrigo, K. (2003)
Proceedings of the Sixth International Conference on Discovery Science (pp. 141-152).
Saporro, Japan: Springer. (PDF, PS)

2004

Introduction: Lessons Learned from Data Mining Applications and Collaborative Problem Solving

Abstract

This introductory paper to the special issue on Data Mining Lessons Learned presents lessons from data mining applications, including experience from science, business, and knowledge management in a collaborative data mining setting.

Introduction: Lessons Learned from Data Mining Applications and Collaborative Problem Solving
Lavrac, N., Motoda, H., Fawcett, T., Holte, R., Langley, P., & Adriaans, P. (2004)
Machine Learning, 57, 13-34.
(PDF, PS)

Inducing Explanatory Process Models from Biological Time Series

Abstract

We address the task of inducing explanatory models from observations and knowledge about candidate biological processes, using the illustrative problem of modeling photosynthesis regulation. We cast both models and background knowledge in terms of processes that interact to account for behavior.

We also describe IPM, an algorithm for inducing quantitative process models from such input, and we demonstrate its use on the photosynthesis domain. In closing, we consider the generality of our approach, discuss related research on biological modeling, and suggest directions for future work.

Inducing Explanatory Process Models from Biological Time Series
Langley, P., Shrager, J., Asgharbeygi, N., Bay, S., & Pohorille, A. (2004)
Proceedings of the Ninth Workshop on Intelligent Data Analysis and Data Mining (pp. 85-90). Stanford, CA, USA
(PDF)

Computational Revision of Ecological Process Models

Most ecological models are developed manually by scientists, who decide on their basic structure, tune their parameters, compare them against available data, and refine them in response. In contrast, most work on computational scientific discovery has emphasized the automated generation of models from data and background knowledge.

We believe that computational tools for model revision offer great practical value to scientists by decreasing the time required to search for models while letting them retain control over the search space. ...

Computational Revision of Ecological Process Models
Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2004)
Proceedings of the Fourth International Workshop on Environmental Applications of Machine Learning (pp. 13-14). Bled, Slovenia
(PDF)

2005

Inducing Hierarchical Process Models in Dynamic Domains

Abstract

Research on inductive process modeling combines background knowledge with time-series data to construct explanatory models, but previous work has placed few constraints on search through the model space.

We present an extended formalism that organizes process knowledge in a hierarchical manner, and we describe HIPM, a system that carries out constrained search for hierarchical process models. We report experiments that suggest this approach produces more accurate and plausible models with less effort. We conclude by discussing related research and directions for future work.

Inducing Hierarchical Process Models in Dynamic Domains
Todorovski, L., Bridewell, W., Shiran, O., & Langley, P. (2005)
Proceedings of the Twentieth National Conference on Artificial Intelligence (pp. 892-897). Pittsburgh, PA, USA: AAAI Press.
(PDF)

Reducing Overfitting in Process Model Induction

Abstract

In this paper, we review the paradigm of inductive process modeling, which uses background knowledge about possible component processes to construct quantitative models of dynamical systems. We note that previous methods for this task tend to over fit the training data, which suggests ensemble learning as a likely response. However, such techniques combine models in ways that reduce comprehensibility, making their output much less accessible to domain scientists.

As an alternative, we introduce a new approach that induces a set of process models from different samples of the training data and uses them to guide a final search through the space of model structures. Experiments with synthetic and natural data suggest this method reduces error and decreases the chance of including unnecessary processes in the model. We conclude by discussing related work and suggesting directions for additional research.

Reducing Overfitting in Process Model Induction
Bridewell, W., Bani Asadi, N., Langley, P., & Todorovski, L. (2005)
Proceedings of the Twenty-Second International Conference on Machine Learning (pp. 81-88). Bonn, Germany.
(PDF)

2006

Inductive Revision of Quantitative Process Models

Abstract

Most research on computational scientific discovery has focused on developing an initial model, but an equally important task involves revising a model in response to new data.

In this paper, we present an approach that represents candidate models as sets of quantitative processes and that treats revision as search through a model space which is guided by time-series observations and constrained by background knowledge cast as generic processes that serve as templates for the specific processes used in models.

We demonstrate our system's ability on three different scientific domains and associated data sets. We also discuss its relation to other work on model revision and consider directions for additional research.

Inductive Revision of Quantitative Process Models
Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2006)
Ecological Modelling, 194, 70-79
(PDF)

Constructing Explanatory Process Models from Biological Data and Knowledge

Abstract

We address the task of inducing explanatory models from observations and knowledge about candidate biological processes, using the illustrative problem of modeling photosynthesis regulation. We cast both models and background knowledge in terms of processes that interact to account for behavior.

We also describe IPM, an algorithm for inducing quantitative process models from such input, and we demonstrate its use both on photosynthesis and on a second domain, biochemical kinetics. In closing, we consider the generality of our approach, discuss related research on biological modeling, and suggest directions for future work.

Constructing Explanatory Process Models from Biological Data and Knowledge
Langley, P., Shiran, O., Shrager, J., Todorovski, L., & Pohorille, A. (2006)
AI in Medicine, 37, 191-201
(PDF)

Learning Process Models with Missing Data

Abstract

In this paper, we review the task of inductive process modeling, which uses domain knowledge to compose explanatory models of continuous dynamic systems. Next we discuss approaches to learning with missing values in time series, noting that these efforts are typically applied for descriptive modeling tasks that use little background knowledge. We also point out that these methods assume that data are missing at random -- a condition that may not hold in scienti c domains.

Using experiments with synthetic and natural data, we compare an expectation maximization approach with one that simply ignores the missing data. Results indicate that expectation maximization leads to more accurate models in most cases, even though its basic assumptions are unmet. We conclude by discussing the implications of our findings along with directions for future work.

Learning Process Models with Missing Data
Bridewell, W., Langley P., Racunas, S., & Borrett, S. R. (2006)
Proceedings of the Seventeenth European Conference on Machine Learning (pp. 557-565)
Berlin, Germany: Springer (PDF)

An Interactive Environment for the Modeling and Discovery of Scientific Knowledge

Abstract

Existing tools for scientific modeling offer little support for improving models in response to data, whereas computational methods for scientific knowledge discovery provide few opportunities for user input.

In this paper, we present a language for stating process models and background knowledge in terms familiar to scientists, along with an interactive environment for knowledge discovery that lets the user construct, edit, and visualize scientific models, use them to make predictions, and revise them to better fit available data.

We report initial studies in three domains that illustrate the operation of this environment and the results of a user study carried out with domain scientists. Finally, we discuss related research on modeling formalisms and model revision, and we suggest priorities for additional research.

An Interactive Environment for the Modeling and Discovery of Scientific Knowledge
Bridewell, W., Sanchez, J. N., Langley, P., & Billman, D. (2006)
International Journal of Human-Computer Studies, 64, 1099-1114
Berlin, Germany: Springer (PDF)

2007

A Constraint Language for Process Model Induction

Abstract

We define the inductive process modeling task as the automated construction of quantitative process models from time series and background knowledge. In this task, the background knowledge comprises generic processes that along with a given set of entities define the space of candidate model structures. Typically this space grows exponentially with the size of the library, so past research introduced a hierarchical organization on the processes to constrain that space to a limited set of plausible configurations.

However, organizing the processes into a hierarchy takes considerable effort, leads to implicit constraints, and creates a complex relationship between the knowledge of what processes exist and the knowledge of how one can combine them. To address these problems, we developed SC-IPM, an inductive process modeler that uses declarative constraints to reduce the size of the model structure space. In this paper, we describe the constraint formalism and how it guides SC-IPM's search.

A Constraint Language for Process Model Induction
Matt Bravo, Will Bridewell, Ljupco Todorovski (2007)
(PDF)

A Method for Representing and Developing Process Models

Abstract

Scientists investigate the dynamics of complex systems with quantitative models, employing them to synthesize knowledge, to explain observations, and to forecast future system behavior. Complete specification of systems is impossible, so models must be simplified abstractions. Thus, the art of modeling involves deciding which system elements to include and determining how they should be represented. We view modeling as search through a space of candidate models that is guided by model objectives, theoretical knowledge, and empirical data.

In this contribution, we introduce a method for representing process-based models that facilitates the discovery of models that explain observed behavior. This representation casts dynamic systems as interacting sets of processes that act on entities. Using this approach, a modeler first encodes relevant ecological knowledge into a library of generic entities and processes, then instantiates these theoretical components, and finally assembles candidate models from these elements. We illustrate this methodology with a model of the Ross Sea ecosystem.

A Method for Representing and Developing Process Models
Borrett, S. R., Bridewell, W., Langley, P., & Arrigo, K. R. (2007)
Ecological Complexity, 4, 1-12
(PDF)

Learning Declarative Bias

Abstract

In this paper, we introduce an inductive logic programming approach to learning declarative bias. The target learning task is inductive process modeling, which we briefly review. Next we discuss our approach to bias induction while emphasizing predicates that characterize the knowledge and models associated with the HIPM system. We then evaluate how the learned bias affects the space of model structures that HIPM considers and how well it generalizes to other search problems in the same domain.

Results indicate that the bias reduces the size of the search space without removing the most accurate structures. In addition, our approach reconstructs known constraints in population dynamics. We conclude the paper by discussing a generalization of the technique to learning bias for inductive logic programming and by noting directions for future work.

Learning Declarative Bias
Bridewell, W., & Todorovski, L. (2007)
Proceedings of the Seventeenth International Conference on Inductive Logic Programming
Corvallis, OR, USA (PDF)

Extracting Constraints for Process Modeling

Abstract

In this paper, we introduce an approach for extracting constraints on process model construction. We begin by clarifying the type of knowledge produced by our method and how one may apply it. Next, we review the task of inductive process modeling, which provides the required data.

We then introduce a logical formalism and a computational method for acquiring scientific knowledge from candidate process models. Results suggest that the learned constraints make sense ecologically and may provide insight into the nature of the modeled domain. We conclude the paper by discussing related and future work.

Extracting Constraints for Process Modeling
Bridewell, W., & Todorovski, L. (2007)
Proceedings of the Fourth International Conference on Knowledge Capture
Whistler, BC, Canada (PDF)

2008

Inductive Process Modeling

Abstract

In this paper, we pose a novel research problem for machine learning that involves constructing a process model from continuous data. We claim that casting learned knowledge in terms of processes with associated equations is desirable for scientific and engineering domains, where such notations are commonly used. We also argue that existing induction methods are not well suited to this task, although some techniques hold partial solutions.

In response, we describe an approach to learning process models from time-series data and illustrate its behavior in three domains. In closing, we describe open issues in process model induction and encourage other researchers to tackle this important problem.

Inductive Process Modeling
Bridewell, W., & Todorovski, L. (2007)
Proceedings of the Fourth International Conference on Knowledge Capture
Whistler, BC, Canada (PDF)

Processes and Constraints in Explanatory Scientific Discovery

In previous publications, we have reported a computational approach to constructing explanatory process models of dynamic systems from time-series data and background knowledge. We have not aimed to mimic the detailed behavior of human researchers, but we maintain that our systems address the same tasks as ecologists, biologists, and other theory-guided scientists, and that they carry out search through similar problem spaces. ...

Processes and Constraints in Explanatory Scientific Discovery
Langley, P., & Bridewell, W. (2008)
Proceedings of the Thirtieth Annual Meeting of the Cognitive Science Society
Washingon, D.C., USA (PDF)

2009

Supporting Innovative Construction of Explanatory Scientific Models

Abstract

Scientific modeling is a creative activity that can benefit from computational support. This chapter reports five challenges that arise in developing such aids, as illustrated by PROMETHEUS, a software environment that supports the construction and revision of explanatory models. These challenges include the paucity of relevant data, the need to incorporate prior knowledge, the importance of comprehensibility, an emphasis on explanation, and the practicality of user interaction.

The responses to these challenges include the use of quantitative processes to encode models and background knowledge, as well as the combination of AND/OR search through a space of model structures with gradient descent to estimate parameters. This chapter reports our experiences with PROMETHEUS on three scientific modeling tasks and some lessons we have learned from those efforts. This chapter concludes by noting additional challenges that were not apparent at the outset of our work.

Supporting Innovative Construction of Explanatory Scientific Models
Bridewell, W., Borrett, S. R., & Langley, P. (2009)
In A. B. Markman & K. L. Wood (Eds.), Tools for Innovation.
Oxford, UK: Oxford University Press.

2010

Two Kinds of Knowledge in Scientific Discovery

Abstract

Research on computational models of scientific discovery investigates both the induction of descriptive laws and the construction of explanatory models. Although the work in law discovery centers on knowledge-lean approaches to searching a problem space, research on deeper modeling tasks emphasizes the pivotal role of domain knowledge. As an example, our own research on inductive process modeling uses information about candidate processes to explain why variables change over time.

However, our experience with IPM, an artificial intelligence system that implements this approach, suggests that process knowledge is insufficient to avoid consideration of implausible models. To this end, the discovery system needs additional knowledge that constrains the model structures. We report on an extended system, SC-IPM, that uses such information to reduce its search through the space of candidates and to produce models that human scientists find more plausible. We also argue that although people carry out less extensive search than SC-IPM, they rely on the same forms of knowledge -- processes and constraints -- when constructing explanatory models.

Two Kinds of Knowledge in Scientific Discovery
Bridewell, W., & Langley, P. (2010)
Topics in Cognitive Science, 2, 36-52

The Induction and Transfer of Declarative Bias

Abstract

People constantly apply acquired knowledge to new learning tasks, but machines almost never do. Research on transfer learning attempts to address this dissimilarity. Working within this area, we report on a procedure that learns and transfers constraints in the context of inductive process modeling, which we review. After discussing the role of constraints in model induction, we describe the learning method, MISC, and introduce our metrics for assessing the cost and benefit of transferred knowledge.

The reported results suggest that cross-domain transfer is beneficial in the scenarios that we investigated, lending further evidence that this strategy is a broadly effective means for increasing the efficiency of learning systems. We conclude by discussing the aspects of inductive process modeling that encourage effective transfer, by reviewing related strategies, and by describing future research plans for constraint induction and transfer learning.

The Induction and Transfer of Declarative Bias
Bridewell, W., & Todorovski, L. (2010)
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (pp. 401-406)
Atlanta, GA, USA: AAAI Press. (PDF)

Integrated Systems for Inducing Spatio-Temporal Process Models

Abstract

Quantitative modeling plays a key role in the natural sciences, and systems that address the task of inductive process modeling can assist researchers in explaining their data. In the past, such systems have been limited to data sets that recorded change over time, but many interesting problems involve both spatial and temporal dynamics.

To meet this challenge, we introduce SCISM, an integrated intelligent system which solves the task of inducing process models that account for spatial and temporal variation. We also integrate SCISM with a constraint learning method to reduce computation during induction. Applications to ecological modeling demonstrate that each system fares well on the task, but that the enhanced system does so much faster than the baseline version.

Integrated Systems for Inducing Spatio-Temporal Process Models
Park, C., Bridewell, W., & Langley, P. (2010)
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence
Atlanta, GA, USA: AAAI Press. (PDF)

2011

Combining Data-Driven and Knowledge-Guided Methods to Induce Interpretable Physiological Models

Abstract

In this paper, we review the paradigm of inductive process modeling and examine its application to human physiology. This framework represents models as a set of interacting processes, each with associated differential or algebraic equations that express causal relations among variables. Simulating such a quantitative process model produces trajectories for variables over time that one can compare to observations. Background knowledge about candidate processes enables search through the space of model structures and their associated parameters, and thus identify quantitative models that explain time-series data.

We present an initial process model for aspects of human physiology, consider its uses for health monitoring, and discuss the induction of such models. In closing, we consider related efforts on physiological modeling and our plans for collecting data to evaluate our framework in this domain.

Combining Data-Driven and Knowledge-Guided Methods to Induce Interpretable Physiological Models
Pat Langley, Will Bridewell (2011)
Computational Physiology - Papers from the AAAI 2011 Spring Symposium (SS-11-04)
Stanford, CA, USA: AAAI Press. (PDF)

2012

Discovering Constraints for Inductive Process Modeling

Abstract

Scientists use two forms of knowledge in the construction of explanatory models: generalized entities and processes that relate them; and constraints that specify acceptable combinations of these components. Previous research on inductive process modeling, which constructs models from knowledge and time-series data, has relied on handcrafted constraints.

In this paper, we report an approach to discovering such constraints from a set of models that have been ranked according to their error on observations. Our approach adapts inductive techniques for supervised learning to identify process combinations that characterize accurate models.

We evaluate the method's ability to reconstruct known constraints and to generalize well to other modeling tasks in the same domain. Experiments with synthetic data indicate that the approach can successfully reconstruct known modeling constraints. Another study using natural data suggests that transferring constraints acquired from one modeling scenario to another within the same domain considerably reduces the amount of search for candidate model structures while retaining the most accurate ones.

Discovering Constraints for Inductive Process Modeling
Todorovski, L., Bridewell, W., & Langley, P. (2012)
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence
Toronto, Canada: AAAI Press. (PDF)

Combining Data with Knowledge To Construct Interpretable Scientific Models

Abstract

Early research in e-science emphasized representing and simulating models that reflected scientists' knowledge, but these models often made little contact with data. Recent work in e-science has utilized machine learning and data mining to uncover regularities in data, but makes few connections to scientists' knowledge. In this talk, I present an approach known as inductive process modeling that combines these two traditions. The paradigm encodes scientific models as sets of processes that incorporate differential equations, induces the models from time-series data, and uses background knowledge to guide their construction.

The resulting models are interpretable, but they are also accurate, in that they match observations. I illustrate this approach in the context of ecology and environmental science, and I report extensions that increase the plausibility of induced models and efficiency at finding them. In addition, I report an interactive software environment for the construction, evaluation, and revision of such interpretable scientific models. This talk describes joint work at Stanford University and ISLE with Kevin Arrigo, Stuart Borrett, Matt Bravo, Will Bridewell, and Ljupco Todorovski.

CMSV-TOC: Pat Langley 2012-11-13 (video)

See Also


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r19 - 04 Apr 2016, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email