An Integrated Framework for Production Data Analysis Using Machine Learning and Wavelets

Investigator: Dante Orta (PhD student)

Modeling reservoir response through data-driven methods requires a set of input variables to codify the necessary information to obtain a reservoir’s response from available data such as oil or liquids flow rate and make estimates of future behavior or fill in data gaps. This set of input variables, also referred to as features are fundamental in determining a model’s performance and ability to capture complex behavior. Defining features to capture reservoir response can be a challenge due to the complexity of the physical process, changes in flow regimes or operation modes. In a reservoir a superposition of processes is often happening at the same time. These can be noise, well interference, water injection pressure support, etc. These processes can present a challenge to the modeler to when choosing features but can also be leveraged to inform potential choices for model input variables.

A useful set of features is one that allows the modeling technique to extract maximum information with minimum effort. This means preserving relevant information such as short-term localized events as well as longer duration effects that might be present in data. Wavelet transforms have been shown to have some of those desirable properties, which make them an ideal candidate for using in the design of useful features for modeling reservoir response. This work showcases some of the properties of wavelet transforms and uses them to build a fully data-driven modeling framework for capturing reservoir response from production data.

The framework covers the full data processing pipeline that a modeler must go through in a real-life scenario, and it deals with challenges that appear when using with production data such as data imperfections and discontinuities. In addition to the fitting of the flow rate to pressure mapping, additional goals were established for the full modeling framework. These objectives are:

Use of a full data-driven model. No explicit use of physical equations or reservoir model knowledge is assumed.
Minimize the process of data cleaning and avoid the selection of individual transients for modeling.
Seamlessly deal with the presence of noise in the data. Assume that noise is inherent to the data.
Allow for the use of incomplete data in the model building process. This can be in the form of data discontinuities in time or uneven sampling frequencies between data variables.

The modeling framework introduced the Maximum Overlap Discrete Wavelet Transform Multiresolution Analysis (MODWT-MRA) as a useful transform for decomposing production time series data. Moreover, the research proved that applying the MODWT-MRA is equivalent to decomposing a single well’s data into a set of virtual wells that present simpler behavior when compared to the original flowrate and pressure readings (Figure 1). These virtual wells decomposition is then leveraged with the use of machine learning and deep learning models to capture the reservoir response.

*Figure 1. The MODWT-MRA decomposes a single well’s data into virtual wells*

The framework was tested using a variety of learning methodologies including the linear and neural network-based models, out of which the LassoNet proved to have the best performance. The proposed model can recover both pressure response from flow rate data as well as reconstruct flow rate history using pressure readings with varying amounts of missing data (Figure 2).

The framework was extended to a two well system with both wells operating at the same time. In this scenario, the model’s target was to identify an individual well’s response and encode it to produce that well’s response as it would be in isolation. This process effectively identifies and filters out interference while adequately capturing the well’s pressure response (Figure 3).

Overall, the framework demonstrates how to leverage the properties of wavelets to produce data-driven reservoir response models using imperfect production data time-series. Noise and incomplete data are treated as inherent parts of data and using wavelets, model complexity is kept to the minimum necessary even for more difficult tasks such as interference detection.