N479 Applied Statistical Modeling and Big Data Analytics

Course Facts

Course Code:
3 days
2.4 Continuing Education Units
24 Professional Development Hours
Certificate Issued Upon Completion


Buisness Impact: This training course will provide a hands-on introduction to statistical modeling and big data analytics so that particpants can use them  for petroleum engineering and geoscience applications.

Topics to be covered include: (a) easy-to-understand descriptions of the commonly-used techniques, (b) case studies demonstrating the applicability, limitations and value-added proposition for these methods, and (c) hands-on problems sessions using open source and/or commercial software. This course will provide engineers and geologists with practical techniques for identifying hidden patterns and relationships in large datasets and extracting data-driven insights towards actionable information that can contribute to lower cost, improved efficiency and/or increased productivity in oil and gas operations. This class will arm petroleum engineers and geoscientists with advanced capabilities to extract new insights from E&P data that can help: (a) learn hidden patterns and relationships in geologic datasets, (b) identify production sweet spots in developed plays; (c) determine factors responsible for separating good wells from poor producers wells, (d) build fast surrogate models of reservoir performance, and (e) assist in predictive maintenance by identifying failure inducing conditions from historical records.

For a more in depth summary of D479 please use the following link to watch Dr. Mishra discuss his course in detail:


Duration and Training Method

A three-day classroom course consisting of lectures interspersed with worked examples, hands-on exercises and discussions (Days 1-2).  Participants will then build and present a machine learning driven model as a capstone group projects (Day 3).

Participants will learn to:
  1. Apply foundational concepts in probability and statistics for basic data analysis
  2. Perform linear regression for building simple input-output models
  3. Conduct multivariate data reduction and clustering for finding sub-groups of data that have similar attributes
  4. Apply machine learning techniques for regression and classification for developing data-driven input-output models
  5. Converse with confidence about big data, data analytics and machine learning terminology and fundamental concepts, and critique statistical modeling and data analytics studies
  1.  Foundational Concepts (Day 1)
    • Big data technologies, basic data analytics and machine learning terminology/concepts
    • Data, statistics, and probability
    • Distributions (models, fitting distributions to data)
    • Inference (Confidence limits, bootstrap, significance tests, Analysis of variance)
  2. Basic Regression Analysis (Day 1)
    • Linear regression (univariate and multivariate regression)
    • Understanding regression statistics
    • Non-parametric regression
  3. Multivariate Data Analysis (Day 1) 
    • Dimension reduction (Principal component analysis)
    • Cluster analysis (K-means, Hierarchical clustering, self-organizing maps)
    • Data visualization
  4. Machine Learning Basics (Day 2)
    • Overview of techniques
    • Evaluating model performance (model validation, goodness-of-fit, common pitfalls)
    • Variable importance
    • Model aggregation
  5. Machine Learning for Regression and Classification (Day 2)
    • Tree-based methods (decision trees, Random forest, Gradient boosting machine)
    • Advanced methods (Neural network, Support vector machine 
  6.  Group Projects (Day 3)
    • Participants will be divided into multi-disciplinary groups, and will develop and present a machine learning model as a capstone project  
  7. Wrap-up (Day 3)
    • Key takeaways and resources
    • Data analytics do’s and don’t’s

Who should attend

This course is for designed for petroleum engineers, geoscientists, and managers interested in becoming smart users of statistical modeling and data analytics.

Prerequisites and linking courses

Participants should have a basic knowledge of statistics or should have attended N480 (Introduction to Statistical Modeling & Big Data Analytics).

Click on a name to learn more about the instructor

Srikanta Mishra