Machine Learning Modeling of Concrete Compressive Strength#
Dataset available on https://archive-beta.ics.uci.edu/dataset/165/concrete+compressive+strength, License is CC BY 4.0
This project consists in developing a machine learning regression model of the concrete compressive strength as a nonlinear function of its ingredients. It is a typical ML dataset found in the literature.
This dataset contains 1030 instances/datapoints, 9 features, including Cement, Blast Furnace Slag, Fly Ash, Water, Superplasticizer, Coarse and Fine Aggregates and concrete Age. The concrete compressive strength is our target/label.
Description of Variables and Dataset#
Inputs (features):
Cement (kg in a m³ mixture)
Blast Furnace Slag (kg in a m³ mixture)
Fly Ash (kg in a m³ mixture)
Water (kg in a m³ mixture)
Superplasticizer (kg in a m³ mixture)
Coarse Aggregate (kg in a m³ mixture)
Fine Aggregate (kg in a m³ mixture)
Age (days) — curing age of the concrete sample
Output (target / label):
Compressive Strength (MPa) — concrete compressive strength at given age
Data file: CSV from UCI dataset
All units follow the dataset as provided. For definitions and context on compressive strength and mixture design, please see the UCI dataset page.
Using the provided dataset, build three different ML regression models to predict Compressive Strength (MPa) from the given input features. Use scikit-learn and machine learning models of your own choosing.
You should:
Formalize the ML problem (features, target, assumptions).
Load and inspect the dataset; perform basic analysis of the data (distributions, correlations, unit checks).
Define and justify your data split (train/test).
Build a pipeline (preprocessing + regressor).
Train/validate at your models (you may try multiple algorithms, but report clearly on your final choice).
Evaluate with MAE, RMSE, and \(R^2\) on the held-out test set; include residual and predicted-vs-true plots.
Provide a short discussion: model choice rationale, performance, physical sanity-checks (e.g., does higher cement content generally increase strength?), limitations.
Note: Feel free to remove non-continuous features (integers) such as age and blast furnace slag.