This dataset was collected for the purpose of developing an imaging technique for estimating soil aggregate size distribution. The dataset contains images of 4 soil samples, cropped and rectified to a top-down perspective. Images of samples are taken as-dug as well as dry. Images of each samples were taken both by a professional camera (resulting in rectified images of 2500 x 2500) as well as a phone camera (2000 x 2000).

We took images of each sample at three different scales. The soil sample (of approximately 400g) was fit on a square surface of pre-set dimensions, and the camera adjusted at a fixed height for each of the scales in such a manner that the soil sample covered as much of the image as possible. At each scale, the soil sample was reconfigured (i.e. taken off the square surface, mixed and distributed on the surface again) so that images of two different configurations are taken. For phone images, the camera height was approximately the same but not completely fixed. Instead, the phone pictures are as close as possible so that the sample still fits in the image, resembling a real-world scenario. The settings for the different scales, and the properties of the produced images are listed in the table.

Scale close middle far
Area [cm^2] 15 x 15 20 x 20 25 x 25
Camera height [cm] 60 78 94
Image resolution camera [px/mm] 16.7 12.5 10
Image resolution phone [px/mm] 13.3 10 8


We provide the ground truth for performing classification and regression experiments. For the regression experiments, we provide the measures of sample weight (in g) and sample volume (in cm^3). For the classification experiments, the ground truth is provided for in two different manners:

  • same scale: the samples are classified as the same only if they belong to the same sample and images were taken at the same scale,
  • all scales: the samples are classified as the same if they belong to the same sample, regardless of the image scale.


Examples of all the soil samples A — D at the far scale (25 x 25 cm^2):

Soil A B C D



We provide:

  • Soil aggregate size distribution dataset, collected 2018 in Lincoln (4 samples at 3 different scales in 2 configurations each, taken by camera and phone),
  • classification ground truth for the same scale and across all scales,
  • regression ground truth in terms of sample weight and volume distribution.


If you are considering using this data, please reference the following:

Petra Bosilj, Iain Gould, Tom Duckett, and Grzegorz Cielniak: “Pattern Spectra from Different Component Trees for Estimating Soil Size Distribution”, Mathematical Morphology and its Applications to Signal and Image Processing (proceedings of International Symposium on Mathematical Morphology) (2019)
    author = {Bosilj, Petra and Gould, Iain and Duckett, Tom and Cielniak, Grzegorz},
    title = {Pattern Spectra from Different Component Trees for Estimating Soil Size Distribution},
    booktitle = {Mathematical Morphology and its Applications to Signal and Image Processing (proceedings of International Symposium on Mathematical Morphology)},
    year = 2019

Recording platform:


Soil aggregate size distribution 2018

*If you want more information about the dataset, or are thinking of using these, please consider sending us an email (Petra Bosilj) telling a little more about your ideas.


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Copyright (c) 2018 Petra Bosilj, Iain Gould, Grzegorz Cielniak, and Tom Duckett.