Using machine learning models such as Random Forest on GEE

This time, I will introduce the method of training and predicting with machine learning models on Google Earth Engine (GEE).

On GEE, you can use various machine learning models such as Decision Trees, Random Forest, Boosted Trees, k-Nearest Neighbors, SVM, and Naive Bayes classifiers.

While the specific parameters for machine learning models may vary, the overall procedure remains the same.

In this example, I will use Random Forest, but you can try the same process with other models by changing the definition part of the Random Forest. Feel free to experiment with it.

If you only want to see the source code, click here.

Procedure
Source code
Usage data

Procedure

The general procedure involves preparing the ground truth data, formatting the data, defining the model, training the model, and validating its accuracy.

Ground Truth Data Preparation

First, prepare the data. In this case, we will use a dataset from Google Earth Engine that calculates NDVI from MODIS.

Here’s the code:

var dataset = ee.ImageCollection('LANDSAT/LC08/C01/T1_32DAY_NDVI').filterDate('2018-01-01', '2018-12-31');
var ndvi = dataset.select('NDVI').median();

Map.setCenter(131.4325, 31.8442, 10);
Map.addLayer(ndvi);

Next, prepare the ground truth data.

This can be done using ee.Geometry in the code.

You can also import it from the polygon creation tool in Google Earth Engine.

If you find it troublesome to create polygons, try copying and pasting this as is!

var sea = ee.Geometry({
  "type": "Polygon",
  "coordinates": [
    [[131.49705254467017,31.931674468692545],
[131.47507988842017,31.847721354715404],
[131.5039189997483,31.790542852705553],
[131.4874395075608,31.71580743663545],
[131.60142266185767,31.71580743663545],
[131.60142266185767,31.81388529310435],
[131.5808232966233,31.907196050862545],
[131.49705254467017,31.931674468692545]]
  ]
});

var land = ee.Geometry({
  "type": "Polygon",
  "coordinates": [
    [[131.16608940990454,31.949155063780058],
[131.03425347240454,31.827887926453258],
[131.18668877513892,31.672573515050868],
[131.3336309138108,31.74384028743355],
[131.38993584545142,31.816219212793197],
[131.33500420482642,31.916521930540263],
[131.24436699779517,31.971292377027723],
[131.16608940990454,31.949155063780058]]
]
});

Data formatting

When running machine learning models in GEE, it must be a FeatureCollection.

Therefore, first make the polygon a FeatureCollection as shown below.

var polygons = ee.FeatureCollection([
  ee.Feature(sea, {"class": 0}),
  ee.Feature(land, {"class": 1}),
]);

For Class, specify the class label you want to classify.

This time, we will classify it into two classes: sea and land.

Next, get the polygon range data from the image using FeatureCollection.

var training = ndvi.sampleRegions({
  collection: polygons,
  properties: ['class'],
  scale: 100
});

The sampleRegions function is a function that extracts specified polygons from an image.

100 is specified in the Scale argument, which is the resolution when extracting.

If you make it smaller, you can retrieve more data, but be careful as it may cause memory errors.

When using a machine learning model, you basically shuffle the data and split it into training data and test data.

To do this with GEE, use the randomColumn function and add a value between 0 and 1 to the retrieved data, then use the Filter function to retrieve the training data and test data.

If you write it in code, it will look like this

var sample = training.randomColumn();

// create train data
var train_data = sample.filter(ee.Filter.lt('random', 0.8));
// create test data
var test_data = sample.filter(ee.Filter.gte('random', 0.8));

The return value of the randomColumn function is divided into 80% and 20% data using the Filter function.

Machine learning model definition

Next, we will define and train the machine learning model.

Machine learning models that can be used with GEE include Decision Trees, Random Forest, Boosted Trees, k-Nearest Neighbors, SVM, and Naive Bayes classifiers..

This time we will use Random Forest. Random forest can be defined with ee.Classifier.smileRandomForest When learning, use the train function on the model after it is defined.

When using the train function, set the property of the correct label of the training data (class in this case) and the name of the band used for training.

This time, the train_data variable is specified as the training data in the features argument.

// randomforest definition
var classifier = ee.Classifier.smileRandomForest(10);

var bands = ndvi.bandNames();
// train randomforest
var trained_rdf = classifier.train({
      features: train_data,
      classProperty: 'class',
      inputProperties: bands
});

Make predictions after learning

When making predictions, use the classify function that can be used with FeatureCollection.

If you give the trained model as an argument to the classify function, you can predict the data.

// predict test data
var test_result = test_data.classify(trained_rdf);

Accuracy verification

Finally, we will verify the accuracy

When verifying accuracy, you can use Kappa coefficient, accuracy rate, etc.

These accuracy indicators can only be used in a mixing matrix, so first prepare a mixing matrix for the test data.

The confusion matrix can be obtained by using the errorMatrix function for the return value of the classify function.

// get erro matriix
var test_confusion_matrix = test_result.errorMatrix('class', 'classification');

Each evaluation index can be calculated on GEE by using the accuracy function and kappa function on the confusion matrix obtained as shown below.

// view matrix
print(test_confusion_matrix);
// accuracy
print(test_confusion_matrix.accuracy());

// kappa
print(test_confusion_matrix.kappa());

Now you can see the accuracy against the test data.

I think the correct answer rate this time will be around 99%.

Source code

Finally, I will introduce the entire source code.

The polygons are also written in the source code, so I think it will work just by copying and pasting!

If you want to run it on GEE right away, you can also run it from here, so please check it out!

var dataset = ee.ImageCollection('LANDSAT/LC08/C01/T1_32DAY_NDVI').filterDate('2018-01-01', '2018-12-31');
var ndvi = dataset.select('NDVI').median();

Map.setCenter(131.4325, 31.8442, 10);
Map.addLayer(ndvi);

var sea = ee.Geometry({
  "type": "Polygon",
  "coordinates": [
    [[131.49705254467017,31.931674468692545],
[131.47507988842017,31.847721354715404],
[131.5039189997483,31.790542852705553],
[131.4874395075608,31.71580743663545],
[131.60142266185767,31.71580743663545],
[131.60142266185767,31.81388529310435],
[131.5808232966233,31.907196050862545],
[131.49705254467017,31.931674468692545]]
  ]
});

var land = ee.Geometry({
  "type": "Polygon",
  "coordinates": [
    [[131.16608940990454,31.949155063780058],
[131.03425347240454,31.827887926453258],
[131.18668877513892,31.672573515050868],
[131.3336309138108,31.74384028743355],
[131.38993584545142,31.816219212793197],
[131.33500420482642,31.916521930540263],
[131.24436699779517,31.971292377027723],
[131.16608940990454,31.949155063780058]]
]
});

var polygons = ee.FeatureCollection([
  ee.Feature(sea, {"class": 0}),
  ee.Feature(land, {"class": 1}),
]);

var training = ndvi.sampleRegions({
  collection: polygons,
  properties: ['class'],
  scale: 100
});

var sample = training.randomColumn();

var train_data = sample.filter(ee.Filter.lt('random', 0.8));
var test_data = sample.filter(ee.Filter.gte('random', 0.8));

var classifier = ee.Classifier.smileRandomForest(10);

var bands = ndvi.bandNames();
var trained_rdf = classifier.train({
      features: train_data,
      classProperty: 'class',
      inputProperties: bands
});

var test_result = test_data.classify(trained_rdf);
var test_confusion_matrix = test_result.errorMatrix('class', 'classification');

print(test_confusion_matrix);
print(test_confusion_matrix.accuracy());
print(test_confusion_matrix.kappa());
print(test_confusion_matrix.fscore());

Usage data

MODIS Terra Daily NDVI