Abstract:In response to computing problems of massive remote sensing images, a method based on Apache Spark is proposed and implemented in retrieving MODIS Sea Surface Temperature (SST) by optimizing and improving the image acquisition, algorithm, and computing process. It applied four bouts of network requests to acquire user-defined data of specific time and zones to improve the efficiency of image acquisition. For a parallelizable algorithm, improvements that reduce parameters and simplify intermediate models are added to the split window algorithm, thus to adapt to fast parallelized computing. Taking advantage of narrow dependence between Resilient Distributed Datasets (RDD), delays for partitions' interactions are evaded. With comparison between single mode and cluster mode, the latter incorporated with Apache Spark has an efficiency of ten times to the former. This study proves that, comparing with a single machine's, programs that retrieving MODIS SST with cluster computing techniques has a higher efficiency.