After improving the raster classification script, in this entry, is the turn for the PNG creation script.
As usual, all the code is at GitHub
The improvement is really big and, sometimes, the old version will crash because not enough memory will be available.
As usual, all the code is at GitHub
Problems in the original script
- The PIL python module was used to draw the PNG image pixel by pixel. This can be changed using the SciPy function scipy.misc.imsave. Using it, all the iterations, comparisons, etc. are done at a lower level, so the script is cleaner and much faster.
- GDAL blocks weren't used, so all the image was loaded into a huge array if the original raster was big. Using blocks as in the last entry example, performance improves a lot.
- All the values in the color file were evaluated. If the minimum and maximum values of the raster are known, many of these comparisons can be avoided, so the performance there is also improved.
The new script
The new script works exactly the same way as the old one at the user level.
''' Script to generate a png from a raster file. It can be used from the command line or from an other python script ''' from os.path import exists from osgeo import gdal from osgeo.gdalconst import GA_ReadOnly from sys import argv from sys import exit from numpy import logical_and from numpy import zeros from numpy import uint8 import scipy ''' Main function. Requires the input file, the color file and the otuput file name. The raster band is 1 by default. If nothing is asked, the discrete colroscale image is created. ''' def raster2png(raster_file, color_file, out_file_name, raster_band=1, discrete=True): if exists(raster_file) is False: raise Exception('[Errno 2] No such file or directory: \'' + raster_file + '\'') dataset = gdal.Open(raster_file, GA_ReadOnly ) if dataset == None: raise Exception("Unable to read the data file") band = dataset.GetRasterBand(raster_band) block_sizes = band.GetBlockSize() x_block_size = block_sizes[0] y_block_size = block_sizes[1] xsize = band.XSize ysize = band.YSize max_value = band.GetMaximum() min_value = band.GetMinimum() if max_value == None or min_value == None: stats = band.GetStatistics(0, 1) max_value = stats[1] min_value = stats[0] #Reading the color table color_table = readColorTable(color_file) #Adding an extra value to avoid problems with the last & first entry if sorted(color_table.keys())[0] > min_value: color_table[min_value - 1] = color_table[sorted(color_table.keys())[0]] if sorted(color_table.keys())[-1] < max_value: color_table[max_value + 1] = color_table[sorted(color_table.keys())[-1]] #Preparing the color table and the output file classification_values = color_table.keys() classification_values.sort() rgb = zeros((ysize, xsize, 4), dtype = uint8) for i in range(0, ysize, y_block_size): if i + y_block_size < ysize: rows = y_block_size else: rows = ysize - i for j in range(0, xsize, x_block_size): if j + x_block_size < xsize: cols = x_block_size else: cols = xsize - j values = band.ReadAsArray(j, i, cols, rows) r = zeros((rows, cols), dtype = uint8) g = zeros((rows, cols), dtype = uint8) b = zeros((rows, cols), dtype = uint8) a = zeros((rows, cols), dtype = uint8) for k in range(len(classification_values) - 1): #print classification_values[k] if classification_values[k] < max_value and (classification_values[k + 1] > min_value ): mask = logical_and(values >= classification_values[k], values < classification_values[k + 1]) if discrete == True: r = r + color_table[classification_values[k]][0] * mask g = g + color_table[classification_values[k]][1] * mask b = b + color_table[classification_values[k]][2] * mask a = a + color_table[classification_values[k]][3] * mask else: v0 = float(classification_values[k]) v1 = float(classification_values[k + 1]) r = r + mask * (color_table[classification_values[k]][0] + (values - v0)*(color_table[classification_values[k + 1]][0] - color_table[classification_values[k]][0])/(v1-v0) ) g = g + mask * (color_table[classification_values[k]][1] + (values - v0)*(color_table[classification_values[k + 1]][1] - color_table[classification_values[k]][1])/(v1-v0) ) b = b + mask * (color_table[classification_values[k]][2] + (values - v0)*(color_table[classification_values[k + 1]][2] - color_table[classification_values[k]][2])/(v1-v0) ) a = a + mask * (color_table[classification_values[k]][3] + (values - v0)*(color_table[classification_values[k + 1]][3] - color_table[classification_values[k]][3])/(v1-v0) ) rgb[i:i+rows,j:j+cols, 0] = r rgb[i:i+rows,j:j+cols, 1] = g rgb[i:i+rows,j:j+cols, 2] = b rgb[i:i+rows,j:j+cols, 3] = a scipy.misc.imsave(out_file_name, rgb) def readColorTable(color_file): ''' The method for reading the color file. * If alpha is not defined, a 255 value is set (no transparency). ''' color_table = {} if exists(color_file) is False: raise Exception("Color file " + color_file + " does not exist") fp = open(color_file, "r") for line in fp: if line.find('#') == -1 and line.find('/') == -1: entry = line.split() if len(entry) == 5: alpha = int(entry[4]) else: alpha=255 color_table[eval(entry[0])]=[int(entry[1]),int(entry[2]),int(entry[3]),alpha] fp.close() return color_table ''' Usage explanation ''' def Usage(): print "Not enough arguments." print "Usage:" print argv[0] + ' [-exact_color_entry] [-band=1] data_file color_file output_file' exit() ''' Program Mainline ''' if __name__ == "__main__": file_name = None colorfile_name = None out_file_name = None discrete = False band = 1 i = 1 while i < len(argv): arg = argv[i] if arg == '-exact_color_entry': discrete = True elif arg == '-band': band = argv[i+1] i = i + 1 elif file_name is None: file_name = arg file_name = file_name.replace("'","") file_name = file_name.replace('"','') elif colorfile_name is None: colorfile_name = arg colorfile_name = colorfile_name.replace("'","") colorfile_name = colorfile_name.replace('"','') elif out_file_name is None: out_file_name = arg out_file_name = out_file_name.replace("'","") out_file_name = out_file_name.replace('"','') i = i + 1 if len(argv) == 1 or file_name == None or colorfile_name == None or out_file_name == None: Usage() ''' try: raster2png(file_name,colorfile_name,out_file_name,band,discrete) except Exception, ex: print "Error: " + str(ex) ''' raster2png(file_name,colorfile_name,out_file_name,band,discrete)
- The part of the code that reads the input (lines 130 - 182) hasn't changed, like the function readColorTable, that reads the color file so it can be processed. Only the main method raster2png has changed.
- Lines 26 to 45 read the raster file metadata, including the extreme values in the file.
- Then, the colorfile is read, and extra values are added if the raster file has values above the maximum value at the color file or below the minimum (lines 47-57). If this isn't done, some strange nodata zones will appear.
- At line 60, an array is created, where the four bands (red, green, blue and alpha) of the output png will be stored.
- Then, every block of data is evaluated (after line 62). Empty arrays are created for each band, and for every value in the colorfile, the affected pixels are chosen at line 84.
- Depending on the user choice, the colour is assigned either directly from the colorfile value or interpolating from the two most close colors. (lines 85-97) Line 91 and 92 have a float cast, very important to get the right value.
- Lines 99-103 assign the block to the right output array region (see how a region of a NumPy array is chosen)
- Line 104 creates the png file.
Benchmarks
To check the results, I have used the same raster used at the original post, and for bigger stuff, the rasters I downloaded for the last post (read it for the big file creation).
Old script
File | with -exact_color_entry | Continuous colors |
---|---|---|
w001001.adf (1126x576px) | 9.363s | 13.329s |
srtm_54_05.tif (6001x6001px) | 9m19s | 12m44s |
big.tif (12001x12001px) | Error | Error |
New script
File | with -exact_color_entry | Continuous colors |
---|---|---|
w001001.adf (1126x576px) | 0.822s | 1.653s |
srtm_54_05.tif (6001x6001px) | 22.503s | 1m07s |
big.tif (12001x12001px) | 1m46s | 4m58s |
What's next
I'm not satisfied yet with the way the image is saved using SciPy, because all the image must be in a single array, instead of working by blocks. Maybe there's an alternative to do this more efficiently. With really big images, the script will fail even using the new script version.