Published on 26.10.2022

On the rising importance of SQL for geospatial data experts

The growth of geospatial data is astounding. What language do data analysts and scientists use to programmatically examine the growing volumes of data? SQL is becoming the standard language for analyzing big data. 

As a vital data input for all decision-making processes, geospatial data is an integral component of any big data platform. For those who work with geospatial data in general, spatial SQL and SQL in general are crucial.

SQL is gaining importance and organizations don’t want to fall behind. Millions of specialists use SQL more and more frequently as part of various cloud-based big data platforms. These platforms are expanding their support for spatial SQL.

SQL
The amount of geospatial data is on the rise. Image: Santtu Pyykkönen; Data: USGS.

PostGIS, the North Star of spatial SQL

Here in Gispo we assist our clients in building geospatial solutions with modern open source geospatial software. We are also fully aware that spatial SQL lies at the very heart of  geospatial solutions. The primary tool we use for practically all consulting projects is PostGIS, which is the driving force behind Gispo’s development.

PostGIS, the world’s most popular geospatial database engine, enables developers and analysts to use, update, and analyze geographical data using SQL. It enables the geospatial enterprise-ready OLAP-databases (Online Analytical Processing) for quick, predetermined data queries and the OLTP-type databases (Online Transaction Processing) for transactional geospatial data processes. 

Besides being the number one favorite tool for developers and analysts, PostGIS is also the reference project that guides the product development for spatial analytics (at least partially) for some of the cloud-based big data platform providers, such as BigQuery (Google) or Snowflake, as well as Presto. 

Why has SQL attracted attention?

So, PostGIS is the reference for Spatial SQL, but why SQL came to be the lingua franca for different big data platforms? There are a few key factors that have contributed to SQL becoming the preferred data programming language :

  • Ease-of-use:
    • Domain specialists and business analysts can use SQL to effectively extract insights from big data to address business issues.
  • Reproducibility: 
    • By using SQL you use code for your analysis. You can quickly change and rerun your code. 
  • Standard-based:
    • Easy to deploy processing over different SQL-engines. You use almost the same syntax in different platforms.  
  • Common language in-between silos: 
    • Data engineers can effectively communicate with data analysts by using SQL. 
  • No need for separate runtime – process the data where the data lives: 
    • SQL does not require any separate engine for running the queries. 

The importance of Spatial SQL for enterprise GIS 

A GIS analyst is trained to work locally on their computer using the graphical user interface of their favorite desktop GIS. Databases and SQL are also hugely useful on localhost and on one’s computer, but the significant benefits come from working in an enterprise environment.

Spatial data curation with QGIS and PostGIS

When discussing the creation, modification, and construction of data with spatial databases (the “OLTP” use case), one example is the usage of PostGIS as a backend for storing data in a QGIS-based data generation process. QGIS works as a map-based frontend for the users for creating and editing data. An example would be a QGIS-based solution we did for the City of Tampere for coordinating the logistics around urban soil management. The city officials need to know where different soil types are stored and generated (e.g. construction sites) in the city and where they need to move them next. 

SQL
A relational database model-based workflow enables for example automation and multi-user data processes as well as facilitates data integrity and data analysis.  

SQL
At the same time user “just” sees the map and some easy-to-use QGIS forms (in Finnish, sorry!)

The spatial analytics use case

For data analysts and specially for data scientists SQL- and other code-based processes are part of their daily routines, whereas for traditionally trained GIS experts, it’s all about using the graphical interface for any analysis, right? It shouldn’t be like this forever, right?

Spatial SQL becomes important as data volumes increase and different cloud-based big data choices (including the all-powerful PostGIS) incorporate spatial analytics features more widely. The databases embedded and backing up the enterprise GIS systems should be accessible applications clients that support SQL. For instance, automated reporting to assist wise decision-making will be made easier as a result.

SQL
Workflows in spatial analytics are often based on SQL. Image: Santtu Pyykkönen; Data: Mapbox Movement (sample data).  

To give an example (the “OLAP” use case), in a recent project here at Gispo we built a PostGIS-backed reporting tool for municipal waste management. The data was stored in a RDMS (i.e. PostgreSQL) and was accessible for different client applications such as QGIS and PowerBI. While QGIS-users appreciated the map-based exploration of the data, the PowerBI dashboard sought to enhance decision-making with simple-to-read graphs and tables based on automated table views (i.e. materialized views) aggregated from dozens of tables with row counts over 5 figures.

Spatial SQL as an investment for the future

As said, at Gispo, we help our customers in almost all the projects by using PostGIS as the spatial database engine of choice. With some customers, we build PostGIS-based data workflows where users create data from QGIS and with some customers we build data analysis data pipelines for driving their business values and organizational goals by using geographical data. This is how we see spatial SQL as a key feature of our customer solutions.  

Profiilikuva

Santtu Pyykkönen

Santtu Pyykkönen is MSA who is interested in GIS (and all its glory), open source software and open communities and urban development. Freetime activities consist of running and reading.