World Climate and Elevation Database

This database creation project involves the use of SQL aggregation queries to populate a polygon database having all major world cities.

Dataset Overview

Along with the location data of major cities, data was provided for every region in the world, showing the recorded temperature, precipitation, elevation, climate, sunshine, lithology, geology, and pedology.

The final data format was to can be best described using table 1 below.

ColumnsSourceDescription
idWorld Cities dataid of this data
nameWorld Cities dataname of this data
climateclimateextract value according to legend
situationDEMcalculate the rate of cities and classify them according to <<Plain: 0-200m <<Colline: 200-800m <<Montagne: >800m
sunshineDiurnal rangecal. the average radiation of polygons
elevation_minDEMcal. the minimum elevation of polygons
elevation _moyDEMcal. the average elevation of polygons
elevation_maxDEMcal. the maximum elevation of polygons
precipitation_minRainfallcal. the minimum rainfall of polygons
precipetation_moyRainfallcal. the average rainfall of polygons
precipitation_maxRainfallcal. the maximum rainfall of polygons
temperature_minTemperature mincal. the minimum temperature of polygons
temperature_moyTemperature moycal. the average temperature of polygons
temperature_maxTemperature maxcal. the maximum temperature of polygons
geologyGeologyextract a list of the 3 biggest geology, separated by a comma, ordered by size.
pedologyPedologyextract a list of the 3 biggest pedology, separated by a comma, ordered by size.
lithologyLithologyextract a list of the 3 biggest lithologies, separated by a comma, ordered by size.

Data Cleaning

The presence of invalid entries in the world cities database will cause any written query to crash. First, check for invalid entries;

/*To check for VALID and INVALID entries in polygons_wld*/
SELECT ST_IsValid(geom)
FROM polygons_wld

Next, eliminate all invalid entries;

/*To DELETE invalid entries in polygon_wld */
DELETE FROM polygons_wld
WHERE ST_IsValid(geom) = false

Data Analysis

To limit the run time of the final analysis, I broke down the code into smaller bits. This was done by creating multiple sub-tables.

The temperature columns have three different data sources for each temperature range. Multiple temperature values were recorded for the minimum, average, and maximum temperatures, over some time. For this analysis, the average of each temperature range, for each city was used in the analysis.

 /*To create a table showing the TEMPERATURE COLUMNS*/
CREATE TABLE temperature
AS
SELECT    za.id,
          za.nom name,
          AVG(temp_min.dn) Min_temperature,
          AVG(temp_moy.degres) Moy_temperature,
          AVG(temp_max.dn) Max_temperatures,
          za.geom
  FROM polygons_wld za
LEFT JOIN temp_minimales temp_min
  ON ST_Within(za.geom, temp_min.geom)
LEFT JOIN temp_moyennes temp_moy
  ON ST_Within(za.geom, temp_moy.geom)
LEFT JOIN temp_maximales temp_max
  ON ST_Within(za.geom, temp_max.geom)
GROUP BY 1,2, za.geom

The table of elevation columns is gotten from a single table. Rather than showing the elevation values, the case function was used to create names for each elevation ranges.

/*To create a table showing the ELEVATION COLUMNS*/
CREATE TABLE elevation
AS
SELECT  za.id id, za.nom name,
        MIN(dem.zlevel) elevation_min,
        AVG(dem.zlevel) elevation_moy,
        MAX(dem.zlevel) elevation_max,
        (CASE  WHEN AVG(dem.zlevel) <= 200 THEN 'Plaine'
              WHEN AVG(dem.zlevel) > 200 AND AVG(dem.zlevel) <= 800 THEN 'Colline'
              ELSE 'Montagne' END) AS situation,
        za.geom
  FROM polygons_wld za
LEFT JOIN "merge 8" dem
  ON ST_Disjoint(za.geom, dem.geom)
GROUP BY 1,2, za.geom

The rainfall amount in mm was aggregated to obtain the minimum, average, and maximum rainfall values, displayed in separate columns.

/* To create a table showing the PRECIPITATION COLUMNS */
CREATE TABLE Precipitation
AS
SELECT  za.id,
        za.nom name,
        MIN(rainfall.mm) precipetation_min,
        AVG(rainfall.mm) precipitation_moy,
        MAX(rainfall.mm) precipitation_max,
        za.geom
  FROM polygons_wld za
LEFT JOIN precipitation_moyennes rainfall
  ON ST_Within(za.geom, rainfall.geom)
GROUP BY 1,2, za.geom

The next table created held the average sunshine and weather values from their respective datasets.

/* TO create a column showing the WEATHER AND CLIMATE COLUMNS */
CREATE TABLE weatherclimate
AS
SELECT  za.id id,
        za.nom name,
        AVG(diurnal.dn) sunshine,
        AVG(weather.zlevel) climat,
        za.geom
  FROM polygons_wld za
LEFT JOIN diurnal_range diurnal
  ON ST_Within(za.geom, diurnal.geom)
LEFT JOIN weather
  ON ST_Within(za.geom, weather.geom)
GROUP BY 1,2, za.geom

The pedology column is required to show the three highest pedology in a city, ordered by square km.

First, a table is created to rank each pedology value by country and by sqkm

/*TO create a rank of the TOP THREE PEDOLOGY by Square KM and COUNTRY */
CREATE TABLE pedologyrank
AS
WITH M2 AS(
      SELECT name_fr, sqkm, country, geom,
              RANK() OVER (PARTITION BY country ORDER BY sqkm DESC) AS rnk
      FROM public.pedology
      ORDER BY country)
SELECT name_fr, sqkm, COUNTRY, rnk,geom
  FROM M2
WHERE rnk=1 OR rnk=2 OR rnk=3

Using the ranks created above, the top three values are concatenated into a single column.

/* TO create a table from the PEDOLOGY RANKING ABOVE */
CREATE TABLE pedologycolumn
AS
WITH
M1 AS   (SELECT name_fr, country, geom FROM pedologyrank WHERE rnk=1),
M2 AS   (SELECT name_fr, country FROM pedologyrank WHERE rnk=2),
M3 AS   (SELECT name_fr, country FROM pedologyrank WHERE rnk=3)
SELECT CONCAT(M1.name_fr, ', ', M2.name_fr, ', ', M3.name_fr)
        AS pedology, M1.geom
  FROM M1
LEFT JOIN M2
  ON M1.country = M2.country
LEFT JOIN M3
  ON M1.country = M3.country

Like the pedology column, the geology column is required to show the three highest pedology in a city, ordered by square km.

First, a table is created to rank each geology value by country and by perimeter

/*TO create a rank of the TOP THREE GEOLOGIES by perimeter and COUNTRY*/
CREATE TABLE geologyrank
AS
WITH geotab AS(
      SELECT name_fr, perimeter, country, geom,
              RANK() OVER (PARTITION BY country ORDER BY perimeter DESC) AS rnk
      FROM public.geology
      ORDER BY country)
SELECT name_fr, perimeter, COUNTRY, rnk, geom
  FROM geotab
WHERE rnk=1 OR rnk=2 OR rnk=3

Using the ranks created above, the top three values are concatenated into a single column.

/* TO create a table from the GEOLOGY RANKING ABOVE */
CREATE TABLE geologycolumn
AS
WITH
M1 AS   (SELECT name_fr, country, geom FROM geologyrank WHERE rnk=1),
M2 AS   (SELECT name_fr, country FROM geologyrank WHERE rnk=2),
M3 AS   (SELECT name_fr, country FROM geologyrank WHERE rnk=3)
SELECT CONCAT(M1.name_fr, ', ', M2.name_fr, ', ', M3.name_fr)
        AS geology, M1.geom
  FROM M1
LEFT JOIN M2
  ON M1.country = M2.country
LEFT JOIN M3
  ON M1.country = M3.country

In the end, a final table was formed by selecting all relevant columns and using join clauses.

/* A final query to compile all entries for the database */
SELECT  za.id id,
        za.nom name
        climat, situation, sunshine,
        elevation_min, elevation_moy, elevation_max,
        precipetation_min, precipitation_moy, precipitation_max,
        Min_temperature, Moy_temperature, Max_temperatures,
        pedology, za.geom

FROM polygons_wld za
LEFT JOIN temperature
ON ST_Within(za.geom, temperature.geom)
LEFT JOIN elevation
ON ST_Within(za.geom, elevation.geom)
LEFT JOIN situationtab
ON ST_Within(za.geom, situationtab.geom)
LEFT JOIN Precipitation
ON ST_Within(za.geom, Precipitation.geom)
LEFT JOIN weatherclimate
ON ST_Within(za.geom, weatherclimate.geom)
LEFT JOIN pedologycolumn
ON ST_Within(za.geom, pedologycolumn.geom)