Table of contents
This database creation project involves the use of SQL aggregation queries to populate a polygon database having all major world cities.
Dataset Overview
Along with the location data of major cities, data was provided for every region in the world, showing the recorded temperature, precipitation, elevation, climate, sunshine, lithology, geology, and pedology.
The final data format was to can be best described using table 1 below.
Columns | Source | Description |
id | World Cities data | id of this data |
name | World Cities data | name of this data |
climate | climate | extract value according to legend |
situation | DEM | calculate the rate of cities and classify them according to <<Plain: 0-200m <<Colline: 200-800m <<Montagne: >800m |
sunshine | Diurnal range | cal. the average radiation of polygons |
elevation_min | DEM | cal. the minimum elevation of polygons |
elevation _moy | DEM | cal. the average elevation of polygons |
elevation_max | DEM | cal. the maximum elevation of polygons |
precipitation_min | Rainfall | cal. the minimum rainfall of polygons |
precipetation_moy | Rainfall | cal. the average rainfall of polygons |
precipitation_max | Rainfall | cal. the maximum rainfall of polygons |
temperature_min | Temperature min | cal. the minimum temperature of polygons |
temperature_moy | Temperature moy | cal. the average temperature of polygons |
temperature_max | Temperature max | cal. the maximum temperature of polygons |
geology | Geology | extract a list of the 3 biggest geology, separated by a comma, ordered by size. |
pedology | Pedology | extract a list of the 3 biggest pedology, separated by a comma, ordered by size. |
lithology | Lithology | extract a list of the 3 biggest lithologies, separated by a comma, ordered by size. |
Data Cleaning
The presence of invalid entries in the world cities database will cause any written query to crash. First, check for invalid entries;
/*To check for VALID and INVALID entries in polygons_wld*/
SELECT ST_IsValid(geom)
FROM polygons_wld
Next, eliminate all invalid entries;
/*To DELETE invalid entries in polygon_wld */
DELETE FROM polygons_wld
WHERE ST_IsValid(geom) = false
Data Analysis
To limit the run time of the final analysis, I broke down the code into smaller bits. This was done by creating multiple sub-tables.
The temperature columns have three different data sources for each temperature range. Multiple temperature values were recorded for the minimum, average, and maximum temperatures, over some time. For this analysis, the average of each temperature range, for each city was used in the analysis.
/*To create a table showing the TEMPERATURE COLUMNS*/
CREATE TABLE temperature
AS
SELECT za.id,
za.nom name,
AVG(temp_min.dn) Min_temperature,
AVG(temp_moy.degres) Moy_temperature,
AVG(temp_max.dn) Max_temperatures,
za.geom
FROM polygons_wld za
LEFT JOIN temp_minimales temp_min
ON ST_Within(za.geom, temp_min.geom)
LEFT JOIN temp_moyennes temp_moy
ON ST_Within(za.geom, temp_moy.geom)
LEFT JOIN temp_maximales temp_max
ON ST_Within(za.geom, temp_max.geom)
GROUP BY 1,2, za.geom
The table of elevation columns is gotten from a single table. Rather than showing the elevation values, the case function was used to create names for each elevation ranges.
/*To create a table showing the ELEVATION COLUMNS*/
CREATE TABLE elevation
AS
SELECT za.id id, za.nom name,
MIN(dem.zlevel) elevation_min,
AVG(dem.zlevel) elevation_moy,
MAX(dem.zlevel) elevation_max,
(CASE WHEN AVG(dem.zlevel) <= 200 THEN 'Plaine'
WHEN AVG(dem.zlevel) > 200 AND AVG(dem.zlevel) <= 800 THEN 'Colline'
ELSE 'Montagne' END) AS situation,
za.geom
FROM polygons_wld za
LEFT JOIN "merge 8" dem
ON ST_Disjoint(za.geom, dem.geom)
GROUP BY 1,2, za.geom
The rainfall amount in mm was aggregated to obtain the minimum, average, and maximum rainfall values, displayed in separate columns.
/* To create a table showing the PRECIPITATION COLUMNS */
CREATE TABLE Precipitation
AS
SELECT za.id,
za.nom name,
MIN(rainfall.mm) precipetation_min,
AVG(rainfall.mm) precipitation_moy,
MAX(rainfall.mm) precipitation_max,
za.geom
FROM polygons_wld za
LEFT JOIN precipitation_moyennes rainfall
ON ST_Within(za.geom, rainfall.geom)
GROUP BY 1,2, za.geom
The next table created held the average sunshine and weather values from their respective datasets.
/* TO create a column showing the WEATHER AND CLIMATE COLUMNS */
CREATE TABLE weatherclimate
AS
SELECT za.id id,
za.nom name,
AVG(diurnal.dn) sunshine,
AVG(weather.zlevel) climat,
za.geom
FROM polygons_wld za
LEFT JOIN diurnal_range diurnal
ON ST_Within(za.geom, diurnal.geom)
LEFT JOIN weather
ON ST_Within(za.geom, weather.geom)
GROUP BY 1,2, za.geom
The pedology column is required to show the three highest pedology in a city, ordered by square km.
First, a table is created to rank each pedology value by country and by sqkm
/*TO create a rank of the TOP THREE PEDOLOGY by Square KM and COUNTRY */
CREATE TABLE pedologyrank
AS
WITH M2 AS(
SELECT name_fr, sqkm, country, geom,
RANK() OVER (PARTITION BY country ORDER BY sqkm DESC) AS rnk
FROM public.pedology
ORDER BY country)
SELECT name_fr, sqkm, COUNTRY, rnk,geom
FROM M2
WHERE rnk=1 OR rnk=2 OR rnk=3
Using the ranks created above, the top three values are concatenated into a single column.
/* TO create a table from the PEDOLOGY RANKING ABOVE */
CREATE TABLE pedologycolumn
AS
WITH
M1 AS (SELECT name_fr, country, geom FROM pedologyrank WHERE rnk=1),
M2 AS (SELECT name_fr, country FROM pedologyrank WHERE rnk=2),
M3 AS (SELECT name_fr, country FROM pedologyrank WHERE rnk=3)
SELECT CONCAT(M1.name_fr, ', ', M2.name_fr, ', ', M3.name_fr)
AS pedology, M1.geom
FROM M1
LEFT JOIN M2
ON M1.country = M2.country
LEFT JOIN M3
ON M1.country = M3.country
Like the pedology column, the geology column is required to show the three highest pedology in a city, ordered by square km.
First, a table is created to rank each geology value by country and by perimeter
/*TO create a rank of the TOP THREE GEOLOGIES by perimeter and COUNTRY*/
CREATE TABLE geologyrank
AS
WITH geotab AS(
SELECT name_fr, perimeter, country, geom,
RANK() OVER (PARTITION BY country ORDER BY perimeter DESC) AS rnk
FROM public.geology
ORDER BY country)
SELECT name_fr, perimeter, COUNTRY, rnk, geom
FROM geotab
WHERE rnk=1 OR rnk=2 OR rnk=3
Using the ranks created above, the top three values are concatenated into a single column.
/* TO create a table from the GEOLOGY RANKING ABOVE */
CREATE TABLE geologycolumn
AS
WITH
M1 AS (SELECT name_fr, country, geom FROM geologyrank WHERE rnk=1),
M2 AS (SELECT name_fr, country FROM geologyrank WHERE rnk=2),
M3 AS (SELECT name_fr, country FROM geologyrank WHERE rnk=3)
SELECT CONCAT(M1.name_fr, ', ', M2.name_fr, ', ', M3.name_fr)
AS geology, M1.geom
FROM M1
LEFT JOIN M2
ON M1.country = M2.country
LEFT JOIN M3
ON M1.country = M3.country
In the end, a final table was formed by selecting all relevant columns and using join clauses.
/* A final query to compile all entries for the database */
SELECT za.id id,
za.nom name
climat, situation, sunshine,
elevation_min, elevation_moy, elevation_max,
precipetation_min, precipitation_moy, precipitation_max,
Min_temperature, Moy_temperature, Max_temperatures,
pedology, za.geom
FROM polygons_wld za
LEFT JOIN temperature
ON ST_Within(za.geom, temperature.geom)
LEFT JOIN elevation
ON ST_Within(za.geom, elevation.geom)
LEFT JOIN situationtab
ON ST_Within(za.geom, situationtab.geom)
LEFT JOIN Precipitation
ON ST_Within(za.geom, Precipitation.geom)
LEFT JOIN weatherclimate
ON ST_Within(za.geom, weatherclimate.geom)
LEFT JOIN pedologycolumn
ON ST_Within(za.geom, pedologycolumn.geom)