wiki:Import impact data to dbeel

back to start page ..

Tickets

Import impact data to dbeel

The different data from data providers are imported in the schema sudoang of the dbeel database.
You can see the code in: source:eda/EDAcommun/sql/sudoang/dbeel_sudoang_obstruction.sql

It has been necessary to reproject some of the layers

SELECT UpdateGeometrySRID('sudoang','catalonia_obstruction','geom', 25831); 
SELECT UpdateGeometrySRID('sudoang','minho_obstruction','geom', 25829); 
SELECT UpdateGeometrySRID('sudoang','spain_obstruction_hpp','geom', 4326); 
SELECT UpdateGeometrySRID('sudoang','catalonia_obstruction_dams','geom', 25831); 
SELECT UpdateGeometrySRID('sudoang','galicia_obstruction','geom', 3041); 

source:eda/data/Docs/trac/sudoang/dam01.png

Create the fist table spain_obstruction

This is the most complete table we have for spain, it's been sent by the ministry, and we will import it first and then import other data.

We have used the table of dams sent by the ministry, it's the latest but not complete, in particular it does not contain the information about HPP that we had in a previous layer sent by the ministry.

So first we have updated the layer with data from less than one kilometre from the ministy's layer.

-- Returns the id of hpp (spain_obstruction_hpp) closest to the dam (spain_obstruction)
alter table sudoang.spain_obstruction add column id_hpp numeric;
alter table sudoang.spain_obstruction add column potencia_t numeric;
alter table sudoang.spain_obstruction add column corriente character varying(50);
alter table sudoang.spain_obstruction add column sbruto_m numeric;

select* from sudoang.spain_obstruction

update  sudoang.spain_obstruction set (id_hpp, potencia_t, corriente, sbruto_m)= (sub2.id, sub2.potencia_t, sub2.corriente, sub2.sbruto_m) from (
select distinct on("PRESA") id, "PRESA", potencia_t, corriente, sbruto_m , dist from (
select hpp.*, "PRESA", 
 st_distance(st_transform(hpp.geom, 25830), obs.geom) as dist
 from (select * from sudoang.spain_obstruction_hpp where geom is not null) hpp join
      (select * from sudoang.spain_obstruction where geom is not null) obs on 
st_dwithin(st_transform(hpp.geom, 25830), obs.geom, 1000)
 order by "PRESA", dist asc) sub) sub2
 where sub2."PRESA"=spain_obstruction."PRESA"; -- 624 rows, 438 msec

Some of the hpp data were not present (415) we have added them using

insert into sudoang.spain_obstruction("PRESA", id_hpp, potencia_t, corriente, sbruto_m, geom) 
 select 
 nextval('presa') as "PRESA",
 id as id_hpp,
 sub.potencia_t,
 sub.corriente,
 sub.sbruto_m,
 st_transform(geom,25830) from (
 select * from sudoang.spain_obstruction_hpp where id in (
select id from sudoang.spain_obstruction_hpp
except
select id from sudoang.spain_obstruction_hpp
where id in (
select distinct id_hpp as id from sudoang.spain_obstruction))
and (sbruto_m >0 or potencia_t>0)  
) sub --415

One of the problem we must solve it that this layer has several points nearly on top of each other (e.g. one per "dike" on each dam). So when calculated multiple impact we might run into problems https://trello.com/c/IIMjWVkX

-- !!! some duplicate inside the spain_obstruction_in_spain table
-- eg SO_288 and SO_874 which have different ALT_CAU !!!
-- internal duplicate here
WITH duplicate AS
(
        SELECT sp1.id AS id1, sp2.id AS id2 FROM sudoang.spain_obstruction_in_spain AS sp1, sudoang.spain_obstruction_in_spain AS sp2
        WHERE substring(sp1.id from 4) < substring(sp2.id from 4) AND ST_Distance(sp1.geom, sp2.geom)<0.01
) SELECT *, rank() OVER(PARTITION BY id1 ORDER BY id2) FROM duplicate ORDER BY substring(id1 from 4)::integer
;
id1id2rank
SO_14SO_151
SO_14SO_173
SO_14SO_162
SO_15SO_161
SO_15SO_172
SO_16SO_171
SO_42SO_441

We have also reprojected all data and created a unique id for all tables, the id is

prefix table
SO Spain_obstruction & spain_obstruction_in_spain
CO Catalonia_obstruction (deprecated)
COD catalonia_obstruction_dams
COH catalonia_obstruction_hpp
MIO minho_obstruction
MIO minho_obstruction
AFR Amber France
AFS Amber Spain
FC fcul_obstruction

To join the data COD and COH it's a bit complex, see the script, instead of indentifying the mother table, the daughter table is idenditied. This was done with a list of coma separated value, first we had to correct it (remove space, replace ; with ,). To join the two tables we had to create an intermediate table catalonia_obstruction_join_dams_hpp source:eda/EDAcommun/sql/sudoang/dbeel_sudoang_obstruction.sql

Create the dbeel database

The script is there : source:eda/EDAcommun/sql/sudoang/dbeel_sudoang_obstruction.sql
We have simply added the column fish pass to it as this information is the only one that is not present that will be usefull

For data providers we have used (MARIA CHECK THIS !)

INSERT INTO dbeel.establishment(et_establishment_name) VALUES ('min esp'); --TODO: TO BE MODIFIED spanish minstry
-- add data provider
INSERT INTO dbeel.data_provider(dp_name, dp_et_id)
        SELECT 'Belen', et_id FROM dbeel.establishment WHERE et_establishment_name = 'min esp'

And for AMBER

-- add establishment
INSERT INTO dbeel.establishment(et_establishment_name) VALUES ('AMBER Project'); --TODO: TO BE MODIFIED
-- add data provider
INSERT INTO dbeel.data_provider(dp_name, dp_et_id)
        SELECT 'Rosa', et_id FROM dbeel.establishment WHERE et_establishment_name = 'AMBER Project'
; --TODO: TO BE MODIFIED

We have created two table for the dbeel, the location and the table containing data regarding the obstruction. It must be noticed that we have added the name of the country and the original id of the dam to link back to original data in dbeelobstruction_place, and we have added the type of fishway in dbeel_physical_obstruction

--------------------------
-- inital modification
--------------------------

-- correction dbeel
ALTER TABLE dbeel.physical_obstruction 
ADD CONSTRAINT fk_ot_phys_obstruction_type FOREIGN KEY (ot_no_obstruction_type) REFERENCES dbeel_nomenclature.obstruction_type(no_id) ON UPDATE CASCADE ON DELETE RESTRICT;
ALTER TABLE dbeel.physical_obstruction 
ADD CONSTRAINT fk_ot_mortality_type FOREIGN KEY (ot_no_mortality_type) REFERENCES dbeel_nomenclature.biological_characteristic_type(no_id) ON UPDATE CASCADE ON DELETE RESTRICT;

-- dump of rivers to dbeel pg_dump -U postgres --table spain.rivers -f "spain.rivers.sql" eda2.0
-- psql -U postgres  -c "create extension ltree" dbeel
-- psql -U postgres -f "spain.rivers.sql" dbeel
ALTER TABLE spain.rivers
 ALTER COLUMN geom TYPE geometry(MultiLineString,3035) 
  USING ST_Transform(geom,3035);

REINDEX TABLE spain.rivers;

--------------------------
-- dbeel creation
--------------------------
-- create the obs_place for obstacles
DROP TABLE if exists sudoang.dbeel_obstruction_place CASCADE;
CREATE TABLE sudoang.dbeel_obstruction_place (
                id_original character varying(10),
                country character varying(2),
        CONSTRAINT pk_obs_op_id PRIMARY KEY (op_id),
        CONSTRAINT c_uk_id_roe UNIQUE (id_original),
        CONSTRAINT fk_so_observation_place_type_id FOREIGN KEY (op_no_observationplacetype) 
                REFERENCES dbeel_nomenclature.observation_place_type (no_id) 
                MATCH SIMPLE ON UPDATE CASCADE ON DELETE RESTRICT
) INHERITS (dbeel.observation_places);

-- creation index spatial
CREATE INDEX 
  ON sudoang.dbeel_obstruction_place
  USING gist
  (the_geom);

-- create physical obstruction
DROP TABLE if exists sudoang.dbeel_physical_obstruction CASCADE;
CREATE TABLE sudoang.dbeel_physical_obstruction (
  fishway_type text,
  CONSTRAINT physical_obstruction_id PRIMARY KEY (ob_id),
  CONSTRAINT fk_dp FOREIGN KEY (ob_dp_id) REFERENCES dbeel.data_provider (dp_id),
  CONSTRAINT fk_ob_origin FOREIGN KEY (ob_no_origin)REFERENCES dbeel_nomenclature.observation_origin (no_id) ,
  CONSTRAINT fk_ob_period FOREIGN KEY (ob_no_period) REFERENCES dbeel_nomenclature.period_type (no_id) ,
  CONSTRAINT fk_ob_type FOREIGN KEY (ob_no_type) REFERENCES dbeel_nomenclature.observation_type (no_id) ,
  CONSTRAINT fk_po_obstruction_passability FOREIGN KEY (po_no_obstruction_passability)REFERENCES dbeel_nomenclature.obstruction_impact (no_id),
  CONSTRAINT fk_ot_obstruction_type FOREIGN KEY (ot_no_obstruction_type) REFERENCES dbeel_nomenclature.obstruction_type(no_id) ON UPDATE CASCADE ON DELETE RESTRICT,
  CONSTRAINT fk_ot_mortality_type FOREIGN KEY (ot_no_mortality_type) REFERENCES dbeel_nomenclature.biological_characteristic_type(no_id) ON UPDATE CASCADE ON DELETE RESTRICT
) INHERITS(dbeel.physical_obstruction);

The data from our previous work (see above) has been used for initial insert.... Starting from the next (AMBER) it becomes more complicated because we have duplicates...

INSERT INTO sudoang.dbeel_physical_obstruction
        SELECT
          uuid_generate_v4() as ob_id,
          11 AS ob_no_origin, -- raw data
          16 AS ob_no_type, -- obstruction
          74 AS ob_no_period, -- Unknown
          NULL AS ob_starting_date,
          NULL AS ob_ending_date,
          dbeel_obstruction_place.op_id as ob_op_id,
          dp_id AS ob_dp_id,
          219 as ot_no_obstruction_type,-- physical
          1 as ot_obstruction_number,
          NULL AS ot_no_mortality_type,
          NULL AS ot_no_mortality,
          NULL as po_no_obstruction_passability,
          case when "ALT_CAU" is not NULL AND "ALT_CAU">0.0  then "ALT_CAU"
        else REPLACE("ALT_CIM", ',', '.')::numeric end AS po_obstruction_height,
          NULL AS po_turbine_number
        FROM sudoang.spain_obstruction_in_spain JOIN sudoang.dbeel_obstruction_place ON (id_original = id), 
        dbeel.data_provider WHERE dp_name = 'Belen' --TODO: TO BE MODIFIED
; -- 3120

AMBER DATA (possible duplicates)

For new data (amber) it's more complicated as we have to deal with duplicates. What we do is search for duplicates and add a column relating to the dbeel table. So we will have a link between amber data and the dbeel.

ALTER TABLE sudoang.amber ADD COLUMN dbeel_op_id uuid;

We have decide to reproject both the dbeel and amber on the rivers and compute the distance between the two re-projected points The column is named geom_reproj

 --reproject on the rivers
SELECT addgeometrycolumn('sudoang','amber','geom_reproj',3035,'POINT',2);
update sudoang.amber set geom_reproj = sub2.geom_reproj from (
select distinct on (id) id, geom_reproj,distance from(
select
id, 
ST_ClosestPoint(r.geom,a.geom) as geom_reproj ,
ST_distance(r.geom,a.geom) as distance
from sudoang.amber a join
spain.rivers r on st_dwithin(r.geom,a.geom,300) 
order by id, distance
)sub )sub2
where sub2.id= amber.id; -- 16809

Then there is a check for internal duplicated value in amber based on the name substring(sp2.id from 5) and a very close distance ST_DWithin(sp1.geom, sp2.geom, 0.01)

-- check for internal duplicate
WITH duplicate AS
(
        SELECT sp1.id AS id1, sp2.id AS id2 FROM sudoang.amber AS sp1, sudoang.amber AS sp2
        WHERE substring(sp1.id from 5) < substring(sp2.id from 5) AND ST_DWithin(sp1.geom, sp2.geom, 0.01)
) SELECT *, rank() OVER(PARTITION BY id1 ORDER BY id2) FROM duplicate ORDER BY substring(id1 from 5)::integer
;

Finally the link between amber and dbeel is made by updating a column containing the identifier from the dbeel. CHECK THIS. THE DBEEL NEEDS TO BE PROJECTED ON RIVERS

-- link amber TO dbeel
ALTER TABLE sudoang.amber ADD COLUMN dbeel_op_id uuid;
Finally we check for duplicates in dbeel, and update amber values using those duplicates

-- duplicate with data already in dbeel
-- UPDATE sudoang.amber SET dbeel_op_id = NULL; -- to reset the column
WITH duplicate_amber AS 
(
        SELECT DISTINCT geom_reproj, op_id
        FROM sudoang.amber, sudoang.dbeel_obstruction_place
        WHERE id LIKE 'ASP%' AND geom_reproj IS NOT NULL AND ST_DWithin(the_geom, geom_reproj, 500) -- to be tuned
) 
UPDATE sudoang.amber SET dbeel_op_id = op_id
FROM duplicate_amber 
WHERE id LIKE 'ASP%' AND amber.geom_reproj IS NOT NULL AND amber.geom_reproj = duplicate_amber.geom_reproj
; --1190

There is a problem of numeric / character numeric data below

-- update height for these dams
WITH amber_obs AS
(
        SELECT *, REPLACE(REPLACE(height, ',', '.'), ' ', '') AS height2
        FROM sudoang.amber
        WHERE dbeel_op_id IS NOT NULL
), amber_obs_max_height AS
(
        SELECT dbeel_op_id, max(height2) AS height FROM amber_obs GROUP BY dbeel_op_id --max is choosen, if mean the process should be different
)       
UPDATE sudoang.dbeel_physical_obstruction SET po_obstruction_height = height::real FROM amber_obs_max_height
WHERE dbeel_op_id = ob_op_id AND po_obstruction_height < height::real; --147

Finally we integrate new dams

-- new dams
WITH unique_obs AS
(
        SELECT DISTINCT geom_reproj FROM sudoang.amber WHERE dbeel_op_id IS NULL AND geom_reproj IS NOT NULL
) 
INSERT INTO sudoang.dbeel_obstruction_place
        SELECT uuid_generate_v4() AS op_id, 'SUDOANG' AS op_gis_systemname, 'amber' AS op_gis_layername, NULL AS op_gislocation, NULL AS op_placename, 11 AS op_no_observationplacetype, NULL op_op_id, geom_reproj AS the_geom, NULL AS id_original, 'SP' AS country
        FROM unique_obs; --15571

And of course the id of the newly integrated data must be filled in the amber database to get the right link

-- record back the link with dbeel
-- UPDATE sudoang.amber SET dbeel_op_id = NULL; -- to reset the column
UPDATE sudoang.amber SET dbeel_op_id = op_id 
FROM sudoang.dbeel_obstruction_place 
WHERE geom_reproj = the_geom AND geom_reproj && the_geom AND op_gis_layername='amber';  -- 15619
        
-- choose an id (last) to record in dbeel
WITH id_obs AS
        (SELECT dbeel_op_id, 'ASP_' || max(substring(id from 5)::integer) AS id FROM sudoang.amber WHERE dbeel_op_id IS NOT NULL GROUP BY dbeel_op_id)
UPDATE sudoang.dbeel_obstruction_place SET id_original = id, op_gislocation = id FROM id_obs WHERE op_id = dbeel_op_id AND op_gis_layername='amber'; --15571

        

INSERT INTO sudoang.dbeel_physical_obstruction
        WITH amber_obs AS
        (
                SELECT *, REPLACE(REPLACE(height, ',', '.'), ' ', '')::real AS height2
                FROM sudoang.amber
                WHERE dbeel_op_id IS NOT NULL
                AND NOT (height LIKE '%-%' OR height LIKE '%>%' OR height LIKE '%a%')
        ), amber_obs_max_height AS
        (
                SELECT dbeel_op_id, max(height2)::real AS height FROM amber_obs GROUP BY dbeel_op_id --max is choosen, if mean the process should be different
        )
        SELECT
          uuid_generate_v4() as ob_id,
          11 AS ob_no_origin, -- raw data
          16 AS ob_no_type, -- obstruction
          74 AS ob_no_period, -- Unknown
          NULL AS ob_starting_date,
          NULL AS ob_ending_date,
          dbeel_obstruction_place.op_id as ob_op_id,
          dp_id AS ob_dp_id,
          219 as ot_no_obstruction_type,-- physical
          1 as ot_obstruction_number,
          NULL AS ot_no_mortality_type,
          NULL AS ot_no_mortality,
          NULL as po_no_obstruction_passability,
          height::real AS po_obstruction_height,
          NULL AS po_turbine_number
        FROM amber_obs_max_height JOIN sudoang.dbeel_obstruction_place ON (dbeel_op_id = op_id), 
        dbeel.data_provider 
        WHERE dp_name = 'Rosa' --TODO: TO BE MODIFIED
        AND op_gis_layername='amber'
; -- 12766      

Portugese data

There is something wrong with the tables when passing to postgis using shp2ppgsql, finally I had to get rid of one of the columns.

-------------------------
-- PORTUGESE LAYERS
------------------------
Three layers, one is barragens and has only like 5 dams, not worth the effort
/*
-- the prj corresponds to 102164 https://epsg.io/102164
ESRI:102164: "Lisboa Hayford Gauss IGeoE" is equivalent to EPSG:20790
ESRI:102165: "Lisboa Hayford Gauss IPCC" is equivalent to EPSG:20791
ESRI:104106: "GCS Datum Lisboa Hayford" is equivalent to EPSG:4207
ESRI:102161: "Datum 73 Hayford Gauss IPCC" is equivalent to EPSG:27493
ESRI: "GCS_Datum_73" is equivalent to EPSG:4274: Datum 73
-- Barragens layer
shp2pgsql  -s shp2pgsql -s 20790 -W "latin1" -I -k Barragens.shp sudoang.barragens>barragens.sql
psql -U postgres -f barragens.sql dbeel

psql:barragens.sql:16: ERREUR:  syntaxe en entrée invalide pour le type numeric : « 000000000000,000000 »
LIGNE 1 : ...LUES ('1','Lindoso',NULL,NULL,NULL,NULL,NULL,NULL,'000000000...

shp2pgsql  -s shp2pgsql -s 20790 -W "latin1" -I -i -k Barragens.shp sudoang.barragens>barragens.sql
-- pas mieux
-- Anyways there is almost nothing in this layer ... not sure it's worth the effort

shp2pgsql  -s shp2pgsql -s 20790 -W "latin1" -I -i -k Grandes_barragens.shp sudoang.Grandes_barragens>Grandes_barragens.sql
psql -U postgres -f Grandes_barragens.sql dbeel
Does not work, same pb

Open in Qgis, delete two last column (empty)
OK works

Same treatment for mini hydricas (delete two last columns)
shp2pgsql  -s shp2pgsql -s 20790 -W "latin1" -I -i -k Mini_hidricas.shp sudoang.Mini_hidricas>Mini_hidricas.sql
psql -U postgres -f Mini_hidricas.sql dbeel



RESTORATION INSTRUCTION FOR MARIA


Remove existing tables

psql -U postgres -c "drop table if exists sudoang.spain_obstruction" dbeel
psql -U postgres -c "drop table if exists sudoang.catalonia_obstruction" dbeel
psql -U postgres -c "drop table if exists sudoang.catalonia_obstruction_dams" dbeel
psql -U postgres -c "drop table if exists  sudoang.catalonia_obstruction_hpp" dbeel
psql -U postgres -c "drop table if exists  sudoang.minho_obstruction" dbeel
psql -U postgres -c "drop table if exists  sudoang.amber" dbeel
psql -U postgres -c "drop table if exists  sudoang.fcul_obstruction" dbeel
psql -U postgres -c "drop table if exists  sudoang.galicia_obstruction" dbeel
psql -U postgres -c "drop table if exists  sudoang.catalonia_obstruction_join_dams_hpp" dbeel
psql -U postgres -c "drop table if exists  sudoang.spain_obstruction_in_spain" dbeel
psql -U postgres -c "drop table if exists  sudoang.spain_obstruction_not_in_spain" dbeel
psql -U postgres -c "drop table if exists  sudoang.dbeel_obstruction_place cascade" dbeel

Restore missing tables -- go to GT2 A21 Définition of data.... I've put three dump there ....

psql -U postgres -f "all_obstacle_tables_except_dbeel.sql"  dbeel
psql -U postgres -f "portugese_dam.sql"  dbeel
psql -U postgres -f "dbeel_tables.sql"  dbeel

Maria also to do (but it's too slow from home you can do this much faster and it will save a bear!!) dump of rivers to dbeel

pg_dump -U postgres --table spain.rivers -f "spain.rivers.sql" eda2.0
psql -U postgres  -c "create extension ltree" dbeel
psql -U postgres -f "spain.rivers.sql" dbeel

in SQL

ALTER TABLE spain.rivers
 ALTER COLUMN geom TYPE geometry(MultiLineString,3035) 
  USING ST_Transform(geom,3035);

REINDEX TABLE spain.rivers;

-- DUMP QUERIES USED BY CEDRIC (for memory)


pg_dump -U postgres --table sudoang.spain_obstruction --table sudoang.catalonia_obstruction --table sudoang.catalonia_obstruction_dams --table sudoang.catalonia_obstruction_hpp --table sudoang.minho_obstruction --table sudoang.amber  --table sudoang.fcul_obstruction --table sudoang.galicia_obstruction  --table sudoang.catalonia_obstruction_join_dams_hpp  --table sudoang.spain_obstruction_in_spain  --table sudoang.spain_obstruction_not_in_spain -f "all_obstacle_tables_except_dbeel.sql" dbeel

pg_dump -U postgres --table sudoang.dbeel_physical_obstruction  --table sudoang.dbeel_obstruction_place -f "dbeel_tables.sql" dbeel

pg_dump -U postgres --table sudoang.grandes_barragens --table sudoang.mini_hidricas -f "portugese_dam.sql" dbeel
Last modified 6 years ago Last modified on Jan 4, 2019 12:39:16 PM