wiki:CookBook join ROE_CCMv2

Version 8 (modified by cedric, 14 years ago) (diff)

--

back to first page ..
back to Recipes for EDA CookBook Eda

How to join the CCM and ROE database and avoid projection pb, this is version 2 of the script

see Joining ROE and the CCM layer CookBook join ROE_CCM
for the first version.
The trouble is that some dams or electrofishing stations are sometimes badly projected
We should start at step 6

1. putting bd_carthage into postgis

hylcov_arc and hylcov_node are created in a database bd_carthage see CookBook join BDMAP_CCM v2

2. getting distance sea and distance source for bd_carthage

This was also done , though imperfectly (some dist source are missing and there are differerences between ccm and bd_carthage) CookBook join BDMAP_CCM v2
Finally we have a table called hylcov_arc_dist2 where distance_source are included from Cédric calculataions (noeuds), distance_sea correspond to eda1.0 and strahler correpond to Laurent calculations.

3. Get distance into the roe table

31. Saving table geobs2010.obstacle_referentiel from bd_carthage to eda2.0

cd C:\eda\backup
C:\"Program Files"\PostgreSQL\8.4\bin\pg_dump  -U postgres -p 5433 -t geobs2010.obstacle_referentiel eda2.0> obstacle_referentiel.sql
C:\"Program Files"\PostgreSQL\8.4\bin\psql -d bd_carthage -p 5433 -U postgres -f obstacle_referentiel.sql
REM une erreur quand il essaye de mettre le schéma bd_map mais là on a pas besoin d'avoir les données dans un schéma et d'autres liées à l'absence des autres tables..; Voir par la suite si ce n'est pas handicapant mais normalement non car on ne sort qu'une table faisant la jointure...

at this stage it is usefull to check the geometry column table and correct it eventually

32. project stationsp2 again and get table of all projections within 300

Some stations were missing...

  • first create two index to speed up queries
    C:\"Program Files"\PostgreSQL\8.4\bin\psql -d bd_carthage -p 5433 -U postgres -c"CREATE INDEX indexobstacle_referentiel ON obstacle_referentiel USING GIST(ref_position_etrs89 GIST_GEOMETRY_OPS);
    
  -- Here we project dams on the bd_carthage
  -- The result is stored in table roe_bd_carthage 
  -- I'm changing ref_position_etrs89 => the_geom

  
DROP TABLE IF EXISTS roe_bd_carthage;
CREATE TABLE roe_bd_carthage as (
     SELECT distinct on (ref_id) ref_id, dist_source,dist_sea,strahler,id_trhyd, min(distance) as distance, the_geom FROM (
               SELECT ob.ref_id, n.distance_mer as dist_sea,n.strahler ,d.id_trhyd ,d.dist_source_max as dist_source, CAST(distance(d.the_geom, ob.ref_position_etrs89) as  decimal(15,1)) as distance,ob.ref_position_etrs89 as the_geom
               FROM obstacle_referentiel As ob
               INNER JOIN hylcov_arc_dist2 d ON ST_DWithin(d.the_geom, ob.the_geom,300)
                LEFT JOIN noeud_troncon_final n ON d.id_som_f = n.id_bdcarthage -- pour récupérer les strahler
               ORDER BY ob.ref_id) as sub
        GROUP BY ref_id, distance, dist_source,dist_sea,strahler, id_trhyd, the_geom
 );
alter table  roe_bd_carthage add column id serial;
-- mise à jour de la table geometry_columns
INSERT INTO geometry_columns(f_table_catalog, f_table_schema, f_table_name, f_geometry_column, coord_dimension, srid, "type")
SELECT '', 'public', 'roe_bd_carthage', 'the_geom', ST_CoordDim(the_geom), ST_SRID(the_geom), GeometryType(the_geom)
FROM roe_bd_carthage LIMIT 1;

-- creation d'index, clé primaire, et constraintes qui vont bien
alter table roe_bd_carthage add CONSTRAINT enforce_dims_the_geom CHECK (ndims(the_geom) = 2);
alter table roe_bd_carthage add  CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL);
alter table roe_bd_carthage add  CONSTRAINT enforce_srid_the_geom CHECK (srid(the_geom) = 3035);
alter table roe_bd_carthage ADD CONSTRAINT pk_id_roe_bd_carthage PRIMARY KEY(id);
CREATE INDEX indexroe_bd_carthage ON roe_bd_carthage
  USING GIST ( the_geom GIST_GEOMETRY_OPS );

33. Saving back table stationsp2 from bd_carthage to eda

cd C:\eda\backup
C:\"Program Files"\PostgreSQL\8.4\bin\pg_dump  -U postgres -p 5433 -t roe_bd_carthage bd_carthage> roe_bd_carthage.sql
C:\"Program Files"\PostgreSQL\8.4\bin\psql -d eda2.0 -p 5433 -U postgres -c "drop table if exists geobs2010.roe_bd_carthage"
C:\"Program Files"\PostgreSQL\8.4\bin\psql -d eda2.0 -p 5433 -U postgres -f roe_bd_carthage.sql
C:\"Program Files"\PostgreSQL\8.4\bin\psql -d eda2.0 -p 5433 -U postgres -c "alter table roe_bd_carthage set schema geobs2010"

4. define criteria helping to do the projection

-- the table geobs2010.roe_ccm_300 was created in CookBook join ROE_CCM but we need to add some columns

-- launch in eda2.0
DROP TABLE IF EXISTS geobs2010.roe_ccm_300_2;
CREATE TABLE geobs2010.roe_ccm_300_2 as (
               SELECT ref_id, r.cum_len_sea as dist_sea_ccm, r.distance_source as dist_source_ccm, r.strahler as strahler_ccm,gid ,CAST(distance(r.the_geom, s.ref_position_etrs89) as  decimal(15,1)) as distance,s.ref_position_etrs89 
               FROM geobs2010.obstacle_referentiel As s
               INNER JOIN  ccm21.riversegments r ON ST_DWithin(r.the_geom, s.ref_position_etrs89,300)
               ORDER BY  ref_id
);
alter table geobs2010.roe_ccm_300_2 add column id serial;
select * from geobs2010.roe_bd_carthage b full outer join geobs2010.roe_ccm_300_2 c on b.ref_id=c.ref_id;

The table from the request joining geobs2010.roe_bd_carthage with the geobs2010.roe_ccm_300 is exported there (use .csv exort to C:\Documents and Settings\cedric\Mes documents\Migrateur\programmes\workspace3.5\EDAdata\dataEDAccm\export_jointure_roe_ccm.csv source:data/dataEDAccm/export_jointure_roe_ccm.csv

Some processing with R and some results in carto_trial.R
We are trying to see if distance_source distance_sea and strahler are consistent between bd_carthage and ccm

VERSION 1.0 DEPRECATED !!!!!!!!!!!!!!!!!!
# analyse des relations entre distance source et distance mer pour les projections sur la ccm
library(lattice)
library(ggplot2)
ccm_bd<-read.csv(paste(datawd,"/dataEDAccm/export_jointure_bdcarthage_ccm.csv",sep=""),sep=";",stringsAsFactors = FALSE)
str(ccm_bd)
ccm_bd$dist_source_ccm=ccm_bd$dist_source_ccm/1000
ccm_bd$dist_sea_ccm=ccm_bd$dist_sea_ccm/1000
ccm_bd$pbstrahler=ccm_bd$strahler_ccm>(ccm_bd$strahler+1) # which strahler are larger in ccm than in bd_carthage, allowing diff of one 
ratio_dist_mer=ccm_bd$cum_len_sea/ccm_bd$distance_mer
ccm_bd$pb_ratio_dist_mer<-ratio_dist_mer>2&ccm_bd$cum_len_sea>100 # la distance mer est le tripe pour la ccm (peut arriver pour les faibles valeurs)
ratio_sources<-ccm_bd$distance_source_ccm/ccm_bd$distance_source&ccm_bd$cum_len_sea>100
ccm_bd$pbratiosource<-ratio_sources>2 # la distance source ccm est égale à trois fois la distance source bd_carthage

sum(ccm_bd$pbratiosource,na.rm=TRUE)/nrow(ccm_bd) # 16 % 2279
sum(ccm_bd$pb_ratio_dist_mer,na.rm=TRUE)/nrow(ccm_bd) # 0.1 % 15
sum(ccm_bd$pbstrahler,na.rm=TRUE)/nrow(ccm_bd) #3% 517
sum(!ccm_bd$pbratiosource&!ccm_bd$pbstrahler&!ccm_bd$pb_ratio_dist_mer,na.rm=TRUE)/nrow(ccm_bd) # on garde 79 % des données
# hitograms
layout(matrix(c(1,2), 1, 2, byrow = TRUE))
hist(ratio_dist_mer[ratio_dist_mer<2],100,main="ratio dist. sea")
abline(v=2,col="red")
mtext(paste(round(sum(ccm_bd$pb_ratio_dist_mer,na.rm=TRUE)/nrow(ccm_bd),2)*100,"% out"),col="red")
hist(ratio_sources[ratio_sources<10],100,main="ratio dist source")
abline(v=3,col="red")
mtext(paste(round(sum(ccm_bd$pbratiosource,na.rm=TRUE)/nrow(ccm_bd),3)*100,"% out"),col="red")
# plots
g<-ggplot(ccm_bd)
g+geom_point(aes(x=distance_source,y=distance_source_ccm,colour=pbstrahler))
g+geom_point(aes(distance_mer,cum_len_sea,colour=pb_ratio_dist_mer))+geom_abline(slope=1,colour="red") +geom_abline(slope=1,intercept=1,colour="green") 
#g+geom_point(aes(distance_mer,cum_len_sea,colour=pbratiosource))+geom_abline(slope=1,colour="red")
g+geom_jitter(aes(x=strahler,y=strahlerccm,colour=pbstrahler),alpha=0.3)
g1<-g+geom_point(aes(x=distance_source,y=distance_source_ccm,colour=pbratiosource))+geom_abline(slope=2,colour="red")
ccm_bd1<-ccm_bd[ccm_bd$pbstrahler,]
g1+geom_point(aes(x=distance_source,y=distance_source_ccm),size=2,data=ccm_bd1,pch=1)

The following criteria have been used to select stations v1
For distance_sea(ccm) > 100 distance_seaccm/distance_sea_bd_carthage<=2 => we mostly strive to avoid the projection of stations on short tributaries on larger streams from the ccm
Distance_source(ccm)/distance_source(bd_carthage)<=2
ccm_bd$strahlerccm-ccm_bd$strahler<=1
Finally 78% lines kept but this include doubles
source:data/Docs/trac/BDMAP/percentage_out.jpeg source:data/Docs/trac/BDMAP/comp_dist_source.jpeg source:data/Docs/trac/BDMAP/comp_dist_mer.jpeg source:data/Docs/trac/BDMAP/com_strahler.jpeg

VERSION 3.0

#---------------------------------------------------------------------------------------------
#---------------------------------------------------------------------------------------------
# analyse des relations entre distance source et distance mer pour les projections sur la ccm
#---------------------------------------------------------------------------------------------
#---------------------------------------------------------------------------------------------
source("EDACCM/init.r")
library(lattice)
library(ggplot2)
ccm_bd<-read.csv(paste(datawd,"/dataEDAccm/export_jointure_bdcarthage_ccm.csv",sep=""),sep=";",stringsAsFactors = FALSE)
str(ccm_bd)
ccm_bd$dist_source_ccm=ccm_bd$dist_source_ccm/1000
ccm_bd$dist_sea_ccm=ccm_bd$dist_sea_ccm/1000
ccm_bd$pbstrahler=ccm_bd$strahler_ccm>(ccm_bd$strahler+1) # which strahler are larger in ccm than in bd_carthage, allowing diff of one 
ratio_dist_sea=ccm_bd$dist_sea_ccm/ccm_bd$dist_sea
ccm_bd$pb_ratio_dist_sea<-ratio_dist_sea>2&ccm_bd$dist_sea_ccm>100 # la distance mer est le tripe pour la ccm (peut arriver pour les faibles valeurs)
layout(matrix(c(1,2), 2, 1, byrow = TRUE))
hist(ccm_bd$dist_source,100,ylim=c(50,15000))
hist(ccm_bd$dist_source,100,ylim=c(0,50))
diffsource<-ccm_bd$dist_source_ccm-ccm_bd$dist_source
ratiosource<-ccm_bd$dist_source_ccm/ccm_bd$dist_source
ccm_bd$pbdiffsource<-!((diffsource>-20&diffsource< 20)|(ratiosource>0.7&ratiosource<1.3))
sum(ccm_bd$pbdiffsource,na.rm=TRUE)/nrow(ccm_bd) # 14 % 2027
sum(ccm_bd$pb_ratio_dist_sea,na.rm=TRUE)/nrow(ccm_bd) # 12 0.00087
sum(ccm_bd$pbstrahler,na.rm=TRUE)/nrow(ccm_bd) #3% 457
#sum(!ccm_bd$pbdiffsource&!ccm_bd$pbstrahler&!ccm_bd$pb_ratio_dist_sea,na.rm=TRUE)/nrow(ccm_bd) # 64 % si on garde strahler
# finally I don't trust there is a strahler rank pb
sum(!ccm_bd$pbdiffsource&!ccm_bd$pb_ratio_dist_sea,na.rm=TRUE)/nrow(ccm_bd)  # on garde 74% des données
# histograms
layout(matrix(c(1,2), 1, 2, byrow = TRUE))
hist(ratio_dist_sea[ratio_dist_sea<4],100,main="ratio dist. sea")
abline(v=2,col="red")
mtext(paste(round(sum(ccm_bd$pb_ratio_dist_sea,na.rm=TRUE)/nrow(ccm_bd),2)*100,"% out"),col="red")
hist(diffsource,100,main="ratio dist source",100)
abline(v=-20,col="red")
abline(v=20,col="red")
mtext(paste(round(sum(ccm_bd$pbdiffsource,na.rm=TRUE)/nrow(ccm_bd),3)*100,"% out"),col="red")
# plots
g<-ggplot(ccm_bd)
breaks=as.vector(c(1, 2, 5) %o% 10^(-1:3))
g+geom_point(aes(x=dist_source,y=dist_source_ccm,colour=pbdiffsource),alpha=0.5,size=0.3)+
		geom_abline(slope=1.3,colour="blue",lty=2)+
		geom_abline(slope=0.7,colour="blue",lty=2)+
		#scale_x_log10(name="distance source bd carthage (log scale)",breaks = breaks, labels = breaks,limits=c(1,1000))+
		#scale_y_log10(name="distance source ccm (log scale)",breaks = breaks, labels = breaks,limits=c(1,1000))	+	
		geom_abline(slope=1,colour="brown")
### same with a log scale

breaks=as.vector(c(1, 2, 5) %o% 10^(-1:3))
g1<-g+geom_point(aes(x=dist_source,y=dist_source_ccm,colour=pbdiffsource),alpha=0.5,size=0.3)+
		geom_hline(yintercept=1.3,colour="green")+	
		geom_vline(xintercept=1.3,colour="green")+	
		scale_x_log10(name="distance source bd carthage (log scale)",breaks = breaks, labels = breaks,limits=c(1,1000))+
		scale_y_log10(name="distance source ccm (log scale)",breaks = breaks, labels = breaks,limits=c(1,1000))	+	
		geom_abline(slope=1,colour="brown")
ccm_bd1<-ccm_bd[ccm_bd$pbstrahler,]
g1+geom_point(aes(x=dist_source,y=dist_source_ccm),size=2,data=ccm_bd1,pch=1)

g+geom_point(aes(dist_sea,dist_sea_ccm,colour=pb_ratio_dist_sea))+geom_abline(slope=1,colour="red") +geom_abline(slope=1,intercept=1,colour="green") 
#g+geom_point(aes(dist_sea,dist_sea_ccm,colour=pbdiffsource))+geom_abline(slope=1,colour="red") +geom_abline(slope=1,intercept=1,colour="green") 

g+geom_jitter(aes(x=strahler,y=strahler_ccm,colour=pbstrahler),alpha=0.3)

Criteria have been used to select stations (v3)
dist sea unchanged, strahler might detect some pb but less than distance source so left out
For the distance source, final choice below = either a difference less than 20 km or a ratio departing less than 0.3 from 1

source:data/Docs/trac/BDMAP/comp_dist_source_1.jpeg source:data/Docs/trac/BDMAP/comp_dist_source_2.jpeg source:data/Docs/trac/BDMAP/percentage_out1.jpeg

5. query to select the final data from geobs2010.bdmap_ccm2

select count(*) from geobs2010.bdmap_ccm2; --13695 this one is with repeated projection on ccm riversegments within 300 m
select count(*) from geobs2010.bdmap_ccm; --8884 this is without selection (first treatment) for the riversegement corresponding to the lowest proj dist
/*
REQUEST WITH SELECTION OF CORRECT CRITERIA FOR dist SEA, dist SOURCE, (STRAHLER RANK not taken)
*/
select * from (
                select b.st_codecsp, 
                b.dist_source as dist_source_bdcar,
                c.dist_source_ccm/1000 as dist_source_ccm,
                (((c.dist_source_ccm/(1000*b.dist_source))<1.3) AND
                (c.dist_source_ccm/(1000*b.dist_source))>0.7) OR
                ((c.dist_source_ccm/1000-b.dist_source)> -20 AND
                (c.dist_source_ccm/1000-b.dist_source)<20) AS dist_source_ratio,
                b.dist_sea as dist_sea_bdcar,
                c.dist_sea_ccm/1000 as dist_sea_ccm,
                NOT((c.dist_sea_ccm/(1000*b.dist_sea))>2 and c.dist_sea_ccm>100000) as dist_sea_ratio,
                b.strahler as strahler_bdcar,
                c.strahler_ccm as strahler_ccm,
                (c.strahler_ccm-b.strahler)<=1 as strahler_diff,
                b.distance as distproj_bdcar,
                c.distance as distproj_ccm,
                id_trhyd,
                gid,
                c.the_geom
                from geobs2010.roe_bd_carthage b join geobs2010.bdmap_ccm2 c on b.st_codecsp=c.st_codecsp
                where b.dist_sea>0) as sub
WHERE (dist_source_ratio is TRUE OR dist_source_ratio IS NULL)
AND dist_sea_ratio is TRUE
--AND strahler_diff is TRUE; 
--9837 lines this is with repeated stations but selection has removed ~ 3000 lines
/*
FINAL REQUEST
*/
drop table if exists geobs2010.bdmap_ccm_final;
create table geobs2010.bdmap_ccm_final as (        
select distinct on(st_codecsp) st_codecsp,
        gid,
        id_trhyd,
        dist_source_bdcar,
        dist_source_ccm,
        dist_sea_bdcar,
        dist_sea_ccm,
        strahler_bdcar,
        strahler_ccm,
        distproj_bdcar,
        min(distproj_ccm) as distproj_ccm,
        the_geom
        from( select *  
                from (
                select b.st_codecsp, 
                        b.dist_source as dist_source_bdcar,
                        c.dist_source_ccm/1000 as dist_source_ccm,
                        (((c.dist_source_ccm/(1000*b.dist_source))<1.3) AND
                        (c.dist_source_ccm/(1000*b.dist_source))>0.7) OR
                        ((c.dist_source_ccm/1000-b.dist_source)> -20 AND
                        (c.dist_source_ccm/1000-b.dist_source)<20) AS dist_source_ratio,
                        b.dist_sea as dist_sea_bdcar,
                        c.dist_sea_ccm/1000 as dist_sea_ccm,
                        NOT((c.dist_sea_ccm/(1000*b.dist_sea))>2 and c.dist_sea_ccm>100000) as dist_sea_ratio,
                        b.strahler as strahler_bdcar,
                        c.strahler_ccm as strahler_ccm,
                        (c.strahler_ccm-b.strahler)<=1 as strahler_diff,
                        b.distance as distproj_bdcar,
                        c.distance as distproj_ccm,
                        id_trhyd,
                        gid,
                        c.the_geom
                        from geobs2010.roe_bd_carthage b join geobs2010.bdmap_ccm2 c on b.st_codecsp=c.st_codecsp
                        where b.dist_sea>0
                ) as sub
                WHERE (dist_source_ratio is TRUE or dist_source_ratio is NULL)
                AND dist_sea_ratio is TRUE
                --AND strahler_diff is TRUE
        ) as sub1 
        group by st_codecsp, gid,id_trhyd,dist_source_bdcar,dist_source_ccm,dist_sea_bdcar,dist_sea_ccm,strahler_bdcar,strahler_ccm,distproj_bdcar,     distproj_ccm,
        the_geom
        order by st_codecsp 
) ;

/*
* ANALYSIS OF WITHDRAWN NUMBERS
*/
select count(*) from geobs2010.roe_bd_carthage where dist_source=0 ;-- 0
select count(*) from geobs2010.roe_bd_carthage where dist_sea=0; -- 5
-- voir comment on gère ces cas particuliers par la suite....
select sum(cast(not(dist_sea_ratio) as integer)) as sum_pb_sea,
        sum(cast(not(dist_source_ratio) as integer)) as sum_pb_source,
        sum(cast((dist_source_ratio is NULL) as integer)) as sum_pb_source_NULL,
        sum(cast(not(strahler_diff) as integer)) as sum_pb_strahler
        from (
                select b.st_codecsp, 
                b.dist_source as dist_source_bdcar,
                c.dist_source_ccm/1000 as dist_source_ccm,
                (((c.dist_source_ccm/(1000*b.dist_source))<1.3) AND
                (c.dist_source_ccm/(1000*b.dist_source))>0.7) OR
                ((c.dist_source_ccm/1000-b.dist_source)> -20 AND
                (c.dist_source_ccm/1000-b.dist_source)<20) AS dist_source_ratio,
                b.dist_sea as dist_sea_bdcar,
                c.dist_sea_ccm/1000 as dist_sea_ccm,
                NOT((c.dist_sea_ccm/(1000*b.dist_sea))>2 and c.dist_sea_ccm>100000) as dist_sea_ratio,
                b.strahler as strahler_bdcar,
                c.strahler_ccm as strahler_ccm,
                (c.strahler_ccm-b.strahler)<=1 as strahler_diff,
                b.distance as distproj_bdcar,
                c.distance as distproj_ccm,
                id_trhyd,
                gid,
                c.the_geom
                from geobs2010.roe_bd_carthage b join geobs2010.bdmap_ccm2 c on b.st_codecsp=c.st_codecsp
                where b.dist_sea>0
        ) as sub; -- 12 (R=12), 1814-169 NULLS (R=2027 including zero), 457 (R=457) OK
/*
* TABLE TO SHOW THE RESULTS IN A MAP FOR THOSE THAT WERE NOT SELECTED TOO
*/

drop table if exists geobs2010.bdmap_ccm_full;
create table geobs2010.bdmap_ccm_full as (
        select b.st_codecsp, 
        b.dist_source as dist_source_bdcar,
        c.dist_source_ccm/1000 as dist_source_ccm,
        (((c.dist_source_ccm/(1000*b.dist_source))<1.3) AND
        (c.dist_source_ccm/(1000*b.dist_source))>0.7) OR
        ((c.dist_source_ccm/1000-b.dist_source)> -20 AND
        (c.dist_source_ccm/1000-b.dist_source)<20) AS dist_source_ratio,
        b.dist_sea as dist_sea_bdcar,
        c.dist_sea_ccm/1000 as dist_sea_ccm,
        NOT((c.dist_sea_ccm/(1000*b.dist_sea))>2 and c.dist_sea_ccm>100000) as dist_sea_ratio,
        b.strahler as strahler_bdcar,
        c.strahler_ccm as strahler_ccm,
        (c.strahler_ccm-b.strahler)<=1 as strahler_diff,
        b.distance as distproj_bdcar,
        c.distance as distproj_ccm,
        id_trhyd,
        gid,
        c.the_geom
        from geobs2010.roe_bd_carthage b join geobs2010.bdmap_ccm2 c on b.st_codecsp=c.st_codecsp
        where b.dist_sea>0);

v0.1 There is a problem of distance source too short for bd_carthage this trouble was already seen on the first figure on this page

source:data/Docs/trac/BDMAP/stations_selection_1.jpg

Reference for the qgis project for Cédric c/eda/bdmpa/projection_bdmap.qgs En normandie, pas mal de segments manquants.. Pourquoi ? Le ratio distance source est null, l'autoriser Problèmes de calculs bd_carthage sur les rivières en réseau...

10896 station sp2=> 8884 projetées à 300m => 6896 après projection final Après examen détaillé il s'agit plus de calculs de distances sources que de réels problèmes de projection. A vérifier sur les pb de rang de strahler....