Converting mixed EPSG codes to a unified CRS in Python

To convert mixed EPSG codes to a unified CRS in Python, read each dataset’s spatial reference, validate it against the EPSG registry using pyproj, and apply GeoDataFrame.to_crs() or rasterio.warp.reproject() to your target projection. The most reliable ETL pattern wraps this in a validation layer that catches undefined, deprecated, or malformed CRS strings before transformation, logs discrepancies, and enforces strict or permissive fallback behavior depending on pipeline tolerance.

Because a single GeoDataFrame can only store one .crs attribute, handling truly mixed EPSG codes requires processing multiple files or frames sequentially, normalizing each, and concatenating the results. Below is the production-ready pattern used by GIS analysts and data engineers for spatial consistency.

Core Validation & Transformation Workflow

  1. Parse & Resolve: Use pyproj.CRS.from_user_input() to normalize WKT, PROJ strings, or EPSG integers into a canonical CRS object.
  2. Registry Validation: Call .to_epsg() to verify the code exists in the EPSG Geodetic Parameter Registry. This catches legacy or custom projections that lack official authority codes.
  3. Strict vs Permissive Mode:
  • strict=True: Raises CRSError on undefined, ambiguous, or unresolvable CRS definitions. Ideal for regulated or reproducible pipelines.
  • strict=False: Logs warnings, skips transformation, or assigns the target CRS as a fallback. Useful for exploratory data cleaning.
  1. Batch Transform & Concatenate: Convert each validated frame to the target CRS, then merge using pandas.concat(). This avoids in-place geometry corruption and preserves original metadata for auditing.

Production-Ready Conversion Function

import geopandas as gpd
import pyproj
from pyproj.exceptions import CRSError
import logging
from pathlib import Path
from typing import Iterable, Union
import pandas as pd

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
logger = logging.getLogger(__name__)

def unify_mixed_epsg(
    datasets: Iterable[Union[gpd.GeoDataFrame, Path, str]],
    target_epsg: int = 4326,
    strict: bool = True
) -> gpd.GeoDataFrame:
    """
    Validates and converts multiple datasets with mixed EPSG codes to a unified CRS.
    """
    target_crs = pyproj.CRS.from_epsg(target_epsg)
    logger.info(f"Target CRS: {target_crs.name} (EPSG:{target_epsg})")

    converted_frames = []
    
    for i, ds in enumerate(datasets):
        # Load if path, copy if already a GeoDataFrame
        gdf = gpd.read_file(ds) if isinstance(ds, (Path, str)) else ds.copy()
        
        if gdf.empty:
            logger.warning(f"Dataset {i} is empty. Skipping.")
            continue

        source_crs = gdf.crs
        if source_crs is None:
            if strict:
                raise CRSError(f"Dataset {i} has undefined CRS in strict mode.")
            logger.warning(f"Dataset {i} CRS undefined. Assigning target CRS.")
            gdf.set_crs(target_crs, inplace=True)
            converted_frames.append(gdf)
            continue

        try:
            crs_obj = pyproj.CRS.from_user_input(source_crs)
            epsg = crs_obj.to_epsg()
            if epsg is None:
                logger.warning(f"Dataset {i} lacks EPSG code: {crs_obj.to_string()}")
            else:
                logger.info(f"Dataset {i} validated: EPSG:{epsg}")
        except CRSError as e:
            if strict:
                raise CRSError(f"Dataset {i} invalid CRS: {e}") from e
            logger.error(f"Dataset {i} validation failed: {e}. Proceeding with raw CRS.")

        try:
            converted_frames.append(gdf.to_crs(target_crs))
        except Exception as e:
            logger.error(f"Transformation failed for dataset {i}: {e}")
            if strict:
                raise

    if not converted_frames:
        logger.warning("No valid datasets to concatenate.")
        return gpd.GeoDataFrame()

    return pd.concat(converted_frames, ignore_index=True)

Implementation & Environment Notes

  • Library Baseline: Requires geopandas>=0.12.0 and pyproj>=3.0.0. Older pyproj releases use deprecated initialization patterns that silently ignore malformed WKT or PROJ strings, leading to silent coordinate shifts.
  • PROJ Database Synchronization: pyproj delegates authority lookups to the underlying PROJ data directory. In containerized or air-gapped environments, outdated PROJ databases will reject recently retired codes or fail to resolve authority strings. Enable network fallback with pyproj.network.set_network_enabled(True) or mount /usr/share/proj to ensure registry parity. Reference the official pyproj documentation for environment configuration.
  • Performance Considerations: Avoid repeated .to_crs() calls on the same frame. The function above processes inputs sequentially and concatenates once, minimizing memory fragmentation. For raster-heavy workflows, swap gdf.to_crs() with rasterio.warp.reproject() and align grid resolutions before merging.
  • Geometry Integrity: Always verify output bounds after transformation. Coordinate rounding errors or datum shifts (e.g., NAD27 → WGS84) can introduce sub-meter offsets. Use gdf.total_bounds and shapely.is_valid post-conversion to catch topology degradation.

Pipeline Integration

Embedding this normalization step early prevents downstream spatial join failures, incorrect distance calculations, and visualization misalignment. Teams implementing broader CRS Normalization Across Mixed Datasets strategies typically wrap this function in a DAG node that runs before schema validation and attribute harmonization.

For production deployments, pair the transformation with a metadata ledger that records original EPSG codes, transformation parameters, and validation outcomes. This audit trail satisfies data governance requirements and simplifies debugging when coordinate mismatches surface in analytics or mapping layers. Integrating this validation into your Automated Vector & Raster Cleaning Workflows ensures spatial consistency scales across batch jobs, streaming ingest, and multi-source geospatial lakes.