Skip to content

Conversation

@dsmedia
Copy link
Collaborator

@dsmedia dsmedia commented Oct 31, 2025

This PR corrects in airports.csv the coordinate signs for eight airports in US Pacific territories that were incorrectly placed in opposite hemispheres. The errors were discovered while preparing metadata updates for airports.csv in #728 and were cross-checked with FAA data and visualized on Google Maps for confirmation. The errors do not appear to have been added intentionally when the dataset was introduced into this repo 10 years ago. No impact expected on Vega/VL/Altair gallery examples. Only known use of airports.csv data in galleries are flight map charts, and these eight airports do not appear in the flight schedule data.

Problem

These airports had incorrect hemisphere signs, causing them to appear thousands of miles from their actual locations. No other major discrepancies founds in airports.csv when compared against official FAA coordinates.

Changes

### American Samoa airports - Latitude sign corrected from positive to negative (Northern → Southern Hemisphere) | Expand to see Google Maps links
Airport Code INCORRECT (Northern Pacific) CORRECT (American Samoa)
Fitiuta FAQ 14.21577583, -169.4239058
14.216°N, 169.424°W
View on map
-14.21577583, -169.4239058
14.216°S, 169.424°W
View on map
Pago Pago International PPG 14.33102278, -170.7105258
14.331°N, 170.711°W
View on map
-14.33102278, -170.7105258
14.331°S, 170.711°W
View on map
Ofu Z08 14.18435056, -169.6700236
14.184°N, 169.670°W
View on map
-14.18435056, -169.6700236
14.184°S, 169.670°W
View on map
### Mariana Islands/Guam airports - Longitude sign corrected from negative to positive (Western → Eastern Hemisphere) | Expand to see Google Maps links
Airport Code INCORRECT (Eastern Pacific) CORRECT (Western Pacific)
Rota International GRO 14.1743075, -145.2425353
14.174°N, 145.243°W
View on map
14.1743075, 145.2425353
14.174°N, 145.243°E
View on map
Saipan International GSN 15.11900139, -145.7293561
15.119°N, 145.729°W
View on map
15.11900139, 145.7293561
15.119°N, 145.729°E
View on map
West Tinian TNI 14.99685028, -145.6180383
14.997°N, 145.618°W
View on map
14.99685028, 145.6180383
14.997°N, 145.618°E
View on map
Pagan Airstrip TT01 18.12444444, -145.7686111
18.124°N, 145.769°W
View on map
18.12444444, 145.7686111
18.124°N, 145.769°E
View on map
Guam International GUM 13.48345, -144.7959825
13.483°N, 144.796°W
View on map
13.48345, 144.7959825
13.483°N, 144.796°E
View on map

Verification

Coordinates verified against FAA National Airspace System Resource (NASR) data (30 Oct 2025 release). This change only flips the hemisphere signs; the original coordinate precision is preserved.

Impact

  • 8 airports will now appear in their correct geographic locations
  • No changes to coordinate precision or other airport data
  • American Samoa airports moved ~3,168 km from Northern Pacific to their actual South Pacific locations
  • Mariana Islands/Guam airports moved ~7,382 km from Eastern Pacific to their actual Western Pacific locations

Flight Data Impact

None of these 8 airports appear in any of the flights datasets in this repository (flights-airport.csv, flights-2k.json, flights-5k.json, flights-10k.json, flights-20k.json, flights-200k.json, flights-3m.parquet, flights-200k.arrow). These are small regional airports in remote Pacific territories that don't have commercial flights in the main US flight tracking datasets. This fix will only impact direct usage of airports.csv for mapping or geographic analysis.

Replication code for data table comparison with FAA and map visualiation with Leafmap
#!/usr/bin/env python3
"""
Generate airport coordinate comparison and reference imagery.

This script:
1. Downloads current FAA NASR data
2. Reads vega-datasets airports.csv
3. Identifies airports with coordinate discrepancies
4. Generates both a comparison table (markdown) and reference panel (PNG)

All coordinates are read programmatically from data sources.
"""

import math
import io
import requests
import zipfile
from pathlib import Path
from PIL import Image, ImageDraw, ImageFont
import pandas as pd


def download_faa_data(url: str, extracted_file: str) -> pd.DataFrame:
    """
    Download and extract FAA NASR airport data.

    Args:
        url: URL to FAA NASR ZIP file
        extracted_file: Name of CSV file within the ZIP

    Returns:
        DataFrame with FAA airport data
    """
    print(f"Downloading FAA data from {url}...")
    response = requests.get(url)

    print(f"Extracting {extracted_file}...")
    with zipfile.ZipFile(io.BytesIO(response.content), 'r') as zip_ref:
        zip_ref.extract(extracted_file)

    # Columns 44-46 need to be read as strings to avoid precision issues
    dtype_spec = {44: 'object', 45: 'object', 46: 'object'}
    faa_data = pd.read_csv(extracted_file, dtype=dtype_spec)

    print(f"✓ Loaded {len(faa_data)} airports from FAA NASR")
    return faa_data


def load_vega_airports(csv_path: str) -> pd.DataFrame:
    """
    Load vega-datasets airports.csv.

    Args:
        csv_path: Path to local airports.csv file

    Returns:
        DataFrame with vega airports data
    """
    print(f"Loading vega-datasets airports from {csv_path}...")
    airports = pd.read_csv(csv_path)
    print(f"✓ Loaded {len(airports)} airports from vega-datasets")
    return airports


def find_coordinate_discrepancies(
    vega_df: pd.DataFrame,
    faa_df: pd.DataFrame,
    iata_codes: list[str] | None = None,
    threshold: float = 0.01
) -> pd.DataFrame:
    """
    Find airports with coordinate discrepancies between vega and FAA data.

    Args:
        vega_df: Vega datasets airports DataFrame
        faa_df: FAA NASR airports DataFrame
        iata_codes: Optional list of specific IATA codes to check
        threshold: Minimum difference to consider a discrepancy (degrees)

    Returns:
        DataFrame with airports that have discrepancies
    """
    print("Comparing coordinates...")

    if iata_codes:
        vega_subset = vega_df[vega_df['iata'].isin(iata_codes)].copy()
    else:
        vega_subset = vega_df.copy()

    # Merge with FAA data
    merged = vega_subset.merge(
        faa_df[['ARPT_ID', 'LAT_DECIMAL', 'LONG_DECIMAL', 'ARPT_NAME']],
        left_on='iata',
        right_on='ARPT_ID',
        how='inner'
    )

    # Calculate differences
    merged['lat_diff'] = abs(merged['latitude'] - merged['LAT_DECIMAL'])
    merged['lon_diff'] = abs(merged['longitude'] - merged['LONG_DECIMAL'])

    # Find discrepancies
    discrepancies = merged[
        (merged['lat_diff'] > threshold) | (merged['lon_diff'] > threshold)
    ].copy()

    print(f"✓ Found {len(discrepancies)} airports with coordinate discrepancies")

    return discrepancies


def generate_comparison_table(
    vega_df: pd.DataFrame,
    faa_df: pd.DataFrame,
    iata_codes: list[str],
    output_path: str = "pr/coordinate_comparison.md"
) -> None:
    """
    Generate markdown comparison table.

    Args:
        vega_df: Vega datasets airports DataFrame
        faa_df: FAA NASR airports DataFrame
        iata_codes: List of IATA codes to include
        output_path: Path to save markdown file
    """
    print(f"Generating comparison table...")

    # Prepare vega data
    vega_for_comparison = vega_df[vega_df['iata'].isin(iata_codes)].copy()
    vega_for_comparison['Coordinates (Vega)'] = vega_for_comparison.apply(
        lambda row: f"({round(row['latitude'], 4)}, {round(row['longitude'], 4)})",
        axis=1
    )
    vega_for_comparison = vega_for_comparison[
        ['iata', 'name', 'city', 'state', 'country', 'Coordinates (Vega)']
    ].rename(columns={
        'iata': 'IATA',
        'name': 'Name',
        'city': 'City',
        'state': 'State',
        'country': 'Country'
    })

    # Prepare FAA data
    faa_coords = faa_df[faa_df['ARPT_ID'].isin(iata_codes)][
        ['ARPT_ID', 'LAT_DECIMAL', 'LONG_DECIMAL']
    ].copy()
    faa_coords['Coordinates (FAA)'] = faa_coords.apply(
        lambda row: f"({round(row['LAT_DECIMAL'], 4)}, {round(row['LONG_DECIMAL'], 4)})",
        axis=1
    )
    faa_coords = faa_coords[['ARPT_ID', 'Coordinates (FAA)']].rename(
        columns={'ARPT_ID': 'IATA'}
    )

    # Merge and sort
    comparison_table = pd.merge(vega_for_comparison, faa_coords, on='IATA', how='left')
    comparison_table_sorted = comparison_table.sort_values(by='State')

    # Save to markdown
    with open(output_path, 'w') as f:
        f.write("# Coordinate Comparison: Vega Datasets vs FAA Data\n\n")
        f.write("Comparison of problematic airport coordinates between Vega Datasets and FAA official data.\n\n")
        f.write(comparison_table_sorted.to_markdown(index=False))
        f.write("\n")

    print(f"✓ Saved comparison table to {output_path}")


# Tile fetching and image generation functions
def deg2num(lat_deg: float, lon_deg: float, zoom: int) -> tuple[int, int]:
    """Convert lat/lon to tile coordinates."""
    lat_rad = math.radians(lat_deg)
    n = 2.0 ** zoom
    xtile = int((lon_deg + 180.0) / 360.0 * n)
    ytile = int((1.0 - math.asinh(math.tan(lat_rad)) / math.pi) / 2.0 * n)
    return (xtile, ytile)


def deg2num_precise(lat_deg: float, lon_deg: float, zoom: int) -> tuple[float, float]:
    """Convert lat/lon to precise tile coordinates (with fractional parts)."""
    lat_rad = math.radians(lat_deg)
    n = 2.0 ** zoom
    xtile = (lon_deg + 180.0) / 360.0 * n
    ytile = (1.0 - math.asinh(math.tan(lat_rad)) / math.pi) / 2.0 * n
    return (xtile, ytile)


def get_satellite_tile(xtile: int, ytile: int, zoom: int) -> Image.Image | None:
    """Fetch a satellite tile from ESRI World Imagery."""
    url = f"https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{zoom}/{ytile}/{xtile}"
    try:
        response = requests.get(url, timeout=10)
        if response.status_code == 200:
            return Image.open(io.BytesIO(response.content))
    except Exception as e:
        print(f"   Warning: Failed to fetch tile {xtile},{ytile}: {e}")
    return None


def create_airport_satellite_image(
    iata: str,
    name: str,
    location: str,
    lat: float,
    lon: float,
    width: int = 600,
    height: int = 400,
    zoom: int = 13
) -> Image.Image:
    """
    Create a satellite image for an airport by fetching tiles.

    Args:
        iata: Airport IATA code
        name: Airport name
        location: Location description (territory/state)
        lat: Latitude (FAA official)
        lon: Longitude (FAA official)
        width: Image width in pixels
        height: Image height in pixels
        zoom: Tile zoom level

    Returns:
        PIL Image with satellite view and text overlay
    """
    # Get the tile containing the airport center
    center_x, center_y = deg2num(lat, lon, zoom)

    # Get precise position within tile grid (with fractional parts)
    precise_x, precise_y = deg2num_precise(lat, lon, zoom)

    # Fetch 3x3 grid of tiles centered on the airport
    tiles = []
    for dy in [-1, 0, 1]:
        row = []
        for dx in [-1, 0, 1]:
            tile = get_satellite_tile(center_x + dx, center_y + dy, zoom)
            if tile is None:
                # Create blank tile if fetch fails
                tile = Image.new('RGB', (256, 256), color='gray')
            row.append(tile)
        tiles.append(row)

    # Composite tiles into single image (3x3 tiles = 768x768 pixels)
    tile_composite = Image.new('RGB', (768, 768))
    for i, row in enumerate(tiles):
        for j, tile in enumerate(row):
            tile_composite.paste(tile, (j * 256, i * 256))

    # Calculate exact pixel position of airport in the 3x3 tile grid
    airport_pixel_x = ((precise_x - (center_x - 1)) * 256)
    airport_pixel_y = ((precise_y - (center_y - 1)) * 256)

    # Center the crop on the airport's exact pixel position
    crop_x = int(airport_pixel_x - width // 2)
    crop_y = int(airport_pixel_y - height // 2)

    # Ensure crop stays within bounds
    crop_x = max(0, min(crop_x, 768 - width))
    crop_y = max(0, min(crop_y, 768 - height))

    img = tile_composite.crop((crop_x, crop_y, crop_x + width, crop_y + height))

    # Add green dot marker with black border
    draw = ImageDraw.Draw(img)
    marker_x = int(airport_pixel_x - crop_x)
    marker_y = int(airport_pixel_y - crop_y)

    # Green dot with black border
    dot_radius = 8
    draw.ellipse(
        [marker_x - dot_radius, marker_y - dot_radius,
         marker_x + dot_radius, marker_y + dot_radius],
        fill='lime',
        outline='black',
        width=3
    )

    # Add text overlay
    try:
        font_large = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 32)
        font_medium = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 20)
        font_small = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 16)
    except Exception:
        font_large = ImageFont.load_default()
        font_medium = ImageFont.load_default()
        font_small = ImageFont.load_default()

    # Semi-transparent overlays
    overlay = Image.new('RGBA', img.size, (0, 0, 0, 0))
    overlay_draw = ImageDraw.Draw(overlay)

    # Top section
    overlay_draw.rectangle([(0, 0), (width, 100)], fill=(0, 0, 0, 180))
    # Bottom section
    overlay_draw.rectangle([(0, height - 60), (width, height)], fill=(0, 0, 0, 180))

    img = Image.alpha_composite(img.convert('RGBA'), overlay).convert('RGB')
    draw = ImageDraw.Draw(img)

    # Draw text - top
    draw.text((10, 10), iata, fill="white", font=font_large)
    draw.text((10, 50), name, fill="white", font=font_medium)
    draw.text((10, 75), location, fill="lightgray", font=font_small)

    # Draw coordinates - bottom (using signed decimal degrees to match table)
    coord_str = f"({lat:.4f}, {lon:.4f})"
    draw.text((10, height - 45), coord_str, fill="lightgreen", font=font_medium)
    draw.text((10, height - 20), "FAA NASR Oct 2025", fill="lightgray", font=font_small)

    return img


def generate_reference_panel(
    faa_df: pd.DataFrame,
    vega_df: pd.DataFrame,
    iata_codes: list[str],
    output_path: str = "pr/airports_reference_panel.png",
    cols: int = 2,
    panel_width: int = 600,
    panel_height: int = 400,
    zoom: int = 13,
    margin: int = 15,
    title: str = "Pacific Island Airports - FAA Official Coordinates",
) -> None:
    """
    Generate a grid panel of airport satellite images.

    Args:
        faa_df: FAA NASR DataFrame with official coordinates
        vega_df: Vega datasets DataFrame (for location info)
        iata_codes: List of IATA codes to include
        output_path: Path to save the composite image
        cols: Number of columns in grid
        panel_width: Width of each panel
        panel_height: Height of each panel
        zoom: Zoom level for satellite imagery
        margin: Margin between panels
        title: Title for the panel
    """
    rows = (len(iata_codes) + cols - 1) // cols

    # Calculate composite dimensions
    title_height = 80
    footnote_height = 50
    composite_width = (panel_width * cols) + (margin * (cols + 1))
    composite_height = title_height + (panel_height * rows) + (margin * (rows + 1)) + footnote_height

    # Create blank composite
    composite = Image.new('RGB', (composite_width, composite_height), color='black')
    draw = ImageDraw.Draw(composite)

    # Add title
    try:
        title_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 40)
        subtitle_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 20)
        footnote_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 16)
    except Exception:
        title_font = ImageFont.load_default()
        subtitle_font = ImageFont.load_default()
        footnote_font = ImageFont.load_default()

    draw.text((margin, 15), title, fill="white", font=title_font)
    draw.text(
        (margin, 58),
        "Corrected coordinates showing actual airport locations",
        fill="lightgray",
        font=subtitle_font,
    )

    # Generate and place each airport image
    print(f"Generating {len(iata_codes)} airport satellite images...")
    for idx, iata in enumerate(iata_codes):
        # Get FAA data
        faa_row = faa_df[faa_df['ARPT_ID'] == iata].iloc[0]
        lat = faa_row['LAT_DECIMAL']
        lon = faa_row['LONG_DECIMAL']

        # Get vega data for location info
        vega_row = vega_df[vega_df['iata'] == iata].iloc[0]
        name = vega_row['name']
        state = vega_row['state']

        # Determine location label
        if state == 'AS':
            location = "American Samoa"
        elif state == 'CQ':
            location = "N. Mariana Islands"
        elif state == 'GU':
            location = "Guam"
        else:
            location = state

        print(f"  [{idx+1}/{len(iata_codes)}] {iata} - {name}...")

        # Generate satellite image
        airport_img = create_airport_satellite_image(
            iata, name, location, lat, lon, panel_width, panel_height, zoom
        )

        # Calculate position in grid
        row = idx // cols
        col = idx % cols

        x = margin + (col * (panel_width + margin))
        y = title_height + margin + (row * (panel_height + margin))

        # Paste into composite
        composite.paste(airport_img, (x, y))

    # Add footnote explaining coordinate convention
    footnote_y = composite_height - footnote_height + 10
    draw.text(
        (margin, footnote_y),
        "Coordinate Convention: (latitude, longitude) in signed decimal degrees",
        fill="lightgray",
        font=footnote_font,
    )
    draw.text(
        (margin, footnote_y + 22),
        "Negative latitude = South (S) | Positive latitude = North (N) | Negative longitude = West (W) | Positive longitude = East (E)",
        fill="darkgray",
        font=footnote_font,
    )

    # Save composite
    composite.save(output_path, quality=95)
    print(f"✓ Panel saved to: {output_path}")
    print(f"   Dimensions: {composite_width}x{composite_height}px")
    print(f"   Grid: {rows} rows × {cols} columns")


def main():
    """Main execution function."""
    print("="*80)
    print("AIRPORT COORDINATE REFERENCE GENERATOR")
    print("="*80)
    print()

    # Configuration
    FAA_URL = "https://nfdc.faa.gov/webContent/28DaySub/extra/30_Oct_2025_APT_CSV.zip"
    FAA_FILE = "APT_BASE.csv"
    VEGA_CSV = "data/airports.csv"
    PROBLEMATIC_IATA = ['FAQ', 'GRO', 'GSN', 'GUM', 'PPG', 'TNI', 'TT01', 'Z08']

    # Step 1: Download FAA data
    faa_df = download_faa_data(FAA_URL, FAA_FILE)

    # Step 2: Load vega-datasets
    vega_df = load_vega_airports(VEGA_CSV)

    # Step 3: Generate comparison table
    print()
    generate_comparison_table(vega_df, faa_df, PROBLEMATIC_IATA)

    # Step 4: Generate reference panel
    print()
    generate_reference_panel(faa_df, vega_df, PROBLEMATIC_IATA)

    print()
    print("="*80)
    print("COMPLETE!")
    print("="*80)
    print("\nGenerated files:")
    print("  • pr/coordinate_comparison.md - Comparison table")
    print("  • pr/airports_reference_panel.png - Reference imagery panel")


if __name__ == "__main__":
    main()
airports_reference_panel

Corrects coordinate signs for 8 airports in US Pacific territories:

American Samoa (should be Southern Hemisphere):
- FAQ (Fitiuta): latitude corrected to negative
- PPG (Pago Pago International): latitude corrected to negative
- Z08 (Ofu): latitude corrected to negative

Mariana Islands/Guam (should be Eastern Hemisphere):
- GRO (Rota International): longitude corrected to positive
- GSN (Saipan International): longitude corrected to positive
- TNI (West Tinian): longitude corrected to positive
- TT01 (Pagan Airstrip): longitude corrected to positive
- GUM (Guam International): longitude corrected to positive

These airports were incorrectly placed in opposite hemispheres,
causing significant geographic errors (e.g., showing in Indian Ocean
instead of Pacific).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@dsmedia dsmedia requested a review from domoritz October 31, 2025 02:40
@dsmedia dsmedia added the dataset-content Pull requests that add, modify, or update dataset contents label Oct 31, 2025
@dsmedia dsmedia changed the title Fix: Correct Hemisphere Signs for Pacific Territories Airports (airports.csv) fix: Correct Hemisphere Signs for Pacific Territories Airports (airports.csv) Oct 31, 2025
@dsmedia dsmedia merged commit cde4370 into vega:main Nov 3, 2025
3 of 4 checks passed
@dsmedia dsmedia deleted the fix-pacific-airport-coordinates branch November 3, 2025 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset-content Pull requests that add, modify, or update dataset contents

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant