← Dashboard
πŸ—ΊοΈ Where in the World? - Geomapping
πŸ† 0 XP
Unit 3: Spatial & Contextual Analysis
πŸ—ΊοΈ

Where in the World?
Geomapping Digital Behavior

Learn to convert messy location data into powerful maps. Discover how spatial analysis reveals patterns invisible in other dataβ€”and when it becomes surveillance.

πŸ“ Why Location Matters

Location data reveals patterns invisible in other data:

🦠 Public Health

COVID-19 Tracking (2020-2022)

  • Mapped infection clusters to identify hotspots
  • Tracked spread between neighborhoods
  • Targeted interventions to specific zip codes

✊ Social Movements

George Floyd Protests (2020)

  • Mapped 7,000+ protests across 2,000+ US cities
  • Showed movement spread from Minneapolis globally
  • Revealed which cities had most sustained activity

πŸ™οΈ Urban Planning

Food Deserts Analysis

  • Map grocery store locations vs population
  • Identify neighborhoods without fresh food access
  • Plan where to open new stores

πŸ—³οΈ Political Analysis

Election Results Mapping

  • Visualize voting patterns by district
  • Show urban vs rural divides
  • Track demographic shifts over time

Common thread: Location adds a spatial dimension that reveals WHERE things happen and HOW patterns spread across space.

😡 The Challenge: Messy Location Data

Social media location data is a MESS:

Here's what you get when you extract location from 100 tweets:

βœ… Clean

"New York, NY"

βœ… Usable

"Brooklyn"

βœ… GPS

40.7128Β° N, 74.0060Β° W

❌ Vague

"East Coast"

❌ Joke

"Mars"

❌ Meme

"In your mom's house"

❌ Abstract

"Everywhere and nowhere"

❌ Missing

[blank]

Reality Check:

Of 100 social media posts with "location":

  • ~30-40% are usable (real places)
  • ~20-30% are vague ("USA", "California")
  • ~30-40% are jokes, fake, or missing

You need to clean this data before mapping.

The Solution:

Geocoding - Converting location strings to standardized latitude/longitude coordinates

🌐 What is Geocoding?

Definition:

Geocoding = Converting a location description (address, city name, landmark) into geographic coordinates (latitude, longitude)

How it works:

Input (Location String) Geocoding β†’ Output (Coordinates)
"New York City" β†’ 40.7128Β° N, 74.0060Β° W
"Eiffel Tower, Paris" β†’ 48.8584Β° N, 2.2945Β° E
"1600 Pennsylvania Ave, DC" β†’ 38.8977Β° N, 77.0365Β° W
"Los Angeles, CA" β†’ 34.0522Β° N, 118.2437Β° W

Geocoding Services:

Google Maps Geocoding API

Most accurate, $5 per 1,000 requests after free tier

Nominatim (OpenStreetMap)

Free, open-source, less accurate

Mapbox Geocoding

Good accuracy, 100,000 free requests/month

Why geocode?

Mapping software (Leaflet, Google Maps, Mapbox) needs coordinates, not text.

Geocoding = translating human-readable locations into machine-readable coordinates

🎬 Geocoding Demo

Scenario: You have 100 tweets with location strings. Let's geocode them.

Step 1: Clean Input Data

Raw Location Cleaned Valid?
"New York" "New York, NY, USA" βœ…
"Brooklyn" "Brooklyn, NY, USA" βœ…
"Mars" [filter out] ❌
"USA" [too vague, skip] ⚠️

Step 2: Geocode with API

Input: "New York, NY, USA"
API Call: Google Maps Geocoding API
Output: {"lat": 40.7128, "lng": -74.0060}

Step 3: Handle Errors

Common Issues:

  • Ambiguous: "Springfield" (exists in 30+ US states)
  • Misspelled: "Los Angelos" vs "Los Angeles"
  • Not found: Fake locations, typos

Result:

Of 100 tweets:

  • 60 successfully geocoded
  • 20 too vague/ambiguous
  • 20 invalid/joke locations

60% success rate is typical for social media data

🎯 CommDAAF Checkpoint: The Privacy Minefield

πŸ“Š DISCOVER

How precise is location data?

Example:

  • City level: "New York, NY" - 8 million people
  • Neighborhood: "Brooklyn Heights" - 20,000 people
  • GPS coordinates: 40.7128Β°, -74.0060Β° - Exact building

GPS coordinates can pinpoint your home, workplace, school.

Calculate: If someone posts 3 tweets from their home location (GPS coordinates), can you identify them?

πŸ” ANALYZE

Real example: Fitness app data leaks

2018: Strava (fitness app) published heatmap of user running routes

  • Revealed locations of secret military bases (soldiers used the app)
  • Showed patrol patterns of troops in warzones
  • All from "anonymized" aggregated data

Even aggregated location data can reveal sensitive information. Analyze: What went wrong?

βš–οΈ ASSESS

What's the harm of publishing aggregated location data?

Scenario: You're mapping tweet locations for a research paper on climate protests.

Your map shows:

  • 100 tweets from climate protests in 20 cities
  • Precise GPS coordinates for each tweet

Potential harms:

  • Law enforcement could identify protesters
  • Could reveal people's home addresses
  • Activists in authoritarian countries at risk

πŸ› οΈ FORMULATE

Design anonymization rules for geospatial research:

Before publishing a map, you must:

  • 1. ☐ _____________________
  • 2. ☐ _____________________
  • 3. ☐ _____________________

Propose 3 rules to protect privacy while enabling spatial research.

Hint: Think about precision levels, minimum counts, data aggregation

πŸ’Ύ Your responses are saved to your learning journal

πŸ—ΊοΈ Three Types of Maps

Once you have geocoded data, how do you visualize it? Three main approaches:

1. 🌈 Choropleth Maps (Color-Coded Regions)

What: Regions (states, counties, zip codes) colored by data value

Best for: Showing aggregated data across regions

Example: COVID cases by state

CA
High
TX
Med
WY
Low

Examples: Election results, COVID rates, median income

2. πŸ“ Marker Maps (Pins on Map)

What: Individual points (pins/markers) at specific locations

Best for: Showing discrete events or locations

Example: Protest locations

πŸ“ πŸ“ πŸ“ πŸ“ πŸ“

Each pin = one protest

Examples: Store locations, crime incidents, protest events

3. πŸ”₯ Heatmaps (Density Visualization)

What: Color gradient showing concentration/density

Best for: Showing where activity is concentrated

Example: Tweet density in NYC

Red = high density, Yellow = low

Examples: Tweet density, crime hotspots, foot traffic

🌈 Choropleth Maps: When to Use

Perfect for:

  • Comparing regions (which state has highest X?)
  • Data that exists at regional level (census data, election results)
  • Showing patterns across geographic boundaries

Real Example: 2024 Election Results

Data: % of votes for each candidate by state

Map type: Choropleth (color each state by winner)

Insight: See urban vs rural divide, swing states

Limitations:

  • Can be misleading if regions vary in size/population
  • Example: Wyoming (small population) looks same size as California
  • Loses detail within regions (can't see city-level patterns)

How to Create:

1. Get regional data (state, county, zip code level)
↓
2. Choose color scale (green-yellow-red, blue-white-red)
↓
3. Use mapping library (Leaflet.js + GeoJSON)
↓
4. Add legend and labels

πŸ“ Marker Maps: When to Use

Perfect for:

  • Individual events/locations (protests, crimes, stores)
  • When exact location matters
  • Small-to-medium datasets (< 1,000 points)

Real Example: George Floyd Protests (2020)

Data: 7,000+ protests with GPS coordinates

Map type: Marker map (one pin per protest)

Insight: See which cities had most sustained activity

Limitations:

  • Cluttered if too many points (thousands of markers overlap)
  • Hard to see density patterns
  • Performance issues with large datasets

Solution: Use marker clustering (group nearby markers)

Interactive Features:

Click marker β†’ Show popup

Example: Click protest marker β†’ See date, size, demands

Filter markers

Example: Show only protests > 1,000 people

Cluster markers

Example: "15 protests in this area" β†’ zoom to see individuals

πŸ”₯ Heatmaps: When to Use

Perfect for:

  • Showing density/concentration patterns
  • Large datasets (thousands of points)
  • Finding hotspots (where activity is highest)

Real Example: Tweet Density During Protests

Data: 100,000 tweets with GPS coordinates

Map type: Heatmap (color gradient by density)

Insight: See which neighborhoods had most social media activity

Advantages:

  • βœ… Handles large datasets (millions of points)
  • βœ… Shows patterns that markers would miss
  • βœ… Visually striking (red = hotspot grabs attention)

Limitations:

  • ❌ Loses individual detail (can't see specific events)
  • ❌ Misleading if not normalized by population
  • ❌ Example: NYC always "hot" just because it has more people

Solution: Normalize by population (tweets per capita, not raw count)

Heatmap Settings:

Radius: How wide each point's "heat" spreads
Intensity: How much each point contributes
Gradient: Color scale (blue→yellow→red)

πŸ”’ Privacy Protection Techniques

The Problem:

Even 3-4 location points can de-anonymize someone.

Research finding: 95% of people can be uniquely identified from 4 spatiotemporal points.

Four Privacy Protection Techniques:

1. K-Anonymity (Minimum Count Rule)

Rule: Only show data if at least K people (usually 5-10) share that location

Example:

Location A: 8 tweets β†’ βœ… Show (Kβ‰₯5)

Location B: 2 tweets β†’ ❌ Hide (K<5)

Protects: Individual identification

2. Grid Aggregation (Snap to Grid)

Rule: Instead of exact coordinates, group into grid cells (e.g., 1km x 1km)

Example:

Exact: 40.7128Β°, -74.0060Β° (specific building)

Grid: "Grid cell 1234" (1km area, ~10,000 people)

Protects: Exact home/work locations

3. Spatial Cloaking (Reduce Precision)

Rule: Round coordinates to fewer decimal places

Example:

Precise: 40.7128456Β° (Β±10 meters)

Cloaked: 40.71Β° (Β±1 kilometer)

Protects: Precision-based identification

4. Temporal Fuzzing (Blur Timestamps)

Rule: Don't show exact times, round to hour/day/week

Example:

Exact: "2:47 PM, June 15, 2025"

Fuzzed: "Afternoon, June 2025"

Protects: Tracking movement patterns

Best Practice: Combine multiple techniques (K-anonymity + Grid aggregation + Temporal fuzzing)

🎯 CommDAAF Checkpoint: Surveillance or Journalism?

πŸ“Š DISCOVER

Research this 2022 case:

2022: Cell Phone Data at Abortion Clinics

After Roe v Wade was overturned, data brokers sold location data showing:

  • Which phones visited abortion clinics
  • Where those phones "lived" (home addresses)
  • Purchased legally from data brokers

Anti-abortion groups used this to identify and target women seeking abortions.

What did you learn about the legal vs ethical boundaries of location data use?

πŸ” ANALYZE

Spectrum of geospatial analysis:

Use Case Ethical?
A: Mapping COVID spread to allocate resources βœ… Public health
B: Mapping protest locations for journalism ⚠️ Depends on anonymization
C: Tracking abortion clinic visitors ❌ Surveillance

When does geospatial analysis cross from research/journalism into surveillance?

βš–οΈ ASSESS

Design ethical boundaries:

You're a data journalist. Your editor asks you to map:

  • Scenario A: Locations of political rallies (from public tweets)
  • Scenario B: Home addresses of rally attendees (from phone location data)

Questions:

  • Is the data legally obtained? (Yes for both)
  • Is publishing it ethical?
  • Could it cause harm?
  • Is there public interest?

πŸ› οΈ FORMULATE

Create ethical boundaries for location data journalism:

Before publishing a map with location data, ask:

  • 1. ☐ _____________________
  • 2. ☐ _____________________
  • 3. ☐ _____________________
  • 4. ☐ _____________________

Propose 4 questions journalists should ask before publishing geospatial data.

πŸ’Ύ Your responses are saved to your learning journal

πŸ’ͺ Interactive Mapping Challenge

Challenge:

You have CSV data with messy location strings. Create a map.

Your Dataset: Climate Protests (2025)

Date Location Size
2025-03-15 New York City 10,000
2025-03-15 Los Angeles 5,000
2025-03-20 Brooklyn, NYC 2,000
2025-03-22 San Francisco 3,500
... ... ...

Your Tasks:

Task 1: Geocode locations

Convert "New York City" β†’ coordinates

Task 2: Choose map type

Choropleth? Marker? Heatmap?

Task 3: Apply privacy filters

Use k-anonymity or grid aggregation

Task 4: Generate map

Create interactive map with Leaflet.js

In VineAnalyst: You'd upload CSV β†’ Choose settings β†’ Generate map automatically

For now, this is a conceptual walkthrough.

🎯 Challenge Step 2: Choose Map Type

Your data: 50 climate protests across 30 US cities

Question: Which map type should you use?

Option A: Choropleth (State-Level)

How it would look: Color each state by # of protests

Problem: Loses city-level detail. Can't see NYC vs rural NY.

Verdict: ❌ Wrong choice for this data

Option B: Marker Map

How it would look: One pin per protest

Advantage: See each individual protest location

Interactive: Click pin β†’ See date, size, location

Verdict: βœ… Good choice! (50 protests = manageable)

Option C: Heatmap

How it would look: Color gradient showing density

Advantage: See concentration (e.g., NYC has many protests)

Problem: Loses individual event detail

Verdict: ⚠️ Could work, but markers better for this size

Decision: Marker Map

Why? 50 protests is small enough to show individually, and we want to preserve event-level detail.

πŸ”’ Challenge Step 3: Apply Privacy Filters

Problem:

Some protests have < 5 attendees. Showing exact location could identify individuals.

Privacy Filter Options:

Option A: K-Anonymity (K=5)

Rule: Only show protests with β‰₯5 attendees

Result:

45 protests shown (β‰₯5 people)

5 protests hidden (<5 people)

Pro: Simple, protects small groups

Con: Loses data on small protests

Option B: Grid Aggregation

Rule: Snap all coordinates to 10km grid

Result:

Instead of exact street address, show "Grid 1234" (10km area)

Pro: Keeps all data, reduces precision

Con: Loses neighborhood-level detail

Option C: Hybrid (K-Anonymity + Spatial Cloaking)

Rules:

  • Show protests β‰₯10 people: Exact location
  • Show protests 5-9 people: Cloaked to city level
  • Hide protests <5 people

Pro: Balances detail with privacy

Con: More complex

Decision: Hybrid Approach

Shows most data while protecting small groups.

🎯 CommDAAF Checkpoint: Digital Redlining

πŸ“Š DISCOVER

Research historical context:

1930s-1960s: Redlining

Banks drew red lines on maps around Black neighborhoods, denying them mortgages.

Government maps literally color-coded neighborhoods:

  • Green: "Best" (white, wealthy)
  • Red: "Hazardous" (Black, immigrant)

Result: Systemic wealth inequality that persists today.

How did maps encode discrimination?

πŸ” ANALYZE

Modern digital redlining examples:

Example 1: Uber Surge Pricing

Research found higher surge pricing in predominantly Black neighborhoods, even with same demand.

Example 2: Food Delivery Zones

DoorDash, Uber Eats exclude certain zip codes (often low-income, minority).

Example 3: Broadband Access

ISPs map "unprofitable" areas (often redlined neighborhoods) and don't invest in infrastructure.

How do location-based algorithms perpetuate historical discrimination?

βš–οΈ ASSESS

Can geospatial analysis perpetuate discrimination?

Consider:

  • Using zip code as proxy for race/income
  • Maps that show "crime hotspots" β†’ reinforce stereotypes
  • Redlining 2.0: Algorithms draw new lines on maps

When does mapping reveal inequality vs. perpetuate it?

πŸ› οΈ FORMULATE

Design location-aware systems that don't discriminate:

If you're building a location-based service (ride-sharing, delivery, etc.), how do you:

  • Ensure equitable pricing across neighborhoods?
  • Avoid excluding low-income areas?
  • Audit for discriminatory patterns?
  • Make decisions transparent?

Propose 3 principles for equitable geospatial systems.

πŸ’Ύ Your responses are saved to your learning journal

πŸŽ“ Key Takeaways

What You Learned:

βœ… Geocoding

Converting messy location strings β†’ standardized coordinates. 60% success rate typical for social media data.

βœ… Three Map Types

  • Choropleth: Color-coded regions (election results, COVID rates)
  • Marker: Pins for individual events (protests, crimes)
  • Heatmap: Density visualization (tweet concentration, foot traffic)

βœ… Privacy Protection

  • K-anonymity: Minimum count rule (β‰₯5 people)
  • Grid aggregation: Snap to grid cells
  • Spatial cloaking: Reduce precision
  • Temporal fuzzing: Blur timestamps

βœ… Real-World Applications

Public health (COVID tracking), social movements (protest mapping), urban planning (food deserts), journalism (election analysis)

Critical Thinking:

⚠️ Privacy Minefield

3-4 location points can de-anonymize 95% of people. Always apply privacy protections.

⚠️ Surveillance vs. Journalism

Legal β‰  ethical. Tracking abortion clinic visitors is legal but harmful.

⚠️ Digital Redlining

Location-based algorithms can perpetuate historical discrimination (Uber surge pricing, delivery zones).

The Golden Rule: With great spatial data comes great responsibility. Map thoughtfully.

πŸ†

Lesson Complete!

0 XP
πŸ—ΊοΈ Badge Unlocked: "Map Master"

You've mastered geospatial analysis and understand the ethical boundaries of mapping digital behavior.