Unit 3: Spatial & Contextual Analysis

🗺️

Where in the World?
Geomapping Digital Behavior

Learn to convert messy location data into powerful maps. Discover how spatial analysis reveals patterns invisible in other data—and when it becomes surveillance.

📍 Why Location Matters

Location data reveals patterns invisible in other data:

🦠 Public Health

COVID-19 Tracking (2020-2022)

Mapped infection clusters to identify hotspots
Tracked spread between neighborhoods
Targeted interventions to specific zip codes

✊ Social Movements

George Floyd Protests (2020)

Mapped 7,000+ protests across 2,000+ US cities
Showed movement spread from Minneapolis globally
Revealed which cities had most sustained activity

🏙️ Urban Planning

Food Deserts Analysis

Map grocery store locations vs population
Identify neighborhoods without fresh food access
Plan where to open new stores

🗳️ Political Analysis

Election Results Mapping

Visualize voting patterns by district
Show urban vs rural divides
Track demographic shifts over time

Common thread: Location adds a spatial dimension that reveals WHERE things happen and HOW patterns spread across space.

😵 The Challenge: Messy Location Data

Social media location data is a MESS:

Here's what you get when you extract location from 100 tweets:

✅ Clean

"New York, NY"

✅ Usable

"Brooklyn"

✅ GPS

40.7128° N, 74.0060° W

❌ Vague

"East Coast"

❌ Joke

"Mars"

❌ Meme

"In your mom's house"

❌ Abstract

"Everywhere and nowhere"

❌ Missing

[blank]

Reality Check:

Of 100 social media posts with "location":

~30-40% are usable (real places)
~20-30% are vague ("USA", "California")
~30-40% are jokes, fake, or missing

You need to clean this data before mapping.

The Solution:

Geocoding - Converting location strings to standardized latitude/longitude coordinates

🌐 What is Geocoding?

Definition:

Geocoding = Converting a location description (address, city name, landmark) into geographic coordinates (latitude, longitude)

How it works:

Input (Location String)	Geocoding →	Output (Coordinates)
"New York City"	→	40.7128° N, 74.0060° W
"Eiffel Tower, Paris"	→	48.8584° N, 2.2945° E
"1600 Pennsylvania Ave, DC"	→	38.8977° N, 77.0365° W
"Los Angeles, CA"	→	34.0522° N, 118.2437° W

Geocoding Services:

Google Maps Geocoding API

Most accurate, $5 per 1,000 requests after free tier

Nominatim (OpenStreetMap)

Free, open-source, less accurate

Mapbox Geocoding

Good accuracy, 100,000 free requests/month

Why geocode?

Mapping software (Leaflet, Google Maps, Mapbox) needs coordinates, not text.

Geocoding = translating human-readable locations into machine-readable coordinates

🎬 Geocoding Demo

Scenario: You have 100 tweets with location strings. Let's geocode them.

Step 1: Clean Input Data

Raw Location	Cleaned	Valid?
"New York"	"New York, NY, USA"	✅
"Brooklyn"	"Brooklyn, NY, USA"	✅
"Mars"	[filter out]	❌
"USA"	[too vague, skip]	⚠️

Step 2: Geocode with API

Input: "New York, NY, USA"
API Call: Google Maps Geocoding API
Output: {"lat": 40.7128, "lng": -74.0060}

Step 3: Handle Errors

Common Issues:

Ambiguous: "Springfield" (exists in 30+ US states)
Misspelled: "Los Angelos" vs "Los Angeles"
Not found: Fake locations, typos

Result:

Of 100 tweets:

60 successfully geocoded
20 too vague/ambiguous
20 invalid/joke locations

60% success rate is typical for social media data

🎯 CommDAAF Checkpoint: The Privacy Minefield

📊 DISCOVER

How precise is location data?

Example:

City level: "New York, NY" - 8 million people
Neighborhood: "Brooklyn Heights" - 20,000 people
GPS coordinates: 40.7128°, -74.0060° - Exact building

GPS coordinates can pinpoint your home, workplace, school.

Calculate: If someone posts 3 tweets from their home location (GPS coordinates), can you identify them?

🔍 ANALYZE

Real example: Fitness app data leaks

2018: Strava (fitness app) published heatmap of user running routes

Revealed locations of secret military bases (soldiers used the app)
Showed patrol patterns of troops in warzones
All from "anonymized" aggregated data

Even aggregated location data can reveal sensitive information. Analyze: What went wrong?

⚖️ ASSESS

What's the harm of publishing aggregated location data?

Scenario: You're mapping tweet locations for a research paper on climate protests.

Your map shows:

100 tweets from climate protests in 20 cities
Precise GPS coordinates for each tweet

Potential harms:

Law enforcement could identify protesters
Could reveal people's home addresses
Activists in authoritarian countries at risk

🛠️ FORMULATE

Design anonymization rules for geospatial research:

Before publishing a map, you must:

1. ☐ _____________________
2. ☐ _____________________
3. ☐ _____________________

Propose 3 rules to protect privacy while enabling spatial research.

Hint: Think about precision levels, minimum counts, data aggregation

💾 Your responses are saved to your learning journal

🗺️ Three Types of Maps

Once you have geocoded data, how do you visualize it? Three main approaches:

1. 🌈 Choropleth Maps (Color-Coded Regions)

What: Regions (states, counties, zip codes) colored by data value

Best for: Showing aggregated data across regions

Example: COVID cases by state

CA
High

TX
Med

WY
Low

Examples: Election results, COVID rates, median income

2. 📍 Marker Maps (Pins on Map)

What: Individual points (pins/markers) at specific locations

Best for: Showing discrete events or locations

Example: Protest locations

📍 📍 📍 📍 📍

Each pin = one protest

Examples: Store locations, crime incidents, protest events

3. 🔥 Heatmaps (Density Visualization)

What: Color gradient showing concentration/density

Best for: Showing where activity is concentrated

Example: Tweet density in NYC

Red = high density, Yellow = low

Examples: Tweet density, crime hotspots, foot traffic

🌈 Choropleth Maps: When to Use

Perfect for:

Comparing regions (which state has highest X?)
Data that exists at regional level (census data, election results)
Showing patterns across geographic boundaries

Real Example: 2024 Election Results

Data: % of votes for each candidate by state

Map type: Choropleth (color each state by winner)

Insight: See urban vs rural divide, swing states

Limitations:

Can be misleading if regions vary in size/population
Example: Wyoming (small population) looks same size as California
Loses detail within regions (can't see city-level patterns)

How to Create:

1. Get regional data (state, county, zip code level)

↓

2. Choose color scale (green-yellow-red, blue-white-red)

↓

3. Use mapping library (Leaflet.js + GeoJSON)

↓

4. Add legend and labels

📍 Marker Maps: When to Use

Perfect for:

Individual events/locations (protests, crimes, stores)
When exact location matters
Small-to-medium datasets (< 1,000 points)

Real Example: George Floyd Protests (2020)

Data: 7,000+ protests with GPS coordinates

Map type: Marker map (one pin per protest)

Insight: See which cities had most sustained activity

Limitations:

Cluttered if too many points (thousands of markers overlap)
Hard to see density patterns
Performance issues with large datasets

Solution: Use marker clustering (group nearby markers)

Interactive Features:

Click marker → Show popup

Example: Click protest marker → See date, size, demands

Filter markers

Example: Show only protests > 1,000 people

Cluster markers

Example: "15 protests in this area" → zoom to see individuals

🔥 Heatmaps: When to Use

Perfect for:

Showing density/concentration patterns
Large datasets (thousands of points)
Finding hotspots (where activity is highest)

Real Example: Tweet Density During Protests

Data: 100,000 tweets with GPS coordinates

Map type: Heatmap (color gradient by density)

Insight: See which neighborhoods had most social media activity

Advantages:

✅ Handles large datasets (millions of points)
✅ Shows patterns that markers would miss
✅ Visually striking (red = hotspot grabs attention)

Limitations:

❌ Loses individual detail (can't see specific events)
❌ Misleading if not normalized by population
❌ Example: NYC always "hot" just because it has more people

Solution: Normalize by population (tweets per capita, not raw count)

Heatmap Settings:

Radius: How wide each point's "heat" spreads

Intensity: How much each point contributes

Gradient: Color scale (blue→yellow→red)

🔒 Privacy Protection Techniques

The Problem:

Even 3-4 location points can de-anonymize someone.

Research finding: 95% of people can be uniquely identified from 4 spatiotemporal points.

Four Privacy Protection Techniques:

1. K-Anonymity (Minimum Count Rule)

Rule: Only show data if at least K people (usually 5-10) share that location

Example:

Location A: 8 tweets → ✅ Show (K≥5)

Location B: 2 tweets → ❌ Hide (K<5)

Protects: Individual identification

2. Grid Aggregation (Snap to Grid)

Rule: Instead of exact coordinates, group into grid cells (e.g., 1km x 1km)

Example:

Exact: 40.7128°, -74.0060° (specific building)

Grid: "Grid cell 1234" (1km area, ~10,000 people)

Protects: Exact home/work locations

3. Spatial Cloaking (Reduce Precision)

Rule: Round coordinates to fewer decimal places

Example:

Precise: 40.7128456° (±10 meters)

Cloaked: 40.71° (±1 kilometer)

Protects: Precision-based identification

4. Temporal Fuzzing (Blur Timestamps)

Rule: Don't show exact times, round to hour/day/week

Example:

Exact: "2:47 PM, June 15, 2025"

Fuzzed: "Afternoon, June 2025"

Protects: Tracking movement patterns

Best Practice: Combine multiple techniques (K-anonymity + Grid aggregation + Temporal fuzzing)

🎯 CommDAAF Checkpoint: Surveillance or Journalism?

📊 DISCOVER

Research this 2022 case:

2022: Cell Phone Data at Abortion Clinics

After Roe v Wade was overturned, data brokers sold location data showing:

Which phones visited abortion clinics
Where those phones "lived" (home addresses)
Purchased legally from data brokers

Anti-abortion groups used this to identify and target women seeking abortions.

What did you learn about the legal vs ethical boundaries of location data use?

🔍 ANALYZE

Spectrum of geospatial analysis:

Use Case	Ethical?
A: Mapping COVID spread to allocate resources	✅ Public health
B: Mapping protest locations for journalism	⚠️ Depends on anonymization
C: Tracking abortion clinic visitors	❌ Surveillance

When does geospatial analysis cross from research/journalism into surveillance?

⚖️ ASSESS

Design ethical boundaries:

You're a data journalist. Your editor asks you to map:

Scenario A: Locations of political rallies (from public tweets)
Scenario B: Home addresses of rally attendees (from phone location data)

Questions:

Is the data legally obtained? (Yes for both)
Is publishing it ethical?
Could it cause harm?
Is there public interest?

🛠️ FORMULATE

Create ethical boundaries for location data journalism:

Before publishing a map with location data, ask:

1. ☐ _____________________
2. ☐ _____________________
3. ☐ _____________________
4. ☐ _____________________

Propose 4 questions journalists should ask before publishing geospatial data.

💾 Your responses are saved to your learning journal

💪 Interactive Mapping Challenge

Challenge:

You have CSV data with messy location strings. Create a map.

Your Dataset: Climate Protests (2025)

Date	Location	Size
2025-03-15	New York City	10,000
2025-03-15	Los Angeles	5,000
2025-03-20	Brooklyn, NYC	2,000
2025-03-22	San Francisco	3,500
...	...	...

Your Tasks:

Task 1: Geocode locations

Convert "New York City" → coordinates

Task 2: Choose map type

Choropleth? Marker? Heatmap?

Task 3: Apply privacy filters

Use k-anonymity or grid aggregation

Task 4: Generate map

Create interactive map with Leaflet.js

In VineAnalyst: You'd upload CSV → Choose settings → Generate map automatically

For now, this is a conceptual walkthrough.

🎯 Challenge Step 2: Choose Map Type

Your data: 50 climate protests across 30 US cities

Question: Which map type should you use?

Option A: Choropleth (State-Level)

How it would look: Color each state by # of protests

Problem: Loses city-level detail. Can't see NYC vs rural NY.

Verdict: ❌ Wrong choice for this data

Option B: Marker Map

How it would look: One pin per protest

Advantage: See each individual protest location

Interactive: Click pin → See date, size, location

Verdict: ✅ Good choice! (50 protests = manageable)

Option C: Heatmap

How it would look: Color gradient showing density

Advantage: See concentration (e.g., NYC has many protests)

Problem: Loses individual event detail

Verdict: ⚠️ Could work, but markers better for this size

Decision: Marker Map

Why? 50 protests is small enough to show individually, and we want to preserve event-level detail.

🔒 Challenge Step 3: Apply Privacy Filters

Problem:

Some protests have < 5 attendees. Showing exact location could identify individuals.

Privacy Filter Options:

Option A: K-Anonymity (K=5)

Rule: Only show protests with ≥5 attendees

Result:

45 protests shown (≥5 people)

5 protests hidden (<5 people)

Pro: Simple, protects small groups

Con: Loses data on small protests

Option B: Grid Aggregation

Rule: Snap all coordinates to 10km grid

Result:

Instead of exact street address, show "Grid 1234" (10km area)

Pro: Keeps all data, reduces precision

Con: Loses neighborhood-level detail

Option C: Hybrid (K-Anonymity + Spatial Cloaking)

Rules:

Show protests ≥10 people: Exact location
Show protests 5-9 people: Cloaked to city level
Hide protests <5 people

Pro: Balances detail with privacy

Con: More complex

Decision: Hybrid Approach

Shows most data while protecting small groups.

🎯 CommDAAF Checkpoint: Digital Redlining

📊 DISCOVER

Research historical context:

1930s-1960s: Redlining

Banks drew red lines on maps around Black neighborhoods, denying them mortgages.

Government maps literally color-coded neighborhoods:

Green: "Best" (white, wealthy)
Red: "Hazardous" (Black, immigrant)

Result: Systemic wealth inequality that persists today.

How did maps encode discrimination?

🔍 ANALYZE

Modern digital redlining examples:

Example 1: Uber Surge Pricing

Research found higher surge pricing in predominantly Black neighborhoods, even with same demand.

Example 2: Food Delivery Zones

DoorDash, Uber Eats exclude certain zip codes (often low-income, minority).

Example 3: Broadband Access

ISPs map "unprofitable" areas (often redlined neighborhoods) and don't invest in infrastructure.

How do location-based algorithms perpetuate historical discrimination?

⚖️ ASSESS

Can geospatial analysis perpetuate discrimination?

Consider:

Using zip code as proxy for race/income
Maps that show "crime hotspots" → reinforce stereotypes
Redlining 2.0: Algorithms draw new lines on maps

When does mapping reveal inequality vs. perpetuate it?

🛠️ FORMULATE

Design location-aware systems that don't discriminate:

If you're building a location-based service (ride-sharing, delivery, etc.), how do you:

Ensure equitable pricing across neighborhoods?
Avoid excluding low-income areas?
Audit for discriminatory patterns?
Make decisions transparent?

Propose 3 principles for equitable geospatial systems.

💾 Your responses are saved to your learning journal

🎓 Key Takeaways

What You Learned:

✅ Geocoding

Converting messy location strings → standardized coordinates. 60% success rate typical for social media data.

✅ Three Map Types

Choropleth: Color-coded regions (election results, COVID rates)
Marker: Pins for individual events (protests, crimes)
Heatmap: Density visualization (tweet concentration, foot traffic)

✅ Privacy Protection

K-anonymity: Minimum count rule (≥5 people)
Grid aggregation: Snap to grid cells
Spatial cloaking: Reduce precision
Temporal fuzzing: Blur timestamps

✅ Real-World Applications

Public health (COVID tracking), social movements (protest mapping), urban planning (food deserts), journalism (election analysis)

Critical Thinking:

⚠️ Privacy Minefield

3-4 location points can de-anonymize 95% of people. Always apply privacy protections.

⚠️ Surveillance vs. Journalism

Legal ≠ ethical. Tracking abortion clinic visitors is legal but harmful.

⚠️ Digital Redlining

Location-based algorithms can perpetuate historical discrimination (Uber surge pricing, delivery zones).

The Golden Rule: With great spatial data comes great responsibility. Map thoughtfully.

🏆

Lesson Complete!

0 XP

🗺️ Badge Unlocked: "Map Master"

You've mastered geospatial analysis and understand the ethical boundaries of mapping digital behavior.

Where in the World?Geomapping Digital Behavior

📍 Why Location Matters

🦠 Public Health

✊ Social Movements

🏙️ Urban Planning

🗳️ Political Analysis

😵 The Challenge: Messy Location Data

Social media location data is a MESS:

Reality Check:

The Solution:

🌐 What is Geocoding?

Definition:

How it works:

Geocoding Services:

Why geocode?

🎬 Geocoding Demo

Step 1: Clean Input Data

Step 2: Geocode with API

Step 3: Handle Errors

Result:

🎯 CommDAAF Checkpoint: The Privacy Minefield

📊 DISCOVER

🔍 ANALYZE

⚖️ ASSESS

🛠️ FORMULATE

🗺️ Three Types of Maps

1. 🌈 Choropleth Maps (Color-Coded Regions)

2. 📍 Marker Maps (Pins on Map)

3. 🔥 Heatmaps (Density Visualization)

🌈 Choropleth Maps: When to Use

Perfect for:

Real Example: 2024 Election Results

Limitations:

How to Create:

📍 Marker Maps: When to Use

Perfect for:

Real Example: George Floyd Protests (2020)

Limitations:

Interactive Features:

🔥 Heatmaps: When to Use

Perfect for:

Real Example: Tweet Density During Protests

Advantages:

Limitations:

Heatmap Settings:

🔒 Privacy Protection Techniques

The Problem:

Four Privacy Protection Techniques:

1. K-Anonymity (Minimum Count Rule)

2. Grid Aggregation (Snap to Grid)

3. Spatial Cloaking (Reduce Precision)

4. Temporal Fuzzing (Blur Timestamps)

🎯 CommDAAF Checkpoint: Surveillance or Journalism?

📊 DISCOVER

🔍 ANALYZE

⚖️ ASSESS

🛠️ FORMULATE

💪 Interactive Mapping Challenge

Challenge:

Your Dataset: Climate Protests (2025)

Your Tasks:

🎯 Challenge Step 2: Choose Map Type

Option A: Choropleth (State-Level)

Option B: Marker Map

Option C: Heatmap

Decision: Marker Map

🔒 Challenge Step 3: Apply Privacy Filters

Problem:

Privacy Filter Options:

Option A: K-Anonymity (K=5)

Option B: Grid Aggregation

Option C: Hybrid (K-Anonymity + Spatial Cloaking)

Decision: Hybrid Approach

🎯 CommDAAF Checkpoint: Digital Redlining

📊 DISCOVER

🔍 ANALYZE

⚖️ ASSESS

🛠️ FORMULATE

🎓 Key Takeaways

What You Learned:

Where in the World?
Geomapping Digital Behavior