Back to Blog

Using the Threads API for Academic Research: A Practical Guide

· 5 min read

Why Researchers Need Threads Data

Threads by Meta crossed 320 million monthly active users in 2025 and continues to grow rapidly. For social media researchers, this creates a rich dataset for studying public discourse, content virality, platform migration behavior, and community dynamics.

Published research on Threads data is already appearing in IEEE, IIETA, and other academic venues — covering sentiment analysis, political discourse, and user adoption patterns. But accessing the data at scale remains a challenge.

The Data Access Problem

Academic researchers face several barriers when collecting Threads data:

Official Meta Threads API limitations:

  • Requires a Meta business account and OAuth approval (2-6 weeks)
  • Limited to your own content — you cannot query other users’ public data
  • Rate limits: 250 posts/24h for publishing, 500 keyword searches/7 days
  • No bulk export functionality

Web scraping drawbacks:

  • Violates Meta Terms of Service
  • Requires proxy infrastructure ($50-200/month)
  • Breaks frequently when Meta updates their frontend
  • Ethical concerns for IRB approval

Meta Content Library:

  • Restricted to approved academic institutions
  • Limited availability and long approval process
  • API access varies by region

A Better Approach: Structured API Access

The thredly API provides programmatic access to public Threads data through a simple REST API. No OAuth, no scraping, no browser automation — just your API key and HTTP requests.

This approach is well-suited for academic research because:

  • Structured, consistent data — JSON responses with typed fields
  • Repeatable collection — Same query returns same format every time
  • Ethical access — Accessing public data through structured endpoints
  • Affordable — Free tier (100 requests/month), paid plans from $9/month

Getting Started for Researchers

Prerequisites

  • Python 3.8+
  • requests and pandas libraries
  • A free API key from RapidAPI
pip install requests pandas

Setup

import requests
import pandas as pd
from datetime import datetime

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://threads-api-pro.p.rapidapi.com"
HEADERS = {
    "X-RapidAPI-Key": API_KEY,
    "X-RapidAPI-Host": "threads-api-pro.p.rapidapi.com"
}

def api_get(endpoint):
    """Reusable API request function with error handling."""
    response = requests.get(f"{BASE_URL}{endpoint}", headers=HEADERS)
    response.raise_for_status()
    result = response.json()
    if not result.get("success"):
        raise Exception(result.get("error", "Unknown API error"))
    return result["data"]

Research Use Case 1: User Profile Analysis

Collect profile data for a set of public figures or influencers:

def collect_profiles(usernames):
    """Collect profile data for multiple users."""
    profiles = []
    for username in usernames:
        try:
            data = api_get(f"/api/user/{username}")
            profiles.append({
                "username": data["username"],
                "full_name": data.get("full_name", ""),
                "followers": data["follower_count"],
                "following": data["following_count"],
                "is_verified": data.get("is_verified", False),
                "collected_at": datetime.now().isoformat()
            })
        except Exception as e:
            print(f"Error fetching {username}: {e}")
    return pd.DataFrame(profiles)

# Example: collect profiles for study participants
subjects = ["zuck", "mosseri", "instagram"]
df = collect_profiles(subjects)
df.to_csv("profiles_dataset.csv", index=False)

Research Use Case 2: Content Collection for Sentiment Analysis

Collect posts from specific users for NLP analysis:

def collect_user_posts(username, save_path=None):
    """Collect all available posts for a user."""
    posts = api_get(f"/api/user/{username}/posts")

    records = []
    for post in posts:
        records.append({
            "username": username,
            "post_id": post.get("id"),
            "text": post.get("text", ""),
            "like_count": post.get("like_count", 0),
            "reply_count": post.get("reply_count", 0),
            "created_at": post.get("created_at"),
            "engagement": post.get("like_count", 0) + post.get("reply_count", 0)
        })

    df = pd.DataFrame(records)

    if save_path:
        df.to_csv(save_path, index=False)
        print(f"Saved {len(records)} posts to {save_path}")

    return df

This data can feed directly into sentiment analysis pipelines using libraries like TextBlob, VADER, or transformer-based models.

Research Use Case 3: Engagement Pattern Analysis

Study how engagement varies across users, time periods, or content types:

def analyze_engagement(posts_df):
    """Calculate engagement metrics from collected posts."""
    if posts_df.empty:
        return {}

    return {
        "total_posts": len(posts_df),
        "avg_likes": posts_df["like_count"].mean(),
        "avg_replies": posts_df["reply_count"].mean(),
        "avg_engagement": posts_df["engagement"].mean(),
        "max_engagement_post": posts_df.loc[
            posts_df["engagement"].idxmax(), "text"
        ][:100],
        "reply_to_like_ratio": (
            posts_df["reply_count"].sum()
            / max(posts_df["like_count"].sum(), 1)
        )
    }

Research Use Case 4: Longitudinal Data Collection

For studies tracking changes over time, set up periodic collection:

import json
from pathlib import Path

def longitudinal_snapshot(usernames, output_dir="data"):
    """Take a daily snapshot of user metrics."""
    Path(output_dir).mkdir(exist_ok=True)
    timestamp = datetime.now().strftime("%Y%m%d")

    snapshot = {}
    for username in usernames:
        try:
            profile = api_get(f"/api/user/{username}")
            snapshot[username] = {
                "followers": profile["follower_count"],
                "following": profile["following_count"],
                "timestamp": datetime.now().isoformat()
            }
        except Exception as e:
            print(f"Error: {username} - {e}")

    filepath = f"{output_dir}/snapshot_{timestamp}.json"
    with open(filepath, "w") as f:
        json.dump(snapshot, f, indent=2)

    print(f"Snapshot saved: {filepath}")

Run this daily via cron to build longitudinal datasets.

Exporting Data for Analysis Tools

Export to CSV (for Excel, SPSS, Stata)

df.to_csv("threads_dataset.csv", index=False, encoding="utf-8")

Export to JSON (for custom pipelines)

df.to_json("threads_dataset.json", orient="records", indent=2)

Export to Parquet (for large datasets)

df.to_parquet("threads_dataset.parquet", index=False)

Ethical Considerations

When using Threads data for research:

  • Collect only public data — thredly only accesses publicly available profiles and posts
  • Anonymize when publishing — Remove or hash usernames in published results unless studying public figures
  • Follow your institution’s IRB guidelines — Consult your ethics board about social media data collection
  • Respect rate limits — Don’t overwhelm the API; plan your collection schedule
  • Document your methodology — Record API version, collection dates, and endpoints used for reproducibility
  • Data retention — Follow your institution’s data management plan for storage and deletion

Rate Limits for Research Projects

PlanRequests/MonthBest For
Free100Pilot study, testing methodology
Basic ($9)10,000Small-scale study (50 users, weekly)
Pro ($49)100,000Medium study (500 users, daily)
Enterprise ($199)1,000,000Large-scale study (5,000+ users)

Citing thredly in Your Research

If you use thredly in published research, we recommend citing it as a data source in your methodology section. Example:

Data was collected using the thredly API (thredly.dev), a REST API providing structured access to public Threads data, between [start date] and [end date].

Next Steps