Why Researchers Need Threads Data
Threads by Meta crossed 320 million monthly active users in 2025 and continues to grow rapidly. For social media researchers, this creates a rich dataset for studying public discourse, content virality, platform migration behavior, and community dynamics.
Published research on Threads data is already appearing in IEEE, IIETA, and other academic venues — covering sentiment analysis, political discourse, and user adoption patterns. But accessing the data at scale remains a challenge.
The Data Access Problem
Academic researchers face several barriers when collecting Threads data:
Official Meta Threads API limitations:
- Requires a Meta business account and OAuth approval (2-6 weeks)
- Limited to your own content — you cannot query other users’ public data
- Rate limits: 250 posts/24h for publishing, 500 keyword searches/7 days
- No bulk export functionality
Web scraping drawbacks:
- Violates Meta Terms of Service
- Requires proxy infrastructure ($50-200/month)
- Breaks frequently when Meta updates their frontend
- Ethical concerns for IRB approval
Meta Content Library:
- Restricted to approved academic institutions
- Limited availability and long approval process
- API access varies by region
A Better Approach: Structured API Access
The thredly API provides programmatic access to public Threads data through a simple REST API. No OAuth, no scraping, no browser automation — just your API key and HTTP requests.
This approach is well-suited for academic research because:
- Structured, consistent data — JSON responses with typed fields
- Repeatable collection — Same query returns same format every time
- Ethical access — Accessing public data through structured endpoints
- Affordable — Free tier (100 requests/month), paid plans from $9/month
Getting Started for Researchers
Prerequisites
- Python 3.8+
requestsandpandaslibraries- A free API key from RapidAPI
pip install requests pandas
Setup
import requests
import pandas as pd
from datetime import datetime
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://threads-api-pro.p.rapidapi.com"
HEADERS = {
"X-RapidAPI-Key": API_KEY,
"X-RapidAPI-Host": "threads-api-pro.p.rapidapi.com"
}
def api_get(endpoint):
"""Reusable API request function with error handling."""
response = requests.get(f"{BASE_URL}{endpoint}", headers=HEADERS)
response.raise_for_status()
result = response.json()
if not result.get("success"):
raise Exception(result.get("error", "Unknown API error"))
return result["data"]
Research Use Case 1: User Profile Analysis
Collect profile data for a set of public figures or influencers:
def collect_profiles(usernames):
"""Collect profile data for multiple users."""
profiles = []
for username in usernames:
try:
data = api_get(f"/api/user/{username}")
profiles.append({
"username": data["username"],
"full_name": data.get("full_name", ""),
"followers": data["follower_count"],
"following": data["following_count"],
"is_verified": data.get("is_verified", False),
"collected_at": datetime.now().isoformat()
})
except Exception as e:
print(f"Error fetching {username}: {e}")
return pd.DataFrame(profiles)
# Example: collect profiles for study participants
subjects = ["zuck", "mosseri", "instagram"]
df = collect_profiles(subjects)
df.to_csv("profiles_dataset.csv", index=False)
Research Use Case 2: Content Collection for Sentiment Analysis
Collect posts from specific users for NLP analysis:
def collect_user_posts(username, save_path=None):
"""Collect all available posts for a user."""
posts = api_get(f"/api/user/{username}/posts")
records = []
for post in posts:
records.append({
"username": username,
"post_id": post.get("id"),
"text": post.get("text", ""),
"like_count": post.get("like_count", 0),
"reply_count": post.get("reply_count", 0),
"created_at": post.get("created_at"),
"engagement": post.get("like_count", 0) + post.get("reply_count", 0)
})
df = pd.DataFrame(records)
if save_path:
df.to_csv(save_path, index=False)
print(f"Saved {len(records)} posts to {save_path}")
return df
This data can feed directly into sentiment analysis pipelines using libraries like TextBlob, VADER, or transformer-based models.
Research Use Case 3: Engagement Pattern Analysis
Study how engagement varies across users, time periods, or content types:
def analyze_engagement(posts_df):
"""Calculate engagement metrics from collected posts."""
if posts_df.empty:
return {}
return {
"total_posts": len(posts_df),
"avg_likes": posts_df["like_count"].mean(),
"avg_replies": posts_df["reply_count"].mean(),
"avg_engagement": posts_df["engagement"].mean(),
"max_engagement_post": posts_df.loc[
posts_df["engagement"].idxmax(), "text"
][:100],
"reply_to_like_ratio": (
posts_df["reply_count"].sum()
/ max(posts_df["like_count"].sum(), 1)
)
}
Research Use Case 4: Longitudinal Data Collection
For studies tracking changes over time, set up periodic collection:
import json
from pathlib import Path
def longitudinal_snapshot(usernames, output_dir="data"):
"""Take a daily snapshot of user metrics."""
Path(output_dir).mkdir(exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d")
snapshot = {}
for username in usernames:
try:
profile = api_get(f"/api/user/{username}")
snapshot[username] = {
"followers": profile["follower_count"],
"following": profile["following_count"],
"timestamp": datetime.now().isoformat()
}
except Exception as e:
print(f"Error: {username} - {e}")
filepath = f"{output_dir}/snapshot_{timestamp}.json"
with open(filepath, "w") as f:
json.dump(snapshot, f, indent=2)
print(f"Snapshot saved: {filepath}")
Run this daily via cron to build longitudinal datasets.
Exporting Data for Analysis Tools
Export to CSV (for Excel, SPSS, Stata)
df.to_csv("threads_dataset.csv", index=False, encoding="utf-8")
Export to JSON (for custom pipelines)
df.to_json("threads_dataset.json", orient="records", indent=2)
Export to Parquet (for large datasets)
df.to_parquet("threads_dataset.parquet", index=False)
Ethical Considerations
When using Threads data for research:
- Collect only public data — thredly only accesses publicly available profiles and posts
- Anonymize when publishing — Remove or hash usernames in published results unless studying public figures
- Follow your institution’s IRB guidelines — Consult your ethics board about social media data collection
- Respect rate limits — Don’t overwhelm the API; plan your collection schedule
- Document your methodology — Record API version, collection dates, and endpoints used for reproducibility
- Data retention — Follow your institution’s data management plan for storage and deletion
Rate Limits for Research Projects
| Plan | Requests/Month | Best For |
|---|---|---|
| Free | 100 | Pilot study, testing methodology |
| Basic ($9) | 10,000 | Small-scale study (50 users, weekly) |
| Pro ($49) | 100,000 | Medium study (500 users, daily) |
| Enterprise ($199) | 1,000,000 | Large-scale study (5,000+ users) |
Citing thredly in Your Research
If you use thredly in published research, we recommend citing it as a data source in your methodology section. Example:
Data was collected using the thredly API (thredly.dev), a REST API providing structured access to public Threads data, between [start date] and [end date].
Next Steps
- Get your API key — Set up in under 5 minutes
- Python tutorial — Detailed Python code examples
- Engagement tracking guide — Build a full engagement tracker
- API reference — All available endpoints