Scripts for creating a directed network of Twitter followers.
These scripts work in three parts.
- Taking an initial user, download information about who they follow. Repeat recursively until depth d.
- Generate the files
edges.csvandvertices.csvrequired to create the directed graph. - Create and visualize the network.
- Tweepy
- NetworkX
- Jupyter Notebooks
- Choose the Twitter user where the program will start e.g.
@NoticiasCaracol. - Decide what recursive depth you want to go. A depth of 1 or 2 should be done in a few hours (depending on how many people they are following), a depth of 5 can take several days.
python GetFollowing.py -s NoticiasCaracol -d 2
This will generate a directory structure like
.
├── following
│ ├── NoticiasCaracol.csv
│ └── Bogota.csv
└── twitter-users
├── 17813487.json
└── 57664761.json
The following directory contains a file for every Twitter user name searched. Each file is a .csv file showing who they are following.
The twitter-users directory contains a .json representation of every user. The file name is their Twitter ID.
This script parses the .csv files and creates two new .csv which contains the elements of the network.
python GenerateNetwork.py -s NoticiasCaracol
The file edges.csv contains the network's edges delimited by commas:
source,target
17813487,57664761
Column 1 is the Twitter ID of a User. Column 2 is the ID of a User they follow.
The file vertices.csv inside the directory network-data/ contains the network's vertices delimited by commas:
id,screen_name,followers,friends
17813487,NoticiasCaracol,8321076,1287
Column 1 is the Twitter ID. Column 2 is the screen name. Column 3 has the number of followers of the user. Column 4 has the number of users followed by this user.
This script gets up to 3200 of each user's latest tweets.
python GetStatuses.py
Results are saved in the statuses.csv file inside of the tweets/ directory.
user id,created_at,text,favorite_count,retweet_count,phone,sensitive,hashtags,no_hashtags,mentions,no_mentions,no_urls,no_media
The columns are:
- ID of the user who wrote the tweet
- created_at: Date the tweet was created
- text: Extended text of the tweet
- favorite_count: Number of times the tweet has received a 'favorite' from a user
- retweet_count: Number of times the tweet has been retweeted
- phone: Device from which the tweet was posted
- sensitive: Indicator that an URL contained in the Tweet may contain content or media identified as sensitive content
- hashtags: List of all the hashtags in the tweet
- no_hashtags: Number of hashtags in the tweet
- mentions: List of IDs of the users mentioned in the tweet
- no_mentions: Number of mentions in the tweet
- no_urls: Number of URLs in the tweet
- no_media: Number of media elements in the tweet
Run jupyter notebooks with the command:
jupyter notebook
Open the Graph.ipynb notebook and run it.
