The corpus contains over 4.5 million tweets (tweet IDs) automatically labeled by a machine learning program with stance regarding Brexit: Positive (supporting Brexit), Negative (opposing Brexit), or Neutral (uncommitted).
The Brexit referendum was held on June 23, 2016, to decide whether the UK should leave or remain in the EU. In the weeks before the referendum, starting on May 12, the UK geo-located Brexit-related tweets were continuously collected resulting in a dataset of around 4.5 million (4,508,440) tweets from almost one million (998,054) users. A large sample of the collected tweets (35,000) was manually labeled for the stance of their authors regarding Brexit: Positive (supporting Brexit), Negative (opposing Brexit), or Neutral (uncommitted). The labeled tweets were used to train a classifier which then automatically labeled all the remaining tweets.
The corpus contains tweet ids and stance labels. The tweets are grouped into files one hour per file. In each file, one row represents one entry (twitter_id, sentiment_label). Lines are ordered by the tweet time.
The data collection, annotation, model training and performance estimation is described in detail in:
Miha Grčar, Darko Cherepnalkoski, Igor Mozetič, Petra Kralj Novak: Stance and influence of Twitter users regarding the Brexit referendum. Computational Social Networks 4/6. 2017. http://dx.doi.org/10.1186/s40649-017-0042-6