Introduction

Applied Data Analysis Project

In the 2011 Arab Spring, the two most significant revolutions happened in Tunisia and Egypt. 7 years later, we aim to revisit the most recurrent demands expressed by the people on social media. Instead of fishing for pre-defined topics or keywords, we want to let the data directly reflect popular concerns. Here, we want to tell the story of the people behind the revolutions from their own perspective. The motivation behind this project is to bring you closer to the story to understand what drove Tunisians and Egyptians to the streets. With this purpose on mind, we will take a meticulous look into the their expressed demands and concerns. Furthermore, we will analyse how they relate to each other.

The dataset used extends from January 13th to February 14th 2011, roughly covering the time period in between the Tunisian presidential resignation and the Egyptian one. With this analysis, we will take a look into how one uprising led to the second. We will will see how the Tunisian revolution bring about the Egyptian one. To accomplish this, we will dive deeper into the data, by studying people’s behaviour in Social Media, as well as the taking a careful look into the News and Web Blogs we have at our disposal.

As conclusion, we will observe how accurately the news represent popular demands and concerns. Such analysis pretends to explore how different information sources describe the same context. Since there are noticeable vocabulary differences between distinct information sources, we investigate this further using topic modelling and text mining techniques according to: Post Category, Language, Country.
Our observations and answers to all this questions will be presented as text descriptions for the combinations of parameters the user chooses to explore. This considers that there is not a single answer for all the questions we pretend to answer, but instead, there are several specific considerations to be made according to each case.

Learn More

Dataset

Applied Data Analysis Project

The dataset we used in this work is the ICWSM 2011 Spinn3r dataset . This is a huge size dataset (~3TB), and contains much information that is not of interest to our analysis. Therefore, the definition of a robust filtering function to collect meaningful data for our analysis is of extreme importance. With the purpose of filtering the data in an unbiased way, we use keyphrases to filter this data based on the Tunisian and Egyptian revolutions topics. This way, we are not imposing any bias from the beginning, since any file is considered, as long as it is related with the thematic we are working on. Using Wikipedia articles for the Tunisian and Egyptian revolutions, we automatically extract the keyphrases that will be used for filtering.

A good insight is provided by evaluating data not only in English, but in two additional languages: French, Tunisia's colonial-era and second most popular language, and Arabic, the native language of both the peoples of Tunisia and Egypt. Such analysis will enable the obstervation interesting particularities, depending on which language the informations is inferred from. The dataset contains posts from 4 different sources. Those are Social Media, Web blog, News, Forums. This diversity plays an important role in the following analysis, since different sources contain different types of content. The possession of date records in the dataset opens a great opportunity to follow on with a analysis of how topics vary as time goes by.

Learn More

TUNISIA

STATISTICS
Learn More

EGYPT

STATISTICS
Learn More

Topics Detection

Select a Country Select a Language Select a Type Select a Visualization

    Common Topics

  • The Domino Effect: how Tunisia’s Revolution brought about the Egyptian one, references both countries.
  • Religion: the Egyptian protests reached their maximum after Friday prayers.
  • Presidential Resignation: the vocabulary includes names of politicians and names of countries related to the events.
  • Popular Concerns:this topic expresses demands including democracy, or concerns such as poverty and corruption.
  • Noise: outliers not providing relevant information.
  • Protests: mentions activities around the revolution.
  • International Reactions: it mentions countries outside the MENA region reacting to the Arab Spring.
Analysis : In Social Media, we see that one of the clusters clearly expresses Popular Concerns, with keywords such as Poverty and Corruption (as a bigram), Joblessness, the People Want (a popular chant), and mentions Anger, a main emotion of the Revolution, and whom it is targeted against: Egyptian President Mubarak, his son Jamal Mubarak (bigram), the Government. A worrying bigram associates Torture with the Revolution, showing the widespread human rights abuses at the time. It is worth noticing the Popular Concerns posts rise right before Egypt’s Revolution starts, and diminish in favour of Live Information on Jan 25. Unlike Egypt’s other Arabic-language clusters, there is a much larger presence of the Egyptian dialect, that overshadows Standard Arabic. An interesting topic cluster is Citizen Journalism, where people share, in Egyptian Arabic, videos, pictures, news about what is going on in Egypt’s streets, which are mentioned as a bigram. Facebook is one of the main keywords in the Protests cluster, whereas the Live Information cluster focuses on what is happening in Cairo’s streets (bigram), and mentions radio programs like Mahdath Lahdah Blahdah of the « Our Freedom » Radio.