Thus, I had to create that dataset myself by scraping Medium's article archive using Selenium and Python. Finally, I collected the article text itself and the tags for each article: Ultimately, I collected 736 articles from the 19th of December 2018 until the 3rd of January 2019. If your article received 49 claps, you already received more claps than 75% of all data science articles. Commonly used words, ignoring fillers such as 'and', include 'data science', 'machine learning', and 'python'. What is the Softmax Function? This distribution still looks a little right-skewed, however, it seems to be centered at around 5 minutes. As evident from this seaborn distplot, most successful articles were published around 3 pm UTC.

Instead, I wanted to find out if there are specific characteristics that make certain articles perform better than others. Another aspect that could be of importance concerns the number and type of tags. As opposed to the 51% of all articles, only 19% of successful articles use less than 5 tags. The title of an article is extremely important. Take a look at the 5 most commonly used words in successful articles: Using 'Data Science' and 'Machine Learning' seems to be a good idea. After all, the minutes it will take you to read the article is one of the first things potential readers of your article will glance over. Extracting tabular data from PDFs made easy with Camelot.

