Distributing your own package on PyPi – Part 2

In Distributing your own package on PyPi I wrote about my first package on PyPI. Here are some refinements aka lessons learned:

Project Description on PyPI

I wondered why the project description on PyPi was empty. Solution: You need a long_description. If You already have a README.md, you can read it into a string and use this as the description.

But you have to add long_description_content_type=’text/markdown’ as well.

from setuptools import setup

# read the contents of your README file
from os import path
this_directory = path.abspath(path.dirname(__file__))
with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
    long_description = f.read()

setup(
    name='flask_url_mapping',
    version='0.6',
    packages=['flask_url_mapping'],
    url='https://github.com/jboegeholz/flaskurls',
    download_url='https://github.com/jboegeholz/flaskurls/archive/0.2.tar.gz',
    license='MIT',
    author='Joern Boegeholz',
    author_email='boegeholz.joern@gmail.com',
    description='Django-style URL handling for Flask',
    long_description=long_description,
    long_description_content_type='text/markdown',
    install_requires=["Flask", "Flask-Login"]
)

 

Dependencies of your Package

If your package relies on the usage of other python packages you should add them to your setup.py as well via install_requires.

setup(
    name='flask_url_mapping',
    version='0.6',
    packages=['flask_url_mapping'],
    url='https://github.com/jboegeholz/flaskurls',
    download_url='https://github.com/jboegeholz/flaskurls/archive/0.2.tar.gz',
    license='MIT',
    author='Joern Boegeholz',
    author_email='boegeholz.joern@gmail.com',
    description='Django-style url handling for Flask',
    install_requires=["Flask", "Flask-Login"]
)

Checking test coverage with pytest-cov

Test coverage

I wanted to analyze my python package flaskurls for test coverage. this is how you can do it:

pipenv install pytest-cov
py.test --cov=flask_url_mapping tests/
----------- coverage: platform win32, python 3.6.5-final-0 -----------
Name                              Stmts   Miss  Cover
-----------------------------------------------------
flask_url_mapping\__init__.py         1      0   100%
flask_url_mapping\flask_urls.py      73      0   100%
-----------------------------------------------------
TOTAL                                74      0   100%


========================== 10 passed in 0.60 seconds ==========================

10 things I didn’t know about Data Science a year ago

In my article My personal road map for learning data science in 2018 I wrote about how I try to tackle the data science knowledge sphere. Due to the fact that 2018 is slowly coming to an end I think it is time for a little wrap up.

What are the things I learned about Data Science in 2018? Here we go:

1. The difference between Data Science, Machine Learning, Deep Learning and AI

Continue reading “10 things I didn’t know about Data Science a year ago”

Distributing your own package on PyPi

In Regular Expressions Demystified I developed a little python package and distributed it via PyPi.

I wanted to publish my second self-written package as well, but coming back after almost a year, some things have changed in the world of PyPi, i.e. the old tutorials aren’t working anymore.

So I wrote this article to bring some clarity into this topic.

Distutils vs Setuptools

Continue reading “Distributing your own package on PyPi”

Data Science: Pandas

Pandas is a data analyzing tool. Together with numpy and matplotlib it is part of the data science stack

You can install it via

pip install pandas

Working with real data

The data set we are using is the astronauts data set from kaggle:

Download Data Set NASA Astronauts from Kaggle

During this introduction we want to answer the following questions

  • Which American astronaut has spent the most time in space?
  • What university has produced the most astronauts?
  • What subject did the most astronauts major in at college?
  • Have most astronauts served in the military? What rank did they achieve?

Basic Usage

import pandas as pd

astronaut_data = pd.read_csv("./astronauts.csv")

With the len function You can get the number of rows in the dataset

len(astronaut_data)

which gives us 357 astronauts

The columns property gives you the names of the individual columns

astronaut_data.columns

The methods head() gives you the first five entries:

astronaut_data.head()

whereas the tail method gives you the last n entries

astronaut_data.tail(10)

With the iloc keyword You get the entries directly

astronaut_data.iloc[0]

Which American astronaut has spent the most time in space?

most_time_in_space = astronaut_data.sort_values(by="Space Flight (hr)", ascending=False).head(1)
most_time_in_space[['Name', 'Space Flight (hr)']]

Sorting the dataframe can be done with sort_by_values. And for this question we sort for Space Flight (hr). Because we want the most hours we have to sort descending which translates to ascending=False.

head(1) gives us the correct answer:

Jeffrey N. Williams. He spent 12818 hours (534 days) in space.

Have You heard of him? Unsung hero!

Hint: the Dataset was updated the last time in 2017. As of 2019 Peggy Whitson is the american who has spent the most time in space. 

She has spend more than 665 days in space!

What university has produced the most astronauts?

The method value_counts is used to count the number of occurences of unique values

astronaut_data['Alma Mater'].value_counts().head(1)

The US Naval Academy produced 12 astronauts

What subject did the most astronauts major in at college?

astronaut_data['Undergraduate Major'].value_counts().head(1)

The same here: use value_counts method on the Undergraduate Major column.
The answer is Physics: 35 Astronauts studied physics in college

Have most astronauts served in the military?

the count method returns the number of entries which are not null or not NaN

astronaut_data['Military Rank'].count()

In this case 207 astronauts have a military rank.

astronaut_data['Military Rank'].value_counts().head(1)

which gives us 94 Colonels.