• pandas is arguably the most famous library for Data Science in Python.
• To get started with pandas, we need to familiarize ourselves with pandas’ 2 fundamental data structures:
• Series
• DataFrame

## 1. What is a Series?

• A Series is a one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) and an associated array of data labels, called its index.
• Use the Series() function in pandas to createa an array from a list:
import pandas as pd

obj = pd.Series([4, 7, -5, 3])
obj
## 0    4
## 1    7
## 2   -5
## 3    3
## dtype: int64
type(obj)
## <class 'pandas.core.series.Series'>
• Since we did not specify an index for the data, a default one consisting of the integers 0 through N - 1 (where N is the length of the data) is created.
• You can get the array representation and index object of the Series via its values and index attributes, respectively:
obj.values
## array([ 4,  7, -5,  3])
obj.index
## RangeIndex(start=0, stop=4, step=1)
• We can also specify the index when we create the Series:
obj2 = pd.Series([4, 7, -5, 3], index=['a', 'b', 'c', 'd'])
obj2
## a    4
## b    7
## c   -5
## d    3
## dtype: int64

## 2. Indexing, Slicing and Filtering

### 2.1 Indexing

• We use index and the square brackets [] to select elements from a Series:
obj
## 0    4
## 1    7
## 2   -5
## 3    3
## dtype: int64
obj[0]
## 4
obj[[0, 3]]
## 0    4
## 3    3
## dtype: int64
• In case we have labels for index, simply use the label when selecting an element:
obj2
## a    4
## b    7
## c   -5
## d    3
## dtype: int64
obj2['b']
## 7
• If you use a number index (instead of the label), Python will still understand the command and interpret it as a the row index:
obj2[1]
## 7

### 2.2 Slicing

• Slicing works with numerical index:
obj[0:2]
## 0    4
## 1    7
## dtype: int64
• Slicing works with label index too, but it behaves differently than normal Python slicing in that the endpoint is inclusive:
obj2['b':'c']
## b    7
## c   -5
## dtype: int64

### 2.3 Filtering

• Boolean filtering works with Series just like NumPy array!
obj
## 0    4
## 1    7
## 2   -5
## 3    3
## dtype: int64
obj[obj < 2]
## 2   -5
## dtype: int64
• This is such an “upgrade” from the built-in list data structure:
a = [4, 7, -5, 3]
a[a < 2]
## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: '<' not supported between instances of 'list' and 'int'
##
## Detailed traceback:
##   File "<string>", line 1, in <module>
• We can even perform a little more “complicated” checking conditions:
# select only even observations
obj[obj % 2 == 0]
## 0    4
## dtype: int64

### 2.4 Selecting with loc and iloc

• These two keywords are much more interesting when we use them for DataFrame.
• In the case of Series, there is not really a reason to use them, but let’s discuss them anyway:
• loc: label location
• iloc: integer location
obj2
## a    4
## b    7
## c   -5
## d    3
## dtype: int64
obj2.loc['b']
## 7
obj2.iloc[1]
## 7
• They do behave differently with slicing:
obj2.loc['b':'c']
## b    7
## c   -5
## dtype: int64
obj2.iloc[1:2]
## b    7
## dtype: int64

## 3. Arithmetic and Data Alignment

### 3.1 Arithmetic operations on objects with same indexes

• pandas Series behave very similarly to NumPy array when performing operations, that is they follow vectorization, also known as element-wise execution.
s1 = pd.Series([7.3, -2.5, 3.4, 1.5])
s1
## 0    7.3
## 1   -2.5
## 2    3.4
## 3    1.5
## dtype: float64
• Let’s add s1 with itself:
s1 + s1
## 0    14.6
## 1    -5.0
## 2     6.8
## 3     3.0
## dtype: float64
• What’s about multiplication? division? raise to a power?
s1 * 5
## 0    36.5
## 1   -12.5
## 2    17.0
## 3     7.5
## dtype: float64
s1 / 2
## 0    3.65
## 1   -1.25
## 2    1.70
## 3    0.75
## dtype: float64
s1 ** 2
## 0    53.29
## 1     6.25
## 2    11.56
## 3     2.25
## dtype: float64
• How will Python react if we add 2 Series with different lengths?
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1])
s1 + s2
## 0    5.2
## 1    1.1
## 2    1.9
## 3    5.5
## 4    NaN
## dtype: float64

### 3.2 Arithmetic operations on objects with different indexes

• An important pandas feature for some applications is the behavior of arithmetic between objects with different indexes.
• When you are adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs.
• For those with database experience, this is similar to an automatic outer join on the index labels.
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
s1
## a    7.3
## c   -2.5
## d    3.4
## e    1.5
## dtype: float64
s2
## a   -2.1
## c    3.6
## e   -1.5
## f    4.0
## g    3.1
## dtype: float64
• Let’s add s1 and s2 together!
s1 + s2
## a    5.2
## c    1.1
## d    NaN
## e    0.0
## f    NaN
## g    NaN
## dtype: float64
• We can also use the add() function with the fill_value argument to specify which value we want to assign to these values instead of NA:
s1.add(s2, fill_value=0)
## a    5.2
## c    1.1
## d    3.4
## e    0.0
## f    4.0
## g    3.1
## dtype: float64

This lecture note referenced material from Chapter 5 of Wes McKinney’s Python for Data Analysis 2nd Ed.