1. What is a Series?

import pandas as pd

obj = pd.Series([4, 7, -5, 3])
obj
## 0    4
## 1    7
## 2   -5
## 3    3
## dtype: int64
type(obj)
## <class 'pandas.core.series.Series'>
obj.values
## array([ 4,  7, -5,  3])
obj.index
## RangeIndex(start=0, stop=4, step=1)
obj2 = pd.Series([4, 7, -5, 3], index=['a', 'b', 'c', 'd'])
obj2
## a    4
## b    7
## c   -5
## d    3
## dtype: int64

2. Indexing, Slicing and Filtering

2.1 Indexing

  • We use index and the square brackets [] to select elements from a Series:
obj
## 0    4
## 1    7
## 2   -5
## 3    3
## dtype: int64
obj[0]
## 4
obj[[0, 3]]
## 0    4
## 3    3
## dtype: int64
  • In case we have labels for index, simply use the label when selecting an element:
obj2
## a    4
## b    7
## c   -5
## d    3
## dtype: int64
obj2['b']
## 7
  • If you use a number index (instead of the label), Python will still understand the command and interpret it as a the row index:
obj2[1]
## 7

2.2 Slicing

  • Slicing works with numerical index:
obj[0:2]
## 0    4
## 1    7
## dtype: int64
  • Slicing works with label index too, but it behaves differently than normal Python slicing in that the endpoint is inclusive:
obj2['b':'c']
## b    7
## c   -5
## dtype: int64

2.3 Filtering

  • Boolean filtering works with Series just like NumPy array!
obj
## 0    4
## 1    7
## 2   -5
## 3    3
## dtype: int64
obj[obj < 2]
## 2   -5
## dtype: int64
  • This is such an “upgrade” from the built-in list data structure:
a = [4, 7, -5, 3]
a[a < 2]
## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: '<' not supported between instances of 'list' and 'int'
## 
## Detailed traceback: 
##   File "<string>", line 1, in <module>
  • We can even perform a little more “complicated” checking conditions:
# select only even observations
obj[obj % 2 == 0]
## 0    4
## dtype: int64

2.4 Selecting with loc and iloc

  • These two keywords are much more interesting when we use them for DataFrame.
  • In the case of Series, there is not really a reason to use them, but let’s discuss them anyway:
    • loc: label location
    • iloc: integer location
obj2
## a    4
## b    7
## c   -5
## d    3
## dtype: int64
obj2.loc['b']
## 7
obj2.iloc[1]
## 7
  • They do behave differently with slicing:
obj2.loc['b':'c']
## b    7
## c   -5
## dtype: int64
obj2.iloc[1:2]
## b    7
## dtype: int64

3. Arithmetic and Data Alignment

3.1 Arithmetic operations on objects with same indexes

  • pandas Series behave very similarly to NumPy array when performing operations, that is they follow vectorization, also known as element-wise execution.
s1 = pd.Series([7.3, -2.5, 3.4, 1.5])
s1
## 0    7.3
## 1   -2.5
## 2    3.4
## 3    1.5
## dtype: float64
  • Let’s add s1 with itself:
s1 + s1
## 0    14.6
## 1    -5.0
## 2     6.8
## 3     3.0
## dtype: float64
  • What’s about multiplication? division? raise to a power?
s1 * 5
## 0    36.5
## 1   -12.5
## 2    17.0
## 3     7.5
## dtype: float64
s1 / 2
## 0    3.65
## 1   -1.25
## 2    1.70
## 3    0.75
## dtype: float64
s1 ** 2
## 0    53.29
## 1     6.25
## 2    11.56
## 3     2.25
## dtype: float64
  • How will Python react if we add 2 Series with different lengths?
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1])
s1 + s2
## 0    5.2
## 1    1.1
## 2    1.9
## 3    5.5
## 4    NaN
## dtype: float64

3.2 Arithmetic operations on objects with different indexes

  • An important pandas feature for some applications is the behavior of arithmetic between objects with different indexes.
  • When you are adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs.
  • For those with database experience, this is similar to an automatic outer join on the index labels.
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
s1
## a    7.3
## c   -2.5
## d    3.4
## e    1.5
## dtype: float64
s2
## a   -2.1
## c    3.6
## e   -1.5
## f    4.0
## g    3.1
## dtype: float64
  • Let’s add s1 and s2 together!
s1 + s2
## a    5.2
## c    1.1
## d    NaN
## e    0.0
## f    NaN
## g    NaN
## dtype: float64
  • We can also use the add() function with the fill_value argument to specify which value we want to assign to these values instead of NA:
s1.add(s2, fill_value=0)
## a    5.2
## c    1.1
## d    3.4
## e    0.0
## f    4.0
## g    3.1
## dtype: float64

This lecture note referenced material from Chapter 5 of Wes McKinney’s Python for Data Analysis 2nd Ed.