Byteli

Exploring CDF vs PPF in SciPy: Understanding Probability Functions

Photo by Robert Stump on Unsplash

Introduction

In probability theory, the Probability Point Function (PPF) and Cumulative Distribution Function (CDF) serve as fundamental tools in understanding and quantifying uncertainty within random variables. In this blog, we delve into the significance, applications, and practical implementation of PPF and CDF by analyzing the daily return of Apple’s stock price in 2023. Meanwhile, we will introduce SciPy, an open-source Python library designed for scientific and technical computing.

Cumulative Distribution Function (CDF)

Definition

The Cumulative Distribution Function (CDF) is a function that describes the probability distribution of a random variable by specifying the probability that the variable will be less than or equal to a certain value. In simpler terms, it gives the probability of a random variable taking on a value less than or equal to a specified number.

Mathematically, for a random variable X, the CDF is denoted as F(x) and is expressed as:

$ F(x) = P(X \leq x)$, for all $x \in \R $

Probability Point Function (PPF) or Inverse Cumulative Distribution Function (CDF)

Definition

The Probability Point Function (PPF), also known as the inverse cumulative distribution function, operates inversely to the CDF. It takes a probability value as input and returns the corresponding value of the random variable for which the CDF equals that probability.

Mathematically, if F(x) is the CDF of a random variable X, then the PPF is denoted as $F^{-1}(p)$ and is expressed as:

$F^{-1}(p)= x$, such that $F(x) = p$

The PPF is particularly useful in statistics for determining values associated with specific probabilities, such as percentiles or critical values.

Contrasting PPF and CDF

Practical Example in Scipy

In this example, we’ll demonstrate the concepts of PPF and CDF within SciPy. We’ll use the daily returns of Apple (AAPL) in 2023 as our dataset.

Get Stock Return Data Using yfinance

 1# For getting historical financial data
 2import yfinance as yf
 3# For scientific computing in statistics
 4from scipy.stats import norm
 5# For plotting and visualization
 6import matplotlib.pyplot as plt
 7
 8# Fetch AAPL data
 9stock_data = yf.download('AAPL', start='2023-01-01', end='2023-12-31')
10# Get a concise summary of our DataFrame
11stock_data.info()
12
13"""
14<class 'pandas.core.frame.DataFrame'>
15DatetimeIndex: 250 entries, 2023-01-03 to 2023-12-29
16Data columns (total 6 columns):
17 #   Column     Non-Null Count  Dtype
18---  ------     --------------  -----
19 0   Open       250 non-null    float64
20 1   High       250 non-null    float64
21 2   Low        250 non-null    float64
22 3   Close      250 non-null    float64
23 4   Adj Close  250 non-null    float64
24 5   Volume     250 non-null    int64
25dtypes: float64(5), int64(1)
26memory usage: 13.7 KB
27"""
28
29# Get the daiily return using `pct_change()`, remove missing values
30stock_returns = stock_data['Adj Close'].pct_change().dropna()

Inspect the distribution by plotting histogram of daily returns

1# Inspect the distribution by plotting histogram of daily returns
2plt.figure(figsize=(8, 6))
3plt.hist(stock_returns, bins=50, density=True, alpha=0.7, color='skyblue', edgecolor='black')
4plt.title('Histogram of Apple Inc. (AAPL)  Daily Returns in 2023')
5plt.xlabel('Daily Returns(%)')
6plt.ylabel('Frequency')
7plt.grid(True)
8plt.show()

hist

Construct the norm distribution in SciPy

1# Calculate mean and standard deviation of daily stock returns
2mean_return = stock_returns.mean()
3std_deviation = stock_returns.std()
4
5# Create a normal distribution based on the calculated mean and standard deviation
6appl_daily_return_distribution = norm(loc=mean_return, scale=std_deviation)

Calculating CDF and PPF in SciPy

The .cdf() and .ppf() methods in SciPy are essential functionalities within the scipy.stats module that handle Cumulative Distribution Function (CDF) and Probability Point Function (PPF), respectively, for various probability distributions.

 1# CDF
 2appl_daily_return_distribution.cdf(0.01)
 3
 4# 0.7420136860623278
 5# 74.20% of the daily returns of Apple is less or equal to 1%
 6
 7appl_daily_return_distribution.cdf(0.005)
 8# 0.5994001904523659
 9# 59.94% of the daily returns of Apple is less or equal to 0.5%
10
11appl_daily_return_distribution.cdf(0.01) - appl_daily_return_distribution.cdf(0.005)
12
13# 0.14261349560996184
14# 14.26% of the daily returns of Apple is greater than 0.5% and less or equal to 1%
 1# PPF
 2appl_daily_return_distribution.ppf(0.9)
 3
 4# 0.017944086843352632
 5# The value of which 90% of all the daily returns are less or equal to is 1.79%
 6
 7appl_daily_return_distribution.ppf(1 - 0.9)
 8# -0.014274232117790355
 9# The value of which 90% of all the daily returns are less or equal to is -1.42%
10# In other words, the value of which 90% of all the daily returns are greater to is -1.42%.

Visualize the Distribution

CDF

 1# Calculate the CDF for a range of values
 2x_values = sorted(stock_returns)
 3y_cdf = appl_daily_return_distribution.cdf(x_values)
 4
 5# Plotting the Cumulative Distribution Function (CDF)
 6plt.figure(figsize=(8, 6))
 7plt.plot(x_values, y_cdf, label='CDF')
 8plt.title("Cumulative Distribution Function (CDF) of Apple's Stock Daily Returns in 2023")
 9plt.xlabel('Daily Returns(%)')
10plt.ylabel('Cumulative Probability')
11plt.legend()
12plt.grid(True)
13plt.show()

cdf

PPF

 1# Calculate the PPF for a range of probabilities
 2probabilities = [0.05, 0.25, 0.5, 0.75, 0.95]
 3x_ppf = appl_daily_return_distribution.ppf(probabilities)
 4
 5# Plotting the Probability Point Function (PPF)
 6plt.figure(figsize=(8, 6))
 7plt.plot(probabilities, x_ppf, marker='o', linestyle='None', label='PPF')
 8plt.title("Probability Point Function (PPF) of of Apple's Stock Daily Returns in 2023")
 9plt.xlabel('Probability')
10plt.ylabel('Stock Daily Returns(%)')
11plt.legend()
12plt.grid(True)
13plt.show()

ppf

Conclusion

In summary, while the CDF gives the probability that a random variable is less than or equal to a particular value, the PPF helps in finding the value of the random variable for a given probability. They are complementary functions often used together in statistical analysis and probability calculations.

#statistics


Reply to this post by email ↪