Master data analysis with Pandas: from basics to advanced, with real code, hands-on labs, and expert guidance by Ram Sir.
Start Learning PandasRam Sir, specialist in Python Data Science, has 2+ years of experience in analytics, machine learning, and teaching Pandas to thousands of students and professionals.
Pandas provides Series (1D) and DataFrame (2D) as primary structures.
import pandas as pd
s = pd.Series([1, 2, 3], index=['a','b','c'])
df = pd.DataFrame({
'name': ['Alice','Bob'],
'age': [23, 34]
})
# From numpy array
import numpy as np
df2 = pd.DataFrame(np.arange(6).reshape(2,3), columns=['A','B','C'])
df = pd.read_csv("data.csv")
df = pd.read_excel("data.xlsx")
df = pd.read_json("data.json")
Use pd.DataFrame.from_dict(), from_records() for advanced creation. DataFrame columns can have different types.
Access and manipulate data using powerful indexing features:
df['age'] # Series
df[['name','age']] # DataFrame
df.loc[0] # Row by label/index
df.iloc[0,1] # Row/col by integer
df.at[0,'age'] # Fast scalar access
df.iat[0,1] # Fast scalar by position
mask = df['age'] > 25
df[mask] # Filter rows
df.loc[df['age']>30, 'name'] # Select name where age>30
df.iloc[[0,2], [1,2]] # Fancy row/col selection
df.loc[1, 'age'] = 40
df['new_col'] = df['age'] * 2
Use loc for label-based and iloc for integer-based indexing. Boolean indexing is extremely powerful!
Manipulate and analyze data with vectorized ops, stats, apply/map, and handle missing data:
df.sum()
df.mean()
df.describe()
df['age'].min()
df.count()
df['age_plus_10'] = df['age'].apply(lambda x: x+10)
df['name_len'] = df['name'].map(len)
df.transform({'age': np.sqrt})
df.isnull()
df.dropna()
df.fillna(0)
df['col'].fillna(df['col'].mean())
apply() works row/col-wise, map() on Series. Use fillna() and dropna() for NaN.
Combine, reshape, and summarize data:
df.groupby('dept')['age'].mean()
df.pivot_table(index='dept', columns='gender', values='salary', aggfunc='sum')
pd.melt(df, id_vars=['name'], value_vars=['age','salary'])
pd.concat([df1, df2], axis=0) # Stack rows
pd.concat([df1, df2], axis=1) # Stack columns
pd.merge(df1, df2, on='id', how='inner')
df1.join(df2, rsuffix='_other')
df.stack()
df.unstack()
df.T # Transpose
df.reset_index()
groupby is for summary stats per group; pivot and melt reshape data; merge is like SQL join.
Handle dates, rolling windows, categorical data, IO, and advanced tricks:
dt = pd.date_range("2023-01-01", periods=5, freq='D')
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.resample('M').mean()
df.rolling(window=3).mean()
df['cat'] = df['col'].astype('category')
df['name'].str.upper()
df['email'].str.contains('gmail')
df.to_csv("out.csv")
df.to_excel("out.xlsx")
df.to_json("out.json")
df.to_sql("table", conn)
Pandas is great for timeseries, categorical, and text data. Use read_* and to_* for file IO.
Results will appear here...
Most-used Pandas functions with quick examples:
df = pd.read_csv("data.csv")
df = pd.DataFrame([[1,2],[3,4]], columns=["A","B"])
df.loc[0, 'A']
df[df['A'] > 1]
df.mean()
df['A'].apply(np.sqrt)
df.groupby('cat').sum()
pd.merge(df1, df2, on='id')
df['date'] = pd.to_datetime(df['date'])
df.resample('M').sum()
df.to_csv("out.csv")
df.to_sql("table", conn)
df.dropna()
df.fillna(0)
df['name'].str.lower()
df['type'] = df['type'].astype('category')