Skip to content

AKShare Data Usage Tips

This documentation is an English translation of the original AKShare documentation.

Back to original Chinese documentation →


Data Usage Guidelines

This section provides important tips for using AKShare data effectively.


1. Data Format

Data Types

Type Description
object String/text data
int64 Integer numbers
float64 Decimal numbers
datetime64 Date/time data
bool True/False values

Common Issues

# Check data types
print(df.dtypes)

# Convert types
df['date'] = pd.to_datetime(df['date'])
df['close'] = pd.to_numeric(df['close'])

2. Data Cleaning

Handling Missing Values

# Check for missing values
print(df.isnull().sum())

# Remove rows with missing values
df_clean = df.dropna()

# Fill missing values
df_filled = df.fillna(method='ffill')  # Forward fill
df_filled = df.fillna(0)  # Fill with 0

Removing Duplicates

# Check for duplicates
print(df.duplicated().sum())

# Remove duplicates
df_clean = df.drop_duplicates()

3. Data Transformation

Renaming Columns

# Rename columns
df = df.rename(columns={
    'date': 'Date',
    'open': 'Open',
    'close': 'Close'
})

Setting Index

# Set date as index
df = df.set_index('date')

# Reset index
df = df.reset_index()

Sorting Data

# Sort by date
df = df.sort_values('date')

# Sort descending
df = df.sort_values('date', ascending=False)

4. Data Aggregation

Group By

# Group by month
df['month'] = df['date'].dt.month
monthly = df.groupby('month').agg({
    'open': 'mean',
    'close': 'mean',
    'volume': 'sum'
})

Resample

# Resample to weekly
weekly = df.resample('W').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
})

5. Data Export

CSV

df.to_csv('data.csv', index=False)

Excel

df.to_excel('data.xlsx', index=False)

JSON

df.to_json('data.json', orient='records')

SQL

import sqlite3

conn = sqlite3.connect('data.db')
df.to_sql('stocks', conn, if_exists='replace')

6. Performance Tips

Efficient Data Loading

# Use specific date ranges
df = ak.stock_zh_a_daily(
    symbol="600519",
    start_date="2024-01-01",
    end_date="2024-01-31"
)

# Instead of loading all history
df = ak.stock_zh_a_daily(symbol="600519")  # Loads all

Chunk Processing

# Process large datasets in chunks
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
    process(chunk)

7. Error Handling

Try-Except

try:
    df = ak.stock_zh_a_daily(symbol="600519")
except Exception as e:
    print(f"Error: {e}")
    # Handle error

Retry Logic

import time

def fetch_with_retry(func, max_retries=3):
    for i in range(max_retries):
        try:
            return func()
        except Exception as e:
            if i == max_retries - 1:
                raise e
            time.sleep(2 ** i)  # Exponential backoff

8. Best Practices

1. Validate Data

# Check data range
assert df['close'].min() > 0
assert df['volume'].min() >= 0

# Check date range
assert df['date'].min() >= pd.Timestamp('2000-01-01')

2. Cache Results

from functools import lru_cache

@lru_cache(maxsize=100)
def get_stock_data(symbol, start, end):
    return ak.stock_zh_a_daily(symbol=symbol, start_date=start, end_date=end)

3. Log Operations

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

logger.info(f"Fetching data for {symbol}")

9. Common Pitfalls

1. Survivorship Bias

Issue: Only analyzing stocks that still exist.

Solution: Use historical constituents data.

2. Look-Ahead Bias

Issue: Using future information.

Solution: Ensure calculations only use past data.

3. Data Snooping

Issue: Overfitting to historical data.

Solution: Use out-of-sample testing.


**AKShare** | *Open Data. Open Minds.* [GitHub](https://github.com/akfamily/akshare) • [Documentation](https://akshare.akfamily.xyz)