A Python package for cleaning, parsing, and analyzing Chinese phone numbers in various formats.
- Clean and normalize phone numbers from various formats
- Extract area codes and map them to corresponding cities
- Detect different phone number formats (mobile, landline, international, etc.)
- Handle multiple phone numbers in a single string
- Support for extensions and special number formats
- Comprehensive analysis of phone number patterns
pip install chinese-phone-parser
from chinese_phone_parser import PhoneParser
# Parse a single phone number
parser = PhoneParser()
result = parser.parse("+86-010-12345678")
print(result)
# Output: {
# 'original': '+86-010-12345678',
# 'normalized': '+8601012345678',
# 'type': 'landline',
# 'area_code': '010',
# 'city': 'Beijing'
# }
# Analyze a dataset with phone numbers
import pandas as pd
from chinese_phone_parser.utils.helpers import analyze_phone_dataset, get_phone_stats, plot_phone_formats
# Load your data
df = pd.read_csv('your_data.csv')
# Analyze the dataset
analyzed_df = analyze_phone_dataset(df, phone_column='phone')
# Get comprehensive statistics
stats = get_phone_stats(df, phone_column='phone')
# Create visualizations
fig = plot_phone_formats(df, phone_column='phone')
fig.show()
- Standard:
13812345678
- International:
+86 138 1234 5678
- With prefix:
0086-13812345678
- Local:
010-12345678
- With area code:
0755-87654321
- International:
+86 10 1234 5678
- 400 numbers:
400-123-4567
- 800 numbers:
800-123-4567
- With extensions:
010-12345678-123
0755-87654321 转 456
010-12345678 ext 789
010-12345678 分机 321
- Multiple numbers:
010-12345678 / 13812345678
0755-87654321, 400-123-4567
- Concatenated numbers:
0101234567813812345678
- Missing area codes:
12345678
-12345678
- Service numbers:
110
(Police)12345
(Government hotline)
from chinese_phone_parser.utils.helpers import analyze_phone_patterns
patterns = analyze_phone_patterns(phone_numbers)
print(patterns)
# Output includes:
# - Digit count distribution
# - Extension presence
# - Multiple number detection
# - International format detection
# - Area code analysis
# - Mobile/landline classification
from chinese_phone_parser.utils.helpers import plot_area_code_map
# Create a visualization of area code distribution
fig = plot_area_code_map(df, phone_column='phone', top_n=10)
fig.show()
For detailed usage and API documentation, please refer to the documentation.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.