Azure Invoice Analyzer Pro is an advanced Streamlit application that transforms Azure billing data into actionable business intelligence. The system provides comprehensive cost analysis through 9 business categories, interactive drill-down capabilities, and complete financial reconciliation with PDF export functionality.
| Chart Type | Calculation Method | Key Metrics |
|---|---|---|
| Cost Category Pie/Bar | df.groupby('CostCategory')['Cost'].sum() |
Category percentages of total cost |
| Service Provider Analysis | df.groupby('ConsumedService')['Cost'].sum() |
Total cost per Azure service |
| Resource Group Costs | df.groupby('ResourceGroup')['Cost'].sum() |
Total cost per resource group |
| Machine Efficiency | Cost / Quantity per resource |
Cost per unit/hour analysis |
| Interactive Drill-Down | Multi-level grouping + pattern matching | Resource Group → Machine → Category |
| Cost Reconciliation | abs(original_total - categorized_total) |
Financial validation & accuracy |
Cost Reconciliation:
original_total = df['Cost'].sum()
categorized_total = classified_df['Cost'].sum()
coverage = (categorized_cost / original_total) × 100Category Classification:
# 9 business categories based on Azure service patterns
if MeterSubcategory in ['Premium SSD Managed Disks']:
return 'Managed Disks'
elif ConsumedService == 'Microsoft.Compute' and not disk_category:
return 'VM Compute'Machine Cost Breakdown:
# Includes related resources (disks, NICs, etc.)
machine_data = df[df['ResourceName'] == machine_name]
related_data = df[df['ResourceName'].str.contains(machine_name)]
total_cost = combined_data['Cost'].sum()| Chart | Purpose | Key Insight | Optimization Action |
|---|---|---|---|
| Cost Category Pie | Budget allocation by business function | Which category consumes most budget | Focus optimization on largest segments |
| Service Provider Bar | Azure service spending distribution | Which Microsoft services cost most | Negotiate better rates, optimize usage |
| Resource Group Costs | Environment/team cost allocation | Which groups spend most | Budget allocation, cost accountability |
| Machine Efficiency | Resource utilization analysis | Cost per hour performance | Right-size over/under-utilized resources |
| Drill-Down Analysis | Granular cost investigation | Machine-level cost breakdown | Target specific machines for optimization |
| Reconciliation Validation | Financial accuracy verification | Data integrity and completeness | Ensure 100% cost accountability |
- Data Ingestion: CSV file upload with validation and error handling
- Cost Classification: 9-category business intelligence classification system
- Cost Reconciliation: Mathematical validation ensuring 100% cost accountability
- Interactive Analysis: Multi-level drill-down from resource groups to individual machines
- Export Generation: Professional PDF reports and individual chart downloads
-
Managed Disks (typically 60-80% of costs)
- Premium SSD Managed Disks
- Standard HDD Managed Disks
- Standard SSD Managed Disks
- Ultra SSD Managed Disks
-
VM Compute (10-25% of costs)
- Virtual machine runtime costs
- CPU and memory consumption
- Excludes storage components
-
CDN (5-15% of costs)
- Content Delivery Network services
- Data transfer and caching
-
Network/IP (3-10% of costs)
- Virtual Network services
- Public IP addresses
- Network security groups
-
Backup (2-8% of costs)
- Recovery Services vault
- Backup storage and operations
-
Load Balancer (1-5% of costs)
- Standard and Basic load balancers
- Application gateways
-
Other Storage (<2% of costs)
- Blob storage, File storage
- Table storage, Queue storage
-
Bandwidth (<1% of costs)
- Data transfer charges
- Inter-region communication
-
Key Vault (<1% of costs)
- Secret management services
- Certificate operations
-
Resource Group Level
- Total cost, machine count, usage hours
- High-level resource allocation view
-
Machine Level
- Individual machine costs and usage
- Service provider breakdown
- Related resource identification
-
Category Level
- Detailed cost breakdown by business category
- Service-level attribution
- Optimization recommendations
- Pattern Matching: Identifies associated resources using naming conventions
- Resource Relationships: Links VMs with their disks, NICs, and other components
- Complete Cost Attribution: Ensures all machine-related costs are captured
vm-name → Primary virtual machine
vm-name-disk → Managed disks
vm-name-nic → Network interfaces
vm-name_OsDisk → Operating system disks
vm-name-backup → Backup services
Mathematical Foundation:
# Core reconciliation calculation
original_total = df['Cost'].sum()
categorized_total = classified_df['Cost'].sum()
difference = abs(original_total - categorized_total)
reconciliation_success = difference < 0.01
# Coverage calculation
categorized_cost = total_cost - uncategorized_cost
coverage_percentage = (categorized_cost / original_total) × 100Key Metrics Explained:
- Original Invoice Total: Sum of all Cost column values from uploaded CSV
- Categorized Total: Sum of all costs after classification processing
- Difference: Mathematical variance between original and processed totals
- Coverage Percentage: (Categorized costs / Original total) × 100
- Reconciliation Status: ✅ Success if difference < $0.01, ❌ Failed otherwise
Business Interpretation:
- 100% Coverage: All costs properly classified, complete financial accountability
- 95-99% Coverage: Good classification, minor uncategorized items need review
- <95% Coverage: Significant classification gaps, immediate attention required
Calculation Method:
total_cost = df['Cost'].sum()
total_quantity = df['Quantity'].sum()
unique_resource_groups = df['ResourceGroup'].nunique()
unique_machines = df['ResourceName'].nunique()Metrics Meaning:
- Total Cost: Complete Azure spending for the invoice period
- Total Usage: Sum of all quantity values (typically hours)
- Top Category: Highest cost category from 9-category classification
- Resource Groups: Number of distinct Azure resource groups
- Services: Number of unique Azure service providers (Microsoft.Compute, etc.)
Calculation Method:
category_summary = classified_df.groupby('CostCategory').agg({
'Cost': 'sum',
'Quantity': 'sum'
}).round(4)
category_percentage = (category_summary['Cost'] / total_cost * 100).round(2)Chart Elements:
- Pie Segments: Each represents one of 9 business categories
- Percentage Labels: Category cost as percentage of total invoice
- Dollar Values: Absolute cost amount for each category
- Color Coding: Consistent colors across all charts for category identification
Business Interpretation:
- Dominant Categories: Largest segments indicate primary cost drivers
- Distribution Balance: Even distribution suggests diversified infrastructure
- Anomalies: Unusually large percentages may indicate optimization opportunities
Calculation Method:
category_breakdown = classified_df.groupby('CostCategory').agg({
'Cost': ['sum', 'count', 'mean'],
'Quantity': 'sum'
})
record_count = category_breakdown['Cost']['count']
avg_cost = category_breakdown['Cost']['mean']Data Points Explained:
- Horizontal Bars: Length represents total cost for each category
- Cost Values: Dollar amount displayed on each bar
- Percentage Labels: Category percentage of total costs
- Record Count: Number of invoice line items in each category (hover data)
Key Insights:
- Bar Length Comparison: Visual representation of spending distribution
- High Record Count: Many small transactions vs few large transactions
- Cost Concentration: Identifies which categories drive majority of spending
Calculation Method:
provider_summary = df.groupby('ConsumedService').agg({
'Cost': ['sum', 'count'],
'Quantity': 'sum'
}).round(4)
provider_percentage = (provider_summary['Cost'] / total_cost * 100).round(2)Chart Components:
- X-Axis: Azure service providers (Microsoft.Compute, Microsoft.Storage, etc.)
- Y-Axis: Total cost in USD for each provider
- Bar Height: Proportional to spending on each service
- Text Labels: Dollar amounts displayed above bars
Service Provider Breakdown:
- Microsoft.Compute: Virtual machines, disks, compute resources
- Microsoft.Network: Virtual networks, load balancers, IP addresses
- Microsoft.Storage: Blob storage, file storage, queues
- Microsoft.RecoveryServices: Backup and disaster recovery
- Microsoft.Cdn: Content delivery network services
Business Analysis:
- Compute Dominance: High Microsoft.Compute costs indicate VM-heavy workloads
- Storage Intensity: High Microsoft.Storage suggests data-intensive applications
- Network Costs: Significant Microsoft.Network indicates complex networking requirements
Calculation Method:
rg_data = df[df['ResourceGroup'] == selected_rg]
rg_cost = rg_data['Cost'].sum()
rg_machines = rg_data['ResourceName'].nunique()
rg_quantity = rg_data['Quantity'].sum()Metrics Explanation:
- Total Cost: Sum of all costs within selected resource group
- Machines: Count of unique ResourceName values in the group
- Total Usage: Sum of quantity values (usually compute hours)
Calculation Method:
machine_summary = rg_data.groupby('ResourceName').agg({
'Cost': 'sum',
'Quantity': 'sum',
'ConsumedService': lambda x: ', '.join(x.unique()),
'MeterCategory': lambda x: ', '.join(x.unique())
})
cost_percentage = (machine_summary['Cost'] / rg_cost * 100).round(2)Table Columns:
- Machine Name: ResourceName from Azure invoice
- Total Cost: Sum of all costs for this specific machine
- Total Usage: Sum of quantity values for this machine
- Services Used: List of Azure services consumed by this machine
- Meter Categories: Types of Azure meters associated with this machine
- Cost %: This machine's percentage of total resource group costs
Calculation Method:
# Enhanced resource detection
machine_data = classified_df[classified_df['ResourceName'] == selected_machine]
related_data = classified_df[
(classified_df['ResourceName'].str.contains(selected_machine, case=False)) |
(classified_df['ResourceName'].str.startswith(selected_machine + '-')) |
(classified_df['ResourceName'].str.startswith(selected_machine + '_'))
]
combined_data = pd.concat([machine_data, related_data]).drop_duplicates()Machine Metrics:
- Total Cost: Sum of costs for machine and all related resources
- Total Usage: Sum of quantity values across all related resources
- Categories: Number of cost categories this machine uses
- Cost/Hour: Total cost divided by total usage hours
Related Resources Detection:
- Primary Resource: Exact match of machine name
- Associated Disks: Resources with patterns like "vm-name-disk", "vm-name_OsDisk"
- Network Components: Resources like "vm-name-nic" (network interfaces)
- Backup Resources: Resources like "vm-name-backup"
Pie Chart (Cost Distribution):
category_breakdown = combined_data.groupby('CostCategory').agg({
'Cost': 'sum',
'Quantity': 'sum'
})
machine_percentage = (category_breakdown['Cost'] / machine_cost * 100).round(2)Elements:
- Pie Segments: Proportional to category costs for this machine
- Percentage Labels: Category percentage of machine's total cost
- Dollar Values: Absolute cost for each category
Bar Chart (Category Details):
- Horizontal Bars: Cost amount for each category
- Category Colors: Consistent color scheme matching pie chart
- Hover Data: Additional details including quantity used
Calculation Method:
efficiency_df = df.groupby('ResourceName').agg({
'Cost': 'sum',
'Quantity': 'sum'
})
efficiency_df['CostPerUnit'] = efficiency_df['Cost'] / efficiency_df['Quantity']
efficiency_df['EfficiencyScore'] = efficiency_df['Cost'] / efficiency_df['Quantity']Dual-Axis Chart Components:
- Left Y-Axis: Total cost in USD (bars)
- Right Y-Axis: Cost per unit/hour (line)
- X-Axis: Resource names (top 15 by cost)
- Bar Height: Total spending on each resource
- Line Points: Efficiency score (cost per hour)
Efficiency Interpretation:
- High Cost + Low Efficiency: Expensive resources with high cost per hour
- High Cost + High Efficiency: Expensive but efficient resources
- Low Cost + Low Efficiency: Cheap resources but poor cost per hour ratio
- Low Cost + High Efficiency: Cost-effective resources
Optimization Targets:
- High Efficiency Score: Resources with cost per hour above average
- Right-Sizing Candidates: Resources with consistently high efficiency scores
- Optimization Opportunities: Resources showing poor cost per unit ratios
Calculation Method:
cost_by_rg = df.groupby('ResourceGroup')['Cost'].sum().sort_values(ascending=False)Chart Elements:
- X-Axis: Resource group names
- Y-Axis: Total cost in USD
- Bar Height: Proportional to total spending per resource group
- Text Labels: Dollar amounts displayed above each bar
Business Usage:
- Environment Comparison: Compare prod, dev, test environment costs
- Department Allocation: Understand which teams/projects drive costs
- Budget Attribution: Assign costs to specific business units
Calculation Method:
cost_by_machine = df.groupby('ResourceName')['Cost'].sum().sort_values(ascending=False)
top_machines = cost_by_machine.head(Config.TOP_ITEMS_COUNT)Analysis Focus:
- Top N Machines: Configurable from 5-50 most expensive resources
- Cost Ranking: Machines ordered by total spending
- Optimization Targets: Highest-cost machines for immediate attention
Calculation Method:
agg_data = df.groupby('ResourceGroup').agg({
'Cost': 'sum',
'Quantity': 'sum'
})Dual-Axis Visualization:
- Primary Y-Axis (Bars): Total cost per resource group
- Secondary Y-Axis (Line): Total usage hours per resource group
- Correlation Analysis: Relationship between cost and usage
Business Insights:
- High Cost + Low Usage: Potentially over-provisioned resources
- Low Cost + High Usage: Efficient resource utilization
- Correlation Patterns: Expected vs unexpected cost-usage relationships
Calculation Method:
uncategorized_items = classified_df[classified_df['CostCategory'] == 'Other']
uncategorized_cost = uncategorized_items['Cost'].sum()
uncategorized_percentage = (uncategorized_cost / total_cost * 100)Key Indicators:
- Uncategorized Cost: Dollar amount not classified into business categories
- Percentage of Total: Uncategorized amount as percentage of total invoice
- Number of Items: Count of line items that couldn't be classified
- Status Assessment: ✅ Excellent (<1%),
⚠️ Good (1-5%), ❌ Needs Review (>5%)
Calculation Method:
service_breakdown = uncategorized_items.groupby([
'ConsumedService', 'MeterCategory', 'MeterSubcategory'
]).agg({
'Cost': 'sum',
'ResourceName': 'count'
})Table Analysis:
- Service Provider: Azure service generating uncategorized costs
- Meter Category: Type of Azure service meter
- Meter Subcategory: Specific service subtype
- Total Cost: Dollar amount for this service combination
- Item Count: Number of invoice lines for this service type
- % of Uncategorized: Percentage of uncategorized costs from this service
Calculation Method:
enhanced_breakdown = all_machine_data.groupby([
'ConsumedService', 'MeterCategory', 'MeterSubcategory'
]).agg({
'Cost': 'sum',
'Quantity': 'sum',
'ResourceName': lambda x: ', '.join(x.unique())
})Detailed Breakdown Components:
- Service Provider: Azure service (Microsoft.Compute, Microsoft.Storage)
- Meter Category: Service category (Virtual Machines, Storage, Network)
- Meter Subcategory: Specific service type (Premium SSD, D2s v3, etc.)
- Cost: Total spending for this service type
- Quantity: Total usage amount (hours, GB, transactions)
- Resource Names: Which specific resources use this service
- Suggested Category: What business category this should be classified as
Calculation Method:
storage_data = all_machine_data[
all_machine_data['MeterCategory'].str.contains('Storage|Disk', case=False)
]
storage_breakdown = storage_data.groupby(['MeterSubcategory', 'ResourceName']).agg({
'Cost': 'sum',
'Quantity': 'sum'
})
storage_percentage = (storage_cost / total_machine_cost * 100)Storage Metrics:
- Storage Type: Specific disk type (Premium SSD, Standard HDD)
- Resource Name: Individual disk or storage resource
- Cost: Dollar amount for this storage component
- Quantity: Storage usage (typically GB-hours or disk-hours)
- Storage Percentage: Storage costs as percentage of total machine costs
Optimization Insights:
- Disk Tier Analysis: Premium vs Standard SSD usage
- Storage Efficiency: Cost per GB analysis
- Right-Sizing Opportunities: Over-provisioned storage identification
Date,Cost,Quantity,ResourceGroup,ResourceName,ConsumedService,MeterCategory,MeterSubcategory
2024-01-01,45.67,720,prod-rg,web-server-01,Microsoft.Compute,Virtual Machines,D2s v3
2024-01-01,89.23,744,prod-rg,web-server-01-disk,Microsoft.Compute,Storage,Premium SSD Managed Disks
# Managed Disks Classification
if MeterSubcategory in ['Premium SSD Managed Disks', 'Standard HDD Managed Disks']:
return 'Managed Disks'
# VM Compute Classification
if ConsumedService == 'Microsoft.Compute' and MeterSubcategory not in disk_categories:
return 'VM Compute'
# Network Classification
if MeterCategory == 'Virtual Network':
return 'Network/IP'- Mathematical Validation: Original total = Sum of categorized costs
- Coverage Tracking: Percentage of costs successfully categorized
- Gap Identification: Uncategorized items analysis and recommendations
- Error Detection: Data quality issues and missing classifications
This comprehensive strategy document provides the technical foundation and business rationale for implementing Azure Invoice Analyzer Pro as a critical tool for Azure cost management and financial governance.