atctools: R Package for ATC Code Matching and Validation

atctools is an R package created by Ville Langén that facilitates Anatomical Therapeutic Chemical (ATC) matching and ensures the integrity of ATC data in pharmaceutical datasets. This specialized tool is essential for pharmaceutical research, drug safety studies, and healthcare data analysis.
Overview
The atctools package provides comprehensive tools for working with ATC codes, which are used worldwide to classify drugs according to the organ or system on which they act and their therapeutic, pharmacological, and chemical properties. The package bridges the gap between drug names and their standardized ATC classifications.
Key Features
🔍 ATC Matching Functions
Advanced drug name matching using sophisticated algorithms including Levenshtein distance for fuzzy matching, ensuring accurate identification even with spelling variations or formatting differences.
🏷️ ATC Code Matching
Direct ATC-to-drug name matching for reverse lookups and data verification workflows.
✅ ATC Data Integrity Checks
Comprehensive validation tools to verify the structure and validity of ATC codes using authoritative reference lists.
Core Functions
drug_match()
Matches drug names to ATC codes without stripping trailing characters like “mg”, “g”, etc.
Usage:
library(atctools)
output_data <- drug_match(example_drug_data, "drug_name", example_reference_data)
output_data
Key Features:
- Preserves original drug name formatting
- Uses fuzzy matching algorithms for robust identification
- Includes confidence scoring through
drug_name_flag
drug_match_strip()
Matches drug names to ATC codes with trailing characters stripped for cleaner matching.
Usage:
output_data_stripped <- drug_match_strip(example_drug_data_with_mg, "drug_name", example_reference_data)
output_data_stripped
Perfect for:
- Drug names with dosage information (e.g., “paracetamol 500mg”)
- Standardizing drug nomenclature
- Handling varied formatting in clinical datasets
atc2drug()
Matches ATC codes back to drug names directly with no fuzzy matching for precise lookups.
Usage:
output_data_atc2drug <- atc2drug(example_atc_data, "ATC_code", example_reference_data)
output_data_atc2drug
Use Cases:
- Reverse lookups from ATC to drug names
- Data verification and validation
- Creating drug dictionaries from ATC classifications
🔍 ATC Validation Functions
validate_atc()
Checks if ATC codes follow the correct structural format (length, character pattern).
Usage:
validate_output <- validate_atc(example_atc_data_for_validation, "atc_1")
validate_output
Validation Criteria:
- Proper ATC code length and structure
- Character pattern verification
- Format compliance with WHO ATC standards
validate_atc_by_reference()
Checks if ATC codes exist in a given reference list (e.g., WHO ATC index or validated datasets).
Usage:
validate_by_ref_output <- validate_atc_by_reference(example_atc_data_for_validation, "atc_2", example_reference_data)
validate_by_ref_output
Combined Validation Workflow
Comprehensive validation using both format and reference checking:
output_combined <- example_atc_data_for_validation %>%
validate_atc("atc_1") %>%
validate_atc_by_reference("atc_2", example_reference_data)
output_combined
Output Flags:
atc_1_invalid
: Flag for invalid format (1 = invalid, 0 = valid)atc_2_invalid
: Flag for “not found in reference list” (1 = not found, 0 = found)
Understanding drug_name_flag
The drug_name_flag
column provides matching confidence in outputs from drug_match()
and drug_match_strip()
:
- 0: First 3 characters of drug name match the first 3 of matched ATC code ✅
- 1: Mismatch detected between drug name and ATC prefix ⚠️
This flag helps identify potential matching errors and enables quality control in large datasets.
Example Datasets
atctools includes comprehensive example datasets for testing and learning:
example_drug_data
Drug names without dosage information for clean matching scenarios.
example_drug_data_with_mg
Drug names with trailing dosage units (e.g., “paracetamol 500mg”) for strip matching.
example_atc_data
ATC codes for testing atc2drug()
– includes valid, invalid, and NA values.
example_reference_data
Reference dataset for drug name ↔ ATC code matching workflows.
example_atc_data_for_validation ✨ New!
Contains ATC codes across multiple columns for testing both structural and reference-based validation.
Installation
Install the development version directly from GitHub:
# install.packages("devtools")
devtools::install_github("vljlangen/atctools")
Use Cases
Perfect for:
- Pharmaceutical researchers standardizing drug classifications
- Clinical data scientists validating medication datasets
- Epidemiologists conducting drug utilization studies
- Healthcare analysts ensuring data quality in EHR systems
- Regulatory affairs professionals working with drug databases
- Pharmacovigilance teams processing adverse event data
Key Updates & Features
✅ New Validation Functions
validate_atc()
checks format validity against WHO standardsvalidate_atc_by_reference()
verifies presence in reference datasets
🧪 Enhanced Testing
- New
example_atc_data_for_validation
dataset - Comprehensive validation workflow examples
🔁 Improved Documentation
- Updated function examples with real-world scenarios
- Clear validation workflow documentation
🧹 Performance Improvements
- Enhanced
atc2drug()
function with better column detection - Optimized matching algorithms for large datasets
Why atctools?
Working with ATC codes manually is error-prone and time-consuming. atctools solves this by:
- Automating matching between drug names and ATC codes
- Ensuring data quality through comprehensive validation
- Handling edge cases intelligently (dosage units, spelling variations)
- Providing confidence metrics for quality control
- Supporting research workflows with robust, tested functions
Technical Details
- Built for: R statistical computing environment
- Specialty: Pharmaceutical and healthcare data analysis
- Dependencies: Minimal dependencies for maximum compatibility
- License: Open source (check repository for specific license)
- Maintained: Actively developed with regular updates
Development & Support
atctools is open source and welcomes contributions from the pharmaceutical data science community. Bug reports, feature requests, and pull requests are encouraged on the GitHub repository.
Questions or issues? Feel free to reach out through the project’s GitHub issues page.
Transform your pharmaceutical data workflows from manual, error-prone processes into automated, validated pipelines with atctools!