OMOP-CDM ETL with Multi-DB Support

Designed and developed a secure, containerized ETL platform tailored for large-scale patient data processing in the healthcare sector. The system automates data ingestion, transformation, and loading workflows with out-of-the-box support for OMOP-CDM, ensuring compliance and interoperability. The solution improves efficiency, reduces manual intervention, and enables data-driven healthcare analytics and decision-making.

Workflow : -

Designed a declarative ETL framework enabling schema and job configurations via YAML files.

Developed a CLI using Python Click to manage data workflows (create, populate, run).

Built transformation pipelines to standardize data into OMOP-CDM format.

Enabled support for PostgreSQL, MSSQL, and Oracle backends using SQLAlchemy.

Ensured performance and scalability through incremental ETL job execution.

Wrote unit and integration tests using Pytest and CliRunner to ensure reliability.

Containerized the entire ETL system using Docker and automated deployment via GitLab CI on Cloud environment.

Leveraged AWS RDS for PostgreSQL and S3 for secure data storage and ingestion workflows

Services

Responsibilities

Python

Click

Pandas

SQL

Alchemy

PostgreSQL

MSSQL

Oracle

Docker

GitLab CI/CD APIs