Available for opportunities

Hello, I'm

Krishna Kumar Yadav

Senior Data Engineer // _

Architecting data infrastructure that processes 10M+ transactions daily, delivering $2M+ annual savings, and building the backbone of fraud detection systems in banking.

Get In Touch View Projects

krishna@data-engineer ~

─ ◻

0 Years Exp

0 Savings/yr

0 Query Boost

0 Txns/Day

Scroll to explore

<about>

About Me

I'm a Senior Data Engineer based in Bengaluru & Gurgaon, India, with a decade of experience architecting data solutions that power critical banking operations.

I specialize in building end-to-end data pipelines, real-time streaming architectures, and feature engineering systems serving ML teams at scale. My work spans fraud detection, risk management, and regulatory compliance in financial services.

Currently expanding into Generative AI, LLMs, MLOps, and cloud-native architectures to push the boundaries of what data engineering can achieve.

profile.py

class DataEngineer:
  def __init__(self):
    self.name = "Krishna Kumar Yadav"
    self.role = "Senior Data Engineer"
    self.experience = "10+ years"
    self.impact = {
      "savings": "$2M+ annual",
      "optimization": "50% faster",
      "scale": "10M+ txns/day"
    }

Location

Bengaluru & Gurgaon

Domain

Banking & FinServ

Education

M.Sc. CS (AI & ML)

Data Pipeline Architecture

Technical Arsenal

Big Data

Apache SparkPySparkKafka Spark StreamingHiveHDFS HadoopKinesisMapReduce ParquetAvroORC

Pipeline & Orchestration

Apache AirflowdbtETL / ELT Pipeline DevWorkflow Automation Batch ProcessingStream Processing

Feature Engineering & ML

Feature StoreFeature Pipelines ML Data PipelinesGreat Expectations Data QualityData Lineage

Warehouse & Lakehouse

SnowflakeDelta LakeApache Iceberg SnowpipeData ModelingStar Schema Data WarehouseLakehouse

Cloud Platforms

Azure DatabricksAzure Data Factory Azure SynapseADLSAWS S3 AWS GlueAWS EMRAWS Lambda AWS Athena

Languages & Tools

PythonSQLScalaBash TigerGraphPostgreSQLApache Impala HBaseGitDocker

Currently Learning & Exploring

Expanding Horizons

Generative AI Large Language Models LangChain RAG Pipelines MLOps Kubernetes Terraform Apache Flink Microsoft Fabric Vector Databases Data Mesh Prompt Engineering Generative AI Large Language Models LangChain RAG Pipelines MLOps Kubernetes Terraform Apache Flink Microsoft Fabric Vector Databases Data Mesh Prompt Engineering

Work Experience

CURRENT

2023 — Present

Senior Data Engineer

Bank of America

Architected data pipelines processing 10M+ transactions daily with 15% performance improvement
Designed real-time streaming with sub-second latency for fraud alerts
Deployed TigerGraph modeling 50M+ relationships for fraud ring detection
Built feature engineering pipelines and Feature Store for ML model training
Reduced query time by 40% on 5TB+ data with Delta Lake Z-ordering

PySparkKafkaTigerGraphdbtAirflowDelta Lake

2021 — 2023

Big Data Engineer

Standard Chartered Bank

Engineered big data architecture processing 5TB+ daily, reducing query time by 50%
Launched pipelines achieving 30% efficiency improvement
Created orchestration with 50+ DAGs and SLA monitoring
Enhanced Spark jobs reducing costs by 20% through optimization

HadoopSparkDatabricksSnowflakeAirflowdbt

2015 — 2021

System Operations Senior Analyst

Wells Fargo

Developed end-to-end data pipelines serving 500+ users
Improved workflow efficiency by 30% with Azure Databricks
Crafted Snowflake data models and star schema for regulatory reporting
Reduced data incidents by 45% with quality monitoring dashboards

PySparkSnowflakeAirflowKafkaDatabricks

Key Projects

Featured Project

Real-Time Fraud Detection Platform

Bank of America

Built a comprehensive data platform processing 10M+ transactions using TigerGraph, Kafka, Spark, Delta Lake, Airflow, and Feature Store. Supplied features to ML team.

25%Capture Rate

30%Less False Positives

10M+Daily Txns

TigerGraphKafkaSparkDelta LakeAirflowFeature Store

Real-Time Transaction Monitoring

Standard Chartered Bank

Orchestrated Kafka streaming pipeline processing 1M+ events/hour with Spark Structured Streaming, Delta Lake, and Databricks.

1M+Events / Hour

KafkaSpark StreamingDelta LakeDatabricksAirflow

Data Warehouse Modernization

Wells Fargo

Spearheaded migration to Snowflake data warehouse using dbt, Airflow, PySpark. Transformed legacy systems into modern, scalable architecture.

60%Faster Queries

35%Cost Reduction

SnowflakedbtAirflowPySparkSQL

Education & Certifications

Education

Master of Science in Computer Science

AI & Machine Learning

Woolf University, Malta

2024

Bachelor of Engineering in Computer Science

VTU, Bengaluru

2014

Certifications

DP-700

Microsoft Fabric Data Engineer

2025

AWS Solutions Architect

2018

AWS Cloud Practitioner

2023

Get In Touch

Have a project? Need data engineering expertise? Let's connect.

awaajj@gmail.com

Location

Bengaluru & Gurgaon, India

Connect on LinkedIn

Open to consulting and collaboration opportunities

Name

Subject

Message