I’m a seasoned infrastructure engineer with 3 years of experience. I have worked with team from big tech company like TikTok. Recently, I worked in ArtaFinance, a financial tech startups that provides RoboInvest service and private equity investment.
My expertise covers:
Work scope and contribution:
Ledger System: Architect-ed and developed ledger system that serves as source of truth for all transactions and implements strict business rules governing all money movements, balance sheet, & accounts states. The ledger system uses strong cryptographic data protection and tamper evident storage system for storing PII and business confidential information using Google Spanner backend.
Trade Order Processor: Architect-ed and built from ground-up, an event-driven system queue for processing ad-hoc trading and block trading orders. The system is capable for scheduling future events as well as processing event on demand. The system also records history of processed events and causes for entity changes. Scaled and optimizes system for concurrent processing to handle up to 2,000 tps.
Trading Readiness Check & Orders Safeties: Designed and built safeties rule engine check for trade order readiness and trade safeties for direct and block trade order safeties. Trade order readiness check that trades and accounts are reconciled the previous day. Safeties rule engine serves as last line of safeties defense against spurious trade orders being sent to our custodian rapidly in a short amount of time, such that human (FinOps) intervention would not be quick enough to prevent costly errors. Modules are implemented as middlewares and enforced in staging and production servers.
Trade Monitoring and Observability: Employed metrics service (Prometheus) for monitoring trading service health and systemic risk. Integrated alerts for trades issues and risk-safeties failures for quarantined trades to pager duty and slack channel.
Tech stack: Python, Typescript, Apache Airflow, Apache Beam, Google Dataflow, Google BigQuery, Google Kubernetes Engine (GKE)
Work scope and contribution:
Launched Bytehouse 1.0: A SaaS cloud-compute data warehousing platform by TikTok, Bytedance. Battle-tested by TikTok Ads engineering team.
SQL Gateway Service: Architected and built from ground-up a SQL Gateway (TCP and HTTP) service that processes and routes client SQL queries into virtual warehouse cluster. Optimized data transfer throughput to reach 655 MB/s data transfer rate from client to data warehouse storage and vice versa. Integrated monitoring, service profiling and distributed tracing for observability and SLOs evaluation.
Data Express System: Architected and built from ground-up a data orchestration system that runs asynchronous workload of data loading job that transfer data from customer data source into ByteHouse storage. Supports data inflow and outflow from file-upload, Kafka connect, AWS S3, and Hive with various data formats.
Virtual Warehouse Usage Billing: Architected and built data pipelines for capturing usage metering from customers’ virtual warehouse clusters and store metrics in InfluxDB. Serves usage data in billing dashboard and for auto-recurring payment charge. Optimizes time-series aggregate queries latency for retrieving summary of usages over bucket of time-windows down to P95 200 ms
Tech stack: Golang, Jaeger Tracing, InfluxDB, Prometheus, Grafanna, Victoria Metrics, Apache Kafka, Apache Spark, ClickHouse
Work scope and contribution:
OCR ML Model for KYC: Developed OCR models to parse Gojek Drivers Identity Card as part of KYC onboarding flow using simple OpenCV; improved model recall by 2% and latency to P95 400ms
Object Detection for Go Screen Ads: Trained Object Detection models for pedestrian scenes and deployed trained models to LCD screen device using ONNX and Torchscript. Benchmarked off-the-shelf model like Retina Net and Yolov2 Net and employed Kalman filtering to track detected objects once bounding boxes are generated.
MLFlow Wrapper: Merlin: Took part in development and release of Merlin python3 SDK to deploy ML instances to staging and production clusters in GCP. SDK helped orchestrate model releases, deployment, and evaluation. SDK is built on top of MLFlow library.
Tech stack: C++, Python, MLFlow, Pytorch, Tensorflow, GKE
Work scope and contributions
Zopim Automation Framework: Automation testing framework: Developed and maintained Zendesk end-end automation testing tools that manages and runs rigorously 60+ API contract tests and 100 scenarios for UI tests covering 6 different Zendesk product suites. Employed page-object model and factory that keeps tests code clean and independent of the UI changes for the page; only the page object needs to change. Written using Selenium framework.
Zopim Automation Dashboard: Test Reporting, Troubleshooting and Observability: Spear-headed test reporting and troubleshooting by building integration with Saucelabs in the testing framework and configured structured test logs, test metrics, page screenshots and test outcome reporting dashboard. Automation to hook Slack message notification and JIRA tickets generation upon test failures.
Jenkins Pipelines: Setup Jenkins in Kubernetes cluster and orchestrate Jenkins build automation pipeline for end-to-end testing in staging and production pods. Configured administrative and third-party auth credentials, setup identity-aware proxy for accessing Jenkins dashboard, and employed HPA to scale in/out Jenkins worker pods to balance between on-demand usage spike and cost.
Bachelor of Engineering - Electrical and Electronic Engineering. Specializing in Signal Processing.