How to include 3rd party (Maven) dependencies to Spark jobs submitted from...
Spark applications may depend on third-party Java or Scala packages stored in Maven repository, and these packages can be included by “–packages” parameter when submitting a Spark job. For example, if...
View ArticleSample AWS Lambda Function to Monitor Oracle Database
I wrote a very simple AWS Lambda function to demonstrate how to connect an Oracle database, gather the tablespace usage information, and send these metrics to CloudWatch. First, I wrote this lambda...
View ArticleAmazon QLDB and the Missing Command Line Client
Amazon Quantum Ledger Database is is a fully managed ledger database which tracks all changes of user data and maintains a verifiable history of changes over time. It was announced at AWS re:Invent...
View ArticleQuery a HBASE table through Hive using PySpark on EMR
In this blog post, I’ll demonstrate how we can access a HBASE table through Hive from a PySpark script/job on an AWS EMR cluster. First I created an EMR cluster (EMR 5.27.0, Hive 2.3.5, Hbase 1.4.0)....
View ArticleLambda Function to Resize EBS Volumes of EMR Nodes
I have to start by saying that you should not use EMR as a persistent Hadoop cluster. The power of EMR lies in its elasticity. You should launch an EMR cluster, process the data, write the data to S3...
View ArticleHow to Use AWS S3 bucket for Spark History Server
Since EMR Version 5.25, it’s possible to debug and monitor your Apache Spark jobs by logging directly into the off-cluster, persistent, Apache Spark History Server using the EMR Console. You do not...
View ArticleHow to Use IAM authentication for RDS PostgreSQL with Glue ETL Jobs
Amazon RDS enables you to use AWS Identity and Access Management (IAM) to manage database access for Amazon RDS for PostgreSQL DB instances. It’s possible use the IAM authentication with Glue...
View ArticleUse Snowflake and Zepl to Analyse Covid-19 (coronavirus) Data
Coronavirus changed our life, most of us are stuck at home. We are trying to follow everything about the pandemic. So I wanted to write a blog post which will guide to configure an environment that you...
View Article