Quantcast
Channel: Gokhan Atil – Gokhan Atil’s Blog
Browsing all 108 articles
Browse latest View live

How to include 3rd party (Maven) dependencies to Spark jobs submitted from...

Spark applications may depend on third-party Java or Scala packages stored in Maven repository, and these packages can be included by “–packages” parameter when submitting a Spark job. For example, if...

View Article


Image may be NSFW.
Clik here to view.

Sample AWS Lambda Function to Monitor Oracle Database

I wrote a very simple AWS Lambda function to demonstrate how to connect an Oracle database, gather the tablespace usage information, and send these metrics to CloudWatch. First, I wrote this lambda...

View Article


Image may be NSFW.
Clik here to view.

Amazon QLDB and the Missing Command Line Client

Amazon Quantum Ledger Database is is a fully managed ledger database which tracks all changes of user data and maintains a verifiable history of changes over time. It was announced at AWS re:Invent...

View Article

Query a HBASE table through Hive using PySpark on EMR

In this blog post, I’ll demonstrate how we can access a HBASE table through Hive from a PySpark script/job on an AWS EMR cluster. First I created an EMR cluster (EMR 5.27.0, Hive 2.3.5, Hbase 1.4.0)....

View Article

Image may be NSFW.
Clik here to view.

Lambda Function to Resize EBS Volumes of EMR Nodes

I have to start by saying that you should not use EMR as a persistent Hadoop cluster. The power of EMR lies in its elasticity. You should launch an EMR cluster, process the data, write the data to S3...

View Article


How to Use AWS S3 bucket for Spark History Server

Since EMR Version 5.25, it’s possible to debug and monitor your Apache Spark jobs by logging directly into the off-cluster, persistent, Apache Spark History Server using the EMR Console. You do not...

View Article

How to Use IAM authentication for RDS PostgreSQL with Glue ETL Jobs

Amazon RDS enables you to use AWS Identity and Access Management (IAM) to manage database access for Amazon RDS for PostgreSQL DB instances. It’s possible use the IAM authentication with Glue...

View Article

Image may be NSFW.
Clik here to view.

Use Snowflake and Zepl to Analyse Covid-19 (coronavirus) Data

Coronavirus changed our life, most of us are stuck at home. We are trying to follow everything about the pandemic. So I wanted to write a blog post which will guide to configure an environment that you...

View Article

Browsing all 108 articles
Browse latest View live