Skip to content

twu-AWS/aws-emr-apache-ranger

 
 

Repository files navigation

Authorization and Auditing on Amazon EMR Using Apache Ranger

The repo provides reference architecture to deploy Apache Ranger on Amazon EMR. Apache Ranger is a framework to enable, monitor, and manage comprehensive data security across the Hadoop platform. Features include centralized security administration, fine-grained authorization across many Hadoop components (eg - Hadoop, Hive, HBase, Storm, Knox, Solr, Kafka, and YARN) and central auditing. It uses agents to sync policies and users, and plugins that run within the same process as the Hadoop component, like NameNode and HiveServer2.

The repo contains code tied to AWS Big Data Blog.

NOTE: the code has gone through unit and functional test against a few recent versions of Amazon EMR. It is likely that it may not work with all EMR versions. Code/plugin marked as beta has not been suitable for production use.

Please submit Pull Request or to create an Issue

Deployment options:

Module Description Architecture Details
V1 Open Source Ranger Plugins with LDAP Basic deployment using AWS Simple AD, Hive and HDFS plugins and optional Presto Plugin
V2 Open Source Ranger Plugins with Kerberos enabled cluster and AD Deploy a Kerberos enabled EMR cluster using Windows AD, Hive and HDFS plugins and optional Presto and HBase Plugin
V3 EMR Native plugins for Spark/S3 with Kerberos enabled cluster and AD Deploy a Kerberos enabled EMR cluster with the Amazon EMR native integration of Apache Ranger - Supports Hive, Spark and Amazon S3

Compatibility/Supported plugins:

Module Tag Region Region Code Cloudformation stack Apache Ranger Version EMR Version Supported Plugins
V1 1.0 All All Foo Apache Ranger 1.0, 2.1 emr-5.28.1, emr-5.29.0, emr-5.30.1 Hive 2.x, Hadoop 2.x, PrestoDB 0.227/0.232 (Presto plugin needs Ranger 2.0)
V1 1.1 All All Foo Apache Ranger 2.2 emr-5.29.0, emr-5.30.1, emr-6.1.0 Hive 3.x, Hadoop 3.x, PrestoSQL 338 OR PrestoDB 0.232
V2 2.0 US East (Virginia) us-east-1 Step1 - Setup VPC/AD server - Foo

Step 2 - Setup the Ranger Server/RDS Instance/EMR Cluster - Foo
Apache Ranger 2.1 emr-5.30.1, emr-6.1.0, emr-6.2.0 Hive 2.x, Hadoop 2.x, PrestoSQL 338/343, PrestoDB 0.227/0.232 (Presto plugin needs Ranger 2.0)
V3 EMR Ranger GA Launch 3.0 US East (Virginia) us-east-1 Step1 - Use this script to Upload SSL key and certs to AWS Secrets Manager Script
Step2 - Setup VPC/AD server Foo
Step 3 - Setup the Ranger Server/RDS Instance/EMR Cluster Foo
Apache Ranger 2.1 emr-5.32.0, emr-6.3.0, emr-6.4.0 Hive 2.x, Hadoop 2.x, Spark 2.x, Hive 3.x, Hadoop 3.x, Spark 3.x

WARNING: The current V1 setup does not enable strong cluster level Auth (Kerberos) for EMR. Only LDAP enabled Hue UI. V2 will support Kerberos - refer to the roadmap for details.


Module V3 - Native Support of Apache Ranger on Amazon EMR 5.32+

Apache Spark Plugin

Amazon S3 Plugin


Module V1 (1.1) PrestoSQL Ranger plugin (EMR 6.1 & Ranger 2.2)

Shows how the plugin can be used to enable column level access controls, column masking and row filter. Demo uses the Presto Redshift connector. The same functionality should work with other Presto connectors.

Please open Git Issues if you would like to see updates/other plugin integrations.

References:

Reporting Bugs

If you encounter a bug, please create a new issue with as much detail as possible and steps for reproducing the bug. See the Contributing Guidelines for more details.

License

This sample code is made available under a modified MIT license. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 65.6%
  • Python 30.5%
  • JavaScript 3.5%
  • PowerShell 0.4%