Deepak Sanagapalli Hadoop Developer Email:
[email protected] [email protected] Phone no: (614)-726-2750 PROFESSIONAL SUMMARY:
8+ years of overall experience in IT Industry which includes experience in Java, Big data technologies and web applications in multi-tiered environment using Java, Hadoop , Hive, HBase, Pig, Sqoop, J2EE (Spring, JSP, Servlets), JDBC, HTML, Java Script(Angular JS). Working knowledge with various other Cloudera Hadoop technologies ( Impala, Sqoop, HDFS, SPARK , SCALA etc) 4 years of comprehensive experience in Big Data Analytics. Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts. Expertise in Apache Spark Development ( Spark SQL, Spark Streaming, MLlib, GraphX, Zeppelin, HDFS, YARN, and NoSQL). Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure infrastructure of Hadoop Cluster along with CDH3&4 clusters. Worked on designed and implemented a Cassandra based database and related web service for storing unstructured data. Experience on NoSQL databases including HBase, Cassandra. Designed and implemented a Cassandra NoSQL based database and associated restful web service that persists high-volume user profile data for vertical teams. Experience in building large scale highly available Web Applications .Working knowledge of web services and other integration integration patterns. Experience in managing and reviewing Hadoop log files. Experience in using Pig, Hive, Scoop and Cloudera Manager. Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Hands on experience in RDBMS, and Linux shell scripting Extending Hive and Pig core functionality functionality by writing custom UDFs.
Experience in analyzing data using HiveQL, Pig Latin and Map Reduce.
Developed MapReduce jobs to automate transfer transfer of data from HBase.
Knowledge in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
Knowledge of data warehousing and ETL tools like Informatica and Pentaho. Experienced in Oracle Database Design and ETL with Informatica.
Mentored, coached, cross-trained junior developers by providing domain knowledge, design advice.
Proven ability in defining goals, coordinating teams and achieving results.
Procedures, Functions, Packages, Views, materialized views, function based indexes and Triggers, Dynamic SQL, ad-hoc reporting using SQL. Business Intelligence (DW) applications. Worked hands on ETL process
Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper, of NoSQL databases such as HBase, Cassandra, and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig. Extensive experience in using MVC architecture, Struts, Hibernate for developing web applications using Java, JSPs, JavaScript, HTML, jQuery, AJAX, XML and JSON. Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC. Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support. Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
EDUCATION:
Bachelors of Technology from Jawaharlal Technological University, Hyderabad, India TOOLS AND TECHNOLOGIES:
C, C++, Java, Python, Scala, Shell Scripting, SQL, PL/SQL Core Java, Spring, Servlets, SOAP/REST services , JSP, JDBC, SML, Hibernate. J2EE Technologies HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Impala, Cassandra, Oozie, BigData Ecosystem Zookeeper, Flume, Ambary, Storm, Spark and Kafka. NoSQL, Oracle 10g/11g/12C, SQL Server 2008/2008 R2/2012/2014/2016/2017, Databases MySQL 2003-2016. Oracle SQL Developer, MongoDB, TOAD and PLSQL Developer Database Tools HTML5, JavaScript, XML, JSON, jQuery, Ajax Web Technologies Web Logic, Web Sphere, Apache Cassandra, Tomcat Web Services Eclipse, NetBeans, WinSCP. IDEs Windows, UNIX, Linux (Ubuntu), Solaris, Centos, Ubuntu, Windows Server Operating systems 2003/2006/2008/2009/2012/2013/2016. Version and Source Control CVS, SVN, Clear Case IBM WebSphere 4.0/5.x/8.5/9.0, Apache Tomcat 4.x/5.x/6.x/7.0/8.x/9.0, JBoss Servers 3.2/4.0/5.1/7.1/8.0/9.0/10.1 MVC, Struts, Log4J, Junit, Maven, ANT, Web Services. Frameworks Programming Languages
PROFESSIONAL EXPERIENCE: Client: Nationwide Insurance, Columbus, OH Role: Hadoop Developer
Jan 2016 – Till date
Description: Nationwide Insurance is a leading company which provides many types of
insurances which includes commercial auto insurance, Health insurance etc. We process huge chunks of customer data into structure formats using various tuning techniques and develop web application to facilitate a leading commercial auto insurance company(Nationwide) for
validating insurance claims. The application assigns a claim n umber with the customer’s details bound to it which is sent to a claim administrator and then are assigned to the officers to process claims. Responsibilities:
Installed and Configured multi-nodes fully distributed Hadoop cluster.
Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark. Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake. Worked with NoSQL databases like Base to create tables and store the data Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis. Developed custom aggregate functions using Spark SQL and performed interactive querying.
Wrote Pig scripts to store the data into HBase
Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL Stored the data in tabular formats using Hive tables and Hive Sere. Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team. Experienced on loading and transforming of large sets of structured, semi structured and unstructured data. Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase). Involved in Installing Hadoop Ecosystem components.
Responsible to manage data coming from different sources. Setup Hadoop Cluster environment a dministration that includes adding and removing cluster nodes, cluster capacity planning and performance tuning. Written Complex Map reduce programs.
Involved in HDFS maintenance and administering it through Hadoop-Java API
Configured Fair Scheduler to provide service level agreements for multiple users of a cluster Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop. Involved in writing Java API’s for interacting with HBase
Involved in writing Flume and Hive scripts to extract , transform and load data into Database
Used HBase as the data storage
Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing. Experienced in installing, configuring and using Hadoop Ecosystem components. Experienced in Importing and exporting data into HDFS and Hive using Sqoop. Knowledge in performance troubleshooting and tuning Hadoop clusters.
Participated in development/implementation of Cloudera Hadoop environment. Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
Installed and configured Hive and also written Hive UDFs and Used Map Reduce and Junit for unit testing. Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to Hive and impala.
Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts. Developed and delivered quality services on-time and on-budget. Solutions developed by the team use Java, XML, HTTP, SOAP, Hadoop, Pig and other web technologies. Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability. Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW. Monitored and managed the Hadoop cluster using Apache Ambary Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment: Java, Hadoop, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g/11g/12C, Teradata, Cassandra, HDFS, Data Lake, Spark, MapReduce, Ambari, Cloudera, Tableau, Snappy, Zookeeper, NoSQL, Shell Scripting, Ubuntu, Solar. Client: AT&T, Middletown NJ Role: Hadoop Developer
Jan 15 -Dec 2015
Description: The project is to build a dashboard to display visual data moreover in the form of charts which is used by VP’s and senior VP’s inside AT&T. After login into secure global page this dashboard will demonstrate the information related to projects which are being monitored under them. Responsibilities:
Responsible for building scalable distributed data pipelines using Hadoop. Used Apache Kafka for tracking data ingestion to Hadoop cluster.
Wrote Pig scripts to debug Kafka hourly data and perform daily roll ups.
Data Migration from existing Teradata systems to HDFS and build datasets on top of it.
Built a framework using SHELL scripts to automate Hive registration, which does dynamic table creation and automated way to add new partitions to the table. Designed Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets. Setup and benchmarked Hadoop/HBase clusters for internal use. Developed Simple to complex MapReduce programs. Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms. Developed Oozie workflows that chain Hive/MapReduce modules for ingesting periodic/hourly input data. Wrote Pig & Hive scripts to analyze the data and detect user patterns. Implemented Device based business logic using Hive UDFs to perform ad-hoc queries on structured data. Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers. Prepared Avro schema files for generating Hive tables and shell scripts for executing Hadoop commands for single execution. Continuously monitored and managed the Hadoop cluster by using Cloudera Manager. Worked with administration team to install operating system, Hadoop updates, patches, version upgrades as required.
Developed ETL pipelines to source data to Business intelligence teams to build visualizations. Involved in unit testing, interface testing, system testing and user acceptance testing of the workflow Tool.
Environment: Cloudera Manager, Map Reduce, HDFS, Pig, Hive, Sqoop, Apache Kafka, Oozie, Teradata, Avro, Java (JDK 1.6), Eclipse. Client: VISA Inc, Wellesley, MA Role: Hadoop Developer
May 2013 – Dec 2014
Description: Visa is a global payments technology company that connects consumers, businesses, banks and governments in more than 200 countries and territories, enabling them to use electronic payments instead of cash and checks. Responsibilities:
Created dashboards according to user specifications and prepared stories to provide an understandable visions. Resolving User Support requests Administer and Support Hadoop Clusters Loaded data from RDBMS to Hadoop using Sqoop
Providing solutions to ETL/Data warehousing teams as to where to store the intermediate and final output file in the various layers in Hadoop Worked collaboratively to manage build outs of large data clusters.
Helped design big data clusters and administered them.
Worked both independently and as an integral part of the development team.
Communicated all issues and participated in weekly strategy meetings. Administered back end services and databases in the virtual environment.
Implemented system wide monitoring and alerts.
Implemented big data systems in cloud environments. Created security and encryption systems for big data.
Performed administration troubleshooting and maintenance of ETL and ELT processes Collaborated with multiple teams for design and implementation of big data clusters in cloud environments Developed PIG Latin scripts for the analysis of semi structured data.
Developed and involved in the industry specific UDF (user defined functions)
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Used Sqoop to import data into HDFS and Hive from other data systems.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager. Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
Developed Hive queries to process the data for visualizing.
Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster. Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
Developed a custom file system plugin for Hadoop to access files on data platform.
The custom file system plugin allows Hadoop Map Reduce programs, HBase, Pig, and Hive to access files directly.
Experience in defining, designing and developing Java applications, specially using [Map/Reduce] by leveraging frameworks such as Cascading and Hive. Teradata vast knowledge experience.
Extracted feeds from social media sites such Imported data using Sqoop to load data from Oracle to HDFS on a regular basis.
Developing scripts and batch jobs to schedule various Hadoop Programs.
Have written Hive Queries for data analysis to meet the business requirements.
Creating Hive Tables and working on them using Hive QL.
Hadoop
Environment: HDFS, Hive, ETL, PIG, UNIX, Linux, CDH 4 distribution, Tableau, Impala, Teradata, Pig ,Sqoop, flume, Oozie Client: GNS Healthcare - Cambridge, MA Role: Hadoop Developer
Aug 2012- Apr 2013
Description: Health record team of GNS Health initiative gathers patient/person information across all the data sources and creates Person record that will be used by downstream systems for running analytics against that data. Responsibilities:
Oversee the performance of Design to develop technical solutions from Analysis documents. Exported data from DB2 to HDFS using Sqoop.
Developed MapReduce jobs using Java API. Installed and configured Pig and also wrote Pig Latin scripts.
Wrote MapReduce jobs using Pig Latin.
Developed workflow using Oozie for running MapReduce jobs and Hive Queries. Worked on Cluster coordination services through Zookeeper.
Worked on loading log data directly into HDFS using Flume.
Involved in loading data from LINUX file system to HDFS.
Responsible for managing data from multiple sources.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data. Responsible to manage data coming from different sources. Assisted in exporting analyzed data to relational databases using Sqoop.
Implemented JMS for asynchronous auditing purposes.
Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters Experience in defining, designing and developing Java applications, specially using Hadoop [Map/Reduce] by leveraging frameworks such as Cascading and Hive.
Experience in Develop monitoring and performance metrics for Hadoop clusters.
Experience in Document designs and procedures for building and managing Hadoop clusters.
Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs. Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
Involved in Creating, Upgrading, and Decommissioning of Cassandra clusters.
Involved in working on Cassandra database to analyze how the data get stored. Successfully loaded files to Hive and HDFS from Mongo DB Solar.
Experience in Automate deployment, management and self-serve troubleshooting applications.
Define and evolve existing architecture to scale with growth data volume, users and usage.
Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services. Installed and configured Hive and also written Hive UDFs.
Experience in managing the CVS and migrating into Subversion.
Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Java (JDK 1.6). Client: Order fulfillment system Bengaluru, INDIA Role: Java Developer
Jan 2011- Jul 2012
Description: Cadence enables global electronic-design innovation and plays an essential role in the creation of today's integrated circuits and electronics. Customers use Cadence software and hardware, methodologies, and services to design and verify advanced semiconductors, printed-circuit boards and systems used in consumer electronics, networking and telecommunications equipment, and computer systems. Responsibilities:
Gathered user requirements followed by analysis and design. Evaluated various technologies for the client.
Developed HTML and JSP to present Client side GUI.
Involved in development of JavaScript code for client side Validations.
Designed the HTML based web pages for displaying the reports.
Developed the HTML based web pages for displaying the reports.
Developed java classes and JSP files.
Extensively used JSF framework.
Created Cascading Style Sheets that are consistent across all browsers and platforms
Extensively used XML documents with XSLT and CSS to translate the content into HTML to present to GUI.
Developed dynamic content of presentation layer using JSP.
Develop user-defined tags using XML.
Developed Cascading Style Sheets(CSS) for creating effects in Visualforce pages
Developed Java Mail for automatic emailing and JNDI to interact with the knowledge server.
Used Struts Framework to implement J2EE design patterns (MVC).
Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
Developed Enterprise java Beans like Entity Beans, session Beans (both Stateless and State full Session beans) and Message Driven Beans.
Environment: Java, J2EE 6, EJB 2.1, JSP 2.0, Servlets 2.4, JNDI 1.2, Java Mail 1.2, JDBC 3.0, Struts, HTML, XML, CORBA, XSLT, Java Script, Eclipse3.2, Oracle10g, Weblogic 8.1, Windows 2003. Client: Maruthi Insurance, Hyd, India
Mar 2009 - Dec 2010
Role: Java Developer Description: The project basically involved the automation of the existing system of generating Quotations for Automobile Insurance. The Insured Auto project has 2 modules, the User module “RetailAutoQuote” is used to register new customers of a group Insurance policy, generate Quotations for Automobile Insurance, edit details and place requests. Responsibilities:
Created the Database, User, Environment, Activity, and Class diagram for the project (UML).
Implement the Database using Oracle database engine.
Designed and developed a fully functional generic n-tiered J2EE application platformthe environment was Oracle technology driven. The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- RichFaces. Created an entity object (business rules and policy, validation logic, default value logic, security) Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features. Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ). Designing GUI prototype using ADF 11G GUI component before finalizing it for development. Used Cascading Style Sheet (CSS) to attain uniformity through all the pages
Create Reusable Component (ADF Library and ADF Task Flow) Experience using Version controls such as CVS, PVCS, and Rational Clear Case.
Creating Modules Using Task Flow with Bounded and Unbounded
Generating WSDL (Web Services) And Create Work Flow Using BPEL Handel the AJAX functions ( partial trigger, partial Submit,auto Submit)
Created the Skin for the layout.
Environment: Java core, Servlet, JSF, ADF Rich client UI Framework ADF -BC (BC4J) 11g, web services Using Oracle SOA (Bell), Oracle WebLogic.