Data Engineer Summary
Hi! I’m a Data Engineer, designer, builder, and manager of Big Data infrastructures. I’m a collaborative engineer with 17+ years of experience designing and executing solutions for complex business problems involving large scale data warehousing, real-time analytics, and reporting solutions. Known for using the right tools when and where they make sense, as well as creating an intuitive architecture that helps organizations effectively analyze and process terabytes of structured and unstructured data.
SEP 2019 – PRESENT
- Big data architecture on AWS : We extract Petabytes of data every month from external data sources from both, structured and unstructured data types.
- We use bach and streaming data processes: For bach → AWS S3 → Glue + Athena → Redshift.
- For streaming we use Kafka: AWS Lambda / AWS Step Functions → AWS MSK → AWS Kinesis → S3 → Snowflake.
- On the Data Warehouse: We use Redshift and started migration to Snowflake where we’ve mainly Star Schema model type and using separated databases that defines the Landing, Stage and Final zones. On the last one we store our Fact-tables and Dimensions.
- We also implemented Apache Airflow for DAG’s automation that connects data from other services on AWS like Redshift – Kafka – Athena. There we store ours ETL using python, Pyspark, and Scala languages.
- ML Modeling: Using AWS EMR Notebooks→ time series analysis (with correlograms) to detect and assess intensity of change points or new trends in KPI’s, and match them with events (such as redefining user, which impacts user count). Click fraud and Botnet detection—taxonomy creation.
- CI/CD: We mainly use python with Alembic(SQL-Alchemy library) for SQL data migrations, and Circle CI + GitHub.
- For C-Level reports we use Power BI as main data visualization tool and started some others with Looker.
- AWS (Redshift, Lambda, Athena, S3, DynamoDB, Glue), PostgreSQL, Spectrum, Snowflake Platform, Jupyter Notebook, Python 3.8, Pyspark, Scala, Apache Airflow, Kafka, Power Bi, GitHub, Jira)
NOV 2017 – AUGUST 2019
- Builder of Power BI and Qlik View reports from heterogeneous data sources.
- Mentoring of Data Science courses for internal employees.
- Creation of Machine learning algorithms for reports and recommendation systems.
- Microsoft Azure architecture design for data stores and data lakes and Hybrid stores from on premises data stores.
- Currently using machine learning with python on azure and Anaconda tool for data visualization reports such as Power BI, Qlik View and Spotfire (legacy). Using GitHub and Azure DevOps as a repository.
- The main thing here is to make use of “What if” for predictive reports and dynamic dashboards reports.
- Clouds: AWS , Azure and GCP. (Depending on the business line) Microsoft Azure, Power BI, Qlik View, GitHub, Azure DevOps,ML, Python,SQL.
NOV 2015 – OCT 2017
By using the Oracle Big Data Appliance, I have created tons of reports with Python and SciPy, I have also created Machine learning algorithms for cases of study on “What if” forecast dashboards.
- Senior Principal Advanced Data Engineer for customer DIRECTV.
- Reviews customer incidents to ascertain cause, impact to the customer environment, remediation activities and identify continuous service improvements.
- Collects, creates, and provides regular reporting to the customer describing service results, value add, recommendations, and issue resolution.
- Works with the customer and internal Oracle team to ensure stability of connectivity between the customer data center and Oracle NOC.
- Monitors and tunes the system to achieve optimum performance levels in standalone and multi-tiered environments.
- Conducts system analysis, configuration management and develops improvements for system software performance, availability and reliability.
- Perform Incident resolution, Problem Determination and Root Cause Analysis in accordance with Service Level Agreements.
- Designs, develops, recommends and implements new or revised system software, utilities and automated processes as necessary.
- Provides technical expertise for system Transitions, Migrations and Consolidations.
- Oracle Cloud.
- Tools: Designed a large data warehouse using star schema, flow-flake.
- Designed and developed a Big Data analytics platform for processing customer viewing preferences and social media comments using Java, Hadoop, Hive.
JUN 2014 – OCT 2015
- Data leader at IBM with a team of 12 DBA’s from EEUU, India, Brazil, China and Egypt.
- We are responsible for Operation and Support of corporate financial applications for the E-pricer Project at IBM.
- Production supports readiness. Implement work and quality processes, Agile engineering methodologies and processes for Production Support services.
- Manage financial, material and human resources. Request resources and ensure they are in a timely manner.
- Ensure and maintain the daily operations of financial systems, meeting defined Service Levels Agreements.
- Collaboration with groups along all the organizations, to attend services requests or resolve problems.
- Initial implementation of work processes; ITIL engineering methodologies and processes for Maintenance, Requirement definition, Analysis and Design, Quality Assurance, Testing and Application Implementation. Additionally, configuration management and change control process.
- Assist the team members to ensure the proper implementation of internal methodology.
- Provide quality assurance for the services requested; Review functional specifications, technical specifications, unit test and handover. Ensure compliance with local and global standards into IBM standard.
- Create strategies and solutions to improve the daily operation.
- Consistently reduced operating costs.
- Support definition of new solutions and business proposals. Support for Project
- Consolidate financial information for the collection of HP services to customers.
- Main achievements:
- I performed the transition of IT services from IBM India to IBM local IT team. A smooth transition was achieved without interruptions while maintaining service levels expected by the client.
- I started the implementation of IBM Agile processes and working methods on the applications area. This will have established a standard work process under best practices in the IT industry.
- I assured the correct day-to-day operation of IBM applications fulfilling SLAs.
Data Engineer Technologies and tools
- Operating Systems: Windows, Linux, and UNIX.
- Programming Languages: Oracle PL/SQL, Hyperion Oracle, R, .NET, Java, HTML,
SQL Microsoft, Python, and Data science libraries: TensorFlow, SciPy, etc.
- Databases: Oracle, DB2, MongoDB, MS SQL Server, MS Access, Oracle, MySQL, PostgreSQL, Progress. Hadoop: HBase, Cassandra, etc.
- Applications: TOAD, Erwin, Star Team, Remedy, MS Office, MS Project, MS SharePoint, Visio, Oracle Financial, Oracle PeopleSoft*, Oracle JD-Edwards*, SQL Server Management Studio, Analysis Services, Reporting Services, Oracle Workbench, Oracle RMAN, Oracle Data-guard, SQLPLUS, Oracle SQL Developer. Oracle Exadata Machine.
- Hyperion, Web-logic, OBI, ODI, OAS. Exalogic, EBS, Golden Gate.
- Cloud: Verizon cloud; Amazon AWS: S3, EC2, EBS, RDS; Terremark DC.; Windows Azure; IBM Soft-layer; Oracle Cloud: SAAS, PAAS, IAAS; IBM Bluemix.
- Google Cloud.
- Hadoop distribution: Cloudera, Apache Hadoop, Map R. Main Tools: Hive, Pig and Hadoop Streaming, Map-reduce, R, SPSS, SAS, Python, MATLAB, CUDA Dev NVIDIA.
- Cloud: AWS EC2, S3, ECS. Docker, Ansible, Terraform. Puppet.
- Linux: Wide variety of distributions as internet servers, file servers, etc.
The Sonatafy Advantage
The Company’s software service offerings are widely recognized as “best of breed”, targeting enterprises ranging from Small to Medium Sized businesses through Fortune 500 companies to better help:
- Maximize the entire software development life-cycle investment
- Executives and managers trust the development process, which helps reduce micromanagement, allowing increased focus on day-to-day business decisions
What makes Sonatafy unique?
- US based management providing thought leadership, oversight and consulting
- Speed of proposals and solution delivery
- Low employee attrition rate <7%
- Operating history and marquee clients
- Proven recruiting and screening systems/processes
- Align solution to client needs
- Ongoing management / oversight of career growth and continued education for placed resources, which is unique in our industry
- We offer comprehensive code audits
While Sonatafy works well in any industry, the Company has achieved significant traction and domain expertise in several high-profile sectors such as:
- Healthcare, Financial Services, Software Services, SaaS and Consumer Products