Part 1: Rails API
Initial project setup
I added additional gems to the Gemfile
Install the gems via: bundle install
I added a basic CORS configuration to the file: config/initializers/cors.rb
. This allows the React frontend to make API requests.
I executed rails active_storage:install
and rake db:migrate
to create the necessary database migrations for Active Storage.
I added a migration to create a pictures table, and executed rake db:migrate
.
I added the Picture model (file: app/models/picture.rb
). The model implements methods for JSON serialization and defines a single Active Storage attachment. The JSON contains an attachment_url with a resized [200, 200] variant.
The controller (file: app/controllers/pictures_controller.rb
) implements the index and create methods.
Last I added the picture controller routes to the file: config/routes.rb
Part 2: Testing
From the console I entered a directory containing test images to upload.
Next I setup RSpec for unit tests. I executed rails generate rspec:install
to generate the configuration files.
I added a DatabaseCleaner strategy and included FactoryBot methods in the file: spec/rails_helper.rb
I added a FactoryBot factory for the picture model, file: spec/factories/pictures.rb
, and copied the Rails logo into spec/fixtures/files/
Here is a sample controller test, file: spec/controllers/pictures_controller_spec.rb
I executed rspec
to ensure the tests run successfully.
Part 3: React front end
I created a new React project.
I included the bootstrap CSS, file: src/index.js
I added a constants file to define the API host URL, new file: src/constants.js
I revised the main App component to include my the Pictures component, file: src/App.js
I created a basic Pictures component (file: src/Pictures.js
). On mount, is loads the existing pictures from the API and renders them in a defined number of columns. It also provides a file input which submits (on change) to create a new picture via the API.
Last I added a bit of CSS to improve the pictures layout, file: src/App.css
The React front end can be started via: npm start
.
A screenshot:
]]>I installed Java JDK from Orcacle; Spark, Hadoop, Postgresql, and Scala via homebrew; and downloaded Apache Zeppelin manually.
Part 1: Hadoop Setup
Ensure you can ssh to localhost without a password.
Changes I made to Hadoop configuration files, located in $HADOOP_CONF_DIR
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
Prepare HDFS and start Hadoop services
Test HDFS, Hadoop, MapReduce:
Part 2: Postgresql Setup
Ensure Postgresql is running
Create a Postgresql database and user for Spark development
Part 3: Apache Zeppelin Setup
I encountered an issue installing Apache Zeppelin via homebrew, so I manually downloaded the full package.
Part 4: Scala and Spark development via Zeppelin
I added my postgresql credentials for the jdbc interpreter
I added a new Zeppelin Notebook (with default interpreter: spark/scala) and began adding paragraphs.
In my first paragraph, I included the postgresql JDBC driver jar, ex:
Ensure the Postgresql driver is loaded
Create 2 Postgresql tables which I plan to populate and join for demonstration purposes
Define credentials in Scala code and create an initial JDBC DataFrame connection variable for reading and writing
Create a list of Account names as a DataFrame
Write Accounts DataFrame to Postgresql table
Load Accounts (with IDs) into a new DataFrame
Collect a list of Account IDs to populate foreign keys in the the other table
Create a function to randomly select an Account ID
Create a list of Report names and a function to select a random name
Generate a million row DataFrame containing randomized Account IDs and Report names to populate the Reports table
Write DataFrame rows to the Postgresql Reports table
Load the Reports table into a DataFrame
Using Spark/SQL to join the tables together
Using Spark/Scala to join the tables together
Showing an aggregate count of Report records per Account
Joining and counting the records
For the final Spark operation, write out a CSV file to HDFS containing Report data for each Account.
Inspect CSV files in HDFS
Copy HDFS CSV files to local filesystem
Count records in each CSV file (the extra ten rows are the CSV headers)
Ensuring each CSV file was partitioned by Account
Inspecting the contents of a CSV file
]]>Part 1: miniDC/OS installation
Initial installation via Homebrew.
Checking network setup.
Create local DCOS cluster.
At this point, the web interface should be accessible (ex: http://172.17.0.3), but you will need to authenticate using the dcos cli.
Part 2: DCOS CLI and cluster setup
Install the DCOS CLI tool via Homebrew and setup the DCOS instance.
At this point you should be able authenticate to the web interface.
Mesosphere DC/OS Dashboard:
Show nodes:
In addition, telemetry URL and health report: http://172.17.0.3/system/health/v1/report
Part 3: Spark package installation
I used Spark to demonstrate installing a package.
Viewing Spark from the services page
The Marathon UI can be accessed directly or from the services page
Part 4: Deploying a Marathon Pod
To demonstrate deploying a Marathon Pod I created 3 containers (Rails API, Postgresql, and Nginx). I put the full source code for these containers on GitHub. I also provided a docker-compose file to test the container connectivity outside DCOS/Marathon.
I created a script to build each Docker container, tag, and push to Docker Hub. file: rails-stack/build-images.sh
I created an example pod JSON file for the three containers. file: rails-stack/rails-stack-pod.json
Deploying the Marathon Pod and testing container functionality
Viewing the Rails stack service in DCOS
…Next part coming soon!
]]>Part 1: Rails API
Scaffold Rails project
Create Post model with title and body
Update migration to not allow null in fields. Edit file: db/migrate/SOMEDATE_create_posts.rb
Execute rake db:migrate
to create posts table.
Add basic model validation, edit file: app/models/post.rb
Create controller
Update generated controller file and set required params and remove location: @post
from create method. edit file: app/controllers/api/posts_controller.rb
Add API namespaced controller routes, edit file: config/routes.rb
Enable CORS for frontend acces by adding gem 'rack-cors'
to Gemfile and executing bundle install
. Update CORS initializer, file: config/initializers/cors.rb
Start Rails API via: rails s -p 3000 -b 0.0.0.0
Test API endpoints via CURL
Part 2: React frontend
To scaffold the React frontend I decided to use reactstrap for Bootstrap 4 components and React Router for navigational components.
To get started I added a JS module to handle all Rails API calls using the Fetch API. Each module export method returns an Array containing error (Boolean) and data (or errors). new file: src/Api.js
Next I updated the main App file and integrated with React Router. It defines a Router component with a list of Routes mapping to components. edit file: src/App.js
Next is the top level Posts component. It fetches existing posts, conditionally renders the PostsTable, and provides a button to add a new post. new file: src/Posts.jsx
Here is the PostsTable component; it utilizes ReactStrap for Bootstrap form components and provides a link to edit and delete each post. new file: src/PostsTable.jsx
Here is the PostForm component; it is used for editing and creating posts. On mount it conditionally (from passed params) fetches the existing post and sets the intial state. As the user enters field data onChange callbacks set the state of the component, and onSubmit the API is called to save the post. new file: src/PostForm.jsx
Below is the PostDelete component. It simply calls the Api delete method and redirects the user back to the Posts component. new file: src/PostDelete.jsx
Last I updated the index.js file to add the Bootstrap CSS include, edit file: src/index.js
Frontend screenshot
]]>I first defined a RedisBase parent class that the workers and producer with inherit from. It contains all the Redis client methods. On initialize it creates a connection to Redis from environment variables. new file: redis_base.rb
I defined the producer class (RedisProducer) below. In a loop, it queues work by pushing a task into the work queue, and then publishes to the pub/sub channel to inform subscribers there is new work to complete. new file: producer.rb
Next I defined the worker class (RedisWorker). On initialize, it creates a pub/sub client, checks if there is incomplete work to resume, and then subscribes to the pub/sub channel for new work tasks. new file: worker.rb
I created a monitor script to show queued tasks and the completed tasks for each worker. new file: monitor.rb
Here is a Dockerfile definition to run the workers and producer Ruby code, new file: Dockerfile
I defined a docker compose file to start Redis, a producer, and 10 workers. new file: docker-compose.yml
I started the docker containers via compose and then executed the monitor script inside the producer container to show the results.
]]>First I created a RabbitMQ base class to contain shared functionality between the producer and workers. On initialize, the base class waits for the RabbitMQ and Elasticsearch services to be available before starting. file: rabbitmq_base.rb
The producer subclass publishes a set number of tasks to complete and then exits. file: producer.rb
The worker subclass subscribes to the queue, checks if the task matches an available worker method, and then generates a person document in Elasticsearch. file: worker.rb
I create a ruby-based Dockerfile for the producer and workers, file: Dockerfile
I used docker compose to create a cluster of services. I implemented a deploy/replicas configuration to spin up 10 worker apps to distribute the load. file: docker-compose.yml
Here are the commands I executed to run the apps and verify the results:
]]>Project setup
Extraction script, file: extract.py
Example usage:
My test image:
Output images:
person_1.jpg
person_2.jpg
person_3.jpg
]]>A pull request should be small and enforce the single responsibility principle; as in the “S” in “SOLID”. If your pull request is too complex, separate functional components into multiple pull requests.
Create a pull request template for your repository. This will help users fill in important information when creating a new pull request.
Example file: .github/pull_request_template.md
, contents:
Provide import information in a pull request description, answering the following questions:
If the feature change can be shown visually, provide a screenshot or GIF. I use LICEcap to capture functionality in an animated GIF.
Integrate your Github project with continuous integration service(s); example: Jenkins, CodeShip, CircleCI, Travis. A pull request should have a successful build before it is reviewed by your team.
Code tips:
?w=1
to the URLUse code blocks and syntax highlighting to make code comments more legible (using triple backticks).
Preview:
Use tasks lists in your pull request description to track progress.
Preview:
Fully utilize the functionality in the discussion sidebar. Request reviewers relevant to your project and utilize the pull request reviews workflow. Assign team members to a pull request who are responsible. Define workflow labels to track the status of a pull request (ex: On hold, Do not review, help wanted).
Mention @somebody in comments to involve another Github user in conversation.
As a reviewer of a pull request, add inline code comments from the Files changed tab. Additional commits can be pushed to the PR branch. When the feedback has been addressed, click on “Resolve conversation” from the Conversation tab. A running conversation on a pull request is good collaboration, and the history can be helpful in the future. When the pull request has been approved, ensure to squash all commits and rebase before merging.
]]>Initial project setup
I created a Terraform file to setup the backend S3 state configuration and AWS provider version, new file: main.tf
Terraform file for configurable parameters, variables.tf
Terraform S3 bucket and bucket notification (to lambda), s3.tf
This following file defines the lambda resources, its IAM role, policy, and permissions. file: lambda.tf
Below is the NodeJS lambda script. It pulls environment variables, defines the exports handler, receives the S3 bucket notification event, collects the metadata from the S3 object/file path, makes a S3 HEAD request to get the S3 metadata, and publishes to the SNS topic, file: meta_lambda.js
Terraform SNS topic and filtered SQS subscription, file: sns.tf
Terraform SQS queue and its IAM policy, file: sqs.tf
I put my configuration variables in secrets.auto.tfvars
Here is a BASH script to pass environment variables to the Terraform backend configuration, and execute Terraform init, plan, and apply. file: main.sh
To test SQS queue delivery I created an SQS client file, add SQS NPM dependency:
And created the NodeJS script, file: sqs-client.js
I executed the terraform apply script, pushed a file to S3 with metadata, and executed the SQL client script to E2E test this functionality:
]]>I used gradle as the build tool and for dependency management. I created a new project via: gradle init
.
I added the dependencies to the build file: kafka-log-aggregator/build.gradle
I created a class to represent a log entry consisting of a code, message, and the aggegate count. It uses Gson to deserialize and serialize to json. new file: kafka-log-aggregator/src/main/java/LogEntry.java
Next I create the LogAggregator class which will be used by the Kafka streams app to contain and aggregate all the log entries, file: kafka-log-aggregator/src/main/java/LogAggregator.java
The LogAggregator class has to implement a serializer class to convert to a byte array. file: kafka-log-aggregator/src/main/java/LogAggregatorSerializer.java
And here is the class to deserialize from the byte array. file: kafka-log-aggregator/src/main/java/LogAggregatorDeserializer.java
Next I created the main class to build and run the log aggregator kafka streams app. file: kafka-log-aggregator/src/main/java/LogAggregatorApp.java
I added a scala unit test to ensure the aggregation of logs works as planned. file: kafka-log-aggregator/src/test/scala/LogAggregatorAppTest.scala
I create a simple Kafka producer ruby script to pipe messages onto the topic, wait a while (in this case a minute for the next session window), and pipe some more. file: kafka-log-aggregator/ruby/producer.rb
At this point I was ready to start Zookeeper, Kafka, and build/run the streams app:
I develop on a Mac using Brew or Docker, here is my environment for this post:
]]>