The Perfect Beginner Setup For Learning SQL

Photo by Christian Englmeier on Unsplash

Two tools that have expedited my understanding of SQL and my passion for data are Elephant SQL and Mockaroo. This article will show you how to leverage both to optimize your learning path.

ElephantSQL provides managed PostgreSQL instances in the cloud. In addition, they offer a free tier that can store up to 20 MB of data (plenty sufficient for, say, a personal blog). The three most valuable aspects of ElephantSQL for a beginner are that:

  1. It’s free.
  2. It’s a breeze to set up.
  3. It allows you to run SQL queries and commands from your browser.

Mockaroo is an easy-to-use website that generates fake data which you can use for practice and testing. The process is simple: you define a table schema (column names and their data types), and then Mockaroo will produce fake data according to those specifications. Even better, you can choose SQL as the output format, which will generate the code necessary to create tables and insert data into them.

When combined, ElephantSQL and Mockaroo allow you to experiment with various SQL commands on fake data that mimic what you might find in the real world.

The Implementation Details

To understand how these tools fit together, we will set up a free ElephantSQL instance, generate two tables of sales data in Mockaroo, then add the data to our ElephantSQL database, and lastly, run a few basic queries against the data.

Setting Up An ElephantSQL Database

Elephant SQL is straightforward to set up. You won’t worry about configuration, installing packages, or downloading applications. The entire process, from end to end, includes creating an account and selecting what data center will host your database. It’s that easy. 💪🏻

Here’re are the general steps:

  1. Create an ElephantSQL account by heading over to https://customer.elephantsql.com/signup
  2. Choose your sign-up method via Google, Github, or Email.
  3. Once you’ve completed signing up, click the green “Create New Instance” button in the upper-right-hand corner of the screen.
  4. Give your instance a name.
  5. Now select a data center closest to your location. A quick note on choosing a data center: the data center is the physical location where your data will exist. Any time you or anyone else access the data from their computer, the information will travel from the data center to the requesting computer’s location. Even though the data nearly travels at the speed of light, selecting the data center closest to you and your users is a good idea.
  6. Make sure to select the tiny turtle plan, the free tier at the time of writing.

Voilà! We now have a fully managed PostgreSQL instance for our studies. After setting up the database, click the instance name, then click the “Browser” link in the left-hand toolbar. After the page loads, the text input at the top of the page is where you can input and run SQL commands.

Here is where you can write and execute SQL

Generating Mockaroo Data

Now we will use Mockaroo to generate some fake data that we can add to our newly minted database via SQL.

For this tutorial, let’s pretend we want to manage and analyze a sales contact strategy where sales reps make calls daily to potential clients. To represent this data, we will create the following two tables.

Table 1: sales_rep_calls

Here’s the first row of call data:

A sample row of the sales_rep_calls table

In this table, each row represents a sales call to a potential client. The outcome column will indicate whether the sales rep was able to close the sale.

Table 2: potential_clients

And here’s the first row of client data:

A sample row of the potential_clients table

In this table, each row represents a potential client. Here the information is more fundamental. We only have two columns, one identifying the client by name and one indicating if the client is a previous customer.

The client_id column in the sales_rep_call table maps to the id column in the potential_client table. Because these two columns are associated with one another, we can use them to join together the sales_rep_calls and potential_clients tables in our SQL queries. This table structure is called a relationship in the SQL world.

Configuring The Data With Mockaroo

Go ahead and open https://mockaroo.com/ in your web browser. After the page loads, you’ll see an interface to customize the field names and data types for the data generated by Mockaroo.

Here’s a screenshot of the settings I used to create the sales_rep_calls table:

A screenshot of Mockaroo’s schema settings

Notice that at the bottom, there is an option to select the output data format. We want to choose SQL, provide the table name (sales_rep_calls), and check “include CREATE TABLE”.

Mockaroo output formate options

Afterward, we can download the SQL code by clicking the “DOWNLOAD DATA” button at the bottom of the screen.

For the sake of brevity, I’ve added the Mockaroo SQL output for our two test tables to the links below. You can copy and paste from these links rather than generate the SQL on your own. Bookmark Mockaroo in your browser. You’ll have plenty of time to experiment with generating data on your own.

Create Table and Insert SQL for sales_rep_calls (200 rows)
Create Table and Insert SQL for potential_clients (10 rows)

Adding The Mockaroo Data To ElephantSQL

Now that we have the SQL for the two tables, we can flip back to ElephantSQL and execute the code produced by Mockaroo.

As mentioned previously, the ElephantSQL browser allows you to execute SQL. Therefore, what we need to do for each table is copy the code from the links above, paste it into the browser input, and then click execute.

After executing the code, you should see a brief success message if there were no errors. Now we can verify that the tables were created by doing a quick select all on each table. So go ahead and run the following SQL queries from the browser to do just that.

SELECT * FROM sales_rep_calls

This will pull down 200 rows of call data.

SELECT * FROM potential_clients

And this will pull down 10 rows of client data.

Analyzing the Data

We have set up a fully managed PostgreSQL database in a few simple steps and loaded it with practice data. Now the fun begins 🚀. We’re ready to use the data to ask and answer questions about sale activity.

Here are three basic questions, followed by an example of how to write each query.

  1. How many calls did each sales rep make?
  2. Which sales rep had the most sales?
  3. What clients received a call?

How many calls did each sales rep make?

Call volume is always top of mind in sales. So let’s see the query that will answer this question.

SELECT sales_rep,
Count(*)
FROM sales_rep_calls
GROUP BY sales_rep

Executing this query returns the following data:

Robert, Ginger, Paula, and Javier made roughly the same amount of calls.

Which sales rep had the most sales?

We’re adding a bit of complexity here, but we can add one small addition to the above query to answer this question.

SELECT sales_rep,
Count(*)
FROM sales_rep_calls
WHERE outcome = 'sale'
GROUP BY sales_rep

By adding WHERE outcome = 'sale', we filter the data to only include rows where a sale was successful. Then the group by clause will count the total sales for each rep.

What clients received a call?

To answer this question, we need to join the call data to the client data. More specifically, we need to join the sales_rep_calls table to the potiental_clients table by matching the client_id column (from sales_rep_calls) and the id column (from potiental_clients).

Here’s what the SQL query looks like:

SELECT sales_rep,
client_name,
outcome
FROM sales_rep_calls
JOIN potential_clients
ON client_id = potential_clients.id

If you are new to joins, this query may strike you as confusing. To simplify things, you can break this query down into two chunks, the SELECT clause, and the FROM clause.

  1. The SELECT clause allows you to indicate the columns you’d like to pull down.
  2. The FROM clause allows you to define the logic that will produce the table of data from which you pull down said columns.

In this case, the logic we define in the FROM clause joins our two tables together, in effect, it handles this segment of our query, which I described earlier:

…we need to join the sales_rep_calls table to the potiental_clients table by matching the client_id column (from sales_rep_calls) and the id column (from potiental_clients).

Wrapping Up

As you can see, using ElephantSQl combined with Mockaroo allows you to quickly hone your SQL skills on fake data in a low-stakes environment. Even though the two data tables we created were far more straightforward than what you’d encounter in the real world, the opportunities for data analysis are abundant.

Additionally, using Mockaroo to configure data types, and create relationships between mock tables, will give you room to conceptualize how data is stored and modeled in an SQL database. Taking advantage of these two tools will reduce the time it takes you to gain confidence in your data engineering and analytics skills.

For further practice, try answering the following questions using the ElephantSQL browser to query the data.

  1. Who was the first client called?
  2. Which client received the most calls?
  3. Which client made the most purchases?
  4. Which sales rep was the most productive (meaning, which had the best call to sales ratio)?

If you found this article helpful, go ahead and give me a follow. And as always, please share your questions and comments below! Thanks for reading.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jon McEwen

Jon McEwen

One part coder, one part mystic. I write mostly about data analytics, project management, and content creation.