Skip to main content

· 10 min read

Introduction to SQL

SQL is a standard language designed for managing data in relational databases. It's commonly used to query, insert, update, and modify data. Most RDBMS (Relational Database Management System) like MySQL, SQLite, Oracle, and PostgreSQL use SQL.

As a data analyst, you'll often work with large volumes of data stored in these databases. SQL becomes an essential tool to retrieve, manipulate, and analyze this data.

1.1 RDBMS and Tables

In SQL, data is stored in tables, just like an Excel spreadsheet. A table is made up of rows (records) and columns (fields). Here's an example of a table, Employees:

EmployeeIDFirstNameLastNamePosition
1JohnDoeAnalyst
2JaneDoeEngineer
3MaryJohnsonManager

2. Basic SQL Syntax

Let's look at the fundamental SQL commands: SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY.

2.1 SELECT and FROM

The SELECT statement is used to select data from a database, and the FROMstatement specifies which table to get the data from.

SELECT FirstName, LastName 
FROM Employees;

This query retrieves all first and last names from the Employees table. If you want to select all columns, use the * symbol:

SELECT * FROM Employees; 

2.2 WHERE

The WHERE clause is used to filter records:

SELECT * 
FROM Employees
WHERE Position = 'Analyst';

This query retrieves all data for employees who are analysts.

2.3 GROUP BY and HAVING

GROUP BY groups rows that have the same values in specified columns into aggregated data. HAVING is used instead of WHERE with aggregated data.

SELECT Position, COUNT(*) 
FROM Employees
GROUP BY Position
HAVING COUNT(*) > 1;

This query shows positions held by more than one employee.

2.4 ORDER BY

ORDER BY is used to sort the data in ascending or descending order:

SELECT * 
FROM Employees
ORDER BY LastName ASC;

This query sorts employees by their last name in ascending order.

3. Querying Data

The SELECT statement is not just for selecting simple rows. We can use it to perform calculations, concatenations, and more.

SELECT FirstName || ' ' || LastName as FullName, Position
FROM Employees;
This query concatenates the first and last names, separated by a space, and displays it as FullName.

4. Filtering and Sorting Data

Apart from WHERE and ORDER BY, SQL offers BETWEEN, LIKE, and IN to filter data.

4.1 BETWEEN

BETWEEN is used to filter by a range:

SELECT * 
FROM Orders
WHERE OrderDate BETWEEN '2023-01-01' AND '2023-12-31';

This query selects all orders placed in the year 2023.

4.2 LIKE and ILIKE

LIKE is used in a WHERE clause to search for a specified pattern in a column. The "%" sign is used to define wildcards (missing letters) both before and after the pattern. Also, note that LIKE is case sensitive. ILIKE can be used for case-insensitive search.

SELECT * 
FROM Employees
WHERE FirstName LIKE 'J%';

This query selects all employees with a first name starting with 'J'.

4.3 IN

IN allows you to specify multiple values in a WHERE clause:

SELECT * 
FROM Employees
WHERE Position IN ('Analyst', 'Engineer');

This query selects all analysts and engineers.

5. Joining Tables

JOIN statements are used to combine rows from two or more tables based on a related column. The different types of joins include INNER JOIN, LEFT (OUTER) JOIN, RIGHT (OUTER) JOIN, and FULL (OUTER) JOIN. Consider this additional table, Departments:

DepartmentIDDepartmentName
1IT
2Sales
3HR

And suppose we add a DepartmentID field to the Employees table. Here's how we can use different types of joins:

5.1 INNER JOIN

SELECT Employees.LastName, Employees.FirstName, Departments.DepartmentName 
FROM Employees
INNER JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

This query retrieves the list of employees along with their respective department names.

  • JOIN More Than 2 Tables
-- Example of INNER JOIN with 3 tables
SELECT c.CustomerID, c.FullName, o.FoodName, d.FullAddress
FROM Customer c
INNER JOIN CustomerOrder o ON c.CustomerID = o.CustomerID
INNER JOIN DeliveryAddress d ON d.ID = o.DeliveryAddressID;

Extending the concept of INNER JOIN to involve three tables for a more comprehensive result set.

5.2 LEFT (OUTER) JOIN

SELECT Employees.LastName, Employees.FirstName, Departments.DepartmentName FROM Employees
LEFT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

This query retrieves all employees and their departments, including employees with no department (the DepartmentName for them will be NULL).

5.3 RIGHT (OUTER) JOIN

SELECT Employees.LastName, Employees.FirstName, Departments.DepartmentName FROM Employees
RIGHT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

This query retrieves all departments and their employees, including departments with no employees.

5.4 FULL (OUTER) JOIN

SELECT Employees.LastName, Employees.FirstName, Departments.DepartmentName FROM Employees
FULL JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

This query retrieves all combinations of employees and departments, including employees with no department and departments with no employees.

6. Aggregation Functions

SQL provides several functions to perform calculations on data, such as COUNT(), SUM(), AVG(), MIN(), MAX(), and GROUP_CONCAT().

SELECT COUNT(*) 
FROM Orders
WHERE OrderDate BETWEEN '2023-01-01' AND '2023-12-31';

This query returns the total number of orders placed in the year 2023.

7. Subqueries and Nested Queries

A subquery is a SQL query nested inside a larger query. A subquery may occur in:

  • A SELECT clause
  • A FROM clause
  • A WHERE clause The subquery can be nested inside a SELECT, INSERT, UPDATE, or DELETE statement or inside another subquery.
SELECT EmployeeID, FirstName, Position 
FROM Employees
WHERE EmployeeID IN (SELECT EmployeeID FROM Orders WHERE OrderTotal > 1000);

This query selects all employees who have made orders totaling more than 1000.

8. Modifying Database Information

SQL allows you to insert, update, and delete data with INSERT, UPDATE, and DELETE commands respectively. Be careful when using these commands as you can change your data permanently.

8.1 INSERT

INSERT INTO Employees (EmployeeID, FirstName, LastName, Position) VALUES (4, 'Mark', 'Anderson', 'Analyst'); 

This query adds a new row to the Employees table.

8.2 UPDATE

UPDATE Employees 
SET Position = 'Senior Analyst'
WHERE EmployeeID = 4;

This query changes Mark Anderson's position to Senior Analyst.

8.3 DELETE

DELETE FROM Employees WHERE EmployeeID = 4; 

This query deletes Mark Anderson's record from the Employees table.

9. Advanced SQL Techniques

Let's delve into more complex techniques with the help of examples.

9.1 Handling NULL values

NULL value in SQL means no or zero value. Here's how you can use IS NULL and IS NOT NULL:

SELECT * 
FROM Employees
WHERE DepartmentID IS NULL;

This query selects all employees who don't belong to any department.

SELECT * 
FROM Employees
WHERE DepartmentID IS NOT NULL;

This query selects all employees who belong to a department.

9.2 String Functions

SQL offers several functions to manipulate strings. Some examples include:

  • CONCAT(): Concatenates two or more strings.
  • TRIM(): Removes leading and trailing spaces of a string.
  • LENGTH(): Returns the length of a string.
SELECT CONCAT(FirstName, ' ', LastName) as FullName, TRIM(Position), LENGTH(FirstName) as NameLength 
FROM Employees;

This query retrieves a full name by combining first and last names, the position after removing leading and trailing spaces, and the length of the first name.

9.3 Date and Time Functions

SQL provides many functions to work with date and time. Some examples include:

  • NOW(): Returns the current date and time.
  • CURDATE(): Returns the current date.
  • CURTIME(): Returns the current time.
SELECT OrderID, OrderTotal, NOW() as QueryTime 
FROM Orders
WHERE OrderDate = CURDATE();

This query retrieves today's orders along with the query execution time.

9.4 Case Statements

Case statements help in implementing conditional logic in SQL:

SELECT FirstName, Position, 
CASE
WHEN Position = 'Analyst' THEN 'Junior Level'
WHEN Position = 'Engineer' THEN 'Mid Level'
ELSE 'Senior Level'
END as JobLevel
FROM Employees;

This query categorizes employees into job levels based on their positions.

9.5 Window Functions

Window functions perform calculations across a set of table rows that are related to the current row:

SELECT FirstName, Position, Salary, 
RANK() OVER (PARTITION BY Position ORDER BY Salary DESC) as Rank FROM Employees;

This query ranks employees within their respective positions based on their salaries.

Optimization and Performance Tuning

Here are some examples demonstrating SQL optimization techniques:

10.1 EXPLAIN

Most SQL databases support the EXPLAIN command, which shows the execution plan of an SQL statement. This can help you understand how your SQL query will be executed and where you can optimize it.

EXPLAIN SELECT * FROM Employees; 

10.2 Avoid SELECT

Rather than using SELECT *, specify the columns you need. This reduces the amount of data that needs to be read from the disk.

SELECT FirstName, LastName FROM Employees; 

10.3 Use LIMIT

If you only need a specific number of rows, use LIMIT to prevent reading unnecessary data.

SELECT * FROM Employees ORDER BY Salary DESC LIMIT 10; 

This query gets the top 10 employees with the highest salaries. 10.4 Index your data Indexing your data can significantly speed up data retrieval times. Here's how you can add an index:

CREATE INDEX idx_employees_position ON Employees(Position); 

What's Next?

  1. Practice, practice, practice: The best way to reinforce your SQL skills is by practicing. Websites like LeetCode, HackerRank, and SQLZoo provide hundreds of SQL problems that you can practice with.
  2. Learn Database Design: Understanding how databases are structured and designed will help you write better and more efficient SQL queries. Look into topics such as normalization, entity-relationship models, and data integrity.
  3. Explore Advanced SQL Topics: This tutorial covered the basics, but there's still a lot to learn. Delve into more advanced topics like stored procedures, triggers, views, and transaction control.
  4. Learn About Database Administration: Although not typically part of a Data Analyst's role, understanding how a database is administered can provide useful context. This can also open up new opportunities in the realm of database management.
  5. Understand SQL in the context of a programming language: If you are familiar with a programming language like Python or R, try to use SQL commands within these languages. This often gives you more flexibility and allows you to perform more complex operations with your data.
  6. Learn About Different SQL Databases: There are many different SQL databases, such as MySQL, SQLite, PostgreSQL, and Oracle. Each has its own unique features and syntax nuances. Familiarize yourself with the one that's most relevant to your work or interests.
  7. Apply SQL in Your Projects: The ultimate test of your skills will be applying SQL in your projects. Whether it's for cleaning data, data wrangling, or analysis, the practical application of SQL will solidify your learning and give you valuable experience. Remember, becoming proficient in SQL is a journey, not a destination. Enjoy the process of learning and experimenting. Happy querying!

· 2 min read

Welcome to our latest project - a Proof of Concept (PoC) repository that explores dynamic distance calculation using delegates. In this blog post, we'll take you through the features, the motivation behind the project, and how you can experiment with dynamic resolution of distance calculation services in a web API.

Why Delegates?

Delegates in C# are powerful constructs that allow us to treat methods as first-class citizens. Leveraging this capability, our PoC demonstrates how to dynamically resolve distance calculation services at runtime. This flexibility opens the door to easily switch between different providers based on specific conditions or user preferences.

Key Features

Dynamic Resolution with Delegates

The heart of this project lies in the DistanceCalculator class, which uses a delegate (DistanceProviderResolver) to dynamically resolve the distance calculation service. This allows us to switch between Google and PCMiler providers with ease, providing a seamless and extensible solution.

var distanceCalculator = new DistanceCalculator();
var distanceService = distanceCalculator.ResolveDistanceProvider(DistanceProvider.Google);
double distance = distanceService.CalculateDistance("Origin", "Destination");

Configurability

Experiment with different distance calculation providers by customizing the Program.cs file. This configurability makes it easy to adapt the PoC to various scenarios and requirements.

Limitations and Roadmap

As a Proof of Concept, it's important to note that this project currently uses simulated distance calculation logic with dummy values. While it showcases the concept effectively, it may lack certain features expected in a production-ready application.

We encourage you to explore, experiment, and contribute. If you encounter issues or have ideas for improvements, please open an issue on our GitHub repository.

Demo

Clone the Repository :

git clone https://github.com/nhonvo/delegate-poc.git
cd delegate-poc

Conclusion

Dynamic distance calculation with delegates offers a glimpse into the world of flexible and configurable service resolution. We hope this PoC sparks ideas and discussions about how such a concept could be integrated into real-world applications.

Feel free to dive into the code, contribute your insights, and join us on this exploration of dynamic resolution in C#.

Happy coding!

· 3 min read

Introduction

  • Entity Framework Core is an ORM framework, open-source, lightweight and cross-platform developed by Microsoft.
  • It enables developers to work with databases using .NET object and EF Core is built on top of ADO.NET

Relationship

One-to-One Relationship

modelBuilder.Entity<Author>()
.HasOne(a => a.Book) // Author has one Book
.WithOne(b => b.Author) // Book has one Author
.HasForeignKey<Book>(b => b.AuthorId); // Foreign key in Book referencing AuthorId

One-to-Many Relationship

modelBuilder.Entity<Author>()
.HasMany(a => a.Books) // Author has many Books
.WithOne(b => b.Author) // Book has one Author
.HasForeignKey(b => b.AuthorId); // Foreign key in Book referencing AuthorId

Many-to-Many Relationship

modelBuilder.Entity<StudentCourse>()
.HasKey(sc => new { sc.StudentId, sc.CourseId }); // Define composite primary key for the join table

modelBuilder.Entity<StudentCourse>()
.HasOne(sc => sc.Student)
.WithMany(s => s.Courses)
.HasForeignKey(sc => sc.StudentId); // Foreign key in join table referencing StudentId

modelBuilder.Entity<StudentCourse>()
.HasOne(sc => sc.Course)
.WithMany(c => c.Students)
.HasForeignKey(sc => sc.CourseId); // Foreign key in join table referencing CourseId

DeleteBehavior

The DeleteBehavior enum in EF Core includes the following options:

  1. Cascade: Deleting the principal/parent entity will cause the dependent/child entities to be deleted as well. OnDelete(DeleteBehavior.Cascade)

  2. SetNull: Deleting the principal/parent entity will set the foreign key properties in the dependent/child entities to null. OnDelete(DeleteBehavior.SetNull)

  3. SetDefault: Deleting the principal/parent entity will set the foreign key properties in the dependent/child entities to their default values. OnDelete(DeleteBehavior.SetDefault)

  4. Restrict: Prevents the deletion of the principal/parent entity if there are dependent/child entities. OnDelete(DeleteBehavior.Restrict) . An exception will be thrown.

  5. NoAction: Similar to Restrict, it is used to specify no action on delete. OnDelete(DeleteBehavior.NoAction) and you'll need to handle constraints in your application logic.

For example:

modelBuilder.Entity<ParentEntity>()
.HasMany(p => p.ChildEntities)
.WithOne(c => c.ParentEntity)
.OnDelete(DeleteBehavior.Restrict);

This configuration would set the delete behavior for the relationship between ParentEntity and ChildEntity to Restrict.

Best pratices

  1. Indexing: Creating indexes on frequently accessed columns can improve query performance.

  2. Proper data modeling: Designing tables and relationships properly can improve query performance and prevent performance issues.

  3. Caching: Storing frequently accessed data in a cache can reduce database calls and improve application performance.

  4. Query optimization: Writing efficient queries can improve performance. Techniques such as avoiding unnecessary joins and reducing the number of returned columns can help.

  5. Connection pooling: Reusing database connections instead of creating new ones can improve performance.

  6. Batch processing: Performing multiple operations in a single database call can improve performance and reduce overhead.

  7. Asynchronous programming: Using asynchronous programming techniques can improve performance by allowing the application to continue executing while waiting for database calls to complete.

· 6 min read

Are you ready to dive into the world of MongoDB and its powerful shell, mongosh? Let's begin our journey by learning how to install mongosh.

MongoDB Node.js Driver - Connection String Options

MongoDB supports various connection string options, including:

  • MaxPoolSize
  • MinPoolSize
  • maxIdleTimeMS

MongoDB Data Types - BSON Types

MongoDB uses BSON types to represent data. Here are some key BSON types:

  • Double: MongoDB uses Double; there is no Float.
  • String
  • Object
  • 32-Integer
  • 64-Integer
  • ObjectId
  • Boolean
  • Date
  • Timestamp
  • Decimal128
  • Array

Data types ObjectId init from timestamp, increment, machineId, randomNumber

Querying MongoDB

MongoDB provides powerful querying capabilities. Let's explore some query operations:

  1. Collection.findOne(query, options)

  2. Collection.find().project().sort().skip().limit().count()

  • Comparison: The following operators can be used in queries to compare values: {field:{operator:value}}
    • $eq: Values are equal
    • $ne: Values are not equal
    • $gt: Value is greater than another value
    • $gte: Value is greater than or equal to another value
    • $lt: Value is less than another value
    • $lte: Value is less than or equal to another value
    • $in: Value is matched within an array {"salary":{$in:[5,10]}}
    • $nin: Value not in
    • $all: find in an array all elements matching in finding ['Smartphones', 'iOS']

Eg: db.trips.find({"id":{$in:[1,20]}})

  • Logical: The following operators can logically compare multiple queries. operator:[{condition1},{condition2},..]
    • $and: Returns documents where both queries match
    • $or: Returns documents where either query matches
    • $nor: Returns documents where both queries fail to match
    • $not: Returns documents where the query does not match

Eg: db.trips.find({$or:["id":{$gt:10},"price":{$lt:19}]})

Evaluation: The following operators assist in evaluating documents.

  • $regex: Allows the use of regular expressions when evaluating field values

  • $text: Performs a text search

  • $where: Uses a JavaScript expression to match documents

  • $expr: $expr: {operator:[field, value]}

  • $elemMatch: find an object in an array of objects only, cannot find in a field ot an array

  • $size: find the number of elements in an array {$scores: {$size: 6}}

  1. Collection.find().count()

  2. Collection.countDocuments()

  3. Collection.aggregate()

    1. [{$match:{}}, {$group:{}}, {$sort: {}}, {$skip: 10}, {$limit}]
    2. Group: total, average
    3. Var pipeline = [state1, state2, state3]
    4. {$count: "total"}
  4. Collection.listIndexes()

  5. Collection.countDocuments(query)

  6. FindAndModify(query:{}, update: {}, {new: true}): new = true: return a modified document

  7. Expression: db.movies.find({$expr:{$gt: ["$idbm.votes", "$year"]}})

  8. Db.products.find("name":"Smartphone").count();

{
"customer.gender": "M",
items: {
$elemMatch: {
name: 'printer paper'
}
}
}

Working with Documents - Insert, Update, and Delete

  • Inserting Documents

  • Collection.insertOne(document, option): Option = { writeConcern: { w : "majority", wtimeout : 100 } }

  • Db.insert({}, {returnId: true})

  • Db.insert([])

  • Collection.insertMany()

  • Updating Documents

  • Collection.updateOne(filter, update, option)

    • Filter = query
    • Update = {$set: {}} or {$inc: {balance: 1000}}
    • Update = {$push: {readings: {v: 10, t: new Date()}}}
    • Option = {upsert: true} upsert not $upsert
  • Collection.update()

  • Collection.updateMany()

  • await salesCollection.updateMany({items: {$elemMatch: {name: "printer paper"}}},{$set: {"items.$.price": 20 }});

  • Deleting Documents

  • deleteMany(query): If the query = {} delete all documents in the collection

  • deleteOne(): return ({acknowledge:true, deleteCount: 1})

  • deleteMany().deletedCount

MongoDB Indexing

MongoDB offers various indexing options to optimize query performance:

  • Single Index

  • Compound Index

  • MultiKey Index

  • Unique Index vs Non-Unique Index

  • Sparse Index vs Non-Sparse Index

  • Geography index: createIndex({locationField:'2dsphere'})

    • Example Query: You can perform geospatial queries using operators like $near, $geoWithin, $geoIntersects, and others to find documents based on their geographical proximity or containment.

MongoDB Aggregation Framework

MongoDB's Aggregation Framework is a powerful tool for data transformation and analysis. Let's explore some aggregation stages:

  • $match
  • $addFields
  • $group
  • $sample: {size: 10} get random 10 documents
  • $project: {total_avg: $round {number, place}}
  • $sortByCount: "$city" => descending = $group $sort descending
  • $lookup:{from: , localField:, foreignField, as:}
  • $out: to another collection must be in the same database => Last stage only
  • $bucketAuto :{groupBy: "$tripduration", buckets:5, output: {}}
  • $bucket: {groupBy:"$tripduration",boundaries: [10,100,1000,1000,100000], default: "other", output}

Full Text Search in MongoDB

MongoDB supports full-text search with various options for tokenization and fuzzy searching.

{
"mappings": {
"dynamic": false,
"fields": {
"company": {
"type": "string"
},
"title": {
"type": "string"
}
}
}
}

Tokenization and Fuzzy Options

  • EdgeGram beginning of the word
  • rightEdgeGram
  • nGram

Fuzzy option:

  • maxEdits
  • MaxExpansions
  • prefixLength

MongoDB Transactions

MongoDB provides support for transactions, ensuring data consistency in complex operations.

// Transaction
Session startSession()
UpdateOne({}, {$inc: {balance: -30}})
CommitTransaction
AbortTransaction

MongoDB Sharding and Replica Set

Sharding and Replica Set are essential concepts for scaling and ensuring high availability in MongoDB.

Sharding

Sharding distributes data across multiple servers to improve read and write scalability.

Replica Set

A Replica Set consists of multiple nodes with one primary node and two replicate nodes.

MongoDB Best Practices

MongoDB offers various best practices and considerations for efficient data modeling and management.

Data Model Patterns

There are 12 data model patterns, including Computed Pattern, Attribute Pattern, Polymorphic pattern, Bucket, Outline, and others.

Capped Collection

db.createCollection("", {capped: true, size: 10, max: 3})

Capped collection has a Limit size.

Read and Write Concern

  • Read Concern: Available, Majority, Local
  • Write Concern: Majority

MongoDB Views

Creating and working with views in MongoDB:

Db.createView("view_name","source_collection",[pipeline],collation)
Db.createView("name", "source", [])

Cannot update to view.

MongoDB Administration and Tools

MongoDB provides tools and commands for administrative tasks, backup, and restore.

mongorestore

Use mongorestore to restore a dump file to MongoDB.

Cursor Method and Indexing

Choose the appropriate cursor method, like Cursor.hint(), to force MongoDB to use a specific index for a query.

MongoDB Security

Ensure the security of your MongoDB instance by understanding the importance of the "admin" database and controlling user access.

Conclusion

In conclusion, MongoDB offers a robust and flexible database solution with a wide range of features for efficient data management and querying. Understanding the various concepts and best practices will empower you to make the most out of MongoDB in your projects. Happy coding!