[SQL] data.world tutorial SQL OUTER JOIN 예제 풀이

데이터 분석/DB & SQL 2020. 4. 30. 10:48

OUTER JOIN 은 언제나 조금 어색하다. 사실 비즈니스에서는 INNER JOIN 을 훨씬 많이 쓰기 때문.
이번 기회에 정리하면서 OUTER JOIN 에 대한 원리학습해보는 것으로!

https://docs.data.world/documentation/sql/concepts/intermediate/Joins.html

Joins

Learn how to JOIN multiple tables in this SQL tutorial. Visit for a full course on learning the basics of SQL.

docs.data.world

OUTER JOIN 의 원리 1줄 정리

There are, however, times when you might want to return all the rows form one table matched up with values (where they exist) from another table.

- return all the rows from one table: 한 테이블의 모든 행을 응답한다.
- matched up with values (where they exist) from another table: 특정 값을 기준으로 다른테이블에 있는 값과 매칭시켜서

1. 요구사항을 검토한다.

For example, if we wanted a list of all the sales agents with a indication of they have deals in the pipeline

- if we wanted a list of all the sales agents: 모든 sales agents 리스트를 원하는데
- with a indication of they have deals in the pipeline: 파이프라인 테이블의 그들이 맺어온 거래도 같이 표시하고 싶다면
- 그 중 현재 진행중인 거래만 표시하고 싶다면 (요구사항엔 없고 이후에 추가된 요구사항!)

2. Dataset 을 확인한다.

- 모든 sales_agent 정보: sales_teams
- 모든 거래 정보 (with sales_agent): sales_pipeline
- 2개 테이블 모두에 sales_agent 를 담은 정보가 있으므로 matched value 는 sales_agents

2개의 테이블을 살펴본다.

SELECT DISTINCT sales_agent FROM sales_teams;
-- sales_teams 내 모든 sales_agents 는 총 35 명

SELECT DISTINCT sales_agent FROM sales_pipeline;
-- pipleline 내 거래정보가 있는 sales_agents 는 총 30 명

3. 여러 테이블을 JOIN 한다. JOIN 방식 검토

기준 테이블: 모든 sales agents 가 있는 테이블, sales_teams
부가 테이블: 거래정보가 있는 테이블, sales_pipeline

기준 테이블은 우리가 데이터를 분석하는데 사용되는 기준이 되는 테이블이고
부가 테이블은 기준 테이블의 데이터를 중심으로 (sales_agents) JOIN 대상이 되는 테이블로, 땡겨오고 싶은 데이터가 있는 테이블이다.

SELECT DISTINCT sales_teams.sales_agent FROM sales_teams
LEFT OUTER JOIN sales_pipeline ON sales_teams.sales_agent = sales_pipeline.sales_agent;

-- sales_teams 내 모든 sales agents 를 기준으로 거래정보를 붙인 결과에서도 sales_agent 는 35명

기준테이블에만 있는 데이터는 응답결과에 row 는 생성이 된다.
다만, 기준테이블의 정보만 가져오고, 부가테이블의 정보는 NULL 처리된다.

SELECT * FROM sales_teams
LEFT OUTER JOIN sales_pipeline ON sales_teams.sales_agent = sales_pipeline.sales_agent
WHERE sales_teams.sales_agent NOT IN (
SELECT DISTINCT sales_agent FROM sales_pipeline
)

-- sales_teams 내 모든 sales agents 를 기준으로 거래정보를 붙인 결과에서
-- pipleline 내 거래정보가 있는 sales_agents 를 제외한 결과 확인
-- 거래정보가 모두 NULL 처리된 5명, 5개의 row 가 응답된다.

요 걸 다이어그램으로 표시하면 다음과 같다. 쿼리할 때 마다 다이어그램을 찾아보지 않는 그 날을 위해!

요건 비교대상 INNER JOIN

또 다른 비교대상 FULL OUTER JOIN

4. 추출 대상 컬럼을 지정한다.

명시적인 추출대상은 sales_agents 하나 이다.
하지만 아마도 진행중인 거래만 보고싶으니, 거래 상태도 같이 추가해주면 좋겠다.

결론. 추출대상컬럼: sales_teams.sales_agent, sales_pipeline.deal_stage

5. (선택) 최종 조회결과의 컬럼명과 데이터타입을 확인한다.

sales_teams.sales_agent 는 agent 로 sales_pipeline.deal_stage 는 deal_stage 로 컬럼명을 깔끔하게 처리하자.
그리고 두 컬럼 모두 STRING 값이니, 데이터타입은 변환이 불필요하다.

최종 쿼리

SELECT DISTINCT sales_teams.sales_agent AS agent,
       sales_pipeline.deal_stage AS deal_stage
  FROM sales_teams
       LEFT OUTER JOIN sales_pipeline ON sales_teams.sales_agent = sales_pipeline.sales_agent
WHERE sales_pipeline.deal_stage = "In Progress"

다이어그램에 해방되는 그날을 위하여!

'데이터 분석 > DB & SQL' 카테고리의 다른 글

[SQL] data.world tutorial SQL Aggregatio 예제 풀이 (0)	2020.05.01
[SQL] data.world tutorial SQL SELF JOIN 예제 풀이 (0)	2020.05.01
[SQL] data.world tutorial SQL INNER JOIN 예제 풀이 (0)	2020.04.26
[SQL] 내가 추출하고 싶은 데이터가 여러 테이블이 걸쳐 저장되어 있다면 (1)	2020.04.26
[SQL] 조건문 Where cluase 비교연산 사용법 (0)	2020.04.25

ABOUT ME