친구의 웹툰 동적 쿼리 작성 시 발생했던 문제 그리고 최적화 (MySQL - group by)

친구의 웹툰이라는 페이지가 있다.

이름, 나이, 성별 정보를 검색할 수 있고 입력하지 않고 검색할 수도 있다.

검색의 목적은 해당 정보와 맞는 유저들의 웹툰 평가 정보를 보기 위함이다.

예를 들면 26세 남자 홍길동씨의 평가 목록과 몇점을 주었는지에 대한 정보를 확인할 수 있다.

혹은 정보를 일부 입력하지 않는다면 26세 남성들이 평가한 웹툰들의 목록과 평균 평가 점수를 확인할 수 있다.

이를 위한 로직은 쿼리 작성이 중요한 부분이었고, 다른 방법으로는 동적쿼리 작성이 얼마나 힘든지 알고 있었기 때문에 Querydsl을 사용하여 작성하였다.

구현

  
@Override 
public Page<FriendsWebtoonDto> findSearchConditionWebtoon(String name, String gender, Integer age, Pageable pageable) { 

	QWebtoon webtoon = QWebtoon.webtoon; 
	QAccount account = QAccount.account; 
	QEvaluation evaluation = QEvaluation.evaluation; 

	List<FriendsWebtoonDto> content = queryFactory 
			.select(Projections.constructor(FriendsWebtoonDto.class,
					 webtoon.titleId, 
					 webtoon.imgSrc, 
					 evaluation.rating.avg().as("avgRating"))) 
			.from(evaluation) 
			.join(evaluation.webtoon, webtoon) 
			.join(evaluation.account, account) 
			.where(usernameEq(name), genderEq(gender), ageEq(age)) 
			.groupBy(webtoon.titleId) 
			.offset(pageable.getOffset()) 
			.limit(pageable.getPageSize()) 
			.fetch(); 

	long total = queryFactory 
		.select(evaluation.count()) 
		.from(evaluation) 
		.join(evaluation.webtoon, webtoon) 
		.join(evaluation.account, account) 
		.where(usernameEq(name), genderEq(gender), ageEq(age)) 
		.fetchOne(); 

	return new PageImpl<>(content, pageable, total); 
} 

private BooleanExpression usernameEq(String username) { 
	return hasText(username) ? account.username.eq(username) : null; } 

private BooleanExpression genderEq(String gender) { 
	return hasText(gender) ? account.gender.eq(Gender.from(gender)) : null; 
} 

private BooleanExpression ageEq(Integer age) { 
	return age != null ? account.age.eq(age) : null; 
}

동적 쿼리를 작성하기 위해 BooleanExpression과 BooleanBuilder를 사용하는 방법이 있다.

BooleanBuilder의 단점

어떠한 쿼리가 발생할 지 예측하기 힘들다.

  
BooleanBuilder builder = new BooleanBuilder();

    if (hasText(condition.getUsername())) {
        builder.and(member.username.eq(condition.getUsername()));
    }

    if(hasText(condition.getTeamName())){
        builder.and(team.name.eq(condition.getTeamName()));
    }

    if(condition.getAgeGoe() != null) {
        builder.and(member.age.goe(condition.getAgeGoe()));
    }

    if(condition.getAgeLoe() != null){
        builder.and(member.age.loe(condition.getAgeLoe()));
    }

이렇게 했을 경우 Querydsl 구현부에는

  
.where(builder)

이렇게만 사용되어 구현부만 보고 한 눈에 어떤 조건을 걸었는 지 확인하기 힘들다.

Querydsl 사용 시 주의사항

Cross Join을 회피하자.

묵시적 조인이라고 하는 조인을 명시하지 않고 엔티티에서 다른 엔티티를 조회해 비교하는 경우 JPA가 알아서 크로스 조인을 하게 된다.

크로스 조인을 하게 되면 나올 수 있는 데이터가 더 많아지기 때문에 성능상 단점이 있다.

따라서 쿼리를 보고 크로스 조인이 발생한다면 명시적 조인을 통해 해결할 수 있다.

select
    member0_.member_id as member_i1_0_,
    member0_.age as age2_0_,
    member0_.team_id as team_id4_0_,
    member0_.username as username3_0_ 
from
    member member0_ cross 
join
    team team1_ 
where
    member0_.team_id=team1_.team_id 
    and member0_.team_id>team1_.member_id

select
    member0_.member_id as member_i1_0_,
    member0_.age as age2_0_,
    member0_.team_id as team_id4_0_,
    member0_.username as username3_0_ 
from
    member member0_ 
inner join
    team team1_ 
        on member0_.team_id=team1_.team_id 
where
    member0_.team_id>team1_.member_id

select 컬럼에 entity를 자제하자.

select절 안에 entity를 넣으면 엔티티의 모든 컬럼이 조회된다.

필요한 컬럼만 조회하도록 하고 싶다면 Dto를 사용해 필요한 컬럼만 조회하도록 하자.

  
queryFactory
            .select(
                    new QMemberTeamDto2(
                            member.id,
                            member.username,
                            member.age,
                            member.team // team 에 있는 모든 칼럼을 가지고오게 된다.
                    )
            )
            .from(member)
            .innerJoin(member.team, team)
            .where(member.team.id.eq(teamId))
            .fetch();

member.team 대신 member.team.id 와 같이 필요한 컬럼만 가져오도록 한다.

발생한 문제 (GroupBy)

org.springframework.dao.InvalidDataAccessResourceUsageException: JDBC exception executing SQL ...

위의 구현코드를 실행해보면, 위와 같은 에러를 만날 수 있다.

이는 MySQL의 sql_mode=only_full_group_by 설정때문이다.

  
.where(usernameEq(name), genderEq(gender), ageEq(age)) 
.groupBy(webtoon.titleId) 
//...

Group By 절에서 명시되지 않은 컬럼을 select하기 때문에 발생하는 문제이다.
imgSrc 컬럼이 Group By 절에 포함되지 않았기 때문에 발생한 문제이다.

이 문제는 크게 2가지 해결 방법이 있다.

sql_mode를 변경한다.
정보를 가져오는 쿼리와 평점을 계산하는 쿼리를 따로 작성한 후 애플리케이션 로직 내에서 합친다.
Group by 절에 imgSrc를 추가한다.

나는 이 중 3번째 방법을 택했다.

3번째 방법을 선택한 이유

sql_mode를 변경한다.
- 장점: 쿼리 작성이 간단해진다.
- 단점: 그룹화된 컬럼 외의 다른 컬럼들에 대해 임의의 값이 반환될 수 있어 예상치 못한 결과를 보일 수 있다.
GroupBy절에 imgSrc를 추가한다.
- 장점: 간단한 쿼리 작성 및 모드를 변경하지 않음으로써 모드 변경 시 발생할 수 있는 문제를 방지할 수 있다.
- 단점: 그룹화 조건이 추가됨에 따라 쿼리의 실행 속도가 느려질 수 있다.

종합하면 성능보다 안정성을 택한 것이다.

해결

  
@Override  
public Page<FriendsWebtoonDto> findSearchConditionWebtoon(String name, String gender, Integer age,  
                                                          Pageable pageable) {  
    QWebtoon webtoon = QWebtoon.webtoon;  
    QAccount account = QAccount.account;  
    QEvaluation evaluation = QEvaluation.evaluation;  
  
    JPAQuery<FriendsWebtoonDto> query = queryFactory  
		    .select(Projections.constructor(FriendsWebtoonDto.class,  
	                    webtoon.titleId,  
	                    webtoon.imgSrc,  
	                    evaluation.rating.avg().as("avgRating")))  
		    .from(evaluation)  
		    .join(evaluation.webtoon, webtoon)  
		    .join(evaluation.account, account)  
		    .where(usernameEq(name), genderEq(gender), ageEq(age))  
		    .groupBy(webtoon.titleId, webtoon.imgSrc)  
		    .orderBy(evaluation.rating.avg().desc())  
		    .offset(pageable.getOffset())  
		    .limit(pageable.getPageSize());  
  
    List<FriendsWebtoonDto> content = query.fetch();  
    long total = query.fetchCount();  
  
    return new PageImpl<>(content, pageable, total);  
}

성능에 비교적 좋지 않은 방식으로 해결했기 때문에, Querydsl의 쿼리 성능 최적화 방법을 찾아 적용해보기로 했다.

동적 쿼리 작성 최적화하기

Group By 최적화

일반적으로 MySQL에서는 Group By를 실행하면 Group By column에 의해 Filesort라는 정렬 알고리즘이 추가적으로 실행된다고 한다.

이 쿼리는 index가 없을 때 발생한다.

Filesort가 발생하면 상대적으로 더 느려지기 때문에 이 경우를 막으려면 order by null을 사용하면 된다고 한다.

하지만 Querydsl에서는 order by null을 지원하지 않기 때문에 직접 구현해 사용할 수 있다.

  
public class OrderByNull extends OrderSpecifier {

    public static final OrderByNull DEFAULT = new OrderByNull();

    private OrderByNull(){
        super(Order.ASC, NullExpression.DEFAULT, NullHandling.Default);
    }
}

문제는 페이징 쿼리의 경우 사용하지 못한다. 따라서 이 방법은 적용하지 못했다.

추가적으로 정렬의 경우 100건 이하라면 애플리케이션 메모리로 가져와 정렬하는 것을 추천한다고 한다. DB 자원의 사용보다 더 효율적이다.

페이징 성능 개선 with No Offset

offset과 limit을 이용한 페이징 방식은 서비스가 커짐에 따라 장애를 유발할 수 있다고 한다.

데이터가 점점 많아지면 느려지기 때문이다. offset을 이용하면 offset + limit 만큼의 데이터를 읽어야 한다.

offset 을 이용하는 기존 페이징 쿼리는 다음과 같다.

  
SELECT *
FROM items 
WHERE 조건문
ORDER BY id desc 
OFFSET 페이지 번호
LIMIT 페이지 사이즈 

이와 같은 형태는 페이지 번호가 뒤로 갈수록 앞에서 읽었던 행을 다시 읽어야 한다.

offset이 10000이고 limit가 20이라면 10,020 행을 읽어야 하고 이후 10,000 개의 행을 버리는 것이다.

그렇기 때문에 성능 상에 안좋다는 것이다.

이 문제를 No Offset 방식으로 해결할 수 있다.

No Offset 방식은 시작 지점을 인덱스로 빠르게 찾아 첫 페이지부터 읽도록 하는 방식이다.

  
SELECT *
FROM items
WHERE 조건문
AND id < 마지막 조회 ID 
ORDER BY id desc 
LIMIT 페이지 사이즈 

이전에 조회된 결과를 한 번에 건너뛸 수 있도록 하여 마지막 조회 결과의 ID를 조건문에 사용하는 방식을 이용한 것이다.

  
public List<MemberDto> noOffset(Long lastMemberId, int limit){
    return queryFactory
        .select(new QMemberDto(
                member.username,
                member.age
        ))
        .from(member)
        .where(member.username.contains("member")
                .and(memberIdLt(lastMemberId)))
        .orderBy(member.id.desc())
        .limit(limit)
        .fetch();
}

private BooleanExpression memberIdLt(Long lastMemberId) {
    return lastMemberId != null ? member.id.lt(lastMemberId): null;
}

하지만 이 또한 내 쿼리에는 적용할 수 없었고, 적용할 이유도 크게 없었다.

우선 웹툰 개수 자체가 한정적이기 때문에 성능에 큰 차이가 없을 것이다. 그리고 이 방식은 More버튼 같이 순차적으로 페이지에 접근할 때 가능한 방법이다.

나의 경우 평균 평점순으로 정렬되어 있는데, 평점이 낮은 웹툰을 보고 싶을 경우 끝 페이지로 가야하기 때문에 불가능하다.

결과적으로 위 부분에서 성능 개선을 위한 로직을 적용할 수는 없었지만, 성능 개선이 필요할 때 이러한 부분들을 참고하여 개선할 수 있을 듯 하다.

친구의 웹툰 동적 쿼리 작성 시 발생했던 문제 그리고 최적화 (MySQL - group by)

구현

Querydsl 사용 시 주의사항

발생한 문제 (GroupBy)

해결

동적 쿼리 작성 최적화하기

Further Reading

프리티어 기간 만료에 따른 배포 서버 종료하기

recommtoon.com 재개발 결정.

recommtoon.com 기존 프로젝트 정리 및 문제점.