Redis 운영 관점에서 알아보기 (2)

클러스터 모드 구축하기

  
  ## Redis Cluster Node (Master 3, Replica 3)

  redis-7000:
    image: redis:7.2
    hostname: redis-7000
    command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
    volumes:
      - ./cluster/redis-7000/redis.conf:/usr/local/etc/redis/redis.conf
      - ./cluster/redis-7000/data:/data
    ports:
      - "7000:6379"

  redis-7001:
    image: redis:7.2
    hostname: redis-7001
    command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
    volumes:
      - ./cluster/redis-7001/redis.conf:/usr/local/etc/redis/redis.conf
      - ./cluster/redis-7001/data:/data
    ports:
      - "7001:6379"

  redis-7002:
    image: redis:7.2
    hostname: redis-7002
    command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
    volumes:
      - ./cluster/redis-7002/redis.conf:/usr/local/etc/redis/redis.conf
      - ./cluster/redis-7002/data:/data
    ports:
      - "7002:6379"

  redis-7003:
    image: redis:7.2
    hostname: redis-7003
    command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
    volumes:
      - ./cluster/redis-7003/redis.conf:/usr/local/etc/redis/redis.conf
      - ./cluster/redis-7003/data:/data
    ports:
      - "7003:6379"

  redis-7004:
    image: redis:7.2
    hostname: redis-7004
    command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
    volumes:
      - ./cluster/redis-7004/redis.conf:/usr/local/etc/redis/redis.conf
      - ./cluster/redis-7004/data:/data
    ports:
      - "7004:6379"

  redis-7005:
    image: redis:7.2
    hostname: redis-7005
    command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
    volumes:
      - ./cluster/redis-7005/redis.conf:/usr/local/etc/redis/redis.conf
      - ./cluster/redis-7005/data:/data
    ports:
      - "7005:6379"

  ## Sentinel (auto failover)

  sentinel-26379:
    image: redis:7.2
    command: ["redis-server", "/usr/local/etc/redis/sentinel.conf", "--sentinel"]
    volumes:
      - ./sentinel/sentinel-26379/sentinel.conf:/usr/local/etc/redis/sentinel.conf
    ports:
      - "26379:26379"
  
  sentinel-26380:
    image: redis:7.2
    command: ["redis-server", "/usr/local/etc/redis/sentinel.conf", "--sentinel"]
    volumes:
      - ./sentinel/sentinel-26380/sentinel.conf:/usr/local/etc/redis/sentinel.conf
    ports:
      - "26380:26379"

  sentinel-26381:
    image: redis:7.2
    command: ["redis-server", "/usr/local/etc/redis/sentinel.conf", "--sentinel"]
    volumes:
      - ./sentinel/sentinel-26381/sentinel.conf:/usr/local/etc/redis/sentinel.conf
    ports:
      - "26381:26379"

기존에 사용했던 docker-compose 파일에 위와 같이 추가해주었다.

command를 통해 Redis 서버를 시작할 때 내부 설정파일을 읽도록 지정한다.
volumes를 통해 호스트의 설정 파일을 컨테이너 내부 경로로 마운트.

따라서 읽을 설정 파일이 필요하다.

docker-compose 파일이 존재하는 디렉토리에 각각 cluster, sentinel 폴더를 만든다.

cluster
- redis-7000
- redis-7001
- …
sentinel
- sentinel-26379
- sentinel-26380
- …

그리고 각 폴더에 설정파일을 넣으면 된다.

  
## redis.conf

# 기본 포트 (컨테이너 내부에서는 6379, 호스트와는 700X:6379 매핑)
port 6379

# 클러스터 모드 활성화
cluster-enabled yes

# 클러스터 노드 정보 저장 파일
cluster-config-file /data/nodes.conf

# 노드 응답 타임아웃(ms)
cluster-node-timeout 5000

# AOF 영속화 활성화 (운영과 유사한 환경을 위해)
appendonly yes

# 스냅샷(RDB) 간격 설정 예시 (기본값 유지해도 OK)
# save 900 1
# save 300 10
# save 60 10000

# 보안 모드 해제 (로컬 테스트 전용)
protected-mode no

cluster-announce-ip 127.0.0.1
cluster-announce-port 7000 # 노드마다 7001, 7002...
cluster-announce-bus-port 17000 # 노드마다 17001, 17002...

# 로그 출력 레벨 (optional)
# loglevel notice

# 동시 연결 제한 (optional)
# maxclients 10000

# 메모리 최대치 (optional)
# maxmemory 512mb
# maxmemory-policy allkeys-lru

cluster-node-timeout 5000
- 마스터가 응답 없다고 판단하기까지 5초 대기한다.

  
## sentinel.conf

# Sentinel 기본 포트 (컨테이너 내부에서는 26379, 호스트와는 2638X:26379 매핑)
port 26379

# 모니터할 마스터 이름, 호스트, 포트, quorum(투표수)
sentinel monitor mymaster redis-7000 6379 2

# 마스터가 응답 없다고 간주할 시간(ms)
sentinel down-after-milliseconds mymaster 5000

# 페일오버 타임아웃(ms)
sentinel failover-timeout mymaster 10000

# 페일오버 시 동시 동기화할 replica 수
sentinel parallel-syncs mymaster 1

# 외부 접속 허용
protected-mode no

# 로그 파일(선택)
# logfile "/var/log/redis/sentinel-26379.log"

# Sentinel 리소스 제한(선택)
# maxmemory 128mb

sentinel monitor mymaster redis-7000 6379 2
- mymaster라는 이름으로 redis-7000:6379 마스터를 모니터링하고 3대 Sentinel 중 2대가 다운 투표를 해야 페일오버가 시작된다.
parallel-syncs
- 페일오버 후 레플리카들이 새 마스터에 동기화될 때 동시에 몇 개까지 동기화할지 설정한다.
- 한 번에 하나씩 순차 동기화해 마스터 부하를 최소화하고 동기화 실패 시 원인 파악이 쉽기 때문에 1로 설정한다.
- 클러스터 규모가 작기 때문에 동기화 대상 노드가 많지 않아 병렬 동기화의 이득이 크지 않다.
- 초기에 1로 두고 동기화 속도, 부하를 모니터링하며 필요할 때 늘리는 것이 일반적이다.

클러스터와 센티넬에 대해 다시 간단히 정리하자면

Cluster
- 데이터 샤딩 및 고가용성 목적으로 슬롯 기반으로 키를 분산한다.
- 16,384개의 슬롯을 노드에 분산 배치해 같은 키는 항상 같은 슬롯에 저장한다.
- 노드를 추가/제거하면서 자동으로 슬롯을 재분배 가능하며 대량의 데이터를 여러 노드에 나누어 저장해 처리량을 확대한다.
Sentinel
- 단일 마스터 환경에서 자동으로 장애를 감지하고 페일오버를 하는 목적

우선 클러스터, 센티넬 환경을 테스트하기 전에 이전에 했던 Jmeter 성능 테스트를 다시 해봤다.

분당 20만 트래픽을 10분간 받아본 결과이다. 이전 테스트 결과로는 분당 35만정도에서 확실한 실패라고 볼 수 있었다.

현재 에러가 조금 잡히는 이유는 다른 프로그램을 많이 켜뒀을 때 시스템 CPU 사용량이 치솟아 잠깐 잡혔다.

클러스터

docker-compose가 있는 경로로 이동 후 명령을 실행한다.

docker exec -it observability-redis-7000-1 sh

위 명령어로 접근 한 후

redis-cli --cluster create <HOST_IP>:7000 <HOST_IP>:7001 <HOST_IP>:7002 <HOST_IP>:7003 <HOST_IP>:7004 <HOST_IP>:7005 --cluster-replicas 1 --cluster-yes

혹은

docker run --rm -it --network observability_default redis:7.2 redis-cli --cluster create <HOST_IP>:7000 <HOST_IP>:7001 <HOST_IP>:7002 <HOST_IP>:7003 <HOST_IP>:7004 <HOST_IP>:7005 --cluster-replicas 1 --cluster-yes

>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica redis-7004:6379 to redis-7000:6379
Adding replica redis-7005:6379 to redis-7001:6379
Adding replica redis-7003:6379 to redis-7002:6379
M: b9d66d8ebe653dddde56ed177a65f7049e2f222e redis-7000:6379
   slots:[0-5460] (5461 slots) master
M: 2bd99410ea897d81a6fb2dcd6246b7c8cd46fc05 redis-7001:6379
   slots:[5461-10922] (5462 slots) master
M: 4f28fb073bc3ac001044a75871724207e58e0e2b redis-7002:6379
   slots:[10923-16383] (5461 slots) master
S: 49bd0c225226c56a11573fb4d6c8ed7004b10ef4 redis-7003:6379
   replicates 4f28fb073bc3ac001044a75871724207e58e0e2b
S: 77ccf055cd599ad317420b740cf8890bbc2a9ce7 redis-7004:6379
   replicates b9d66d8ebe653dddde56ed177a65f7049e2f222e
S: fff755013945965432fed1e6a8b3e27bbd07e86d redis-7005:6379
   replicates 2bd99410ea897d81a6fb2dcd6246b7c8cd46fc05
Can I set the above configuration? (type 'yes' to accept):

명령어 입력 후 위와 같이 나오면 yes를 입력해주면 클러스터 생성이 완료된다.

정상적으로 완료되면 위와 같은 메시지를 볼 수 있다.

현재 클러스터 상태
- 총 6개의 Redis 인스턴스로 3개의 마스터와 각 마스터당 1개의 레플리카로 구성된다.
- 7000: 0 ~ 5460 슬롯 -> 7003 복제
- 7001: 5461 ~ 10922 슬롯 -> 7004 복제
- 7002: 10923 ~ 16383 슬롯 -> 7005 복제
- AOF 영속화, 노드 다운은 5초 timeout

그리고 클러스터 모드를 애플리케이션 코드에서도 적용해야 한다.

  
spring.data.redis.cluster.nodes=<HOST_IP>:7000,<HOST_IP>:7001,<HOST_IP>:7002  
spring.data.redis.cluster.max-redirects=3

  
@Configuration
@EnableCaching
public class RedisConfig {

//    @Value(value = "${spring.data.redis.host}")
//    private String host;
//
//    @Value(value = "${spring.data.redis.port}")
//    private int port;

    @Value("${spring.data.redis.cluster.nodes}")
    private List<String> clusterNodes;

    @Value("${spring.data.redis.cluster.max-redirects:3}")
    private Integer maxRedirects;

    @Bean
    public RedissonClient redissonClient() {
        Config config = new Config();
//        config.useSingleServer().setAddress("redis://localhost:6379"); // Redis 주소 설정
//        return Redisson.create(config);

        config.useClusterServers()
                .addNodeAddress(clusterNodes.stream()
                        .map(addr -> "redis://" + addr).toArray(String[]::new));

        return Redisson.create(config);
    }

    @Bean
    public RedisConnectionFactory redisConnectionFactory() {
//        return new LettuceConnectionFactory(host, port);

        RedisClusterConfiguration clusterConfig = new RedisClusterConfiguration(clusterNodes);
        clusterConfig.setMaxRedirects(maxRedirects);

        LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
                .commandTimeout(Duration.ofSeconds(2))
                .shutdownTimeout(Duration.ofSeconds(2))
                .build();

        return new LettuceConnectionFactory(clusterConfig, clientConfig);
    }

모니터링 상태가 현재 Redis-exporter에 6379포트로 되어있기 때문에 이 부분을 redis-7000:6379와 같이 하나의 노드 기준으로 지정하고 클러스터 모드로만 켜주면 나머지 노드도 자동으로 데이터를 긁어온다.

Waiting for the cluster to join……

Cluster 구성 중 이 문구가 반복된다면 방화벽 문제일 수 있다.

Windows 기준 방화벽 및 네트워크 보호에 가서 7000 ~ 7005, 17000 ~ 17005 포트를 열어주면 된다.

클러스터 모드 테스트

이제 클러스터 구축을 완료했으니 다양한 시나리오를 테스트해보자.

장애 전환 테스트

redis-cli -c -h <HOST_IP> -p 7000 cluster nodes | grep master

위 명령어를 입력하면 현재 마스터 노드들을 볼 수 있다. 현재 7000, 7001, 7002가 마스터인 것을 확인할 수 있다.

이 셋 중 하나를 강제로 중단시켜보자. 나는 7001번을 간단하게 Docker Desktop에서 컨테이너 중지 버튼으로 중단시켰다.

redis-cli -c -h <HOST_IP> -p 7000 cluster nodes

명령어로 확인해보니, 자동승격이 되고 있지 않았다.

redis-cli -h <HOST_IP> -p 7005 INFO replication

Replication 
role:slave # 레플리카 상태
master_host:192.168.219.101 # 복제 대상으로 삼는 마스터의 주소 
master_port:7001 
master_link_status:down # 마스터와의 복제 링크가 down 상태이다. (연결 미수립)
master_last_io_seconds_ago:-1 # 마스터로부터 단 한번도 데이터 I/O를 받은적이 없다. (정상 0 이상)
master_sync_in_progress:0 
slave_read_repl_offset:1 # 레플리카가 읽고 처리한 복제 오프셋
slave_repl_offset:1 
master_link_down_since_seconds:-1 # 마스터와 복제 링크가 down인게 얼마나 됐는지.
slave_priority:100 
slave_read_only:1 
replica_announced:1 
connected_slaves:0 
master_failover_state:no-failover 
master_replid:f74a6c566a828fa7532c7c436fcb97f88313d865 
master_replid2:0000000000000000000000000000000000000000 
master_repl_offset:0 
second_repl_offset:-1 
repl_backlog_active:0 
repl_backlog_size:1048576 
repl_backlog_first_byte_offset:0 
repl_backlog_histlen:0

결국 자동승격이 되지 않은 이유는 레플리카가 마스터와 한 번도 정상 복제를 성립하지 못했기 때문이다.

* Starting BGSAVE for SYNC with target: replicas sockets 
* 1:M  * Background RDB transfer started by pid 186 
* 186:C  * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB 
* 1:M  * Diskless rdb transfer, done reading from pipe, 1 replicas still up. 
* 1:M  * Connection with replica 172.18.0.1:6379 lost. 
* 1:M  * Replica 172.18.0.1:6379 asks for synchronization 
* 1:M  * Full resync requested by replica 172.18.0.1:6379 
* 1:M  * Current BGSAVE has socket target. Waiting for next BGSAVE for SYNC 
* 1:M  * Background RDB transfer terminated with success

로그를 보면 위와 같다. 디스크리스 풀 리싱크를 시작하는데, 레플리카 연결이 바로 끊기고 다시 싱크를 요청하는 것이 반복중이다.

현재 환경이 스프링 서버는 로컬, 나머지는 Docker로 띄우다보니 Redis Cluster도 레플리카가 마스터에 Host IP로 붙는 설정을 해놓았는데 이 부분 때문에 복제가 제대로 진행되지 않고 있다.

Redis 운영 관점에서 알아보기 (2)

클러스터 모드 구축하기

클러스터

클러스터 모드 테스트

장애 전환 테스트

Further Reading

Redis 운영 관점에서 알아보기

배치 성능 개선기

트랜잭션 분리를 통한 개선