Redis Caching Service

I. Problem Background

To enable HTTPS, I abandoned the stubborn domestic virtual hosting, and took the opportunity to rewrite the previous PHP service with Node, deploying it on an expensive VPS. After moving out, I found that fetching domestic RSS often timed out, and even when not timing out, it needed loading for about 20s, completely unusable. Since the decision to move was firm, I had to find a way to speed up

The previous solution was to fetch on request, parse and then respond, the process looked slow but was actually fast, generally loading no more than 3s, acceptable for personal use, so only client-side memory caching and offline caching were done

Now 20s is completely unbearable, so first implement the fastest-acting memory caching:

Scheduled fetching,预先 store in redis
Redis memory caching, simple expiration strategy

Fetch all every 2 hours, store in redis, requests first check cache, if cache has it then don't fetch live, unless service just restarted, then need to fetch live. Scheduled fetching doesn't affect normal response, because fetching scenario doesn't need to consider dirty data issues, newer or older doesn't matter much (but in extreme cases data is from before scheduled fetch interval + client cache expiration time, then data is very old)

II. Install and Configure Redis

Installation

CentOS environment, compile and install redis stable:

# Download
wget http://download.redis.io/releases/redis-stable.tar.gz
# Extract
tar -axvf redis-stable.tar.gz
cd redis-stable
# Compile and install
make
make test
make install

Default installation path is /usr/local/bin:

$ ls /usr/local/bin | grep 'redis'
redis-benchmark
redis-check-aof
redis-check-rdb
redis-cli
redis-sentinel
redis-server

Configuration

Default configuration file is in the package root directory redis.conf:

mkdir -p /etc/redis/
cp redis.conf etc/redis
# Modify configuration items
vi etc/redis/redis.conf
# Run in background, default not background
#   Change GENERAL section daemonize no to daemonize yes
# Change password, default passwordless login
#   Remove comment from SECURITY section requirepass, change to requirepass <mypassword>

Other configuration items like port number, log directory etc. don't matter, modify if needed, then start and verify:

# Start service
redis-server /etc/redis/redis.conf
# Client connection
redis-cli
auth <mypassword>
# Operations
set 'key' 'value'
get 'key'

P.S. For more redis commands, please check Command reference – Redis, or try online Try Redis

Add to System Service

redis-server /etc/redis/redis.conf starting like this every time looks uncomfortable, adding to system service allows management via service redis <cmd>:

# Copy startup script
cp util/redis_init_script /etc/init.d/
# Rename
mv /etc/init.d/redis_init_script /etc/init.d/redis

Then modify configuration items:

vi /etc/init.d/redis
# Change second line chkconfig xxx to chkconfig 2345 80 90
# Confirm port number correct REDISPORT=6379
# Confirm server executable path correct EXEC=/usr/local/bin/redis-server
# Confirm cli path correct CLIEXEC=/usr/local/bin/redis-cli
# Confirm redis.conf path correct CONF="/etc/redis/${REDISPORT}.conf"
# Change start to background execution
#   Change $EXEC $CONF to $EXEC $CONF &

P.S. In # chkconfig 2345 80 90, 2345 is run level, 80 90 respectively indicate startup/shutdown priority, smaller value means higher priority, controls order, for more information please check chkconfig

Default reads /etc/redis/${REDISPORT}.conf, that is /etc/redis/6379.conf, to handle multi-instance situations, so we rename the configuration file:

mv /etc/redis/redis.conf /etc/redis/6379.conf

Finally register system service:

# Register
chkconfig --add redis
# Set auto-start
chkconfig redis on

Can manage via service command:

service redis start

III. Node Connects to Redis

There's ready-made third-party module node_redis:

npm install redis --save

Try connecting:

const redis = require('redis');

const PORT = 6379;
const HOST = '127.0.0.1';
const PWD = 'mypassword';
const OPTS = {auth_pass: PWD};

// connect redis
let client = redis.createClient(PORT, HOST, OPTS);
client
    .on('error', (err) => {
        console.log('Error ' + err);
    })
    .on('ready', () => console.log('redis connected'));

After connect can operate freely, API is consistent with redis commands:

// Write
client.set(key, val, callback);
// Read
client.get(key, (error, val) => {});
// Set expiration
client.expire(key, seconds);
// Check expiration
client.expire(url, (error, ttl) => {
    if (ttl > 0) console.log('alive');
    else console.log('died');
});

Special note: All callback are Node classic style, first parameter is err, not data

Make a simple cache layer, structure as follows:

cache
  - queue
  - clearQueue()
  - expire()
  - ttl()

  + set()
  + get()
  + checkFresh()

Cooperate with access fetching and scheduled fetching:

fetch
  - onerror(error) => {
        emitter.emit('error', error);
    };
  - onsuccess(data) => {
        data && cache.set(url, data);
    };
  - fetchNow()

  + fetch() => {
        if (noCache) {
            cache.checkFresh(url, (fresh) => {
                if (!fresh) console.log('schedule force fetch now'), fetchNow();
                else oncancel('cache is still fresh now');
            }
        }
        else {
            cache.get(url, (data) => {
                if (data) console.log('fetch from cache'), onsuccess(data);
                else fetchNow();
            });
        }
    }

Access fetching goes through cache, directly from cache, only fetch if not present. Scheduled fetching forcibly bypasses cache, but checks expiration, if data is still new, cancel fetch task, if not new then fetch live, if fetch successful then record through cache layer

P.S. Scheduled fetching checks expiration to avoid unnecessary repeated fetching, for example if service crashes and restarts, redis data is unaffected, still new, then no need to fetch again

IV. Summary

Speed improvement effect is very obvious, previously accessing domestic resources 20s loading shortened to 5-6s, foreign resources also faster by about 1-2s, compared to previous PHP service, 5-6s is still quite slow, but next optimization items aren't this simple and crude:

todo
1. API splitting, some interfaces return 128K text, consider pagination
2. Redis data structure optimization, currently url key corresponds to a very large JSON string, should be more scientific way
3. Long connection, reduce travel cost