Syed Jahanzaib – Personal Blog to Share Knowledge !

August 13, 2012

Youtube caching with Squid + Nginx

Filed under: Linux Related — Tags: , , — Syed Jahanzaib / Pinochio~:) @ 11:52 AM

Updated Version of squid which cache youtube and many other contents. read following

https://aacable.wordpress.com/2012/01/19/youtube-caching-with-squid-2-7-using-storeurl-pl/

 

 

Advantages of Youtube Caching   !!!

In most part of the world, bandwidth is very expensive, therefore it is (in some scenarios) very useful to Cache Youtube videos or any other flash videos, so if one of user downloads video / flash file , why again the same user or other user can’t download the same file from the CACHE, why he sucking the internet pipe for same content again n again?
Peoples on same LAN ,sometimes watch similar videos. If I put some youtube video link on on FACEBOOK, TWITTER or likewise , and all my friend will  watch that video and that particular video gets viewed many times in few hours. Usually the videos are shared over facebook or other social networking sites so the chances are high for multiple hits per popular videos for my LAN users / friends. [syed.jahanzaib]

This is the reason why I wrote this article.

Disadvantages of Youtube Caching   !!!

The chances, that another user will watch the same video, is really slim. if I search for something specific on youtube, i get more then hundreds of search results for same video. What is the chance that another user will search for the same thing, and will click on the same link / result? Youtube hosts more than 10 million videos. Which is too much to cache anyway. You need lot of space to cache videos. Also accordingly you will be needing ultra modern fast hardware with tons of  SPACE to handle such kind of cache giant. anyhow Try it

AFAIK you are not supposed to cache youtube videos, youtube don’t like it. I don’t understand why. Probably because their ranking mechanism relies on views, and possibly completed views, which wouldn’t be measurable if the content was served from a local cache.

After unsuccessful struggling with storeurl.pl method , I was searching for alternate method to cache youtube videos. Finally I found ruby base method using Nginx to cache YT. Using this method I was able to cache all Youtube videos almost perfectly. (not 100%, but it works fine in most cases with some modification.I am sure there will be some improvement in near future).

Updated: 24thth August, 2012

Thanks to Mr. Eliezer Croitoru & Mr.Christian Loth & others for there kind guidance.

Following components were used in this guide.

Proxy Server Configuration:
Ubuntu Desktop 10.4
Nginix version: nginx/0.7.65
Squid Cache: Version 2.7.STABLE7

Client Configuration for testing videos:
Windows XP with Internet Explorer 6
Windows 7 with Internet Explorer 8

Lets start with the Proxy Server Configuration:

1) Update Ubuntu

First install Ubuntu, After installation, configure its networking components, then update it by following command
apt-get install update

2) Install SSH Server [Optional]

Now install SSH server so that you can manage your server remotely using PUTTY or any other ssh tool.

apt-get install openssh-server

3) Install Squid Server

Now install Squid Server by following command
apt-get install squid
[This will install squid 2.7 by default]

Now edit squid configuration files by using following command

nano /etc/squid/squid.conf

Remove all lines and paste the following data

# SQUID 2.7/ Nginx TEST CONFIG FILE
# Email: aacable@hotmail.com
# Web  : https://aacable.wordpress.com
# PORT and Transparent Option
http_port 8080 transparent
server_http11 on
icp_port 0

# Cache is set to 5GB in this example (zaib)
store_dir_select_algorithm round-robin
cache_dir aufs /cache1 5000 16 256
cache_replacement_policy heap LFUDA
memory_replacement_policy heap LFUDA

# If you want to enable DATE time n SQUID Logs,use following
emulate_httpd_log on
logformat squid %tl %6tr %>a %Ss/%03Hs %<st %rm %ru %un %Sh/%<A %mt
log_fqdn off

# How much days to keep users access web logs
# You need to rotate your log files with a cron job. For example:
# 0 0 * * * /usr/local/squid/bin/squid -k rotate
logfile_rotate 14
debug_options ALL,1
cache_access_log /var/log/squid/access.log
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log

#[zaib] I used DNSAMSQ service for fast dns resolving
#so install by using "apt-get install dnsmasq" first
dns_nameservers 127.0.0.1 221.132.112.8

#ACL Section
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl SSL_ports port 443 563 # https, snews
acl SSL_ports port 873 # rsync
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 563 # https, snews
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl Safe_ports port 631 # cups
acl Safe_ports port 873 # rsync
acl Safe_ports port 901 # SWAT
acl purge method PURGE
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost
http_access allow all
http_reply_access allow all
icp_access allow all

#[zaib]I used UBUNTU so user is proxy, in FEDORA you may use use squid
cache_effective_user proxy
cache_effective_group proxy
cache_mgr aacable@hotmail.com
visible_hostname proxy.aacable.net
unique_hostname aacable@hotmail.com

cache_mem 8 MB
minimum_object_size 0 bytes
maximum_object_size 100 MB
maximum_object_size_in_memory 128 KB

refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern (Release|Packages(.gz)*)$ 0 20% 2880
refresh_pattern . 0 50% 4320
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache

# Youtube Cache Section [zaib]
url_rewrite_program /etc/nginx/nginx.rb
url_rewrite_host_header off
acl youtube_videos url_regex -i ^http://[^/]+\.youtube\.com/videoplayback\?
acl range_request req_header Range .
acl begin_param url_regex -i [?&]begin=
acl id_param url_regex -i [?&]id=
acl itag_param url_regex -i [?&]itag=
acl sver3_param url_regex -i [?&]sver=3
cache_peer 127.0.0.1 parent 8081 0 proxy-only no-query connect-timeout=10
cache_peer_access 127.0.0.1 allow youtube_videos id_param itag_param sver3_param !begin_param !range_request
cache_peer_access 127.0.0.1 deny all

Save & Exit.

4) Install Nginx

Now install Nginix by
apt-get install nginx

Now edit its config file by using following command
nano /etc/nginx/nginx.conf

Remove all lines and paste the following data

# This config file is not written by me, [syed.jahanzaib]
# My Email address is inserted Just for tracking purposes
# For more info, visit http://code.google.com/p/youtube-cache/
# Syed Jahanzaib / aacable [at] hotmail.com
user www-data;
worker_processes 4;
pid /var/run/nginx.pid;
events {
worker_connections 768;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
gzip on;
gzip_static on;
gzip_comp_level 6;
gzip_disable .msie6.;
gzip_vary on;
gzip_types text/plain text/css text/xml text/javascript application/json application/x-javascript application/xml application/xml+rss;
gzip_proxied expired no-cache no-store private auth;
gzip_buffers 16 8k;
gzip_http_version 1.1;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
# starting youtube section
server {
listen 127.0.0.1:8081;
location / {
root /usr/local/www/nginx_cache/files;
#try_files "/id=$arg_id.itag=$arg_itag" @proxy_youtube; # Old one
#try_files  "$uri" "/id=$arg_id.itag=$arg_itag.flv" "/id=$arg_id-range=$arg_range.itag=$arg_itag.flv" @proxy_youtube; #old2
try_files "/id=$arg_id.itag=$arg_itag.range=$arg_range.algo=$arg_algorithm" @proxy_youtube;
}
location @proxy_youtube {
resolver 221.132.112.8;
proxy_pass http://$host$request_uri;
proxy_temp_path "/usr/local/www/nginx_cache/tmp";
#proxy_store "/usr/local/www/nginx_cache/files/id=$arg_id.itag=$arg_itag"; # Old 1
proxy_store "/usr/local/www/nginx_cache/files/id=$arg_id.itag=$arg_itag.range=$arg_range.algo=$arg_algorithm";
proxy_ignore_client_abort off;
proxy_method GET;
proxy_set_header X-YouTube-Cache "aacable@hotmail.com";
proxy_set_header Accept "video/*";
proxy_set_header User-Agent "YouTube Cacher (nginx)";
proxy_set_header Accept-Encoding "";
proxy_set_header Accept-Language "";
proxy_set_header Accept-Charset "";
proxy_set_header Cache-Control "";}
}
}

Save & Exit.

Now Create directories to hold cache files

mkdir /usr/local/www
mkdir /usr/local/www/nginx_cache
mkdir /usr/local/www/nginx_cache/tmp
mkdir /usr/local/www/nginx_cache/files
chown www-data /usr/local/www/nginx_cache/files/ -Rf

Now create nginx .rb file

touch /etc/nginx/nginx.rb
chmod 755 /etc/nginx/nginx.rb
nano /etc/nginx/nginx.rb

Paste the following data in this newly created file

#!/usr/bin/env ruby1.8
# This script is not written by me,
# My Email address is inserted Just for tracking purposes
# For more info, visit http://code.google.com/p/youtube-cache/
# Syed Jahanzaib / aacable [at] hotmail.com
# url_rewrite_program <path>/nginx.rb
# url_rewrite_host_header off

require "syslog"
require "base64"

class SquidRequest
attr_accessor :url, :user
attr_reader :client_ip, :method

def method=(s)
@method = s.downcase
end

def client_ip=(s)
@client_ip = s.split('/').first
end
end

def read_requests
# URL <SP> client_ip "/" fqdn <SP> user <SP> method [<SP> kvpairs]<NL>
STDIN.each_line do |ln|
r = SquidRequest.new
r.url, r.client_ip, r.user, r.method, *dummy = ln.rstrip.split(' ')
(STDOUT << "#{yield r}\n").flush
end
end

def log(msg)
Syslog.log(Syslog::LOG_ERR, "%s", msg)
end

def main
Syslog.open('nginx.rb', Syslog::LOG_PID)
log("Started")

read_requests do |r|
if r.method == 'get' && r.url !~ /[?&]begin=/ && r.url =~ %r{\Ahttp://[^/]+\.youtube\.com/(videoplayback\?.*)\z}
log("YouTube Video [#{r.url}].")
"http://127.0.0.1:8081/#{$1}"
else
r.url
end
end
end
main

Save & Exit.

5) Install RUBY

What is RUBY?
Ruby is a dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write. [syed.jahanzaib]

Now install RUBY by following command

apt-get install ruby

6) Configure Squid Cache DIR and Permissions

Now create cache dir and assign proper permission to proxy user

mkdir /cache1
chown proxy:proxy /cache1
chmod -R  777 /cache1

Now initialize squid cache directories by

squid -z

You should see Following message

Creating Swap Directories

7) Finally Start/restart SQUID & Nginx

service squid start
service nginx restart

Now from test pc, open youtube and play any video, after it download completely, delete the browser cache, and play the same video again, This time it will be served from the cache. You can verify it by monitoring your WAN link utilization while playing the cached file.

Look at the below WAN utilization graph, it was taken while watching the clip which is not in cache

WAN utilization of Proxy, While watching New Clip (Not in cache)

Now Look at the below WAN utilization graph, it was taken while watching the clip which is now in CACHE.

WAN utilization of Proxy, While watching already cached Clip

Playing Video, loaded from the cache chunk by chunk

It will load first chunk from the cache, if the user keep watching the clip, it will load next chunk at the end of first chunk, and will continue to do so.

Video cache files can be found in following locations.
/usr/local/www/nginx_cache/files

e.g:

ls -lh /usr/local/www/nginx_cache/files

The above file shows the clip is in 360p quality, and the length of the clip is 5:54 Seconds.
itag=34 shows the video quality is 360p.

Credits: Thanks to Mr. Eliezer Croitoru & Mr.Christian Loth & others for there kind guidance.

Find files that have not been accessed from x days. Useful to delete old cache files that have not been accessed since x days.

 find  /usr/local/www/nginx_cache/files  -atime  +30  -type f

Regard’s
Syed Jahanzaib