Tuesday, February 28, 2017

Cassandra Walkthrough

INTRODUCTION

Today I am gonna write something about Cassandra database system.  Yes, first question comes in mind is why on earth do we need this database while already there are other more popular databases. Yes, that is true as far as our relational database has less amount of data. Since Company grows, so the database, and the conventional relational databases do not sufficiently fulfill our requirements to hold such a big data. We have to think about some alternative databases which overcomes this problem i.e. these no-SQL databases can store and access efficiently huge data.

WHY CASSANDRA?

Cassandra is no-SQL database which is used by large companies to store big data. The storage structure of cassandra is different from that of relational databases and provides faster data access with support of data replication in clusters. So, basically, cassandra can store huge data and faster access of data.

INSTALLATION

The installation of cassandra is very simple. We need to download cassandra from here

http://cassandra.apache.org/

in *.tar.gz format. We have to extract it somewhere (e.g. /opt/cassandra) and this is the root directory of cassandra.

So, to directly access cassandra related commands, we have to define /opt/cassandra/bin ad PATH directory. Just add this to /etc/environment file and restart the system.


STARTING THE SERVER

The cassandra/bin directory contains all the executables needed. To start server

cassandra -f (starts server in background, that makes it easier to close the server, just with CTRL+C)
cassandra This will start server in background. To close server, we have to kill the cassandra process id.

pgrep -f CassandraDaemon  (gets pid of cassandra)
kill pid or pkill -f CassandraDaemon

After server is started we can use the cassandra client to connect. Cassandra comes with a very handy client tool called cqlsh which connects to the cassandra server and we can execute queries to carry out database operations.

nodetool status
(Checks if cassandra is running )

Connection and running commands:
cqlsh localhost
cqlsh localhost -u cassandra -p cassandra 
describe keyspaces
use [KEYSPACE]
describe tables
exit  



NETWORK CONNECTION

Cassandra, by default, can be connected locally. That means, we have to do some changes in conf/cassandra.yaml file to carry out connection from network.
We have to not that, to carry out connection from network, the  port 9042 should not be blocked by firewall applications.

To carry out the changes, we open the file:

conf/cassandra.yaml

and fine the line listen_address line, and comment out, and define the IP address of the server.  e.g.

listen_address: 10.8.102.62

Again, we change the rpc_address also,

rpc_address: 0.0.0.0

and finally, we change the broadcast address 

broadcast_rpc_address: 10.8.102.255

If rpc_address is defined as a fixed address, then we can leave broadcast_rpc_address blank or commented. 

And, finally, we add seed provider also, here we have to add the ip-address as a seed provider. 

Just find the line which seed_provider and go to parameters, and there is already one seed defined, and you have to add the ip-address of the cassandra server. 

- seeds: "127. 0.0.1,10.8.102.62"

We save this configuration and restart server. Now, we can connect to the server from another computer in the network:


cqlsh 10.8.102.62