Overview
Elasticsearch is a lot of things to a lot of people, the simplest definition might be an open source search engine that uses Apache Lucene as its engine.
The type of index used is called an inverted index. Instead of having index of page -> words it use words -> page index. This is similar to book index that we usually find in the back of a book.
Several notable Elasticsearch users are StumbleUpon, Quora, Foursquare, Etsy, Soundcloud, GitHub, Stack Exchange and Netfix.
You might want to see how people define Elasticsearch in a video that Elastic created titled “How would you describe Elasticsearch?”. You can watch the video below.
In this tutorial we’ll learn how-to install Elasticsearch on a single node. We’ll also learn how-to manage Elasticsearch and the basic usage of Elasticsearch.
Installing Prerequisites
Elasticsearch needs Java Virtual Machine (JVM) to run. We will use Oracle JDK 8 instead of OpenJDK in this tutorial. We will install Oracle JDK 8 using Webupd8 team team PPA repository.
Add the webupd8team ppa repository :
$ sudo add-apt-repository ppa:webupd8team/java
...
Press [ENTER] to continue or ctrl-c to cancel adding it
...
OK
You need to press enter to continue adding the webupd8team PPA repository. The output is truncated to show you only the most important part
Let apt-get download and read the metadata of the new repository that we just added:
$ sudo apt-get update
Install JDK 8.
$ sudo apt-get -y install oracle-java8-installer
the -y
option above will make you agree automatically with packages to be installed including dependencies. If you want to check what packages will be installed you can remove the -y
option above.
Package configuration. Choose OK
Accepting Oracle Binary Code Lisence Terms. Choose Yes
After installing Java 8, you can check the current java version by running command below :
$ java -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
We’ve confirmed that we already have JDK 8 installed.
Installing Elasticsearch
Now let’s install Elasticsearch, The first step that we have to do is add elastic.co repository package signing key.
$ wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Now, we add Elasticsearch repo from elastic.co. Elasticsearch provide a separate repository for each major version. 1.1.x should use 1.1 repository. 1.7.x should use 1.7 repository etc. This separation is being made to avoid accidental upgrade.
The command below is to install Elasticsearch 1.7. If you want to use Elasticsearch 1.6 for example you have to change 1.7 below with 1.6.
$ echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-1.7.list
After we add Elasticsearch repository, now we’ll read repository metadata again so it will include Elasticsearch repo and install Elasticsearch.
$ sudo apt-get update
$ sudo apt-get -y install elasticsearch
Let’s check Elasticsearch service status
$ sudo service elasticsearch status
* elasticsearch is not running
By default, Elasticsearch is not running, we can start Elasticsearch using command below.
$ sudo service elasticsearch start
* Starting Elasticsearch Server
After starting Elasticsearch, the state should change to running.
$ sudo service elasticsearch status
* elasticsearch is running
We’ll make elasticsearch running on boot by running update-rc.d.
$ sudo update-rc.d elasticsearch defaults 95 10
Let’s check the setting above by rebooting the machine by running command below. Please make sure that you don’t have another service running that being used by users on the server before rebooting the server.
$ sudo reboot
After reboot. Let’s check Elasticsearch service once again :
$ sudo service elasticsearch status
* elasticsearch is running
Elasticsearch is automatically running on reboot.
Alternative Way to Install Elasticsearch
There are alternative way of installing Elasticsearch. The first alternative is by downloading the .deb
package directly and install using dpkg
command. You don’t have to add the Elasticsearch repository if you are using this method. To use this method you can use command below :
Download Elasticsearch Debian package.
$ wget -c https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.3.deb
Download file that contain sha1 hash for the package above
$ wget -c https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.3.deb.sha1.txt```
Let's check the Elasticsearch ```.deb``` package that we already downloaded using ```sha1sum``` command.
$ sha1sum -c elasticsearch-1.7.3.deb.sha1.txt
elasticsearch-1.7.3.deb: OK
It should show **OK**. If you get output like the one below, then you have to re-download the elasticsearch ```.deb``` package
elasticsearch-1.7.3.deb: FAILED
sha1sum: WARNING: 1 computed checksum did NOT match
Now, let's install using ```dpkg``` command
$ sudo dpkg -i elasticsearch-1.7.3.deb
You can the follow the guide above to start and check status of the Elasticsearch server and also make sure Elasticsearch is automatically running on reboot.
### Installing Elasticsearch from tarball Archive
The last method to install Elasticsearch is downloding binary file on as a ```zip``` or ```.tar.gz``` file. In this tutorial we'll download a ```.tar.gz``` file. This method might be useful if you are using Elasticsearch when developing apps and want to run Elasticsearch on demand and not as a service.
Let's download the file using wget
$ wget -c https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.3.tar.gz
Download the file that contain sha1 hash of the file :
$ wget -c https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.3.tar.gz.sha1.txt
Check the SHA1 hash of the Elasticsearch package
$ sha1sum -c elasticsearch-1.7.3.tar.gz.sha1.txt
elasticsearch-1.7.3.tar.gz: OK
It should give you output **OK**. If you get another output you need to re-download Elasticsearch package.
Now, extract Elasticsearch archive file.
$ tar xzf elasticsearch-1.7.3.tar.gz
Move the extracted files to elasticsearch folder and move this folder to ```/opt``` directory.
$ mv elasticsearch-1.7.3 elasticsearch
$ sudo mv elasticsearch /opt/
To run Elasticsearch you can run command below
$ cd /opt/elasticsearch
$ bin/elasticsearch
If you want to run Elasticsearch as daemon you can use ```-d``` option
$ cd /opt/elasticsearch
$ bin/elasticsearch -d
## Configuring Elasticsearch
Elasticsearch configuration located in ```/etc/elasticsearch```. There are two files in this folder : ```elasticsearch.yml``` that contain elasticsearch configuration like node name, cluster name etc. and ```logging.yml``` that contain specific configuration for logging.
Another configuration file located in ```/etc/default/elasticsearch```. This file contain environment variable and Java Options used by Elasticsearch
## Managing Elasticsearch
You can manage Elasticsearch using sevice command. To see the available options you can run the command below :
$ sudo service elasticsearch
* Usage: /etc/init.d/elasticsearch {start|stop|restart|force-reload|status}
You can select one of the options listed inside curly braces.
If you want to check what ports Elasticsearch is listening to you can use ```netstat``` command.
$ sudo netstat -naptu | grep LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 906/sshd
tcp6 0 0 :::9300 :::* LISTEN 2549/java
tcp6 0 0 :::22 :::* LISTEN 906/sshd
tcp6 0 0 :::9200 :::* LISTEN 2549/java
Port 9300 is used to connect from client to Elasticsearch using native Elasticsearch protocol, this port also being used by elasticsearch to form a cluster. You can create an Elasticsearch cluster when you have more than one Elasticsearch server.
Port 9200 is Elasticsearch HTTP RESTful API. The data being exchanged between your application and Elasticsearch using [JSON](http://www.json.org/) format.
## Using Elasticsearch
In this tutorial we'll only use ```curl``` to get data and input data to Elasticsearch. First of all let's do the simplest command below:
$ curl -XGET ‘http://localhost:9200/’
{
“status” : 200,
“name” : “Kurt Wagner”,
“cluster_name” : “elasticsearch”,
“version” : {
"number" : "1.7.3",
"build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682",
"build_timestamp" : "2015-10-15T09:14:17Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
“tagline” : “You Know, for Search”
}
The output above shows us the node name (```name```), cluster name and details about Elasticsearch that we currently use.
Now, let's check Elasticsearch cluster health. Cluster health endpoint is ```http://localhost:9200/_cluster/health```.
$ curl -XGET ‘http://localhost:9200/_cluster/health’
{“cluster_name”:”elasticsearch”,”status”:”green”,”timed_out”:false,”number_of_nodes”:1,”number_of_data_nodes”:1,”active_primary_shards”:0,”active_shards”:0,”relocating_shards”:0,”initializing_shards”:0,”unassigned_shards”:0,”delayed_unassigned_shards”:0,”number_of_pending_tasks”:0,”number_of_in_flight_fetch”:0}
You see that the output above is only one line and cannot be read easily. You can append ```?pretty``` with any query that you have so you have prettier and more readable output. Let's check cluster health one more time :
curl -XGET ‘http://localhost:9200/_cluster/health?pretty’
{
“cluster_name” : “elasticsearch”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“active_primary_shards” : 0,
“active_shards” : 0,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0
}
Another information that we want to get is node(s) info. You can run command belwo to get nodes status.
$ curl http://localhost:9200/_nodes?pretty
{
“cluster_name” : “elasticsearch”,
“nodes” : {
"iP5vBg_GSO-8iZIca46aHg" : {
"name" : "Kurt Wagner",
"transport_address" : "inet[/10.15.0.6:9300]",
"host" : "labs",
"ip" : "127.0.1.1",
"version" : "1.7.3",
"build" : "05d4530",
"http_address" : "inet[/10.15.0.6:9200]",
"settings" : {
"pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
"path" : {
"conf" : "/etc/elasticsearch",
"data" : "/var/lib/elasticsearch",
"logs" : "/var/log/elasticsearch",
"work" : "/tmp/elasticsearch",
"home" : "/usr/share/elasticsearch"
},
…
I truncated the output above since the output is pretty long.
### Playing With Index
Enough with basic usage of Elasticsearch, let's play more with Elasticsearch. I will not go into the detail of Elasticsearch here but only a brief introduction on how Elasticsearch manage its data.
Basically, We create Index as the main grouping of our data. Inside an index we create type. For example in this tutorial we'll create an index named ```movies```.
Let's check the current status of indices on our elasticsearch
$ curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
No index yet. We'll create a new index called ```movies```.
$ curl -XPUT ‘localhost:9200/movies?pretty’
{
“acknowledged” : true
}
Let's recheck the indices status
$ curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open movies 5 1 0 0 575b 575b
Now we have one index called movies with number of documents (```docs.count```) zero.
Now let's delete the index. you can use command below to delete ```movies``` index.
$ curl -XDELETE ‘localhost:9200/movies?pretty’
{
“acknowledged” : true
}
If we check the status of indices on our Elasticsearch server, it will back to zero.
$ curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
### Input Data
You don't have to create index before inputting data to Elasticsearch. Now we'll add data directly to Elasticsearch. The command below will create index called ```movies``` and type called ```movie```.
$ curl -XPOST ‘localhost:9200/movies/movie/1’ -d ‘
{
"title": "Wings",
"imdbId": "tt0018578",
"releaseDate": "1927-05-19T05:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1927,
"releaseMonth": 4,
"releaseDay": 19
}’
When we check the indices status, we'll get information that we have on index with 1 document inside
$ curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open movies 5 1 1 0 4.2kb 4.2kb
Let's add more data
$ curl localhost:9200/movies/movie/3?pretty -d ‘{
"title": "The Broadway Melody",
"imdbId": "tt0019729",
"releaseDate": "1929-02-01T05:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1929,
"releaseMonth": 1,
"releaseDay": 1
}’
$ curl localhost:9200/movies/movie/3?pretty -d ‘{
"title": "All Quiet on the Western Front",
"imdbId": "tt0020629",
"releaseDate": "1930-04-21T04:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1930,
"releaseMonth": 3,
"releaseDay": 21
}’
$ curl localhost:9200/movies/movie/3?pretty -d ‘{
"title": "Cimarron",
"imdbId": "tt0021746",
"releaseDate": "1931-01-26T05:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1931,
"releaseMonth": 0,
"releaseDay": 26
}’
Rechecking the index, we now have 4 documents inside ```movies``` index :
$ curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open movies 5 1 4 0 15.8kb 15.8kb
### Searching Data
This is the main power of Elasticsearch that we want to use, search. let's query any data that contain 1929.
$ curl ‘localhost:9200/movies/movie/_search?q=1929’
{“took”:5,”timed_out”:false,”_shards”:{“total”:5,”successful”:5,”failed”:0},”hits”:{“total”:1,”max_score”:0.10848885,”hits”:[{“_index”:”movies”,”_type”:”movie”,”_id”:”2″,”_score”:0.10848885,”_source”:
{
"title": "The Broadway Melody",
"imdbId": "tt0019729",
"releaseDate": "1929-02-01T05:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1929,
"releaseMonth": 1,
"releaseDay": 1
}}]}}
Of course you can add ```&pretty``` on the URL to get pretty output.
$ curl ‘localhost:9200/movies/movie/_search?q=1929&pretty’
{
“took” : 2,
“timed_out” : false,
“_shards” : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
“hits” : {
"total" : 1,
"max_score" : 0.10848885,
"hits" : [ {
"_index" : "movies",
"_type" : "movie",
"_id" : "2",
"_score" : 0.10848885,
"_source":
{
"title": "The Broadway Melody",
"imdbId": "tt0019729",
"releaseDate": "1929-02-01T05:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1929,
"releaseMonth": 1,
"releaseDay": 1
}
} ]
}
}
If you search through an index without giving parameter, it will show all data inside the index :
$ curl ‘http://localhost:9200/movies/movie/_search?pretty’
{
“took” : 3,
“timed_out” : false,
“_shards” : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
“hits” : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "movies",
"_type" : "movie",
"_id" : "4",
"_score" : 1.0,
"_source":{
"title": "Cimarron",
"imdbId": "tt0021746",
"releaseDate": "1931-01-26T05:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1931,
"releaseMonth": 0,
"releaseDay": 26
}
}, {
"_index" : "movies",
"_type" : "movie",
"_id" : "1",
"_score" : 1.0,
"_source":
{
"title": "Wings",
"imdbId": "tt0018578",
"releaseDate": "1927-05-19T05:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1927,
"releaseMonth": 4,
"releaseDay": 19
}
}, {
"_index" : "movies",
"_type" : "movie",
"_id" : "2",
"_score" : 1.0,
"_source":
{
"title": "The Broadway Melody",
"imdbId": "tt0019729",
"releaseDate": "1929-02-01T05:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1929,
"releaseMonth": 1,
"releaseDay": 1
}
}, {
"_index" : "movies",
"_type" : "movie",
"_id" : "3",
"_score" : 1.0,
"_source":{
"title": "All Quiet on the Western Front",
"imdbId": "tt0020629",
"releaseDate": "1930-04-21T04:00:00.000Z",
"releaseCountry": "USA",
"releaseYear": 1930,
"releaseMonth": 3,
"releaseDay": 21
}
} ]
}
}
### Delete Data
To delete data we can use ```-XDELETE``` and provide the URL of the data. For example if we want to delete movie no 4 we can use command below :
$ curl -XDELETE http://localhost:9200/movies/movie/4?pretty
{
“found” : true,
“_index” : “movies”,
“_type” : “movie”,
“_id” : “4”,
“_version” : 2
}
When we check our index, now we have 3 documents on our index.
$ curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open movies 5 1 3 0 15.8kb 15.8kb
“`
Summary
In this tutorial we learned how-to install Elasticsearch on Ubuntu 14.04, how-to do basic configuration and management of elasticsearch, and also basic usage of Elasticsearch.