Getting Started

Tip

To install ElasticSearch, just follow the instructions on their web site, The Debian package for Elasticsearch.

ElasticSearch 6

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

Install

sudo apt update
sudo apt install elasticsearch
sudo service elasticsearch start

Tips

Tip

See https://www.kbsoftware.co.uk/docs/dev-elasticsearch.html for how to install the Phoentic Analysis plugins.

Query

Match all documents:

{ "match_all": {} }

Sample

Using HTTPie:

http GET http://localhost:9200/job-index/_analyze analyzer=my_analyzer text="Plymouth"

Stats

Using the python API:

from elasticsearch import Elasticsearch
es = ElasticSearch()
stats = es.indices.stats('my-index')

import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(stats)

stats['_all']['primaries']['docs']
# {'count': 6, 'deleted': 0}

DjangoConEU 2015

Based on lucene

No need to just index what is in the model. You can cram as much stuff as you want into a document. Does not have to be in a simple key/value format. It will happily accept lists etc. Just has to be in the format of a simple JSON document.

We must have an _id field e.g:

def to_search(self):
      return {
          '_id': self.pk
          'creation_date' self.creation_date,
          'body': self.body,
          'score': self.score,
          'comments': [c.to_search() for c in self.comments()],
      }
      # using the DocType from below
      return QuestionDoc(meta={'id': d.pop('_id')}, **d)

Very easy to query many indexes at once.

After loading

To verify that the information has loaded into ElasticSearch:

http://localhost:9200/
http://localhost:9200/_search
http://localhost:9200/_search?q=bean
http://localhost:9200/_search?q=tags:bean
http://localhost:9200/_search?q=awful flavor
  • http://localhost:9200/ will return the version number.

  • Scoring not relevant when only search for one word.

  • It used to ignore the common words e.g. the, but not longer.

Client:

# this is a very low level api
from elasticsearch import ElasticSearch
es = ElasticSearch()
es.info()
es.search(q='awful flavour')
es.search(body={"query": {"filtered": {"query": {"bool": {"should": [{"match": {"title": "bean"}}, {"match": {"body": "bean"}}}, "filter": {"term": {"tags": "beans"}}}})
es.indices.get_mapping(index='stack', doc_type='question')

# this is better
from elasticsearch_dsl import Search
s = Search()
# one query type
s = s.query('match', body='bean')
s.to_dict()
# another query type
s.filter('term', tags='beans')
s.query(
    'bool',
    should=[
        Q('match', title='beans'),
        Q('match', title__ngram='beans'),
        Q('match', title={'query': 'beans', 'fuzzinesss': 2}),
    ],
    minimum_should_match='30%'
)
# result can use dot notation e.g.
result.comment
# for the id, we use meta
result.meta.id
result.aggregations.per_tag.buckets

# DocType is just like a Django model
# in search.py
# ElasticSearch still uses the dynamic mappings
from elasticsearch_dsl import DocType
class Question(DocType):
    creation_date = Date()
    tags = String(index='not_analyzed', multi=True)

Question._doc_type.mapping.to_dict()
# refresh the actual field types from elasticsearch
Question._doc_type.refresh()
Question._doc_type.mapping.to_dict()
Question.get(id=464)

Reply on post_save being more or less reliable and then reindex everything every now and again:

def update_search(instance, **kwargs):
    instance.to_search().save()

post_save.connect(update_search, sender=Answer)

You should have 1 server or more than 2. Do not have 2 servers. This is called split brain.