|
| 1 | +Using the API |
| 2 | +============= |
| 3 | + |
| 4 | +Overview |
| 5 | +~~~~~~~~ |
| 6 | + |
| 7 | +Cloud Search allows an application to quickly perform full-text and |
| 8 | +geospatial searches without having to spin up instances |
| 9 | +and without the hassle of managing and maintaining a search service. |
| 10 | + |
| 11 | +Cloud Search provides a model for indexing documents containing structured data, |
| 12 | +with documents and indexes saved to a separate persistent store optimized |
| 13 | +for search operations. |
| 14 | + |
| 15 | +The API supports full text matching on string fields and allows indexing |
| 16 | +any number of documents in any number of indexes. |
| 17 | + |
| 18 | +Client |
| 19 | +------ |
| 20 | + |
| 21 | +:class:`Client <gcloud.search.client.Client>` objects provide a means to |
| 22 | +configure your Cloud Search applications. Eash instance holds both a |
| 23 | +``project`` and an authenticated connection to the Cloud Search service. |
| 24 | + |
| 25 | +For an overview of authentication in ``gcloud-python``, see :doc:`gcloud-auth`. |
| 26 | + |
| 27 | +Assuming your environment is set up as described in that document, |
| 28 | +create an instance of :class:`Client <gcloud.search.client.Client>`. |
| 29 | + |
| 30 | +.. doctest:: |
| 31 | + |
| 32 | + >>> from gcloud import search |
| 33 | + >>> client = search.Client() |
| 34 | + |
| 35 | +Indexes |
| 36 | +~~~~~~~ |
| 37 | + |
| 38 | +Indexes are searchable collections of documents. |
| 39 | + |
| 40 | +List all indexes in the client's project: |
| 41 | + |
| 42 | +.. doctest:: |
| 43 | + |
| 44 | + >>> indexes = client.list_indexes() # API call |
| 45 | + >>> for index in indexes: |
| 46 | + ... print(index.name) |
| 47 | + ... field_names = ', '.join([field.name for field in index.fields]) |
| 48 | + ... print('- %s' % field_names) |
| 49 | + index-name |
| 50 | + - field-1, field-2 |
| 51 | + another-index-name |
| 52 | + - field-3 |
| 53 | + |
| 54 | +Create a new index: |
| 55 | + |
| 56 | +.. doctest:: |
| 57 | + |
| 58 | + >>> new_index = client.index('new-index-name') |
| 59 | + |
| 60 | +.. note:: |
| 61 | + |
| 62 | + Indexes cannot be created, updated, or deleted directly on the server: |
| 63 | + they are derived from the documents which are created "within" them. |
| 64 | + |
| 65 | +Documents |
| 66 | +~~~~~~~~~ |
| 67 | + |
| 68 | +Create a document instance, which is not yet added to its index on |
| 69 | +the server: |
| 70 | + |
| 71 | +.. doctest:: |
| 72 | + |
| 73 | + >>> index = client.index('index-id') |
| 74 | + >>> document = index.document('document-1') |
| 75 | + >>> document.exists() # API call |
| 76 | + False |
| 77 | + >>> document.rank |
| 78 | + None |
| 79 | + |
| 80 | +Add one or more fields to the document: |
| 81 | + |
| 82 | +.. doctest:: |
| 83 | + |
| 84 | + >>> field = document.Field('fieldname') |
| 85 | + >>> field.add_value('string') |
| 86 | + |
| 87 | +Save the document into the index: |
| 88 | + |
| 89 | +.. doctest:: |
| 90 | + |
| 91 | + >>> document.create() # API call |
| 92 | + >>> document.exists() # API call |
| 93 | + True |
| 94 | + >>> document.rank # set by the server |
| 95 | + 1443648166 |
| 96 | + |
| 97 | +List all documents in an index: |
| 98 | + |
| 99 | +.. doctest:: |
| 100 | + |
| 101 | + >>> documents = index.list_documents() # API call |
| 102 | + >>> [document.id for document in documents] |
| 103 | + ['document-1'] |
| 104 | + |
| 105 | +Delete a document from its index: |
| 106 | + |
| 107 | +.. doctest:: |
| 108 | + |
| 109 | + >>> document = index.document('to-be-deleted') |
| 110 | + >>> document.exists() # API call |
| 111 | + True |
| 112 | + >>> document.delete() # API call |
| 113 | + >>> document.exists() # API clal |
| 114 | + False |
| 115 | + |
| 116 | +.. note:: |
| 117 | + |
| 118 | + To update a document in place after manipulating its fields or rank, just |
| 119 | + recreate it: E.g.: |
| 120 | + |
| 121 | + .. doctest:: |
| 122 | + |
| 123 | + >>> document = index.document('document-id') |
| 124 | + >>> document.exists() # API call |
| 125 | + True |
| 126 | + >>> document.rank = 12345 |
| 127 | + >>> field = document.field('field-name') |
| 128 | + >>> field.add_value('christina aguilera') |
| 129 | + >>> document.create() # API call |
| 130 | + |
| 131 | +Fields |
| 132 | +~~~~~~ |
| 133 | + |
| 134 | +Fields belong to documents and are the data that actually gets searched. |
| 135 | + |
| 136 | +Each field can have multiple values, which can be of the following types: |
| 137 | + |
| 138 | +- String (Python2 :class:`unicode`, Python3 :class:`str`) |
| 139 | +- Number (Python :class:`int` or :class:`float`) |
| 140 | +- Timestamp (Python :class:`datetime.datetime`) |
| 141 | +- Geovalue (Python tuple, (:class:`float`, :class:`float`)) |
| 142 | + |
| 143 | +String values can be tokenized using one of three different types of |
| 144 | +tokenization, which can be passed when the value is added: |
| 145 | + |
| 146 | +- **Atom** (``atom``) means "don't tokenize this string", treat it as one |
| 147 | + thing to compare against. |
| 148 | + |
| 149 | +- **Text** (``text``) means "treat this string as normal text" and split words |
| 150 | + apart to be compared against. |
| 151 | + |
| 152 | +- **HTML** (``html``) means "treat this string as HTML", understanding the |
| 153 | + tags, and treating the rest of the content like Text. |
| 154 | + |
| 155 | +.. doctest:: |
| 156 | + |
| 157 | + >>> from gcloud import search |
| 158 | + >>> client = search.Client() |
| 159 | + >>> index = client.index('index-id') |
| 160 | + >>> document = index.document('document-id') |
| 161 | + >>> field = document.field('field-name') |
| 162 | + >>> field.add_value('britney spears', tokenization='atom') |
| 163 | + >>> field.add_value(''<h1>Britney Spears</h1>', tokenization='html') |
| 164 | + |
| 165 | +Searching |
| 166 | +~~~~~~~~~ |
| 167 | + |
| 168 | +After populating an index with documents, search through them by |
| 169 | +issuing a search query: |
| 170 | + |
| 171 | +.. doctest:: |
| 172 | + |
| 173 | + >>> from gcloud import search |
| 174 | + >>> client = search.Client() |
| 175 | + >>> index = client.index('index-id') |
| 176 | + >>> query = client.query('britney spears') |
| 177 | + >>> matching_documents = index.search(query) # API call |
| 178 | + >>> for document in matching_documents: |
| 179 | + ... print(document.id) |
| 180 | + ['document-id'] |
| 181 | + |
| 182 | +By default, all queries are sorted by the ``rank`` value set when the |
| 183 | +document was created. See: |
| 184 | +https://cloud.google.com/search/reference/rest/v1/projects/indexes/documents#resource_representation.google.cloudsearch.v1.Document.rank |
| 185 | + |
| 186 | +To sort differently, use the ``order_by`` parameter: |
| 187 | + |
| 188 | +.. doctest:: |
| 189 | + |
| 190 | + >>> ordered = client.query('britney spears', order_by=['field1', '-field2']) |
| 191 | + |
| 192 | +Note that the ``-`` character before ``field2`` means that this query will |
| 193 | +be sorted ascending by ``field1`` and then descending by ``field2``. |
| 194 | + |
| 195 | +To limit the fields to be returned in the match, use the ``fields`` paramater: |
| 196 | + |
| 197 | +.. doctest:: |
| 198 | + |
| 199 | + >>> projected = client.query('britney spears', fields=['field1', 'field2']) |
0 commit comments