Posts

Search Engine NoSQL Database

Search Engine NoSQL Database

Search engine databases are NoSQL databases that deal with data that does not necessarily conform to the rigid structural requirements of relation database management systems (RDBMS) as data for search may be text-based, semi-structured, or unstructured. Search engine databases are made to help users quickly find information they need in a high-quality and cost-effective manner. They are optimized for key word queries and typically offer specialized methods such as full-text search, complex search expressions, and ranking of search results.

Search engine databases contain two main components. First content is added to the search engine database index. Then when a user executes a query, relevant results are rapidly returned utilizing the search engine database index. Fast search responses are possible because instead of searching the text directly, queries perform searches against an index. This is the equivalent of retrieving pages in a book related to a keyword by searching the index at the back of a book, as opposed to searching each of the words in each page of the book. This type of index is known as an inverted index, because it converts a page-centric data structure to a keyword-centric data structure.

Search engine databases commonly support the following types of search functionality:

•  Full-text search:  Compares every word of the search request against every word within a file. Examines all the words in every stored file that contains natural language text such as English, French, or Spanish. And is appropriate when data to be discovered is mostly free-form text like that of a news article, academic paper, essay, or book.
•  Semi-structured search:  Searches of data that have both the rigid structure of an RDBMS and full-text sentences like those in a MS Word or PDF document as they can be converted to either XML or JSON format. Semi-structured data is a form of data that has a self-describing structure and contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data.
•  Geographic search:  Associates locations to web resources in order to answer location-based queries. Search results will not only be related to the topic of a query, but they will also be related to a physical location associated with the query. Thus, physical locations will be retrieved are in proximity of the search topic.
•  Network search:  Offers a relationship-oriented approach to search that lets users explore the connections in data within stored documents. This can include linkages between people, places, preferences, & products and is useful in discovering relevance of relationships. The search engine processes natural language queries to return information from across network graphs
•  Navigational search:  Augments other search capabilities with a guided-navigation system allowing users to narrow down search results by applying multiple filters based on classification of items. Navigational search uses a hierarchy structure or taxonomy of categories to enable users to browse information by choosing from a pre-determined set of categories. This allows a user to type in a simple query, then refine their search options by either navigating or drilling down into a category.
•  Vector search:  Ranks document results based upon how close they are to search keywords utilizing multi-dimensional vector distance models. Vector search is a way to conduct “fuzzy search”, i.e. a way to find documents that are close to a keyword. They help find inexact matches to documents that are “in-the-neighborhood” of search keywords.

Share

What is NoSQL?

Definition of NoSQL

Recently, NoSQL databases have been developed that provide a high-performance and salable alternative to more traditional relational database management systems (RDBMS), especially when dealing with large amounts of unstructured or semi-structured data. NoSQL, which stands for “Not Only SQL” (https://en.wikipedia.org/wiki/NoSQL) is unlike RDBMS as it is designed for processing large collections of distributed data that don’t fit well into strict rows and columns. And NoSQL databases are ideal solutions for implementations of Big Data initiatives. Moreover, the substantial increase in amount, speed, and variation of Big Data in recent years has greatly increased the need for deployments of NoSQL databases. While traditional RDBMS are very useful for the processing of highly-structured data, NoSQL databases typically accommodate either semi-structured data, fully-unstructured data, documents, graphs, or dynamic schema.  And NoSQL databases are now widely recognized for their ease of development, functionality, and performance at scale.

The term NoSQL can be applied to some databases that were available before traditional RDBMS, but more often the term refers to databases developed in the mid to late 2000s for the purpose of large-scale database processing within web and mobile based applications. Within these emerging applications, requirements for performance and scalability outweighed the conventional requirement for the rigid data consistency that existing RDBMS provided to transactional applications.  Subsequently, NoSQL databases for web applications have tended to focus on very specific characteristics of data management. The ability to process very large volumes of data and quickly distribute that data across computing processors and clusters has been very desirable in large-scale web application design. There has also been a greater need for flexible data schema, or no schema at all, in order to better implement rapid changes to applications.

An advantage of NoSQL databases over traditional RBMS is that they store and manage data in ways that allow for high operational speed and great flexibility on the part of system developers. In addition, data can be stored in a schema-less or free-form fashion. Any data can be stored in any record. And unlike traditional RDBMS, many NoSQL databases can be scaled horizontally across hundreds or thousands of commodity servers. And NoSQL databases typically utilize lower amounts system memory than RDBMS. This allows for NoSQL databases to achieve much higher performance than traditional RDBMS.

 

NoSQL Databases Typically Contain the Following Types of Data:

•  Semi-structured Data:  CSV, Word, Excel, PowerPoint, Documents, PDFs, Logs, XML, JSON
•  Unstructured Data:  Emails, Text, Messages, Blog Entries, Twitter
•  Binary Data:  Graphics, Images, Audio, Video

 

NoSQL Database Types

•  Key-Value Stores: A simple data storage system that pairs a unique key with an associated value.  Typical uses include: dictionaries, image stores, document/file stores, query cache, lookup tables.
•  Document Stores:  Data stores that pair each key with a complex data structure known as a document.  Documents are typically semi-structured either in XML or JSON formats.  Typical uses include: MS Word documents, MS Excel documents, spreadsheets, presentations, PDF files, sales orders, invoices, product descriptions, web pages, forms.
•  Graph Stores:  Data stores that organize data as nodes, which are like records in a relational database, and edges, which represent connections between nodes.  Typical uses include: social networks, fraud detection, pattern matching, relationship-heavy data.
•  Wide Column / Column Family Stores:  Data stores that have the ability to hold very large numbers of dynamic columns. But unlike a relational database, the names and format of the columns can vary from row to row in the same table.  Typical uses include: web crawling, large sparsely populated tables, highly-adaptive systems, high-variance systems.
•  Native XML Databases:  Data stores that allow data to be stored in the extensible markup language (XML) format, a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. XML databases are a sub-category of document stores.
•  Search Engines:  Information retrieval systems designed to help find information stored on a computer system.
•  Multi-Modal Databases:  Data stores that contain aspects of multiple types of NoSQL database all within one product.

 

NoSQL Database Products (2018)

 

Share