Log analytics the straightforward approach with Amazon OpenSearch Serverless

0
12


We lately introduced the preview launch of Amazon OpenSearch Serverless, a brand new serverless possibility for Amazon OpenSearch Service, which makes it simple so that you can run large-scale search and analytics workloads with out having to configure, handle, or scale OpenSearch clusters. It mechanically provisions and scales the underlying sources to ship quick knowledge ingestion and question responses for even probably the most demanding and unpredictable workloads.

OpenSearch Serverless helps two major use instances:

  • Log analytics that focuses on analyzing massive volumes of semi-structured, machine-generated time collection knowledge for operational, safety, and person habits insights
  • Full-text search that powers buyer functions of their inside networks (content material administration programs, authorized paperwork) and internet-facing functions reminiscent of ecommerce web site catalog search and content material search

This submit focuses on constructing a easy log analytics pipeline with OpenSearch Serverless.

Resolution overview

Within the following sections, we stroll by way of the steps to create and entry a group in OpenSearch Serverless, and show the right way to configure two completely different knowledge ingestion pipelines to index knowledge into the gathering.

Create a group

To get began with OpenSearch Serverless, you first create a group. A assortment in OpenSearch Serverless is a logical grouping of a number of indexes that signify an analytics workload.

The next graphic offers a fast navigation for creating a group. Alternatively, confer with this weblog submit to study extra about the right way to create and configure a group in OpenSearch Serverless.

Entry the gathering

You should use the AWS Identification and Entry Administration (IAM) credentials with a secret key and entry key ID on your IAM customers and roles to entry your assortment programmatically. Alternatively, you may arrange SAML authentication for accessing the OpenSearch Dashboards. Notice that SAML authentication is simply accessible to entry OpenSearch Dashboards; you require IAM credentials to carry out any operations utilizing the AWS Command Line Interface (AWS CLI), API, and OpenSearch shoppers for indexing and looking knowledge. On this submit, we use IAM credentials to entry the collections.

Create an information ingestion pipeline

OpenSearch Serverless helps the identical ingestion pipelines because the open-source OpenSearch and managed clusters. These shoppers embody functions like Logstash and Amazon Kinesis Information Firehose, and language shoppers like Java Script, Python, Go, Java, and extra. For extra particulars on all of the ingestion pipelines and supported shoppers, confer with ingesting knowledge into OpenSearch Serverless collections.

Utilizing Logstash

The open-source model of Logstash (Logstash OSS) offers a handy approach to make use of the majority API to add knowledge into your collections. OpenSearch Serverless helps the logstash-output-opensearch output plugin, which helps IAM credentials for knowledge entry management. On this submit, we present the right way to use the file enter plugin to ship knowledge out of your command line console to an OpenSearch Serverless assortment. Full the next steps:

  1. Obtain the logstash-oss-with-opensearch-output-plugin file (this instance makes use of the distro for macos-x64; for different distros, confer with the artifacts):
    wget https://artifacts.opensearch.org/logstash/logstash-oss-with-opensearch-output-plugin-8.4.0-macos-x64.tar.gz

  2. Extract the downloaded tarball:
    tar -zxvf logstash-oss-with-opensearch-output-plugin-8.4.0-macos-x64.tar.gz
    cd logstash-8.4.0/

  3. Replace the logstash-output-opensearch plugin to the newest model:
    ./bin/logstash-plugin replace logstash-output-opensearch

    The OpenSearch output plugin for OpenSearch Serverless makes use of IAM credentials to authenticate. On this instance, we present the right way to use the file enter plugin to learn knowledge from a file and ingest into an OpenSearch Serverless assortment.

  4. Create a log file with the next pattern knowledge and identify it pattern.log:
    {"deviceId":2823605996,"fleetRegNo":"IRV82MBYQ1","oilLevel":0.92,"milesTravelled":1.105,"totalFuelUsed":0.01,"provider":"AOS Van Strains","temperature":14,"tripId":6741375582,"originODC":"ODC Las Vegas","originCountry":"United States","originCity":"Las Vegas","originState":"Nevada","originGeo":"36.16,-115.13","destinationODC":"ODC San Jose","destinationCountry":"United States","destinationCity":"San Jose","destinationState":"California","destinationGeo":"37.33,-121.89","speedInMiles":18,"distanceMiles":382.81,"milesToDestination":381.705,"@timestamp":"2022-11-17T17:11:25.855Z","site visitors":"heavy","weather_category":"Cloudy","climate":"Cloudy"}
    {"deviceId":2823605996,"fleetRegNo":"IRV82MBYQ1","oilLevel":0.92,"milesTravelled":1.105,"totalFuelUsed":0.01,"provider":"AOS Van Strains","temperature":14,"tripId":6741375582,"originODC":"ODC Las Vegas","originCountry":"United States","originCity":"Las Vegas","originState":"Nevada","originGeo":"36.16,-115.13","destinationODC":"ODC San Jose","destinationCountry":"United States","destinationCity":"San Jose","destinationState":"California","destinationGeo":"37.33,-121.89","speedInMiles":18,"distanceMiles":382.81,"milesToDestination":381.705,"@timestamp":"2022-11-17T17:11:26.155Z","site visitors":"heavy","weather_category":"Cloudy","climate":"Heavy Fog"}
    {"deviceId":2823605996,"fleetRegNo":"IRV82MBYQ1","oilLevel":0.92,"milesTravelled":1.105,"totalFuelUsed":0.01,"provider":"AOS Van Strains","temperature":14,"tripId":6741375582,"originODC":"ODC Las Vegas","originCountry":"United States","originCity":"Las Vegas","originState":"Nevada","originGeo":"36.16,-115.13","destinationODC":"ODC San Jose","destinationCountry":"United States","destinationCity":"San Jose","destinationState":"California","destinationGeo":"37.33,-121.89","speedInMiles":18,"distanceMiles":382.81,"milesToDestination":381.705,"@timestamp":"2022-11-17T17:11:26.255Z","site visitors":"heavy","weather_category":"Cloudy","climate":"Cloudy"}
    {"deviceId":2823605996,"fleetRegNo":"IRV82MBYQ1","oilLevel":0.92,"milesTravelled":1.105,"totalFuelUsed":0.01,"provider":"AOS Van Strains","temperature":14,"tripId":6741375582,"originODC":"ODC Las Vegas","originCountry":"United States","originCity":"Las Vegas","originState":"Nevada","originGeo":"36.16,-115.13","destinationODC":"ODC San Jose","destinationCountry":"United States","destinationCity":"San Jose","destinationState":"California","destinationGeo":"37.33,-121.89","speedInMiles":18,"distanceMiles":382.81,"milesToDestination":381.705,"@timestamp":"2022-11-17T17:11:26.556Z","site visitors":"heavy","weather_category":"Cloudy","climate":"Heavy Fog"}
    {"deviceId":2823605996,"fleetRegNo":"IRV82MBYQ1","oilLevel":0.92,"milesTravelled":1.105,"totalFuelUsed":0.01,"provider":"AOS Van Strains","temperature":14,"tripId":6741375582,"originODC":"ODC Las Vegas","originCountry":"United States","originCity":"Las Vegas","originState":"Nevada","originGeo":"36.16,-115.13","destinationODC":"ODC San Jose","destinationCountry":"United States","destinationCity":"San Jose","destinationState":"California","destinationGeo":"37.33,-121.89","speedInMiles":18,"distanceMiles":382.81,"milesToDestination":381.705,"@timestamp":"2022-11-17T17:11:26.756Z","site visitors":"heavy","weather_category":"Cloudy","climate":"Cloudy"}

  5. Create a brand new file and add the next content material, and save the file as logstash-output-opensearch.conf after offering the details about your file path, host, Area, entry key, and secret entry key:
    enter {
       file {
         path => "<path/to/your/pattern.log>"
         start_position => "starting"
       }
    }
    output {
        opensearch {
            ecs_compatibility => disabled
            index => "logstash-sample"
            hosts => "<HOST>:443"
            auth_type => {
                kind => 'aws_iam'
                aws_access_key_id => '<AWS_ACCESS_KEY_ID>'
                aws_secret_access_key => '<AWS_SECRET_ACCESS_KEY>'
                area => '<REGION>'
                service_name => 'aoss'
                }
            legacy_template => false
            default_server_major_version => 2
        }
    }

  6. Use the next command to start out Logstash with the config file created within the earlier step. This creates an index referred to as logstash-sample and ingests the doc added underneath the pattern.log file:
    ./bin/logstash -f <path/to/your/config/file>

  7. Search utilizing OpenSearch Dashboards by operating the next question:
    GET logstash-sample/_search
    {
      "question": {
        "match_all": {}
      },
      "track_total_hits" : true
    }

On this step, you used a file enter plugin from Logstash to ship knowledge to OpenSearch Serverless. You possibly can substitute the enter plugin with another plugin supported by Logstash, reminiscent of Amazon Easy Storage Service (Amazon S3), stdin, tcp, or others, to ship knowledge to the OpenSearch Serverless assortment.

Utilizing a Python consumer

OpenSearch offers high-level shoppers for a number of widespread programming languages, which you need to use to combine together with your software. With OpenSearch Serverless, you may proceed to make use of your current OpenSearch consumer to load and question your knowledge in collections.

On this part, we present the right way to use the opensearch-py consumer for Python to ascertain a safe connection together with your OpenSearch Serverless assortment, create an index, ship pattern logs, and analyze these log knowledge utilizing OpenSearch Dashboards. On this instance, we use a pattern occasion generated from fleets carrying items and packages. This knowledge comprises pertinent fields reminiscent of supply, vacation spot, climate, velocity, and site visitors. The next is a pattern document:

"_source" : {
    "deviceId" : 2823605996,
    "fleetRegNo" : "IRV82MBYQ1",
    "provider" : "AOS Van Strains",
    "temperature" : 14,
    "tripId" : 6741375582,
    "originODC" : "ODC Las Vegas",
    "originCountry" : "United States",
    "originCity" : "Las Vegas",
    "destinationCity" : "San Jose",
    "@timestamp" : "2022-11-17T17:11:25.855Z",
    "site visitors" : "heavy",
    "climate" : "Cloudy"
    ...
    ...
}

To arrange the Python consumer for OpenSearch, it’s essential to have the next conditions:

  • Python3 put in in your native machine or the server from the place you might be operating this code
  • Bundle Installer for Python (PIP) put in
  • The AWS CLI configured; we use it to retailer the key key and entry key for credentials

Full the next steps to arrange the Python consumer:

  1. Add the OpenSearch Python consumer to your mission and use Python’s digital atmosphere to arrange the required packages:
    mkdir python-sample
    cd python-sample
    python3 -m venv .env
    supply .env/bin/activate
    .env/bin/python3 -m pip set up opensearch-py
    .env/bin/python3 -m pip set up requests_aws4auth
    .env/bin/python3 -m pip set up boto3
    .env/bin/python3 -m pip set up geopy

  2. Save your often used configuration settings and credentials in information which can be maintained by the AWS CLI (see Fast configuration with aws configure) by utilizing the next instructions and offering your entry key, secret key, and Area:
  3. The next pattern code makes use of the opensearch-py consumer for Python to ascertain a safe connection to the required OpenSearch Serverless assortment and index a pattern doc to index time collection. You will need to present values for area and host. Notice that it’s essential to use aoss because the service identify for OpenSearch Service. Copy the code and save in a file as sample_python.py:
    from opensearchpy import OpenSearch, RequestsHttpConnection
    from requests_aws4auth import AWS4Auth
    import boto3
    
    host="<host>" # OpenSearch Serverless assortment endpoint
    area = '<area>' # e.g. us-west-2
    
    service="aoss"
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, area, service,
    session_token=credentials.token)
    
    # Create an OpenSearch consumer
    consumer = OpenSearch(
        hosts = [{'host': host, 'port': 443}],
        http_auth = awsauth,
        use_ssl = True,
        verify_certs = True,
        connection_class = RequestsHttpConnection
    )
    # Specify index identify
    index_name="octank-iot-logs-2022-11-19"
    
    # Put together a doc to index 
    doc = {
        "deviceId" : 2823605996,
        "fleetRegNo" : "IRV82MBYQ1",
        "provider" : "AOS Van Strains",
        "temperature" : 14,
        "tripId" : 6741375582,
        "originODC" : "ODC Las Vegas",
        "originCountry" : "United States",
        "originCity" : "Las Vegas",
        "destinationCity" : "San Jose",
        "@timestamp" : "2022-11-19T17:11:25.855Z",
        "site visitors" : "heavy",
        "climate" : "Cloudy"
    }
    
    # Index Paperwork
    response = consumer.index(
        index = index_name,
        physique = doc
    )
    
    print('n Doc listed with response:')
    print(response)
    
    
    # Seek for the Paperwork
    q = 'heavy'
    question = {
        'dimension': 5,
            'question': {
            'multi_match': {
            'question': q,
            'fields': ['traffic']
            }
        }
    }
    
    response = consumer.search(
    physique = question,
    index = index_name
    )
    print('nSearch outcomes:')
    print(response)

  4. Run the pattern code:

  5. On the OpenSearch Service console, choose your assortment.
  6. On OpenSearch Dashboards, select Dev Instruments.
  7. Run the next search question to retrieve paperwork:
    GET octank-iot-logs-*/_search
    {
      "question": {
        "match_all": {}
      }
    }

After you might have ingested the information, you need to use OpenSearch Dashboards to visualise your knowledge. Within the following instance, we analyze knowledge visually to realize insights on numerous dimensions reminiscent of common gas consumed by a selected fleet, site visitors circumstances, distance traveled, and common mileage by the fleet.

Conclusion

On this submit, you created a log analytics pipeline utilizing OpenSearch Serverless, a brand new serverless possibility for OpenSearch Service. With OpenSearch Serverless, you may deal with constructing your software with out having to fret about provisioning, tuning, and scaling the underlying infrastructure. OpenSearch Serverless helps the identical ingestion pipelines and high-level shoppers because the open-source OpenSearch mission. You possibly can simply get began utilizing the acquainted OpenSearch indexing and question APIs to load and search your knowledge and use OpenSearch Dashboards to visualise that knowledge.

Keep tuned for a collection of posts specializing in the assorted choices accessible so that you can construct efficient log analytics and search functions. Get hands-on with OpenSearch Serverless by taking the Getting Began with Amazon OpenSearch Serverless workshop and construct an identical log analytics pipeline that was mentioned on this submit.


In regards to the authors

Prashant Agrawal is a Sr. Search Specialist Options Architect with Amazon OpenSearch Service. He works intently with prospects to assist them migrate their workloads to the cloud and helps current prospects fine-tune their clusters to realize higher efficiency and save on value. Earlier than becoming a member of AWS, he helped numerous prospects use OpenSearch and Elasticsearch for his or her search and log analytics use instances. When not working, yow will discover him touring and exploring new locations. In brief, he likes doing Eat → Journey → Repeat.

Pavani Baddepudi is a senior product supervisor working in search providers at AWS. Her pursuits embody distributed programs, networking, and safety.

LEAVE A REPLY

Please enter your comment!
Please enter your name here