# Section indexer

This plugin parses files using Tika (opens new window) and indexes documents in ElasticSearch (opens new window).

You can use this plugin:

  • To parse all documents in a directory with Tika and index the results in Elastic. This is the main use of the classes in this plugin.
  • To parse documents with Tika and do something else with the result. For example, show a document's metadata in the screen.
  • To index the output of other RVT2 modules. For example, you can index the output of the PST parser.

# Running

If you use the Tika module, you must run Tika in server mode by running run.sh inside the $RVT2_HOME/external_tools/tika directory. The first time you run this command, it will download Tika.

If you use the ElasticSearch indexer, you'll need an ElasticSearch >=6 server somewhere in the network. In some cases, ElasticSearch might need a special file system configuration. Also, if you are planning to use the rvt2-analyzer, the ElasticSearch must allow CORS requests at least from the domain of the analyzer. An example script to run ElasticSearch can be found inside the directory $RVT2_HOME/external_tools/elastic.

# Jobs

  • indexer.parse_file: Parse a file and show the result in the standard output. Use for debugging.
  • indexer.parse_directory: Parse a directory and show the result in the standard output. Use for debugging.
  • indexer.directory: Parse a directory and save in MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json.
  • indexer.save: Save a previously indexed database in an ElasticSearch server. Alternative to elasticdump.
  • indexer.save_directory: Run indexer.directory and then indexer.save with default parameters.
  • indexer.convert_json: Convert a JSON file to a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.convert_csv: Convert a CSV file to a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.query_and_tag: Query elastic, select all related documents (containers, attachments..) and tag all of them. You must indexer.save the output
  • indexer.export: Query elastic, select all documents matching a query and export them to a JSON.
  • indexer.tag_and_export: Runs indexer.query_and_tag, indexer.save, indexer.export.
  • indexer.blind_searches: Blind searches on a parsed JSON file, result from indexer.save.
  • indexer.index_timeline_body: Index a BODY file provided in the path.
  • indexer.export_pst: Export contents of every pst or ost file found in a source using pffexport.
  • indexer.pst: Parse PST files previously exported with indexer.export_pst.
  • indexer.mails: Export, parse and characterize contents of every pst or ost file found in a source. Runs export_pst, pst and characterize_mails
  • indexer.pst_item2eml: Convert a message extracted from a pst to an eml file.
  • indexer.365.all: Adapt Microsoft 365 parsed logs to JSON format suitable for Elastic Common Schema (ECS).
  • indexer.365.save: Save all events generated by "indexer.365.all" to Elastic.

# Job indexer.parse_file

Parse a file and show the result in the standard output. Use for debugging.

# Configurable parameters

Parameter Description Default
only_root Parse only the root file False

# Job indexer.parse_directory

Parse a directory and show the result in the standard output. Use for debugging.

# Configurable parameters

Parameter Description Default
filter List of file categories to parse. If not provided, parse all files. Predefined categories can be found in ./conf/file_categories.cfg configuration file ``

# Job indexer.directory

Parse a directory and save in MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json. This file is compatible with indexers such as elasticdump, but you will prefer using indexer.save

# Configurable parameters

Parameter Description Default
path The path to the directory to parse ``
outfile Save the result of the parsing in this file MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json
index_name The name of the indx to save the parsed files SOURCE
rvtindex The name of the index to save metadata. Set to empty to not save metadata. rvtindexer
restartable If True, parsing can be restarted from the last error. Use with care! False
filter List of file categories to parse. If not provided, parse all files. Predefined categories can be found in ./conf/file_categories.cfg configuration file ``

# Job indexer.save

Save a previously indexed database in an ElasticSearch server. Alternative to elasticdump.

You can define the location of the elasticsearch server and username/password using:

--globals indexer:es_hosts="http://localhost:9200" --globals indexer:es_username=USERNAME --globals indexer:es_password=PASSWORD

# Jobs

# Configurable parameters

Parameter Description Default
path The path to a JSON file output from indexer.directory. ``
restartable In True, the index can be restarted from an error. Use with care! False
mapping Path to the file describing the mapping of fields to ElasticSearch. The mapping can only be used when the index is created. ./plugins/indexer/es-settings.json
index_name Index name in ElasticSearch. If index does not exists, create it. SOURCE
cooloff_every After this number of seconds, wait cooloff_seconds. 300
cooloff_seconds Seconds to wait to cool off ElasticSearch. 5
tabs Space separated tabs to add to the rvt2-analyzer. Available tabs can be found at "./plugins/indexer/analyzer-tabs.json". Examples: files, emails, apache, iis. ``

# Job indexer.save_directory

Run indexer.directory and then indexer.save with default parameters.

# Jobs

  • indexer.directory: Parse a directory and save in MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json.
  • indexer.save: Save a previously indexed database in an ElasticSearch server. Alternative to elasticdump.

# Job indexer.convert_json

Convert a JSON file to a JSON suitable to be sent to ElasticSearch using indexer.save

# Configurable parameters

Parameter Description Default
path the CSV file to convert. It must be provided ``
outfile path to the generated json file output.json
disableCommonFields True
generate_id False
index_name name of the destination index at Elastic SOURCE

# Job indexer.convert_csv

Convert a CSV file to a JSON suitable to be sent to ElasticSearch using indexer.save

  • path: the CSV file to convert. It must be provided
  • output: configure outfile param. Default value: output.json

# Configurable parameters

Parameter Description Default
path the CSV file to convert. It must be provided ``
outfile path to the generated json file output.json
delimiter ;
disableCommonFields True
generate_id False
date_fields @timestamp
index_name name of the destination index at Elastic SOURCE

# Job indexer.query_and_tag

Query elastic, select all related documents (containers, attachments..) and tag all of them. You must indexer.save the output

# Configurable parameters

Parameter Description Default
index_name The name of the index to query. The name will be converted to lower case, since ES only accept lower case names. Wildcards can be used SOURCE
outfile The output of the job. You must indexer.save this file MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json
query The query to run in lucene language *
tag The name of the tag EXPORT
tag_field Save tags in this field. Use one of the registered tag fields in ElasticSearchBulkIndex (hints: tags-new or blindsearches-new) tags-new
max_results If the query will return more than this number of results, stop. 1000

# Job indexer.export

Query elastic, select all documents matching a query and export them to a JSON. The target JSON file may then be saved to any ElasticSearch server using indexer.save.

# Configurable parameters

Parameter Description Default
outfile The output of the job. You must indexer.save this file MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/export.json
index_name The name of the index to query. The name will be converted to lower case, since ES only accept lower case names. Wildcards can be used SOURCE
query The query to run in lucene language *
max_results If the query will return more than this number of results, stop 1000

# Job indexer.tag_and_export

Runs indexer.query_and_tag, indexer.save, indexer.export. In order to save the results to Elastic, you must run indexer.save to any desired ES_HOST on MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/exported.json

# Jobs

  • indexer.query_and_tag: Query elastic, select all related documents (containers, attachments..) and tag all of them. You must indexer.save the output
  • indexer.save: Save a previously indexed database in an ElasticSearch server. Alternative to elasticdump.
  • indexer.export: Query elastic, select all documents matching a query and export them to a JSON.

# Configurable parameters

Parameter Description Default
interfile MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/original.json
query The query to run. See indexer.query_and_tag. *
index_name The name of the index to query SOURCE
tag The name of the tag. See indexer.query_and_tag. EXPORT
outfile The output of the job. You must indexer.save this file MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/exported.json

# Job indexer.blind_searches

Blind searches on a parsed JSON file, result from indexer.save.

# Configurable parameters

Parameter Description Default
keyword_file The name of the keyword file in the searches directory. MORGUE/CLIENT/CASENAME/searches_files/kw
outfile Save the results to this file, ready to be used with indexer.save MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.blindsearches.json

# Job indexer.index_timeline_body

Index a BODY file provided in the path. Since _id for each file is shared with indexer_directory results, information from both timeline and Tika parsing may be combined and updated.

# Configurable parameters

Parameter Description Default
outfile MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.timeline.json

# Job indexer.export_pst

Export contents of every pst or ost file found in a source using pffexport. This job depends on plugins.common and the succesful generation of alloc_files.

# Configurable parameters

Parameter Description Default
outfile A CSV containing the path to the actual pstfiles and their reference MORGUE/CLIENT/CASENAME/SOURCE/output/mail/pstfiles.csv
outdir Export the contents of PST files to this directory MORGUE/CLIENT/CASENAME/SOURCE/output/mail/

# Job indexer.pst

Parse PST files previously exported with indexer.export_pst. This module also calls to indexer.pst.secondary.

# Configurable parameters

Parameter Description Default
outfile A JSON file with all the information in the mailboxes, ready to be imported into ElasticSearch MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.pst.json
path An absolute path to pstfiles.csv, output from indexer.export_pst MORGUE/CLIENT/CASENAME/SOURCE/output/mail/pstfiles.csv

# Job indexer.mails

Export, parse and characterize contents of every pst or ost file found in a source. Runs export_pst, pst and characterize_mails

All PSTs and OST files in the source are exported to MORGUE/CLIENT/CASENAME/SOURCE/output/mail, and a CSV file describing the PSTs will created there. The JSON with the parsed mails will be in MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.pst.json.

# Jobs

  • indexer.export_pst: Export contents of every pst or ost file found in a source using pffexport.
  • indexer.pst: Parse PST files previously exported with indexer.export_pst.
  • characterize_mails: Create basic summary about mail accounts from a source.

# Job indexer.pst_item2eml

Convert a message extracted from a pst to an eml file.

  • path: the path to a Message folder

# Job indexer.365.all

Adapt Microsoft 365 parsed logs to JSON format suitable for Elastic Common Schema (ECS). After this, you can save this file using indexer.365.save

# Jobs

  • indexer.365.mailboxaudit: Convert results from office365.MailBoxAudit into a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.365.messagetrace: Convert results from office365.MessageTrace into a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.365.adminauditlogs: Convert results from office365.AdminAuditLogs into a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.365.azureadauditlogs: Convert results from office365.AzureADAuditLogs into a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.365.unifiedauditlogs: Convert results from office365.AuditRecords into a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.365.interactive: Convert results from office365.InteractiveSignins into a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.365.noninteractive: Convert results from office365.NonInteractiveSignins into a JSON suitable to be sent to ElasticSearch using indexer.save
  • indexer.365.save: Save all events generated by "indexer.365.all" to Elastic.

# Configurable parameters

Parameter Description Default
index_name Name of the indice in ElasticSearch SOURCE-microsoft365

# Job indexer.365.save

Save all events generated by "indexer.365.all" to Elastic.

You can save to Elastic just one file by setting the path:

rvt2 -j indexer.365.sav full/path/to/myfile.json

You can define the location of the elasticsearch server and username/password using:

--globals indexer:es_hosts="http://localhost:9200" --globals indexer:es_username=USERNAME --globals indexer:es_password=PASSWORD

# Configurable parameters

Parameter Description Default
path Glob pattern to find all events files previously generated. ``
restartable In True, the index can be restarted from an error. Use with care! False
mapping Path to the file describing the mapping of fields to ElasticSearch. The mapping can only be used when the index is created. ./plugins/indexer/ecs-settings.json
index_name Index name in ElasticSearch. If index does not exists, create it. Must coincide with the index name defined in every json processed SOURCE-microsoft365
cooloff_every After this number of seconds, wait cooloff_seconds. 10000
cooloff_seconds Seconds to wait to cool off ElasticSearch. 1

WARNING

This chapter was created automatically using autodoc.sh. Do not modify manually this file.