# Section indexer
This plugin parses files using Tika (opens new window) and indexes documents in ElasticSearch (opens new window).
You can use this plugin:
- To parse all documents in a directory with Tika and index the results in Elastic. This is the main use of the classes in this plugin.
- To parse documents with Tika and do something else with the result. For example, show a document's metadata in the screen.
- To index the output of other RVT2 modules. For example, you can index the output of the PST parser.
# Running
If you use the Tika module, you must run Tika in server mode by running run.sh
inside the $RVT2_HOME/external_tools/tika
directory. The first time you run this command, it will download Tika.
If you use the ElasticSearch indexer, you'll need an ElasticSearch >=6 server somewhere in the network. In some cases, ElasticSearch might need a special file system configuration. Also, if you are planning to use the rvt2-analyzer, the ElasticSearch must allow CORS requests at least from the domain of the analyzer. An example script to run ElasticSearch can be found inside the directory $RVT2_HOME/external_tools/elastic
.
# Jobs
indexer.parse_file
: Parse a file and show the result in the standard output. Use for debugging.indexer.parse_directory
: Parse a directory and show the result in the standard output. Use for debugging.indexer.directory
: Parse a directory and save inMORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json
.indexer.save
: Save a previously indexed database in an ElasticSearch server. Alternative toelasticdump
.indexer.save_directory
: Run indexer.directory and then indexer.save with default parameters.indexer.convert_json
: Convert a JSON file to a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.convert_csv
: Convert a CSV file to a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.query_and_tag
: Query elastic, select all related documents (containers, attachments..) and tag all of them. You must indexer.save the outputindexer.export
: Query elastic, select all documents matching a query and export them to a JSON.indexer.tag_and_export
: Runsindexer.query_and_tag
,indexer.save
,indexer.export
.indexer.blind_searches
: Blind searches on a parsed JSON file, result from indexer.save.indexer.index_timeline_body
: Index a BODY file provided in the path.indexer.export_pst
: Export contents of every pst or ost file found in a source using pffexport.indexer.pst
: Parse PST files previously exported with indexer.export_pst.indexer.mails
: Export, parse and characterize contents of every pst or ost file found in a source. Runs export_pst, pst and characterize_mailsindexer.pst_item2eml
: Convert a message extracted from a pst to an eml file.indexer.365.all
: Adapt Microsoft 365 parsed logs to JSON format suitable for Elastic Common Schema (ECS).indexer.365.save
: Save all events generated by "indexer.365.all" to Elastic.
# Job indexer.parse_file
Parse a file and show the result in the standard output. Use for debugging.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
only_root | Parse only the root file | False |
# Job indexer.parse_directory
Parse a directory and show the result in the standard output. Use for debugging.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
filter | List of file categories to parse. If not provided, parse all files. Predefined categories can be found in ./conf/file_categories.cfg configuration file | `` |
# Job indexer.directory
Parse a directory and save in MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json
.
This file is compatible with indexers such as elasticdump, but you will prefer using indexer.save
# Configurable parameters
Parameter | Description | Default |
---|---|---|
path | The path to the directory to parse | `` |
outfile | Save the result of the parsing in this file | MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json |
index_name | The name of the indx to save the parsed files | SOURCE |
rvtindex | The name of the index to save metadata. Set to empty to not save metadata. | rvtindexer |
restartable | If True, parsing can be restarted from the last error. Use with care! | False |
filter | List of file categories to parse. If not provided, parse all files. Predefined categories can be found in ./conf/file_categories.cfg configuration file | `` |
# Job indexer.save
Save a previously indexed database in an ElasticSearch server. Alternative to elasticdump
.
You can define the location of the elasticsearch server and username/password using:
--globals indexer:es_hosts="http://localhost:9200" --globals indexer:es_username=USERNAME --globals indexer:es_password=PASSWORD
# Jobs
# Configurable parameters
Parameter | Description | Default |
---|---|---|
path | The path to a JSON file output from indexer.directory. | `` |
restartable | In True, the index can be restarted from an error. Use with care! | False |
mapping | Path to the file describing the mapping of fields to ElasticSearch. The mapping can only be used when the index is created. | ./plugins/indexer/es-settings.json |
index_name | Index name in ElasticSearch. If index does not exists, create it. | SOURCE |
cooloff_every | After this number of seconds, wait cooloff_seconds. | 300 |
cooloff_seconds | Seconds to wait to cool off ElasticSearch. | 5 |
tabs | Space separated tabs to add to the rvt2-analyzer. Available tabs can be found at "./plugins/indexer/analyzer-tabs.json". Examples: files, emails, apache, iis. | `` |
# Job indexer.save_directory
Run indexer.directory and then indexer.save with default parameters.
# Jobs
indexer.directory
: Parse a directory and save inMORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json
.indexer.save
: Save a previously indexed database in an ElasticSearch server. Alternative toelasticdump
.
# Job indexer.convert_json
Convert a JSON file to a JSON suitable to be sent to ElasticSearch using indexer.save
# Configurable parameters
Parameter | Description | Default |
---|---|---|
path | the CSV file to convert. It must be provided | `` |
outfile | path to the generated json file | output.json |
disableCommonFields | True | |
generate_id | False | |
index_name | name of the destination index at Elastic | SOURCE |
# Job indexer.convert_csv
Convert a CSV file to a JSON suitable to be sent to ElasticSearch using indexer.save
- path: the CSV file to convert. It must be provided
- output: configure outfile param. Default value: output.json
# Configurable parameters
Parameter | Description | Default |
---|---|---|
path | the CSV file to convert. It must be provided | `` |
outfile | path to the generated json file | output.json |
delimiter | ; | |
disableCommonFields | True | |
generate_id | False | |
date_fields | @timestamp | |
index_name | name of the destination index at Elastic | SOURCE |
# Job indexer.query_and_tag
Query elastic, select all related documents (containers, attachments..) and tag all of them. You must indexer.save the output
# Configurable parameters
Parameter | Description | Default |
---|---|---|
index_name | The name of the index to query. The name will be converted to lower case, since ES only accept lower case names. Wildcards can be used | SOURCE |
outfile | The output of the job. You must indexer.save this file | MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.json |
query | The query to run in lucene language | * |
tag | The name of the tag | EXPORT |
tag_field | Save tags in this field. Use one of the registered tag fields in ElasticSearchBulkIndex (hints: tags-new or blindsearches-new) | tags-new |
max_results | If the query will return more than this number of results, stop. | 1000 |
# Job indexer.export
Query elastic, select all documents matching a query and export them to a JSON.
The target JSON file may then be saved to any ElasticSearch server using indexer.save
.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
outfile | The output of the job. You must indexer.save this file | MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/export.json |
index_name | The name of the index to query. The name will be converted to lower case, since ES only accept lower case names. Wildcards can be used | SOURCE |
query | The query to run in lucene language | * |
max_results | If the query will return more than this number of results, stop | 1000 |
# Job indexer.tag_and_export
Runs indexer.query_and_tag
, indexer.save
, indexer.export
.
In order to save the results to Elastic, you must run indexer.save
to any desired ES_HOST on MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/exported.json
# Jobs
indexer.query_and_tag
: Query elastic, select all related documents (containers, attachments..) and tag all of them. You must indexer.save the outputindexer.save
: Save a previously indexed database in an ElasticSearch server. Alternative toelasticdump
.indexer.export
: Query elastic, select all documents matching a query and export them to a JSON.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
interfile | MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/original.json | |
query | The query to run. See indexer.query_and_tag . | * |
index_name | The name of the index to query | SOURCE |
tag | The name of the tag. See indexer.query_and_tag . | EXPORT |
outfile | The output of the job. You must indexer.save this file | MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/exported.json |
# Job indexer.blind_searches
Blind searches on a parsed JSON file, result from indexer.save.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
keyword_file | The name of the keyword file in the searches directory. | MORGUE/CLIENT/CASENAME/searches_files/kw |
outfile | Save the results to this file, ready to be used with indexer.save | MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.blindsearches.json |
# Job indexer.index_timeline_body
Index a BODY file provided in the path.
Since _id
for each file is shared with indexer_directory
results,
information from both timeline and Tika parsing may be combined and updated.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
outfile | MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.timeline.json |
# Job indexer.export_pst
Export contents of every pst or ost file found in a source using pffexport. This job depends on plugins.common and the succesful generation of alloc_files.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
outfile | A CSV containing the path to the actual pstfiles and their reference | MORGUE/CLIENT/CASENAME/SOURCE/output/mail/pstfiles.csv |
outdir | Export the contents of PST files to this directory | MORGUE/CLIENT/CASENAME/SOURCE/output/mail/ |
# Job indexer.pst
Parse PST files previously exported with indexer.export_pst. This module also calls to indexer.pst.secondary.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
outfile | A JSON file with all the information in the mailboxes, ready to be imported into ElasticSearch | MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.pst.json |
path | An absolute path to pstfiles.csv, output from indexer.export_pst | MORGUE/CLIENT/CASENAME/SOURCE/output/mail/pstfiles.csv |
# Job indexer.mails
Export, parse and characterize contents of every pst or ost file found in a source. Runs export_pst, pst and characterize_mails
All PSTs and OST files in the source are exported to MORGUE/CLIENT/CASENAME/SOURCE/output/mail, and a CSV file describing the PSTs will created there. The JSON with the parsed mails will be in MORGUE/CLIENT/CASENAME/SOURCE/output/indexer/SOURCE.pst.json.
# Jobs
indexer.export_pst
: Export contents of every pst or ost file found in a source using pffexport.indexer.pst
: Parse PST files previously exported with indexer.export_pst.characterize_mails
: Create basic summary about mail accounts from a source.
# Job indexer.pst_item2eml
Convert a message extracted from a pst to an eml file.
- path: the path to a Message folder
# Job indexer.365.all
Adapt Microsoft 365 parsed logs to JSON format suitable for Elastic Common Schema (ECS). After this, you can save this file using indexer.365.save
# Jobs
indexer.365.mailboxaudit
: Convert results fromoffice365.MailBoxAudit
into a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.365.messagetrace
: Convert results fromoffice365.MessageTrace
into a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.365.adminauditlogs
: Convert results fromoffice365.AdminAuditLogs
into a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.365.azureadauditlogs
: Convert results fromoffice365.AzureADAuditLogs
into a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.365.unifiedauditlogs
: Convert results fromoffice365.AuditRecords
into a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.365.interactive
: Convert results fromoffice365.InteractiveSignins
into a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.365.noninteractive
: Convert results fromoffice365.NonInteractiveSignins
into a JSON suitable to be sent to ElasticSearch using indexer.saveindexer.365.save
: Save all events generated by "indexer.365.all" to Elastic.
# Configurable parameters
Parameter | Description | Default |
---|---|---|
index_name | Name of the indice in ElasticSearch | SOURCE-microsoft365 |
# Job indexer.365.save
Save all events generated by "indexer.365.all" to Elastic.
You can save to Elastic just one file by setting the path:
rvt2 -j indexer.365.sav full/path/to/myfile.json
You can define the location of the elasticsearch server and username/password using:
--globals indexer:es_hosts="http://localhost:9200" --globals indexer:es_username=USERNAME --globals indexer:es_password=PASSWORD
# Configurable parameters
Parameter | Description | Default |
---|---|---|
path | Glob pattern to find all events files previously generated. | `` |
restartable | In True, the index can be restarted from an error. Use with care! | False |
mapping | Path to the file describing the mapping of fields to ElasticSearch. The mapping can only be used when the index is created. | ./plugins/indexer/ecs-settings.json |
index_name | Index name in ElasticSearch. If index does not exists, create it. Must coincide with the index name defined in every json processed | SOURCE-microsoft365 |
cooloff_every | After this number of seconds, wait cooloff_seconds. | 10000 |
cooloff_seconds | Seconds to wait to cool off ElasticSearch. | 1 |
WARNING
This chapter was created automatically using autodoc.sh
. Do not modify manually this file.