skusa108

Does anyone have guidance on accessing the extracted text through the Autopsy API? Specifically, I'm writing a report module and would like to include the extracted text for files included in the report, when available.

I'm using the Python API, so for example, I'm grabbing a file by ID:

item = sleuthkitCase.getAbstractFileById(fileID)

From there, I'm current using the getCrtimeAsDate(), getAtimeAsDate(), getMtimeAsDate() functions to retrieve the object's metadata. But I'd also like to get at the extracted text, and I don't see any obvious way to do so. Could be that I'm just completing missing something though.

Any help is appreciated!
Re: API Access to Extracted Text

carrier

The best reference for this is this file:

https://github.com/sleuthkit/autopsy/bl ... arkup.java

Specifically, the getSolrContent() method.
Overall, the code needs to look _something_ like this:

Server solrServer = KeywordSearch.getServer();
String content = solrServer.getSolrContent(currentContent, chunkId);

(though in Python and not Java)

This is not well documented, but probably should be. We have a bunch of modules that do this type of thing.

You can get the number of chunks with this method:


A chunk is a page. Large files are broken up into smaller pages.
