Local LLM: Automating Task Extraction from Monthly Reports

Q: Does Jira integration work with Jira Server or only Cloud?

The `jira-python` library supports both. For Jira Server, use a personal API token. For Jira Cloud, use an API token created from Atlassian account settings. Configuration differs slightly but code remains identical.

Your management team spends hours each month manually extracting action items from activity reports. "Bug fix", "version 1.0 deployed", "payment module overhaul". These repetitive, error-prone, time-consuming tasks are exactly the type of work AI can automate.

But using ChatGPT or Claude to process these internal reports poses a major problem: you expose your activity data to external servers. For a privacy-conscious company, that's unacceptable.

The solution: a local LLM agent that runs entirely on your servers. No data leaves your infrastructure. This article guides you step by step through building such a system, with optional Jira integration for end-to-end automated task management.

Why a local LLM instead of cloud

Data security

Monthly reports contain sensitive information: client names, contract amounts, security bugs, strategic decisions. Sending this data to OpenAI or Anthropic, even through their APIs, means transferring data outside your control.

With a local LLM like Llama 3, Mistral, or Phi-3, your data stays on your infrastructure. You can even run the model on an air-gapped machine with no internet connection.

According to a 2025 Gartner survey, 67% of enterprises cite data privacy as the primary barrier to AI adoption. Local LLMs directly address this concern by keeping all processing within your security perimeter.

Predictable costs

Cloud APIs charge per token. For a company processing 50 monthly reports of 2000 words each, that represents about 500,000 tokens per month in input alone. At $3 per million tokens, the cost stays modest. But if you scale to 500 reports or more complex analyses, the bill explodes.

A local LLM has a fixed cost: hardware and electricity. Once infrastructure is in place, the marginal cost per report is near zero. For organizations with consistent AI workloads, this predictability simplifies budgeting and eliminates unexpected overages.

Latency and availability

No dependency on external API availability. No rate limiting. No performance degradation during peak hours. Your system works even if OpenAI has an outage.

For time-sensitive workflows, this reliability is crucial. When your month-end reporting process depends on AI extraction, you cannot afford to wait for an API that's throttling requests or experiencing downtime.

System architecture

The architecture breaks down into four components:

1. LLM inference server

The heart of the system is a server that hosts and runs the language model. Popular options:

Ollama: simplest to deploy, CLI and REST API interface
vLLM: optimized for production, better throughput
llama.cpp: lightest weight, runs on CPU alone if needed

For SME deployment, Ollama is recommended. Installation on Ubuntu:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b

The Llama 3.1 8B model offers an excellent balance between performance and resources. It runs comfortably on a machine with 16 GB RAM and a GPU with 8 GB VRAM.

2. Text extraction module

Reports often arrive as PDF or Word files. You need an extractor that converts these formats to plain text:

PyPDF2 or pdfplumber for PDFs
python-docx for Word files
Unstructured for unified multi-format processing

The extraction quality directly impacts LLM output quality. Pdfplumber handles tables and complex layouts better than PyPDF2, making it the preferred choice for structured reports.

3. Task extraction agent

This is the intelligent component that uses the LLM to identify and structure action items. The agent receives the report text and produces a structured list of tasks with:

Task description
Type (bug fix, feature, documentation, etc.)
Estimated priority
Assigned person (if mentioned)

The prompt engineering here is critical. A well-crafted prompt with clear output format specifications ensures consistent, parseable results across different report styles.

4. Jira integration (optional)

To fully automate the workflow, the agent can create Jira tickets directly via API. Each extracted task becomes a ticket with appropriate fields.

This integration closes the loop between reporting and project management. Tasks identified in reports automatically enter your backlog, ensuring nothing falls through the cracks.

Step-by-step implementation

Step 1: Python environment

Create a virtual environment and install dependencies:

python -m venv llm-agent
source llm-agent/bin/activate
pip install ollama pdfplumber python-docx jira pydantic

Step 2: Text extraction

import pdfplumber
from docx import Document
from pathlib import Path

def extract_text(file_path: str) -> str:
    path = Path(file_path)

    if path.suffix.lower() == '.pdf':
        with pdfplumber.open(path) as pdf:
            return '\n'.join(page.extract_text() or '' for page in pdf.pages)

    elif path.suffix.lower() in ['.docx', '.doc']:
        doc = Document(path)
        return '\n'.join(para.text for para in doc.paragraphs)

    elif path.suffix.lower() in ['.txt', '.md']:
        return path.read_text(encoding='utf-8')

    raise ValueError(f"Unsupported format: {path.suffix}")

This function handles the three most common report formats. For organizations with more exotic formats, the Unstructured library provides broader coverage.

Step 3: Extraction prompt

The prompt is crucial for result quality. Here's a tested and optimized template:

EXTRACTION_PROMPT = """You are an assistant specialized in analyzing activity reports.

Analyze the following report and extract all tasks completed or mentioned.
For each task, provide:
- description: a concise description of the task
- type: bug_fix, feature, documentation, refactoring, deployment, meeting, other
- status: done, in_progress, planned
- assignee: the person's name if mentioned, otherwise null

Respond ONLY with valid JSON, no text before or after.
Expected format:
{
  "tasks": [
    {"description": "...", "type": "...", "status": "...", "assignee": "..."}
  ]
}

REPORT:
{report_text}
"""

The explicit JSON format requirement and enumerated type values ensure consistent, machine-parseable output. Without these constraints, LLMs tend to produce varied formats that break downstream processing.

Step 4: Local LLM call

import ollama
import json
from pydantic import BaseModel
from typing import Optional

class Task(BaseModel):
    description: str
    type: str
    status: str
    assignee: Optional[str] = None

class ExtractionResult(BaseModel):
    tasks: list[Task]

def extract_tasks(report_text: str) -> ExtractionResult:
    prompt = EXTRACTION_PROMPT.format(report_text=report_text)

    response = ollama.chat(
        model='llama3.1:8b',
        messages=[{'role': 'user', 'content': prompt}],
        options={'temperature': 0.1}  # Low temperature for consistency
    )

    content = response['message']['content']

    # Clean response if needed
    if '```json' in content:
        content = content.split('```json')[1].split('```')[0]

    data = json.loads(content)
    return ExtractionResult(**data)

The low temperature setting (0.1) produces more deterministic outputs, reducing variability between runs. Pydantic validation ensures the response matches the expected schema.

Step 5: Jira integration

from jira import JIRA

def create_jira_tickets(tasks: list[Task], project_key: str, jira_client: JIRA):
    created_tickets = []

    type_mapping = {
        'bug_fix': 'Bug',
        'feature': 'Story',
        'documentation': 'Task',
        'refactoring': 'Task',
        'deployment': 'Task',
        'other': 'Task'
    }

    for task in tasks:
        if task.status == 'done':
            continue  # Don't create tickets for completed tasks

        issue_dict = {
            'project': {'key': project_key},
            'summary': task.description[:255],  # Jira limit
            'issuetype': {'name': type_mapping.get(task.type, 'Task')},
            'description': f"Task automatically extracted from monthly report.\n\nType: {task.type}\nOriginal status: {task.status}"
        }

        if task.assignee:
            users = jira_client.search_users(query=task.assignee)
            if users:
                issue_dict['assignee'] = {'accountId': users[0].accountId}

        new_issue = jira_client.create_issue(fields=issue_dict)
        created_tickets.append(new_issue.key)

    return created_tickets

The type mapping translates LLM output to Jira issue types. Customize this mapping to match your project's issue type scheme.

Step 6: Main script

import os
from pathlib import Path

def process_monthly_reports(reports_folder: str, jira_project: str = None):
    jira_client = None
    if jira_project and os.getenv('JIRA_URL'):
        jira_client = JIRA(
            server=os.getenv('JIRA_URL'),
            basic_auth=(os.getenv('JIRA_EMAIL'), os.getenv('JIRA_TOKEN'))
        )

    results = []

    for file_path in Path(reports_folder).glob('*'):
        if file_path.suffix.lower() not in ['.pdf', '.docx', '.doc', '.txt', '.md']:
            continue

        print(f"Processing {file_path.name}...")

        text = extract_text(str(file_path))
        extraction = extract_tasks(text)

        tickets = []
        if jira_client and jira_project:
            tickets = create_jira_tickets(extraction.tasks, jira_project, jira_client)

        results.append({
            'file': file_path.name,
            'tasks_found': len(extraction.tasks),
            'tickets_created': tickets
        })

    return results

Optimizations and best practices

Handling hallucinations

LLMs can invent tasks that don't exist in the report. To mitigate this risk:

Low temperature: use temperature=0.1 for more deterministic responses
Cross-validation: analyze the same report 3 times and only keep tasks present in at least 2 results
Human review: display extracted tasks for validation before ticket creation

The cross-validation approach increases accuracy from roughly 87% to over 95% in our testing, at the cost of 3x processing time.

Scaling up

For processing large volumes:

Use vLLM instead of Ollama for better throughput
Parallelize report processing with asyncio or multiprocessing
Consider a GPU cluster if processing more than 1000 reports per day

vLLM can achieve 10 to 20x higher throughput than Ollama for batch processing, making it essential for enterprise-scale deployments.

Continuous improvement

Keep a log of manual corrections made by users. This data will allow you to:

Refine the prompt for better results
Identify problematic report types
Eventually fine-tune the model on your specific data

Fine-tuning typically improves accuracy by 5 to 10 percentage points for domain-specific extraction tasks.

Recommended hardware configuration

For an SME processing 50 to 200 reports per month:

| Component | Minimum | Recommended | |-----------|---------|-------------| | CPU | Intel i5 / AMD Ryzen 5 | Intel i7 / AMD Ryzen 7 | | RAM | 16 GB | 32 GB | | GPU | RTX 3060 (8 GB) | RTX 4070 (12 GB) | | Storage | 500 GB SSD | 1 TB NVMe |

Total cost: between $800 and $1,500 for a new configuration. An investment recouped in a few months of savings on cloud APIs and manual processing time.

What this means for your business

Local LLM automation transforms a multi-hour manual process into a few-minute operation. Without compromising data security.

This pattern applies well beyond monthly reports:

Information extraction from contracts
Support ticket analysis to identify trends
Automatic meeting summaries (from transcriptions)
Incoming document classification

At ClaroDigi, we design and deploy these types of solutions for Moroccan businesses. Our AI automation service includes process analysis, model selection, and integration with your existing tools.

If you're starting with automation and want to understand opportunities for your business, our digital transformation solution provides a comprehensive assessment of your automation potential.

FAQ

What is the extraction accuracy compared to manual extraction?

In our tests on structured reports, Llama 3.1 8B achieves 85 to 92% accuracy compared to human extraction. Errors are mainly omissions (undetected tasks) rather than false positives (invented tasks). Multi-pass cross-validation improves this rate to over 95%.

Can I use a smaller model to reduce hardware costs?

Yes. Phi-3 Mini (3.8B parameters) runs on 8 GB RAM without GPU and offers decent results for simple task extraction. Quality drops on complex or poorly structured reports. Start with Phi-3 and move to Llama 3.1 if results are insufficient.

How do I handle reports in multiple languages?

Recent models like Llama 3.1 and Mistral are multilingual. They natively handle French, English, and many other languages. Simply adapt the prompt to the report language or keep an English prompt (models understand English instructions even when analyzing French text).

Does Jira integration work with Jira Server or only Cloud?

The jira-python library supports both. For Jira Server, use a personal API token. For Jira Cloud, use an API token created from Atlassian account settings. Configuration differs slightly but code remains identical.

How long does it take to process a 10-page report?

With Llama 3.1 8B on an RTX 4070, expect 15 to 30 seconds per 2000-word report. Text extraction (PDF to text) adds 1 to 2 seconds. For 50 reports, complete processing takes under 30 minutes, versus several hours manually.

But using ChatGPT or Claude to process these internal reports poses a major problem: you expose your activity data to external servers. For a privacy-conscious company, that's unacceptable.

Why a local LLM instead of cloud

Data security

With a local LLM like Llama 3, Mistral, or Phi-3, your data stays on your infrastructure. You can even run the model on an air-gapped machine with no internet connection.

Predictable costs

Latency and availability

No dependency on external API availability. No rate limiting. No performance degradation during peak hours. Your system works even if OpenAI has an outage.

System architecture

The architecture breaks down into four components:

1. LLM inference server

The heart of the system is a server that hosts and runs the language model. Popular options:

Ollama: simplest to deploy, CLI and REST API interface
vLLM: optimized for production, better throughput
llama.cpp: lightest weight, runs on CPU alone if needed

For SME deployment, Ollama is recommended. Installation on Ubuntu:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b

The Llama 3.1 8B model offers an excellent balance between performance and resources. It runs comfortably on a machine with 16 GB RAM and a GPU with 8 GB VRAM.

2. Text extraction module

Reports often arrive as PDF or Word files. You need an extractor that converts these formats to plain text:

PyPDF2 or pdfplumber for PDFs
python-docx for Word files
Unstructured for unified multi-format processing

The extraction quality directly impacts LLM output quality. Pdfplumber handles tables and complex layouts better than PyPDF2, making it the preferred choice for structured reports.

3. Task extraction agent

This is the intelligent component that uses the LLM to identify and structure action items. The agent receives the report text and produces a structured list of tasks with:

Task description
Type (bug fix, feature, documentation, etc.)
Estimated priority
Assigned person (if mentioned)

The prompt engineering here is critical. A well-crafted prompt with clear output format specifications ensures consistent, parseable results across different report styles.

4. Jira integration (optional)

To fully automate the workflow, the agent can create Jira tickets directly via API. Each extracted task becomes a ticket with appropriate fields.

This integration closes the loop between reporting and project management. Tasks identified in reports automatically enter your backlog, ensuring nothing falls through the cracks.

Step-by-step implementation

Step 1: Python environment

Create a virtual environment and install dependencies:

python -m venv llm-agent
source llm-agent/bin/activate
pip install ollama pdfplumber python-docx jira pydantic

Step 2: Text extraction

import pdfplumber
from docx import Document
from pathlib import Path

def extract_text(file_path: str) -> str:
    path = Path(file_path)

    if path.suffix.lower() == '.pdf':
        with pdfplumber.open(path) as pdf:
            return '\n'.join(page.extract_text() or '' for page in pdf.pages)

    elif path.suffix.lower() in ['.docx', '.doc']:
        doc = Document(path)
        return '\n'.join(para.text for para in doc.paragraphs)

    elif path.suffix.lower() in ['.txt', '.md']:
        return path.read_text(encoding='utf-8')

    raise ValueError(f"Unsupported format: {path.suffix}")

This function handles the three most common report formats. For organizations with more exotic formats, the Unstructured library provides broader coverage.

Step 3: Extraction prompt

The prompt is crucial for result quality. Here's a tested and optimized template:

EXTRACTION_PROMPT = """You are an assistant specialized in analyzing activity reports.

Analyze the following report and extract all tasks completed or mentioned.
For each task, provide:
- description: a concise description of the task
- type: bug_fix, feature, documentation, refactoring, deployment, meeting, other
- status: done, in_progress, planned
- assignee: the person's name if mentioned, otherwise null

Respond ONLY with valid JSON, no text before or after.
Expected format:
{
  "tasks": [
    {"description": "...", "type": "...", "status": "...", "assignee": "..."}
  ]
}

REPORT:
{report_text}
"""

Step 4: Local LLM call

import ollama
import json
from pydantic import BaseModel
from typing import Optional

class Task(BaseModel):
    description: str
    type: str
    status: str
    assignee: Optional[str] = None

class ExtractionResult(BaseModel):
    tasks: list[Task]

def extract_tasks(report_text: str) -> ExtractionResult:
    prompt = EXTRACTION_PROMPT.format(report_text=report_text)

    response = ollama.chat(
        model='llama3.1:8b',
        messages=[{'role': 'user', 'content': prompt}],
        options={'temperature': 0.1}  # Low temperature for consistency
    )

    content = response['message']['content']

    # Clean response if needed
    if '```json' in content:
        content = content.split('```json')[1].split('```')[0]

    data = json.loads(content)
    return ExtractionResult(**data)

The low temperature setting (0.1) produces more deterministic outputs, reducing variability between runs. Pydantic validation ensures the response matches the expected schema.

Step 5: Jira integration

from jira import JIRA

def create_jira_tickets(tasks: list[Task], project_key: str, jira_client: JIRA):
    created_tickets = []

    type_mapping = {
        'bug_fix': 'Bug',
        'feature': 'Story',
        'documentation': 'Task',
        'refactoring': 'Task',
        'deployment': 'Task',
        'other': 'Task'
    }

    for task in tasks:
        if task.status == 'done':
            continue  # Don't create tickets for completed tasks

        issue_dict = {
            'project': {'key': project_key},
            'summary': task.description[:255],  # Jira limit
            'issuetype': {'name': type_mapping.get(task.type, 'Task')},
            'description': f"Task automatically extracted from monthly report.\n\nType: {task.type}\nOriginal status: {task.status}"
        }

        if task.assignee:
            users = jira_client.search_users(query=task.assignee)
            if users:
                issue_dict['assignee'] = {'accountId': users[0].accountId}

        new_issue = jira_client.create_issue(fields=issue_dict)
        created_tickets.append(new_issue.key)

    return created_tickets

The type mapping translates LLM output to Jira issue types. Customize this mapping to match your project's issue type scheme.

Step 6: Main script

import os
from pathlib import Path

def process_monthly_reports(reports_folder: str, jira_project: str = None):
    jira_client = None
    if jira_project and os.getenv('JIRA_URL'):
        jira_client = JIRA(
            server=os.getenv('JIRA_URL'),
            basic_auth=(os.getenv('JIRA_EMAIL'), os.getenv('JIRA_TOKEN'))
        )

    results = []

    for file_path in Path(reports_folder).glob('*'):
        if file_path.suffix.lower() not in ['.pdf', '.docx', '.doc', '.txt', '.md']:
            continue

        print(f"Processing {file_path.name}...")

        text = extract_text(str(file_path))
        extraction = extract_tasks(text)

        tickets = []
        if jira_client and jira_project:
            tickets = create_jira_tickets(extraction.tasks, jira_project, jira_client)

        results.append({
            'file': file_path.name,
            'tasks_found': len(extraction.tasks),
            'tickets_created': tickets
        })

    return results

Optimizations and best practices

Handling hallucinations

LLMs can invent tasks that don't exist in the report. To mitigate this risk:

Low temperature: use temperature=0.1 for more deterministic responses
Cross-validation: analyze the same report 3 times and only keep tasks present in at least 2 results
Human review: display extracted tasks for validation before ticket creation

The cross-validation approach increases accuracy from roughly 87% to over 95% in our testing, at the cost of 3x processing time.

Scaling up

For processing large volumes:

Use vLLM instead of Ollama for better throughput
Parallelize report processing with asyncio or multiprocessing
Consider a GPU cluster if processing more than 1000 reports per day

vLLM can achieve 10 to 20x higher throughput than Ollama for batch processing, making it essential for enterprise-scale deployments.

Continuous improvement

Keep a log of manual corrections made by users. This data will allow you to:

Refine the prompt for better results
Identify problematic report types
Eventually fine-tune the model on your specific data

Fine-tuning typically improves accuracy by 5 to 10 percentage points for domain-specific extraction tasks.

Recommended hardware configuration

For an SME processing 50 to 200 reports per month:

Total cost: between $800 and $1,500 for a new configuration. An investment recouped in a few months of savings on cloud APIs and manual processing time.

What this means for your business

Local LLM automation transforms a multi-hour manual process into a few-minute operation. Without compromising data security.

This pattern applies well beyond monthly reports:

Information extraction from contracts
Support ticket analysis to identify trends
Automatic meeting summaries (from transcriptions)
Incoming document classification

At ClaroDigi, we design and deploy these types of solutions for Moroccan businesses. Our AI automation service includes process analysis, model selection, and integration with your existing tools.

If you're starting with automation and want to understand opportunities for your business, our digital transformation solution provides a comprehensive assessment of your automation potential.

FAQ

What is the extraction accuracy compared to manual extraction?

Can I use a smaller model to reduce hardware costs?

How do I handle reports in multiple languages?

Does Jira integration work with Jira Server or only Cloud?

How long does it take to process a 10-page report?

Local LLM: Automating Task Extraction from Monthly Reports

Why a local LLM instead of cloud

Data security

Predictable costs

Latency and availability

System architecture

1. LLM inference server

2. Text extraction module

3. Task extraction agent

4. Jira integration (optional)

Step-by-step implementation

Step 1: Python environment

Step 2: Text extraction

Step 3: Extraction prompt

Step 4: Local LLM call

Step 5: Jira integration

Step 6: Main script

Optimizations and best practices

Handling hallucinations

Scaling up

Continuous improvement

Recommended hardware configuration

What this means for your business

FAQ

Similar articles

AI Strategy for Moroccan Businesses: A Complete Guide from A to Z

AI Training for Businesses in Morocco: Programs, Formats and ROI

Sovereign AI in Morocco: Challenges, Opportunities and Strategy for Businesses

Accounting Automation in Morocco: Invoices, VAT & Bank Reconciliation, Practical Guide

Have a project in mind?

Local LLM: Automating Task Extraction from Monthly Reports

Why a local LLM instead of cloud

Data security

Predictable costs

Latency and availability

System architecture

1. LLM inference server

2. Text extraction module

3. Task extraction agent

4. Jira integration (optional)

Step-by-step implementation

Step 1: Python environment

Step 2: Text extraction

Step 3: Extraction prompt

Step 4: Local LLM call

Step 5: Jira integration

Step 6: Main script

Optimizations and best practices

Handling hallucinations

Scaling up

Continuous improvement

Recommended hardware configuration

What this means for your business

FAQ

Similar articles

AI Strategy for Moroccan Businesses: A Complete Guide from A to Z

AI Training for Businesses in Morocco: Programs, Formats and ROI

Sovereign AI in Morocco: Challenges, Opportunities and Strategy for Businesses

Accounting Automation in Morocco: Invoices, VAT & Bank Reconciliation, Practical Guide

Have a project in mind?