Installation & Deployment¶
Prerequisites¶
- Python 3.11 or 3.12 (Ubuntu 24.04 LTS default: 3.12)
- PostgreSQL 16+ with pgvector extension
- Redis (for pipeline automation)
- Anthropic API key (Claude LLM)
- OntServe MCP server (port 8082) for ontology management
1. Clone and Install¶
git clone https://github.com/cr625/proethica.git
cd proethica
python3 -m venv venv-proethica
source venv-proethica/bin/activate
pip install -r requirements.txt
2. PostgreSQL Setup¶
Install PostgreSQL and pgvector¶
sudo apt-get update
sudo apt-get install -y postgresql postgresql-contrib postgresql-16-pgvector
sudo service postgresql start
Replace 16 with your installed version (pg_lsclusters to check).
Create Database¶
-- As postgres user (sudo -u postgres psql):
CREATE DATABASE ai_ethical_dm;
\connect ai_ethical_dm
CREATE EXTENSION IF NOT EXISTS vector;
Tables are created automatically by SQLAlchemy on first run.
3. Redis Setup¶
Redis serves as the message broker for Celery background tasks (pipeline automation).
sudo apt-get install -y redis-server
sudo systemctl enable redis-server
sudo systemctl start redis-server
Verify with:
redis-cli ping
# Should return: PONG
ProEthica uses Redis DB 1 (redis://localhost:6379/1) to avoid conflicts with other applications.
Redis is optional for basic usage (manual single-case extraction works without it), but required for batch pipeline automation.
4. Configuration¶
Create .env in the project root (use .env.production.example as a template):
# Database
SQLALCHEMY_DATABASE_URI=postgresql://<user>:<password>@localhost:5432/ai_ethical_dm
# API Keys
ANTHROPIC_API_KEY=<your-api-key>
# Flask
SECRET_KEY=your-secret-key-here
FLASK_ENV=development
# OntServe Integration
ONTSERVE_MCP_URL=http://localhost:8082
ONTSERVE_WEB_URL=http://localhost:5003
See Settings for the full list of environment variables.
5. OntServe Setup¶
ProEthica requires the OntServe MCP server for ontology management (entity validation, class hierarchy, SPARQL queries).
# In a separate terminal:
cd /path/to/OntServe
source venv-ontserve/bin/activate
python servers/mcp_server.py
OntServe MCP listens on port 8082. See OntServe repository for full setup instructions.
ProEthica will start without OntServe but ontology features (class assignment, entity commit) will be unavailable.
6. Run¶
Start ProEthica¶
source venv-proethica/bin/activate
python run.py
Access at: http://localhost:5000
Start Celery Worker (for pipeline automation)¶
In a separate terminal:
cd /path/to/proethica
source venv-proethica/bin/activate
celery -A celery_config.celery worker --loglevel=info
The worker processes background pipeline tasks (batch extraction, queue management). See Pipeline Automation for details.
All Services at Once¶
./scripts/start_all.sh start
This starts OntServe MCP, Redis, Celery worker, and Flask in the correct order.
Service Dependencies¶
| Service | Port | Purpose | Required |
|---|---|---|---|
| ProEthica (Flask) | 5000 | Web application | Yes |
| PostgreSQL | 5432 | Data storage | Yes |
| OntServe MCP | 8082 | Ontology integration | For ontology features |
| Redis | 6379 | Task queue | For pipeline automation |
| Celery Worker | - | Background tasks | For pipeline automation |
7. Verify Installation¶
# Test imports
python -c "
from app import create_app
from app.services.llm import get_llm_manager
print('ProEthica imports successful')
"
# Check services
redis-cli ping # Redis
curl -s http://localhost:8082/ # OntServe MCP
curl -s http://localhost:5000/health/ready # ProEthica health check
# Run test suite
PYTHONPATH=/path/to/parent:$PYTHONPATH pytest tests/ -v
Production Deployment¶
Gunicorn¶
gunicorn -w 4 -b 127.0.0.1:5000 --max-requests 1000 --max-requests-jitter 50 --timeout 60 wsgi:app
--max-requests 1000recycles workers after 1000 requests to prevent memory leaks--max-requests-jitter 50staggers restarts so all workers don't recycle at once--timeout 60allows heavy case pages to complete under load (default 30s)
Systemd Services¶
Production deployments use systemd for process management. Create service files for each component (ProEthica, OntServe web, OntServe MCP) with appropriate gunicorn settings. Include --max-requests and --timeout flags for worker recycling.
sudo systemctl daemon-reload
sudo systemctl restart proethica
Celery in Production¶
celery -A celery_config.celery worker --loglevel=info --detach
Or configure as a systemd service for automatic restart.
Nginx Configuration¶
Nginx serves as a reverse proxy with caching and bot protection. Configure:
- Reverse proxy to the gunicorn socket/port
- Response caching with
proxy_cachefor rendered HTML pages - Rate limiting with
limit_req_zoneto prevent abuse - Security headers (HSTS, X-Content-Type-Options, X-Frame-Options, Referrer-Policy, Permissions-Policy)
- TLS via Let's Encrypt / certbot
Note: nginx's add_header inheritance is per-block. Any add_header in a location block suppresses all headers from parent contexts. Include security headers in every location block.
Cache management:
# Check cache status header on a response
curl -sI https://proethica.org/ | grep X-Cache-Status
# HIT = served from cache, MISS = fetched from gunicorn
robots.txt¶
The file at app/static/robots.txt is served at the domain root via nginx alias. It controls crawler behavior, including crawl delays and route restrictions. Update locally and deploy with git pull; no service restart needed.
fail2ban¶
fail2ban auto-bans IPs that repeatedly trigger nginx rate limits. Configure a jail that watches the nginx error log for rate-limit violations and bans offending IPs via iptables.
Production Instance¶
The primary ProEthica instance is available at proethica.org, maintained for research and demonstration purposes.
Common Issues¶
Database connection refused¶
sudo systemctl status postgresql
sudo systemctl start postgresql
psql -h localhost -U postgres -d ai_ethical_dm -c "SELECT 1;"
ANTHROPIC_API_KEY not set¶
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
# Or add to .env file
OntServe not reachable¶
ProEthica requires OntServe MCP on port 8082 for ontology features:
curl -s http://localhost:8082/
Redis connection refused¶
sudo systemctl status redis-server
sudo systemctl start redis-server
redis-cli ping # Should return PONG
Celery worker not processing tasks¶
# Check worker status
celery -A celery_config status
# Check active tasks
celery -A celery_config inspect active
Project Structure¶
proethica/
├── app/ # Main application
│ ├── models/ # SQLAlchemy models
│ ├── routes/ # Flask blueprints (42 registered)
│ ├── services/ # Business logic and extraction
│ │ ├── llm/ # LLM manager
│ │ ├── extraction/ # Entity extraction
│ │ ├── synthesis/ # Step 4 analysis
│ │ └── narrative/ # Narrative generation
│ ├── tasks/ # Celery task definitions
│ └── templates/ # Jinja2 templates
├── scripts/ # Pipeline and analysis scripts
├── tests/ # Test suite
├── docs/ # MkDocs documentation source
├── config.py # Flask configuration
├── celery_config.py # Celery worker configuration
├── requirements.txt # Python dependencies
├── run.py # Development server
└── wsgi.py # Production WSGI entry point
Related Documentation¶
- System Architecture - Technical architecture overview
- Settings - Configuration options and environment variables
- Ontology Integration - OntServe MCP configuration
- Pipeline Automation - Batch processing setup