Accurately estimating software development effort is one of the most challenging tasks in software project management. Manual estimation using expert judgment and traditional methods like COCOMO leads to:
This system uses machine learning trained on real historical project data to provide objective, automated, data-driven effort estimates.
/docsUser (Browser)
β
βΌ
GitHub Pages (index.html) β Static frontend hosting
β
β POST /predict (JSON payload)
βΌ
Render Cloud βββ FastAPI Backend β REST API (Dockerized)
β β
β loads βΌ
β .joblib model files β 6 trained ML models
β
βΌ
JSON Response β Effort (hrs) Β· Duration (months) Β· Cost (USD) Β· Category
| Model | Dataset | RΒ² Score | RMSE | MAE |
|---|---|---|---|---|
| Linear Regression | Desharnais | 0.699 | 1,959 | 1,457 |
| MLP Neural Network | Combined | 0.646 | 4,916 | 2,456 |
| Random Forest | Desharnais | 0.606 | 2,246 | 1,813 |
| Decision Tree | Desharnais | -0.035 | 3,634 | 2,682 |
| Model | Accuracy | Classes |
|---|---|---|
| Logistic Regression | 76.5% | Low / Medium / High |
| Gaussian Naive Bayes | 52.9% | Low / Medium / High |
Note: IEEE and ACM research papers on the same Desharnais benchmark dataset report RΒ² values of 0.55β0.72. Our results are well within the published research range.
| Layer | Technology | Purpose |
|---|---|---|
| ML | Scikit-learn 1.7.2 | Model training and evaluation |
| Backend | FastAPI 0.136.1 | REST API framework |
| Server | Uvicorn 0.46.0 | ASGI production server |
| Validation | Pydantic 2.13.4 | Automatic input validation |
| Data | Pandas + NumPy | Preprocessing and feature engineering |
| Persistence | Joblib 1.5.2 | Model serialization |
| Frontend | HTML5 + CSS3 + JS | Responsive web application |
| Container | Docker 29.4.2 | Application containerization |
| Deployment | Render Cloud | Backend cloud hosting |
| Hosting | GitHub Pages | Frontend static hosting |
effort-estimation/
βββ main.py # FastAPI backend β REST API with all endpoints
βββ train_all.py # Train all 6 ML models and save .joblib files
βββ convert_models.py # Export model weights to JavaScript format
βββ app.py # Streamlit app (Phase 1 prototype)
βββ Dockerfile # Docker container configuration
βββ requirements.txt # Python dependencies with pinned versions
βββ .dockerignore # Files excluded from Docker build
β
βββ data/
β βββ desharnais.csv # 81 software projects (10 features)
β βββ combined_dataset.csv # 642 software projects (3 features)
β
βββ models/
β βββ mlp_model.joblib # MLP Neural Network + StandardScaler
β βββ lr_model.joblib # Linear Regression + scaler
β βββ dt_model.joblib # Decision Tree + scaler
β βββ rf_model.joblib # Random Forest + scaler + feature importances
β βββ gnb_model.joblib # Gaussian NB + scaler + LabelEncoder
β βββ log_model.joblib # Logistic Regression + scaler + LabelEncoder
β βββ app_data.pkl # Test set data + all model metrics
β
βββ static/
βββ index.html # Complete web application (HTML + CSS + JS)
βββ models.js # Exported model weights for browser inference
curl -X POST "https://effort-estimation-api.onrender.com/predict" \
-H "Content-Type: application/json" \
-d '{
"size": 100,
"duration": 6,
"team_exp": 2,
"manager_exp": 3,
"transactions": 100,
"entities": 50,
"points_na": 100,
"adjustment": 1.0,
"year_end": 2024,
"hours_per_month": 160,
"hourly_rate": 25,
"language": "Python"
}'
# 1. Clone the repository
git clone https://github.com/AbishekBino/effort-estimation.git
cd effort-estimation
# 2. Install dependencies
pip install -r requirements.txt
# 3. Train all models (generates .joblib files)
python train_all.py
# 4. Start the API server
uvicorn main:app --reload
# 5. Open in browser
# Website: http://localhost:8000
# API Docs: http://localhost:8000/docs
# Health: http://localhost:8000/health
# Build the Docker image
docker build -t effort-estimation-api .
# Run the container
docker run -p 8000:8000 effort-estimation-api
# Open: http://localhost:8000
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Serves the web application |
GET |
/health |
API health status check |
GET |
/models |
Returns all model performance metrics |
POST |
/predict |
Run all 6 ML models and get predictions |
GET |
/docs |
Interactive Swagger UI documentation |
POST /predict{
"status": "success",
"mlp": {
"effort_hours": 1808.33,
"duration_months": 11.3,
"cost_usd": 45208.32,
"confidence_low": 1446.67,
"confidence_high": 2170.0
},
"linear_regression": {
"effort_hours": 2145.0,
"duration_months": 13.4,
"cost_usd": 53625.0
},
"decision_tree": {
"effort_hours": 1224.0,
"duration_months": 7.65,
"cost_usd": 30600.0
},
"random_forest": {
"effort_hours": 2013.81,
"duration_months": 12.59,
"cost_usd": 50345.12
},
"effort_category_gnb": "Medium",
"effort_category_logistic": "Low"
}
Raw Datasets (CSV)
β
βΌ
Data Cleaning
βββ Remove Experience < 0 (invalid rows)
βββ Drop NaN values (dropna)
βββ Cap top 1% outliers (Size, Effort)
β
βΌ
Feature Engineering
βββ Log_Size = log1p(Size)
βββ Log_Duration = log1p(Duration)
βββ Size_x_Duration = Size Γ Duration
βββ Size_x_Experience = Size Γ Experience
βββ One-hot encoding (Language column)
β
βΌ
Train-Test Split (80% / 20%, random_state=42)
β
βΌ
StandardScaler (fit on train only β prevent data leakage)
β
βΌ
Train 6 Models + 5-Fold Cross Validation
β
βΌ
Save with Joblib (.joblib files)
β
βΌ
FastAPI serves predictions via REST API
[+] Building 3.8s (14/14) FINISHED
β
All models loaded successfully
INFO: Uvicorn running on http://0.0.0.0:8000
INFO: Application startup complete.
| Name | Role |
|---|---|
| Abishek Bino | ML Engineering, Backend, Deployment |
| Adarsh YL | Data Collection, Model Evaluation |
| Abhinand SS | Frontend Development, Testing |
| Aswin Jose | Documentation, Analysis |
Project Guide: Prof. Priya Shekhar Institution: Lourdes Matha College of Science and Technology (LMCST) Program: B.Tech Computer Science Engineering β KTU (2025β26)
This project is developed as a Final Year B.Tech project for academic purposes.