Introduction

Visio AI is an open-source, enterprise-grade Data Science platform that empowers anyone to build Machine Learning models without writing a single line of code. It acts as a visual interface on top of powerful Python libraries like pandas, scikit-learn, and plotly.

Live Demo: Access the cloud version at visio-ai.streamlit.app

Why use Visio AI?

🌱 Why Visio AI? The Sustainable Choice

In an era dominated by Generative AI and Large Language Models (LLMs), Visio AI was built with a specific philosophy: Efficiency, Privacy, and Sustainability.

🌍 Reduced Carbon Footprint

Training and running LLMs (like GPT-4 or Gemini) utilizes massive GPU clusters that consume gigawatts of electricity and millions of gallons of water for cooling.

Visio AI is "Green AI". It utilizes classical Machine Learning algorithms (like Random Forest) that are mathematically efficient. They run on your local CPU, consuming a fraction of the energy.

🔒 Privacy & Offline Capability

When you use cloud-based AI, you upload your sensitive data to external servers.

Visio AI runs locally on your machine. Your financial or medical data never leaves your laptop. You can use the entire platform without an internet connection.

⚡ The Right Tool for the Job

You don't need a trillion-parameter "Brain" to fit a trend line. Using an LLM for tabular data analysis is like using a sledgehammer to crack a nut.

Visio AI provides the exact mathematical tools needed for structured data, resulting in faster, more interpretable results without the "hallucinations" of GenAI.

⚡ Quick Start

Follow these 3 steps to run your first predictive model.

Step 1: Launch

Run the application in your terminal:

streamlit run Home.py

Step 2: Load Data

Navigate to Data Loader. Drag and drop your `.csv` file.
Tip: Don't have data? Create a simple Excel file with columns like "YearsExperience" and "Salary".

Step 3: Analyze

Navigate to EDA to visualize your data, or go straight to Supervised to build a model.

1. Data Loading & Cleaning

Before any AI can work, your data must be clean. Real-world data is often "messy" (missing values, wrong types).

Features

Important: You CANNOT proceed to modeling if your data has missing values (NaN). Always run cleaning first.

2. Exploratory Data Analysis (EDA)

EDA is about "interviewing" your data to understand its story. Visio AI offers two modes:

Interactive (Plotly) vs Static (Seaborn)

Mode Best Used For...
Interactive Deep dives. You can zoom, pan, and hover over points to see details. Great for presentations.
Static Publication-quality images. Access to advanced statistical plots like Pair Plots (Correlations grid).

Smart Suggestions

The system analyzes your data types and suggests the best chart. For example, if you pick one numerical column, it suggests a Histogram (to see distribution). If you pick two, it suggests a Scatter Plot (to see relationships).

3. Supervised Machine Learning

This is where the magic happens. "Supervised" means you are teaching the computer by providing examples (Input + Correct Output).

The Workflow

  1. Select Target (Y): What do you want to predict? (e.g., "Price", "Diagnosis").
  2. Select Features (X): What data influences the target? (e.g., "Size", "Symptoms").
  3. Train: The system splits your data (80% for teaching, 20% for testing).
  4. Evaluate: Check the Accuracy or R2 Score.
  5. Predict: Use the "Prediction Lab" to enter new manual values and get a real-time prediction.
Concept: Regression vs. Classification
Regression: Predicting a distinct number (e.g., $500,000 House Price).
Classification: Predicting a category/group (e.g., "Spam" or "Not Spam").

4. Unsupervised Machine Learning

Used when you have data but NO target answer. You want the AI to find hidden patterns or groups on its own.

K-Means Clustering

Imagine you have 1,000 customers. You don't know who is who. K-Means will group them into clusters based on similarities (e.g., "Cluster 1: Young, High Spend", "Cluster 2: Older, Low Spend").

PCA (Dimensionality Reduction)

If you have 50 columns, you can't visualize them. PCA compresses 50 columns into 2 or 3 "Principal Components" that capture the most information, allowing you to plot a 3D chart of your complex data.

5. Image AI (Computer Vision)

Visio AI isn't just for spreadsheets. The Image AI module allows you to analyze visual data using state-of-the-art Multimodal LLMs.

The Model: Nvidia Nemotron

We utilize the nvidia/nemotron-nano-12b-v2-vl model via OpenRouter.

Workflow

  1. Upload: Drag & Drop any `.jpg` or `.png` image.
  2. Prompt: Ask a question (e.g., "What is written on the whiteboard?" or "Describe the architectural style").
  3. Analyze: The AI processes the image and returns a text description in seconds.

6. AutoML (Automated Machine Learning)

Not sure which algorithm to pick? The **AutoML** module runs a tournament.

It trains every available model (Random Forest, SVM, Regression, etc.) on your data and creates a **Leaderboard** ranked by accuracy. This is the fastest way to find the best model.

Algorithm Encyclopedia

A plain-english guide to the models available in Visio AI.

Classification Algorithms (Predicting Categories)

Logistic Regression Base

What is it? Despite the name, it's for classification. It draws a line (or plane) to separate two classes.
Use when: You need a simple, interpretable baseline (e.g., "Yes/No").

Random Forest Classifier Advanced

What is it? Creates hundreds of "Decision Trees" (flowcharts) and averages their votes.
Use when: You want high accuracy and have complex data. It's very robust against overfitting.

Support Vector Machine (SVM) Complex

What is it? Finds the widest possible "street" between two categories.
Use when: You have high-dimensional data (lots of features) and distinct margins.

XGBoost State-of-the-Art

What is it? Extreme Gradient Boosting. Builds trees sequentially, correcting previous errors.
Use when: You need maximum accuracy. It is the gold standard for structured data.

Regression Algorithms (Predicting Numbers)

Linear Regression Base

What is it? Draws a straight line through data points to predict a trend.
Use when: You expect a simple linear relationship (e.g., "As square footage goes up, price goes up").

Decision Tree Regressor Intermediate

What is it? Splits data into smaller and smaller groups to find an average value.
Use when: Data has non-linear patterns (e.g., "Prices go up with size, but ONLY in this specific zipcode").

FAQ & Troubleshooting

Q: My model failed to train!

A: Did you clean your data? If you have blank cells (NaN), models will crash. Go to Data Loader -> Auto Clean.

Q: Why is "Pair Plot" slow?

A: A Pair Plot creates a grid of Every Column vs Every Column. If you have 20 columns, that's 400 plots! Filter your data to 5-6 key columns first.

Q: Can I use my trained model elsewhere?

A: Yes! After training, click "Download PKL". This is a standard Python pickle file you can load in any other Python script using `joblib.load()`.

Visio AI Enterprise v2.0
Built with Python & Streamlit.
Open Source License.