Git for Analysts¶

As data analysis moves towards "Analytics as Code" (e.g., dbt, Python scripts, stored procedures), Version Control using Git is a mandatory skill.

Why Analysts Need Git¶

Reproducibility: You can always go back to the exact code that generated a specific report.
Collaboration: Multiple analysts can work on the same project without overwriting each other's work (using branches).
Code Review: Peers can review your SQL logic or Python code before it is merged into production.

The Basic Workflow¶

Clone: Copy the remote repository (from GitHub/GitLab) to your local machine.
```
git clone https://github.com/your-org/your-repo.git
```
Branch: Create a new workspace for your specific task (never work directly on the main branch).
```
git checkout -b feature/update-churn-query
```
Edit: Make changes to your SQL, Python, or Markdown files.
Add (Stage): Tell Git which modified files you want to include in the next save.
```
git add my_query.sql
```

Commit: "Save" the changes with a descriptive message.

git commit -m "Updated the churn calculation to exclude test users"

Push: Send your local branch up to the remote repository.
```
git push origin feature/update-churn-query
```
Pull Request (PR): Go to GitHub/GitLab and open a PR. Ask a peer to review your code. Once approved, it is merged into main.

`.gitignore` Best Practices¶

The most important rule for Data Analysts: Never commit raw data to Git!

Git is meant for tracking code, not massive .csv, .parquet, or database dump files. If you commit a 500MB CSV file, it will bloat the repository and potentially leak sensitive PII (Personally Identifiable Information).

Always ensure you have a .gitignore file that ignores data formats:

# .gitignore
*.csv
*.xlsx
*.parquet
*.sqlite
.env
__pycache__/

Useful Commands¶

git status - Shows which files are modified, staged, or untracked.
git log - Shows the history of commits.
git pull - Fetches the latest changes from the remote repository to your local machine.

Git for Analysts¶

Why Analysts Need Git¶

The Basic Workflow¶

.gitignore Best Practices¶

Useful Commands¶

References¶

`.gitignore` Best Practices¶