---
title: "Getting started with logrittr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with logrittr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  eval      = FALSE
)
```

## Motivation

In SAS, every DATA step prints a log that tells you exactly what happened:

```
NOTE: There were 120000 observations read from WORK.SALES.
NOTE: 7153 observations were deleted.
NOTE: The data set WORK.SALES has 112847 observations and 11 variables.
```

R's `dplyr` pipelines are silent by default. `logrittr` fills that gap with
`%>=%`, a pipe operator that logs at each step:

- row count before/after (with signed delta)
- column count before/after (with signed delta)
- column names added or dropped
- elapsed time

No function masking, no dependencies beyond `cli` and `stringr` for coloring and formatting in console.

> **Tip**: Fira Code users: with ligatures enabled, `%>=%` renders as a single
> wide arrow, close from a regular pipe.

## Installation

```{r install}
install.packages("logrittr", repos = "https://guillaumepressiat.r-universe.dev")

# alternatively
# remotes::install_github("GuillaumePressiat/logrittr")
```

## Basic usage

```{r basic}
library(logrittr)
library(dplyr)

iris %>=%
  as_tibble() %>=%
  filter(Sepal.Length < 5) %>=%
  mutate(rn = row_number()) %>=%
  group_by(Species) %>=%
  summarise(n = n_distinct(rn))
```

```
── iris  [rows:       150  cols:    5] ────────────────────────────────────────────
ℹ as_tibble()                                       rows:  150 +0     cols:  5 +0    [   0.0 ms]
ℹ filter(Sepal.Length < 5)                          rows:   22 -128   cols:  5 +0    [   1.0 ms]
ℹ mutate(rn = row_number())                         rows:   22 +0     cols:  6 +1    [   1.0 ms]
  added: rn
ℹ group_by(Species)                                 rows:   22 +0     cols:  6 +0    [   1.0 ms]
ℹ summarise(n = n_distinct(rn))                     rows:    3 -19    cols:  2 -4    [   1.0 ms]
  dropped: Sepal.Length, Sepal.Width, Petal.Length, and 2 other
  added: n
```

`%>=%` is fully composable with `|>` and `%>%`. Use it only where you want
visibility, and fall back to the native pipe for the rest.

## Nested pipelines

When `%>=%` appears inside an argument (e.g. inside `semi_join()`), the nested
steps are automatically indented with a `>` prefix so they are visually
distinct from the main pipeline:

```{r nested}
iris %>=%
  as_tibble() %>=%
  filter(Sepal.Length < 5) %>=%
  mutate(rn = row_number()) %>=%
  semi_join(
    iris %>% as_tibble() %>=%
      filter(Species == "setosa"),
    by = "Species"
  ) %>=%
  group_by(Species) %>=%
  summarise(n = n_distinct(rn))
```

```
── iris  [rows:       150  cols:    5] ─────────────────────────────────────
ℹ as_tibble()                                              rows:   150 +0     cols:  5 +0    [   1.0 ms]
ℹ filter(Sepal.Length < 5)                                 rows:    22 -128   cols:  5 +0    [   1.0 ms]
ℹ mutate(rn = row_number())                                rows:    22 +0     cols:  6 +1    [   1.0 ms]
  added: rn
ℹ > filter(Species == "setosa")                            rows:    50 -100   cols:  5 +0    [   2.0 ms]
ℹ semi_join(iris %>% as_tibble() %>=% filter(Species ==    rows:    20 -2     cols:  6 +0    [  32.0 ms]
  "setosa"), by = "Species")
ℹ group_by(Species)                                        rows:    20 +0     cols:  6 +0    [   1.0 ms]
ℹ summarise(n = n_distinct(rn))                            rows:     1 -19    cols:  2 -4    [   1.0 ms]
  dropped: Sepal.Length, Sepal.Width, Petal.Length, and 2 others
  added: n
```

## Options

All display options are controlled via `logrittr_options()`:

```{r options}
logrittr_options()
#> $wrap_width
#> [1] 52
#> $big_mark
#> [1] " "
#> $lang
#> [1] "en"
#> $max_cols
#> [1] 5
```

### Language

Switch to French with `lang = "fr"` (the metrics line uses `lignes`
instead of `rows`):

```{r lang}
logrittr_options(lang = "fr")

iris %>=%
  select(Species, Sepal.Length, Sepal.Width) %>=%
  filter(Sepal.Length > 5)
```

```
── iris  [lignes:       150  cols:    5] ─────────────────────────────────────────
ℹ select(Species, Sepal.Length, Sepal.Width)      lignes:   150 +0    cols:    3 -2  [   3.0 ms]
  dropped: Petal.Length, Petal.Width
ℹ filter(Sepal.Length > 5)                        lignes:   118 -32   cols:    3 +0  [   1.0 ms]
```

### Thousands separator

```{r bigmark}
logrittr_options(lang = "en", big_mark = ",")

big <- data.frame(x = seq_len(1e6), y = rnorm(1e6))
big %>=% filter(x > 500000)
```

```
── big  [rows: 1,000,000  cols:    2] ────────────────────────────────────────────
ℹ filter(x > 5e+05)                       rows:   500,000 -500000   cols:    2 +0    [  11.0 ms]
```

or underscore

```
── big  [rows: 1_000_000  cols:    2] ────────────────────────────────────────────
ℹ filter(x > 5e+05)                       rows:   500_000 -500000   cols:    2 +0    [  11.0 ms]
```

### Column name truncation

When a `select` or `join` adds or drops many columns, only the first `max_cols`
names are shown to keep the log readable:

```{r maxcols}
logrittr_options(max_cols = 2, lang = "en")

iris %>=%
  as_tibble() %>=%
  select(Species, Sepal.Length)
```

```
── iris  [rows:       150  cols:    5] ───────────────────────────────────────────────
ℹ as_tibble()                                 rows:   150 +0    cols:  5 +0   [   0.0 ms]
ℹ select(Species, Sepal.Length)               rows:   150 +0    cols:  2 -3   [   1.0 ms]
  dropped: Sepal.Width, Petal.Length, and 1 other
```

Use `max_cols = Inf` to always display all names:

```{r maxcols_inf}
logrittr_options(max_cols = Inf)
```

### Restoring defaults

`logrittr_options()` invisibly returns the previous values, which makes it easy
to restore the state after a temporary change:

```{r restore}
old <- logrittr_options(lang = "fr", big_mark = ",")
# ... work ...
do.call(logrittr_options, old)  # restore previous state
```


## Using logrittr with lumberjack

If you already use the `lumberjack` package, `logrittr_logger` plugs directly
into its `%L>%` pipe. The same console output as `%>=%` is produced, and you
keep access to all lumberjack features (`run_file()`, custom loggers, etc.).

```{r lumberjack}
library(lumberjack)
library(dplyr)

iris  %L>%
  start_log(log = logrittr_logger$new(), label = "Iris Example") %L>%
  as_tibble() %L>%
  filter(Sepal.Length < 5) %L>%
  mutate(rn = row_number()) %L>%
  group_by(Species) %L>%
  summarise(n = n_distinct(rn)) %L>%
  dump_log(stop = TRUE)
```

```
── Iris Example  [rows:       150  cols:    5] ───────────────────────────────────────
ℹ as_tibble()                                      rows:       150 +0        cols:    5 +0    [    NA ms]
ℹ filter(Sepal.Length < 5)                         rows:        22 -128      cols:    5 +0    [  47.0 ms]
ℹ mutate(rn = row_number())                        rows:        22 +0        cols:    6 +1    [   5.0 ms]
  added: rn
ℹ group_by(Species)                                rows:        22 +0        cols:    6 +0    [  11.0 ms]
ℹ summarise(n = n_distinct(rn))                    rows:         3 -19       cols:    2 -4    [   4.0 ms]
  dropped: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, rn
  added: n
✔ Log from Iris Example step written to ~/Documents/GitHub/logrittr/Iris Example_simple.csv
```

The first step always shows `NA ms` because lumberjack does not provide a
start time -- elapsed is measured as the interval between consecutive steps.