turbo_broccoli.guard

If a block of code produces a JSON file, say out/foo.json, and if it is not needed to rerun the block if the output file exists, then a guarded block handler if an alternative to

if not Path("out/foo.json").exists():
    ...
    if success:
        tb.save_json(result, "out/foo.json")
else:
    result = tb.load_json("out/foo.json")

A guarded block handler allows to guard an entire block of code, and even a loop on a per-iteration basis.

Guarding a simple block

Use it as follows:

h = GuardedBlockHandler("out/foo.json")
for _ in h:
    # This whole block will be skipped if out/foo.json exists
    # If not, don't forget to set the results:
    h.result = ...
# In any case, the results of the block are available in h.result

I know the syntax isn't the prettiest, it would be more natural to use a with h: syntax but python doesn't allow for context managers that don't yield... The handler's result is None by default. If h.result is left to None, no output file is created. This allows for scenarios like

h = GuardedBlockHandler("out/foo.json")
for _ in h:
    ...  # Guarded code
    if success:
        h.result = ...

It is also possible to use "native" saving/loading methods:

h = GuardedBlockHandler("out/foo.csv")
for _ in h:
    ...
    h.result = some_pandas_dataframe

See turbo_broccoli.native.save and turbo_broccoli.native.load. Finally, if the actual result of the block are not needed, use:

h = GuardedBlockHandler("out/large.json", load_if_skip=False)
for _ in h:
    ...
# If the block was skipped (out/large.json already exists), h.result is
# None instead of the content of out/large.json

Guarding a loop

Let's say you have a loop

for x in an_iterable:
    ...  # expensive code that produces a result you want to save

You can guard the loop as follows:

h = GuardedBlockHandler("out/foo.json")
for i, x in h(an_iterable):  # an_iterable is always enumerated!
    # h.result is already a dict, no need to initialize it
    ...  # expensive code that produces a result you want to save
    h.result[x] = ...

The contents of h.result are saved to out/foo.json at the end of every iteration. However, if out/foo.json already exists, the loop will skip all iterations that are already saved. In details, let's say that the contents of out/foo.json is

{"a": "aaa", "b": "bbb"}

Then the content of the following loop is only executed for "c":

for i, x in h(["a", "b", "c"]):
    h.result[x] = x * 3
# h.result is now {"a": "aaa", "b": "bbb", "c": "ccc"}

If you want h.result to be a list instead of a dict, use:

h = GuardedBlockHandler("out/foo.json")
for i, x in h(an_iterable, result_type="list"):
    # h.result is already a list, no need to initialize it
    ...  # expensive code that produces a result you want to save
    h.result.append(...)

Caveats

  • Recall that in the case of simple blocks, setting/leaving h.result to None is understood as a failed computation:

    for _ in h:
      h.result = None
    for Z_ in h:  # This block isn't skipped
      h.result = "Hello world"
    

    In the case of loops however, if an entry of h.result is set to None, the corresponding iteration is not treated as failed. For example:

    for i, x in h(["a", "b", "c"]):
      h.result[x] = x * 3 if x != "b" else None
    # h.result is now {"a": "aaa", "b": None, "c": "ccc"}
    for i, x in h(["a", "b", "c"]):
      h.result[x] = x * 3
    # The second loop has been completely skipped, h.result is still
    # {"a": "aaa", "b": None, "c": "ccc"}
    
  • The load_if_skip constructor argument has no effect, meaning that the JSON file is always loaded if it exists. If you want some level of laziness, consider the following trick:

    from turbo_broccoli.context import EmbeddedDict
    
    h = GuardedBlockHandler("out/foo.json", nodecode_types=["embedded"])
    for i, x in h(["a", "b", "c"]):
      y = ...  # a dict that is expensive to compute
      h.result[x] = EmbeddedDict(y)
    

    By changing the type of y from a dict to an EmbeddedDict, and setting the "embedded" type in the guarded block handler's internal context's nodecode_types, results that were already present in the JSON file will not be decoded.

  1"""
  2If a block of code produces a JSON file, say `out/foo.json`, and if it is not
  3needed to rerun the block if the output file exists, then a guarded block
  4handler if an alternative to
  5
  6```py
  7if not Path("out/foo.json").exists():
  8    ...
  9    if success:
 10        tb.save_json(result, "out/foo.json")
 11else:
 12    result = tb.load_json("out/foo.json")
 13```
 14
 15A guarded block handler allows to *guard* an entire block of code, and even a
 16loop on a per-iteration basis.
 17
 18## Guarding a simple block
 19
 20Use it as follows:
 21
 22```py
 23h = GuardedBlockHandler("out/foo.json")
 24for _ in h:
 25    # This whole block will be skipped if out/foo.json exists
 26    # If not, don't forget to set the results:
 27    h.result = ...
 28# In any case, the results of the block are available in h.result
 29```
 30
 31I know the syntax isn't the prettiest, it would be more natural to use a `with
 32h:` syntax but python doesn't allow for context managers that don't yield...
 33The handler's `result` is `None` by default. If `h.result` is left to `None`,
 34no output file is created. This allows for scenarios like
 35
 36```py
 37h = GuardedBlockHandler("out/foo.json")
 38for _ in h:
 39    ...  # Guarded code
 40    if success:
 41        h.result = ...
 42```
 43
 44It is also possible to use ["native" saving/loading
 45methods](https://altaris.github.io/turbo-broccoli/turbo_broccoli/native.html#save):
 46
 47```py
 48h = GuardedBlockHandler("out/foo.csv")
 49for _ in h:
 50    ...
 51    h.result = some_pandas_dataframe
 52```
 53
 54See `turbo_broccoli.native.save` and `turbo_broccoli.native.load`. Finally, if
 55the actual result of the block are not needed, use:
 56
 57```py
 58h = GuardedBlockHandler("out/large.json", load_if_skip=False)
 59for _ in h:
 60    ...
 61# If the block was skipped (out/large.json already exists), h.result is
 62# None instead of the content of out/large.json
 63```
 64
 65## Guarding a loop
 66
 67Let's say you have a loop
 68
 69```py
 70for x in an_iterable:
 71    ...  # expensive code that produces a result you want to save
 72```
 73
 74You can guard the loop as follows:
 75
 76```py
 77h = GuardedBlockHandler("out/foo.json")
 78for i, x in h(an_iterable):  # an_iterable is always enumerated!
 79    # h.result is already a dict, no need to initialize it
 80    ...  # expensive code that produces a result you want to save
 81    h.result[x] = ...
 82```
 83
 84The contents of `h.result` are saved to `out/foo.json` at the end of every
 85iteration. However, if `out/foo.json` already exists, the loop will skip all
 86iterations that are already saved. In details, let's say that the contents of
 87`out/foo.json` is
 88
 89```json
 90{"a": "aaa", "b": "bbb"}
 91```
 92
 93Then the content of the following loop is only executed for `"c"`:
 94
 95```py
 96for i, x in h(["a", "b", "c"]):
 97    h.result[x] = x * 3
 98# h.result is now {"a": "aaa", "b": "bbb", "c": "ccc"}
 99```
100
101If you want `h.result` to be a list instead of a dict, use:
102
103```py
104h = GuardedBlockHandler("out/foo.json")
105for i, x in h(an_iterable, result_type="list"):
106    # h.result is already a list, no need to initialize it
107    ...  # expensive code that produces a result you want to save
108    h.result.append(...)
109```
110
111### Caveats
112
113- Recall that in the case of simple blocks, setting/leaving `h.result` to
114  `None` is understood as a failed computation:
115
116  ```py
117  for _ in h:
118      h.result = None
119  for Z_ in h:  # This block isn't skipped
120      h.result = "Hello world"
121  ```
122
123  In the case of loops however, if an entry of `h.result` is set to `None`, the
124  corresponding iteration is not treated as failed. For example:
125
126  ```py
127  for i, x in h(["a", "b", "c"]):
128      h.result[x] = x * 3 if x != "b" else None
129  # h.result is now {"a": "aaa", "b": None, "c": "ccc"}
130  for i, x in h(["a", "b", "c"]):
131      h.result[x] = x * 3
132  # The second loop has been completely skipped, h.result is still
133  # {"a": "aaa", "b": None, "c": "ccc"}
134  ```
135
136- The `load_if_skip` constructor argument has no effect, meaning that the JSON
137  file is always loaded if it exists. If you want some level of laziness,
138  consider the following trick:
139
140  ```py
141  from turbo_broccoli.context import EmbeddedDict
142
143  h = GuardedBlockHandler("out/foo.json", nodecode_types=["embedded"])
144  for i, x in h(["a", "b", "c"]):
145      y = ...  # a dict that is expensive to compute
146      h.result[x] = EmbeddedDict(y)
147  ```
148
149  By changing the type of `y` from a dict to an `EmbeddedDict`, and setting the
150  `"embedded"` type in the guarded block handler's internal context's
151  `nodecode_types`, results that were already present in the JSON file will not
152  be decoded.
153"""
154
155from pathlib import Path
156
157try:
158    from loguru import logger as logging
159except ModuleNotFoundError:
160    import logging  # type: ignore
161
162from typing import Any, Generator, Iterable, Literal
163
164from turbo_broccoli.context import Context
165from turbo_broccoli.native import load as native_load
166from turbo_broccoli.native import save as native_save
167
168
169class GuardedBlockHandler:
170    """See module documentation"""
171
172    block_name: str | None
173    context: Context
174    file_path: Path
175    load_if_skip: bool
176    result: Any = None
177
178    def __call__(
179        self, it: Iterable, **kwargs
180    ) -> Generator[tuple[int, Any], None, None]:
181        """Alias for `GuardedBlockHandler.guard` with an iterable"""
182        yield from self.guard(it, **kwargs)
183
184    def __init__(
185        self,
186        file_path: str | Path,
187        block_name: str | None = None,
188        load_if_skip: bool = True,
189        context: Context | None = None,
190        **kwargs,
191    ) -> None:
192        """
193        Args:
194            file_path (str | Path): Output file path.
195            block_name (str, optional): Name of the block, for logging
196                purposes. Can be left to `None` to suppress such logs.
197            load_if_skip (bool, optional): Wether to load the output file if
198                the block is skipped.
199            context (turbo_broccoli.context.Context, optional): Context to use
200                when saving/loading the target JSON file. If left to `None`, a
201                new context is built from the kwargs.
202            **kwargs: Forwarded to the `turbo_broccoli.context.Context`
203                constructor. Ignored if `context` is not `None`.
204        """
205        self.file_path = kwargs["file_path"] = Path(file_path)
206        self.block_name, self.load_if_skip = block_name, load_if_skip
207        self.context = context if context is not None else Context(**kwargs)
208
209    def __iter__(self) -> Generator[Any, None, None]:
210        """
211        Alias for `GuardedBlockHandler.guard` with no iterable and no kwargs
212        """
213        yield from self.guard()
214
215    def _guard_iter(
216        self,
217        it: Iterable,
218        result_type: Literal["dict", "list"] = "dict",
219        **kwargs,
220    ) -> Generator[tuple[int, Any], None, None]:
221        if self.file_path.is_file():
222            self.result = native_load(self.file_path)
223        else:
224            self.result = {} if result_type == "dict" else []
225        if result_type == "dict":
226            yield from self._guard_iter_dict(it, **kwargs)
227        else:
228            yield from self._guard_iter_list(it, **kwargs)
229
230    def _guard_iter_dict(
231        self, it: Iterable, **kwargs
232    ) -> Generator[tuple[int, Any], None, None]:
233        for i, x in enumerate(it):
234            if x in self.result:
235                if self.block_name:
236                    logging.debug(
237                        f"Skipped iteration '{str(x)}' of guarded loop "
238                        f"'{self.block_name}'"
239                    )
240                continue
241            yield (i, x)
242            self._save()
243
244    def _guard_iter_list(
245        self, it: Iterable, **kwargs
246    ) -> Generator[tuple[int, Any], None, None]:
247        for i, x in enumerate(it):
248            if i < len(self.result):
249                if self.block_name:
250                    logging.debug(
251                        f"Skipped iteration {i} of guarded loop "
252                        f"'{self.block_name}'"
253                    )
254                continue
255            yield (i, x)
256            self._save()
257
258    def _guard_no_iter(self, **kwargs) -> Generator[Any, None, None]:
259        if self.file_path.is_file():
260            self.result = (
261                native_load(self.file_path) if self.load_if_skip else None
262            )
263            if self.block_name:
264                logging.debug(f"Skipped guarded block '{self.block_name}'")
265            return
266        yield self
267        if self.result is not None:
268            self._save()
269            if self.block_name is not None:
270                logging.debug(
271                    f"Saved guarded block '{self.block_name}' results to "
272                    f"'{self.file_path}'"
273                )
274
275    def _save(self):
276        """Saves `self.result`"""
277        self.file_path.parent.mkdir(parents=True, exist_ok=True)
278        native_save(self.result, self.file_path)
279
280    def guard(
281        self, it: Iterable | None = None, **kwargs
282    ) -> Generator[Any, None, None]:
283        """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation"""
284        if it is None:
285            yield from self._guard_no_iter(**kwargs)
286        else:
287            yield from self._guard_iter(it, **kwargs)
class GuardedBlockHandler:
170class GuardedBlockHandler:
171    """See module documentation"""
172
173    block_name: str | None
174    context: Context
175    file_path: Path
176    load_if_skip: bool
177    result: Any = None
178
179    def __call__(
180        self, it: Iterable, **kwargs
181    ) -> Generator[tuple[int, Any], None, None]:
182        """Alias for `GuardedBlockHandler.guard` with an iterable"""
183        yield from self.guard(it, **kwargs)
184
185    def __init__(
186        self,
187        file_path: str | Path,
188        block_name: str | None = None,
189        load_if_skip: bool = True,
190        context: Context | None = None,
191        **kwargs,
192    ) -> None:
193        """
194        Args:
195            file_path (str | Path): Output file path.
196            block_name (str, optional): Name of the block, for logging
197                purposes. Can be left to `None` to suppress such logs.
198            load_if_skip (bool, optional): Wether to load the output file if
199                the block is skipped.
200            context (turbo_broccoli.context.Context, optional): Context to use
201                when saving/loading the target JSON file. If left to `None`, a
202                new context is built from the kwargs.
203            **kwargs: Forwarded to the `turbo_broccoli.context.Context`
204                constructor. Ignored if `context` is not `None`.
205        """
206        self.file_path = kwargs["file_path"] = Path(file_path)
207        self.block_name, self.load_if_skip = block_name, load_if_skip
208        self.context = context if context is not None else Context(**kwargs)
209
210    def __iter__(self) -> Generator[Any, None, None]:
211        """
212        Alias for `GuardedBlockHandler.guard` with no iterable and no kwargs
213        """
214        yield from self.guard()
215
216    def _guard_iter(
217        self,
218        it: Iterable,
219        result_type: Literal["dict", "list"] = "dict",
220        **kwargs,
221    ) -> Generator[tuple[int, Any], None, None]:
222        if self.file_path.is_file():
223            self.result = native_load(self.file_path)
224        else:
225            self.result = {} if result_type == "dict" else []
226        if result_type == "dict":
227            yield from self._guard_iter_dict(it, **kwargs)
228        else:
229            yield from self._guard_iter_list(it, **kwargs)
230
231    def _guard_iter_dict(
232        self, it: Iterable, **kwargs
233    ) -> Generator[tuple[int, Any], None, None]:
234        for i, x in enumerate(it):
235            if x in self.result:
236                if self.block_name:
237                    logging.debug(
238                        f"Skipped iteration '{str(x)}' of guarded loop "
239                        f"'{self.block_name}'"
240                    )
241                continue
242            yield (i, x)
243            self._save()
244
245    def _guard_iter_list(
246        self, it: Iterable, **kwargs
247    ) -> Generator[tuple[int, Any], None, None]:
248        for i, x in enumerate(it):
249            if i < len(self.result):
250                if self.block_name:
251                    logging.debug(
252                        f"Skipped iteration {i} of guarded loop "
253                        f"'{self.block_name}'"
254                    )
255                continue
256            yield (i, x)
257            self._save()
258
259    def _guard_no_iter(self, **kwargs) -> Generator[Any, None, None]:
260        if self.file_path.is_file():
261            self.result = (
262                native_load(self.file_path) if self.load_if_skip else None
263            )
264            if self.block_name:
265                logging.debug(f"Skipped guarded block '{self.block_name}'")
266            return
267        yield self
268        if self.result is not None:
269            self._save()
270            if self.block_name is not None:
271                logging.debug(
272                    f"Saved guarded block '{self.block_name}' results to "
273                    f"'{self.file_path}'"
274                )
275
276    def _save(self):
277        """Saves `self.result`"""
278        self.file_path.parent.mkdir(parents=True, exist_ok=True)
279        native_save(self.result, self.file_path)
280
281    def guard(
282        self, it: Iterable | None = None, **kwargs
283    ) -> Generator[Any, None, None]:
284        """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation"""
285        if it is None:
286            yield from self._guard_no_iter(**kwargs)
287        else:
288            yield from self._guard_iter(it, **kwargs)

See module documentation

GuardedBlockHandler( file_path: str | pathlib.Path, block_name: str | None = None, load_if_skip: bool = True, context: turbo_broccoli.context.Context | None = None, **kwargs)
185    def __init__(
186        self,
187        file_path: str | Path,
188        block_name: str | None = None,
189        load_if_skip: bool = True,
190        context: Context | None = None,
191        **kwargs,
192    ) -> None:
193        """
194        Args:
195            file_path (str | Path): Output file path.
196            block_name (str, optional): Name of the block, for logging
197                purposes. Can be left to `None` to suppress such logs.
198            load_if_skip (bool, optional): Wether to load the output file if
199                the block is skipped.
200            context (turbo_broccoli.context.Context, optional): Context to use
201                when saving/loading the target JSON file. If left to `None`, a
202                new context is built from the kwargs.
203            **kwargs: Forwarded to the `turbo_broccoli.context.Context`
204                constructor. Ignored if `context` is not `None`.
205        """
206        self.file_path = kwargs["file_path"] = Path(file_path)
207        self.block_name, self.load_if_skip = block_name, load_if_skip
208        self.context = context if context is not None else Context(**kwargs)

Args: file_path (str | Path): Output file path. block_name (str, optional): Name of the block, for logging purposes. Can be left to None to suppress such logs. load_if_skip (bool, optional): Wether to load the output file if the block is skipped. context (turbo_broccoli.context.Context, optional): Context to use when saving/loading the target JSON file. If left to None, a new context is built from the kwargs. **kwargs: Forwarded to the turbo_broccoli.context.Context constructor. Ignored if context is not None.

block_name: str | None
file_path: pathlib.Path
load_if_skip: bool
result: Any = None
def guard( self, it: Optional[Iterable] = None, **kwargs) -> Generator[Any, NoneType, NoneType]:
281    def guard(
282        self, it: Iterable | None = None, **kwargs
283    ) -> Generator[Any, None, None]:
284        """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation"""
285        if it is None:
286            yield from self._guard_no_iter(**kwargs)
287        else:
288            yield from self._guard_iter(it, **kwargs)

See GuardedBlockHandler's documentation