Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [checkpoint]

Check Point Software Technologies is a widely deployed brand of firewalls and other security and networking products.

0 votes
0 answers
10 views

Display result of overall loss validation graph

Can I display the results of the overall loss validation graph from the last checkpoint? so I can see the difference between learning curve loss train and loss vall, because from the code that I run ...
Af Farhat's user avatar
0 votes
0 answers
23 views

how to set checkpoint with efficientdet model and tensorflow object detection API

I have run the training process, but for the evaluation I run only at the last checkpoint step, can someone help me to provide input so that I can change the checkpoint assignment per 100 steps, so ...
Af Farhat's user avatar
0 votes
0 answers
82 views

How to fix this error: KeyError: 'model.embed_tokens.weight'

This is the detailed error: Traceback (most recent call last): File "/home/cyq/zxc/SmartEdit/train/DS_MLLMSD11_train.py", line 769, in <module> train() File "/home/cyq/zxc/...
hshsh's user avatar
  • 11
0 votes
0 answers
51 views

Save and Load - Checkpoint Godot 4

I’m new to game development and recently started developing my first game, a simple 2D game to better understand the platform’s functionalities. I’m having trouble adjusting the checkpoint saving ...
Neitan's user avatar
  • 1
0 votes
1 answer
56 views

Playbook for checkpoint

I want to make some playbooks for checkpoint; My question is: for checkpoint is there a specific connection string from ansible? `Procedure to generate database backup in Security Management Server: $...
Alvis Sanchez's user avatar
1 vote
1 answer
11 views

Model Not Saving After Training in PyCharm Virtual Environment

# I'm running this python code in my pc windows 10 on PyCharm 2024 version in Virtual enviroment-----: - `import os import numpy as np import librosa import soundfile as sf import tensorflow as tf ...
saad sagheer's user avatar
0 votes
0 answers
23 views

Saving pytorch model as checkpoint is giving very bad results after few days

Problem: Loading a saved PyTorch model after few days giving very bad results. Hamming score suddenly dropped from 75% to 0.1% and flat score dropped from 65% to 0.3%. torch.save(model, 'models/...
anushka Singh's user avatar
0 votes
1 answer
103 views

flink checkpoint interval setting

I have a flink task that uses RocksDB StateBackend, and the checkpoint configuration is a minimum interval of 3 minutes and a timeout of five minutes. When I tested the checkpoint recovery mechanism, ...
fidodosomething's user avatar
0 votes
0 answers
21 views

Spark streaming with Kafka: ERROR in fold checkpoint Multiple streaming queries

I am a beginner in Apache Spark. I am researching the problem of restarting a job when an error occurs when using Spark streaming with Kafka. I tried deleting the latest file in the commit folder and ...
Đức Hân Trần's user avatar
0 votes
0 answers
184 views

Asynchronous part of checkpoint could not be completed

I am getting following error in my jobmanager. #0 - asynchronous part of checkpoint 11880 could not be completed.\njava.util.concurrent.CancellationException: null\n\tat java.util.concurrent....
Banupriya's user avatar
  • 161
0 votes
1 answer
145 views

Flink job keep DEPLOYING or INITIALIZING

I deploy my flink tasks based on flink-kubernetes-operator. At the same time, I set up a checkpoint, where the checkpoint directory is a mounted pvc. StateBackend uses RocksDB and is configured with ...
fidodosomething's user avatar
1 vote
1 answer
87 views

puzzled with flink window state

I'm currently confused about windows and states. Suppose I have a program that counts user access data every minute and needs to do sum statistics in each window. Assume that at this time, I configure ...
fidodosomething's user avatar
0 votes
2 answers
128 views

The size of checkpoint gets larger and larger without using state in tumblingProcessingTimeWindows

You can see that the size of checkpoint gets larger and larger, and never reduces. In web UI, you can see that it is caused by TumblingProcessingTimeWindows, and I found that the size of checkpoint ...
jimmy_go's user avatar
0 votes
0 answers
33 views

Rerunning the iterations from a saved checkpoint in Tensorflow

I need to run a code for 200000 iterations. But I have a time limit for running the code and after that, the code will stop. I have saved checkpoints until 23000 iterations. Now, I want to restore the ...
Narcis's user avatar
  • 411
0 votes
0 answers
250 views

Getting error KeyError: 'model_state_dict' on loading the finetuned ResNet pretrained model

I am trying to save my checkpoint at every 10 epochs and loading my model on saving it it says missing model_state_dict. here val loss and val accuracy are for validation. I have also added an early ...
Pranshu Jena's user avatar

15 30 50 per page
1
2 3 4 5
21