Questions tagged [checkpoint]
Check Point Software Technologies is a widely deployed brand of firewalls and other security and networking products.
checkpoint
310
questions
0
votes
0
answers
10
views
Display result of overall loss validation graph
Can I display the results of the overall loss validation graph from the last checkpoint? so I can see the difference between learning curve loss train and loss vall, because from the code that I run ...
0
votes
0
answers
23
views
how to set checkpoint with efficientdet model and tensorflow object detection API
I have run the training process, but for the evaluation I run only at the last checkpoint step, can someone help me to provide input so that I can change the checkpoint assignment per 100 steps, so ...
0
votes
0
answers
82
views
How to fix this error: KeyError: 'model.embed_tokens.weight'
This is the detailed error:
Traceback (most recent call last):
File "/home/cyq/zxc/SmartEdit/train/DS_MLLMSD11_train.py", line 769, in <module>
train()
File "/home/cyq/zxc/...
0
votes
0
answers
51
views
Save and Load - Checkpoint Godot 4
I’m new to game development and recently started developing my first game, a simple 2D game to better understand the platform’s functionalities.
I’m having trouble adjusting the checkpoint saving ...
0
votes
1
answer
56
views
Playbook for checkpoint
I want to make some playbooks for checkpoint; My question is: for checkpoint is there a specific connection string from ansible?
`Procedure to generate database backup in Security Management Server:
$...
1
vote
1
answer
11
views
Model Not Saving After Training in PyCharm Virtual Environment
# I'm running this python code in my pc windows 10 on PyCharm 2024 version in Virtual enviroment-----: -
`import os
import numpy as np
import librosa
import soundfile as sf
import tensorflow as tf
...
0
votes
0
answers
23
views
Saving pytorch model as checkpoint is giving very bad results after few days
Problem: Loading a saved PyTorch model after few days giving very bad results.
Hamming score suddenly dropped from 75% to 0.1% and flat score dropped from 65% to 0.3%.
torch.save(model, 'models/...
0
votes
1
answer
103
views
flink checkpoint interval setting
I have a flink task that uses RocksDB StateBackend, and the checkpoint configuration is a minimum interval of 3 minutes and a timeout of five minutes.
When I tested the checkpoint recovery mechanism, ...
0
votes
0
answers
21
views
Spark streaming with Kafka: ERROR in fold checkpoint Multiple streaming queries
I am a beginner in Apache Spark. I am researching the problem of restarting a job when an error occurs when using Spark streaming with Kafka.
I tried deleting the latest file in the commit folder and ...
0
votes
0
answers
184
views
Asynchronous part of checkpoint could not be completed
I am getting following error in my jobmanager.
#0 - asynchronous part of checkpoint 11880 could not be completed.\njava.util.concurrent.CancellationException: null\n\tat java.util.concurrent....
0
votes
1
answer
145
views
Flink job keep DEPLOYING or INITIALIZING
I deploy my flink tasks based on flink-kubernetes-operator. At the same time, I set up a checkpoint, where the checkpoint directory is a mounted pvc. StateBackend uses RocksDB and is configured with ...
1
vote
1
answer
87
views
puzzled with flink window state
I'm currently confused about windows and states. Suppose I have a program that counts user access data every minute and needs to do sum statistics in each window. Assume that at this time, I configure ...
0
votes
2
answers
128
views
The size of checkpoint gets larger and larger without using state in tumblingProcessingTimeWindows
You can see that the size of checkpoint gets larger and larger, and never reduces.
In web UI, you can see that it is caused by TumblingProcessingTimeWindows, and I found that the size of checkpoint ...
0
votes
0
answers
33
views
Rerunning the iterations from a saved checkpoint in Tensorflow
I need to run a code for 200000 iterations. But I have a time limit for running the code and after that, the code will stop. I have saved checkpoints until 23000 iterations. Now, I want to restore the ...
0
votes
0
answers
250
views
Getting error KeyError: 'model_state_dict' on loading the finetuned ResNet pretrained model
I am trying to save my checkpoint at every 10 epochs and loading my model on saving it it says missing model_state_dict. here val loss and val accuracy are for validation. I have also added an early ...