GOOGLE ADS

miércoles, 20 de abril de 2022

tensorflow: no se pudo encontrar la implementación de dnn

Estoy tratando de ejecutar mi código Keras CuDNNGRU en tensorflow usando gpu pero siempre aparece el error "Error al encontrar la implementación de dnn" aunque ya instalé CUDA y CuDNN.

Ya reinstalé CUDA y CuDNN varias veces y actualicé la versión de CuDNN de 7.2.1 a 7.5.0 pero no soluciona nada. También trato de ejecutar mi código en Jupyter Notebook y en el compilador de python (en la terminal) y ambos resultados son iguales. Aquí están los detalles de hardware y software míos.

  • Tesla V100 PCIe 16GB

  • Ubuntu 18.04

  • NVIDIA-SMI 384.183

  • CUDA 9.0

  • CuDNN 7.5.0

  • Miniconda 3

  • Pitón 3.6

  • Tensorflow 1.12

  • Difícil 2.1.6

  • Aquí está mi código.

    encoder_LSTM = tf.keras.layers.CuDNNGRU(hidden_unit,return_sequences=True,return_state=True)
    encoder_LSTM_rev=tf.keras.layers.CuDNNGRU(hidden_unit,return_state=True,return_sequences=True,go_backwards=True)
    encoder_outputs, state_h = encoder_LSTM(x)
    encoder_outputsR, state_hR = encoder_LSTM_rev(x)

    Y este es el mensaje de error.

    2019-05-27 19:08:06.814896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
    2019-05-27 19:08:06.814956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-05-27 19:08:06.814971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
    2019-05-27 19:08:06.814978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
    2019-05-27 19:08:06.815279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14678 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
    2019-05-27 19:08:08.050226: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
    2019-05-27 19:08:08.050350: E tensorflow/stream_executor/cuda/cuda_dnn.cc:381] Possibly insufficient driver version: 384.183.0
    2019-05-27 19:08:08.050378: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cudnn_rnn_ops.cc:1214: Unknown: Fail to find the dnn implementation.
    2019-05-27 19:08:08.050483: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
    2019-05-27 19:08:08.050523: E tensorflow/stream_executor/cuda/cuda_dnn.cc:381] Possibly insufficient driver version: 384.183.0
    2019-05-27 19:08:08.050541: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cudnn_rnn_ops.cc:1214: Unknown: Fail to find the dnn implementation.
    Traceback (most recent call last):
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
    tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
    [[{{node cu_dnngru/CudnnRNN}} = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnngru/transpose, cu_dnngru/ExpandDims, gradients/while/Shape/Enter_grad/zeros/Const, cu_dnngru/concat)]]
    [[{{node mean_squared_error/value/_37}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1756_mean_squared_error/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
    File "ta_skenario1.py", line 271, in <module>
    losss, op = sess.run([loss, optimizer], feed_dict={x:data,y_label:label,initial_input:begin_sentence})
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
    [[node cu_dnngru/CudnnRNN (defined at ta_skenario1.py:205) = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnngru/transpose, cu_dnngru/ExpandDims, gradients/while/Shape/Enter_grad/zeros/Const, cu_dnngru/concat)]]
    [[{{node mean_squared_error/value/_37}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1756_mean_squared_error/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
    Caused by op 'cu_dnngru/CudnnRNN', defined at:
    File "ta_skenario1.py", line 205, in <module>
    encoder_outputs, state_h = encoder_LSTM(x)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 619, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in __call__
    outputs = self.call(inputs, *args, **kwargs)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 109, in call
    output, states = self._process_batch(inputs, initial_state)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 299, in _process_batch
    rnn_mode='gru')
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 116, in cudnn_rnn
    is_training=is_training, name=name)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
    File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()
    UnknownError (see above for traceback): Fail to find the dnn implementation.
    [[node cu_dnngru/CudnnRNN (defined at ta_skenario1.py:205) = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnngru/transpose, cu_dnngru/ExpandDims, gradients/while/Shape/Enter_grad/zeros/Const, cu_dnngru/concat)]]
    [[{{node mean_squared_error/value/_37}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1756_mean_squared_error/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

    ¿Alguna idea? Gracias

    ACTUALIZACIÓN: Traté de degradar la versión de CuDNN de 7.5.0 a 7.1.4 pero el resultado sigue siendo el mismo.


    Solución del problema

    No estoy seguro de si puede ayudar, pero en mi caso, el problema se debió al uso de varios archivos de cuaderno jupyter.

    Estaba escribiendo un código simple para una red neuronal y decidí dividirlo en 2 cuadernos, uno para el entrenamiento y otro para la predicción (si no tenía recursos/tiempo para entrenar su red, proporcioné mi modelo guardado en un archivo).

    Si ejecuté los dos cuadernos "juntos", así que básicamente primero el entrenamiento y luego el de predicción sin desconectar el núcleo del primer código, habría recibido este error.

    Desconectar el núcleo del primer cuaderno jupyter antes de usar el segundo resolvió mi problema.

    No hay comentarios.:

    Publicar un comentario

    Flutter: error de rango al acceder a la respuesta JSON

    Estoy accediendo a una respuesta JSON con la siguiente estructura. { "fullName": "FirstName LastName", "listings...