chore(deps): update dependency transformers to v4.36.0 [security] - autoclosed#11099
Closed
renovate-bot wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
Closed
Conversation
236a228 to
aa6d16d
Compare
Contributor
|
@dependabot recreate |
7733150 to
3555c02
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==4.30.2->==4.36.0GitHub Vulnerability Alerts
CVE-2023-7018
Deserialization of Untrusted Data in GitHub repository huggingface/transformers prior to 4.36.
CVE-2023-6730
Deserialization of Untrusted Data in GitHub repository huggingface/transformers prior to 4.36.0.
Release Notes
huggingface/transformers (transformers)
v4.36.0: v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread supportCompare Source
New model additions
Mixtral
Mixtral is the new open-source model from Mistral AI announced by the blogpost Mixtral of Experts. The model has been proven to have comparable capabilities to Chat-GPT according to the benchmark results shared on the release blogpost.
The architecture is a sparse Mixture of Experts with Top-2 routing strategy, similar as
NllbMoearchitecture in transformers. You can use it throughAutoModelForCausalLMinterface:The model is compatible with existing optimisation tools such Flash Attention 2,
bitsandbytesand PEFT library. The checkpoints are release undermistralaiorganisation on the Hugging Face Hub.Llava / BakLlava
Llava is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. In other words, it is an multi-modal version of LLMs fine-tuned for chat / instructions.
The Llava model was proposed in Improved Baselines with Visual Instruction Tuning by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee.
Llava] Add Llava to transformers by @younesbelkada in #27662The integration also includes
BakLlavawhich is a Llava model trained with Mistral backbone.The mode is compatible with
"image-to-text"pipeline:And you can find all Llava weights under
llava-hforganisation on the Hub.SeamlessM4T v2
SeamlessM4T-v2 is a collection of models designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text. It is an improvement on the previous version and was proposed in Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team from Meta AI.
For more details on the differences between v1 and v2, refer to section Difference with SeamlessM4T-v1.
SeamlessM4T enables multiple tasks without relying on separate models:
PatchTST
The PatchTST model was proposed in A Time Series is Worth 64 Words: Long-term Forecasting with Transformers by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong and Jayant Kalagnanam.
At a high level, the model vectorizes time series into patches of a given size and encodes the resulting sequence of vectors via a Transformer that then outputs the prediction length forecast via an appropriate head. The model is illustrated in the following figure:
PatchTSMixer
The PatchTSMixer model was proposed in TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting by Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong and Jayant Kalagnanam.
PatchTSMixer is a lightweight time-series modeling approach based on the MLP-Mixer architecture. In this HuggingFace implementation, we provide PatchTSMixer’s capabilities to effortlessly facilitate lightweight mixing across patches, channels, and hidden features for effective multivariate time-series modeling. It also supports various attention mechanisms starting from simple gated attention to more complex self-attention blocks that can be customized accordingly. The model can be pretrained and subsequently used for various downstream tasks such as forecasting, classification and regression.
CLVP
The CLVP (Contrastive Language-Voice Pretrained Transformer) model was proposed in Better speech synthesis through scaling by James Betker.
Phi-1/1.5
The Phi-1 model was proposed in Textbooks Are All You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li.
The Phi-1.5 model was proposed in Textbooks Are All You Need II: phi-1.5 technical report by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
TVP
The text-visual prompting (TVP) framework was proposed in the paper Text-Visual Prompting for Efficient 2D Temporal Video Grounding by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding.
This research addresses temporal video grounding (TVG), which is the process of pinpointing the start and end times of specific events in a long video, as described by a text sentence. Text-visual prompting (TVP), is proposed to enhance TVG. TVP involves integrating specially designed patterns, known as ‘prompts’, into both the visual (image-based) and textual (word-based) input components of a TVG model. These prompts provide additional spatial-temporal context, improving the model’s ability to accurately determine event timings in the video. The approach employs 2D visual inputs in place of 3D ones. Although 3D inputs offer more spatial-temporal detail, they are also more time-consuming to process. The use of 2D inputs with the prompting method aims to provide similar levels of context and accuracy more efficiently.
DINOv2 depth estimation
Depth estimation is added to the DINO v2 implementation.
ROCm support for AMD GPUs
AMD's ROCm GPU architecture is now supported across the board and fully tested in our CI with MI210/MI250 GPUs. We further enable specific hardware acceleration for ROCm in Transformers, such as Flash Attention 2, GPTQ quantization and DeepSpeed.
PyTorch
scaled_dot_product_attentionnative supportPyTorch's
torch.nn.functional.scaled_dot_product_attentionoperator is now supported in the most-used Transformers models and used by default when usingtorch>=2.1.1, allowing to dispatch on memory-efficient attention and Flash Attention backend implementations with no other package thantorchrequired. This should significantly speed up attention computation on hardware that that supports these fastpath.While Transformers automatically handles the dispatch to use SDPA when available, it is possible to force the usage of a given attention implementation (
"eager"being the manual implementation, where each operation is implemented step by step):Training benchmark, run on A100-SXM4-80GB.
"eager", s)"sdpa", s)"eager", MB)"sdpa", MB)Inference benchmark, run on A100-SXM4-80GB.
"eager"(ms)"sdpa"(ms)New Cache abstraction & Attention Sinks support
We are rolling out a new abstraction for the
past_key_valuescache, which enables the use of different types of caches. For now, onlyllamaandllama-inspired architectures (mistral,persimmon,phi) support it, with other architectures scheduled to have support in the next release. By default, a growing cache (DynamicCache) is used, which preserves the existing behavior.This release also includes a new
SinkCachecache, which implements the Attention Sinks paper. WithSinkCache, the model is able to continue generating high-quality text well beyond its training sequence length! Note that it does not expand the context window, so it can’t digest very long inputs — it is suited for streaming applications such as multi-round dialogues. Check this colab for an example.Cacheabstraction and Attention Sinks support by @tomaarsen in #26681Safetensors as a default
We continue toggling features enabling safetensors as a default across the board, in PyTorch, Flax, and TensorFlow.
When using PyTorch model and forcing the load of
safetensorsfile withuse_safetensors=True, if the repository does not contain the safetensors files, they will now be converted on-the-fly server-side.from_ptflag when loading withsafetensorsby @LysandreJik in #27394Breaking changes
pickle files
We now disallow the use of
pickle.loadinternally for security purposes. To circumvent this, you can use theTRUST_REMOTE_CODE=Truecommand to indicate that you would still like to load it.pickle.loadunlessTRUST_REMOTE_CODE=Trueby @ydshieh in #27776Beam score calculation for decoder-only models
In the previous implementation of beam search, when
length_penaltyis active, the beam score for decoder-only models was penalized by the total length of both prompt and generated sequence. However, the length of prompt should not be included in the penalization step -- this release fixes it.Slight API changes/corrections
AttentionMaskConvertercompatible withtorch.compile(..., fullgraph=True)by @fxmarty in #27868Bugfixes and improvements
PEFT/Tests] Fix peft integration failing tests by @younesbelkada in #27258Docs/SAM] Reflect correct changes to run inference without OOM by @younesbelkada in #27268FA2] Add flash attention for forDistilBertby @susnato in #26489PretrainedTokenizer] add some of the most important functions to the doc by @ArthurZucker in #27313Kosmos2Processorbatch mode by @ydshieh in #27323FA2] Add flash attention forGPT-Neoby @susnato in #26486Whisper] Add conversion script for the tokenizer by @ArthurZucker in #27338gpt_bigcode. by @susnato in #27348Whisper] Nit converting the tokenizer by @ArthurZucker in #27349Kosmos-2device issue by @ydshieh in #27346from_pt=Trueby @ydshieh in #27372torch.rangeintest_modeling_ibert.pyby @kit1980 in #27355CodeLlamaTokenizer] Nit, update init to make sure the AddedTokens are not normalized because they are special by @ArthurZucker in #27359pyproject.tomlby @ydshieh in #27366pytest.markdirectly by @ydshieh in #27390FuyuConfigby @ydshieh in #27399Owlv2checkpoint name and a default value inOwlv2VisionConfigby @ydshieh in #27402circleci/create_circleci_config.pyis modified by @ydshieh in #27413Quantization] Add str to enum conversion for AWQ by @younesbelkada in #27320AttentionMaskConverter] ]Fix-mask-inf by @ArthurZucker in #27114examples_torch_jobfaster by @ydshieh in #27437utils/not_doctested.txtby @ydshieh in #27459Llama + Mistral] Add attention dropout by @ArthurZucker in #27315gradient_checkpointing_kwargsby @tomaszcichy98 in #27470python-Levenshteinfornougatin CI image by @ydshieh in #27465AWQ] Addresses TODO for awq tests by @younesbelkada in #27467Peft]modules_to_savesupport for peft integration by @younesbelkada in #27466CI-test_torch] skiptest_tf_from_pt_safetensorsfor 4 models by @ArthurZucker in #27481ExponentialDecayLengthPenaltydoctest by @gante in #27485GenerationConfig.from_pretrainedcan return unused kwargs by @gante in #27488CI-test_torch] skip test_tf_from_pt_safetensors andtest_assisted_decoding_sampleby @ArthurZucker in #27508CircleCI] skip test_assisted_decoding_sample for everyone by @ArthurZucker in #27511tokenizers] updatetokenizersversion pin by @ArthurZucker in #27494PretrainedConfig] Improve messaging by @ArthurZucker in #27438en/model_docdocs to Japanese. by @Yuki-Imajuku in #27401pytest] Avoid flash attn test marker warning by @ArthurZucker in #27509usedforsecurity=Falsein hashlib methods (FIPS compliance) by @Wauplin in #27483latest-pytorch-amdfor now by @ydshieh in #27541Styling] stylify using ruff by @ArthurZucker in #27144convert_hf_to_openai.pyscript to Whisper documentation resources by @zuazo in #27590FA-2] Add fa2 support forfrom_configby @younesbelkada in #26914large-v3version support by @flyingleafe in #27336core/gradient_checkpointing] add support for old GC method by @younesbelkada in #27610past_key_valuesingenerateby @gante in #27612init_git_repoby @statelesshz in #27617use_cache=Truein Flash Attention tests by @fxmarty in #27635resize_token_embeddingsby @czy-orange in #26861)dependency] update pillow pins by @ArthurZucker in #27409max_stepsdocumentation regarding the end-of-training condition by @qgallouedec in #27624FA2] Add flash attention for opt by @susnato in #26414save_only_modelarg and simplifying FSDP integration by @pacman100 in #27652TransfoXLby @ydshieh in #27607DocString] Support a revision in the docstringadd_code_sample_docstringsto facilitate integrations by @ArthurZucker in #27645TVPModelTestby @ydshieh in #27695en/model_docto JP by @rajveer43 in #27264TransfoXLTokenizer.__init__by @ydshieh in #27721tests/utils/tiny_model_summary.jsonis modified by @ydshieh in #27693~transformer.->~transformers.by @tomaarsen in #27740check_runner_status.ymlby @ydshieh in #27767GenerationConfigthrows an exception whengenerateargs are passed by [@&Removed duplicate endpoints sample - now at endpoints/getting-started #820Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Never, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.