8 26 207

Michał Junczyk PRO

michaljunczyk

https://goodmike31.github.io/michaljunczyk/

AI & ML interests

Automatic Speech Recognition, Data Annotation, ML Systems Design, ML Data Management, ML Systems Evaluation

Recent Activity

liked a dataset about 2 hours ago

confit/audioset-16khz-wds

liked a dataset about 2 hours ago

fosters/be-bel-audio-corpus

liked a dataset about 2 hours ago

isLucid/liepa-2

View all activity

Organizations

upvoted a collection 6 days ago

ASR

Collection

ASR 학습 데이터를 모와둔 폴더 • 13 items • Updated Sep 8 • 1

upvoted 2 papers 25 days ago

BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

Paper • 2510.00232 • Published 27 days ago • 15

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published 26 days ago • 86

upvoted a collection 29 days ago

ParlaSpeech

Collection

Speech + text dataset collection based on the ParlaMint data. Paper describing the construction process: https://www.arxiv.org/abs/2409.15397. • 4 items • Updated Oct 11, 2024 • 1

upvoted 2 papers about 1 month ago

TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them

Paper • 2509.21117 • Published Sep 25 • 29

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Paper • 2505.13404 • Published May 19 • 2

upvoted a collection about 2 months ago

Audio Datasets

Collection

5 items • Updated Feb 12 • 1

upvoted 2 papers about 2 months ago

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Paper • 2508.20088 • Published Aug 27 • 20

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published Aug 28 • 63

upvoted a collection 2 months ago

DeepSeek-V3.1

Collection

4 items • Updated Sep 22 • 240

upvoted an article 7 months ago

Article

Fine-Tune Whisper with 🤗 Transformers

Nov 3, 2022

• 317

upvoted a paper 11 months ago

Not All LLM Reasoners Are Created Equal

Paper • 2410.01748 • Published Oct 2, 2024 • 29

upvoted an article 11 months ago

Article

EuroLLM-9B

and 5 others •

Dec 2, 2024

• 133

upvoted a paper 11 months ago

EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 28

upvoted a collection 12 months ago

Polish Automatic Speech Recognition

Collection

3 items • Updated Jan 26, 2024 • 3

upvoted 2 articles about 1 year ago

Article

How to build a custom text classifier without days of human labeling

and 4 others •

Oct 17, 2024

• 55

Article

Can foundation models label data like humans?

Jun 12, 2023

• 1

upvoted 2 collections about 1 year ago

Llama-3.1-Nemotron-70B

Collection

SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 6 days ago • 155

Bielik-11B-v2.2

Collection

A collection of models based on Bielik-11B-v2.2 - instruct and quantized versions. • 17 items • Updated Jun 6 • 28

upvoted a paper about 1 year ago

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Paper • 2401.16658 • Published Jan 30, 2024 • 14

Michał Junczyk PRO

AI & ML interests

Recent Activity

Organizations

michaljunczyk's activity

Fine-Tune Whisper with 🤗 Transformers

EuroLLM-9B

How to build a custom text classifier without days of human labeling

Can foundation models label data like humans?